Containers Expo Blog Authors: Elizabeth White, Pat Romanski, Yeshim Deniz, Liz McMillan, Zakia Bouachraoui

Related Topics: Containers Expo Blog

Containers Expo Blog: Article

Ten Mistakes to Avoid When Virtualizing Data

Meeting the ever-changing information needs of today's enterprises

Mistake #3 - Missing the Hybrid Opportunity
In many cases, the best data integration solution is a combination of virtual and physical approaches. There is no reason to be locked into one way or the other. Figure 2 illustrates hybrid use cases, followed by a description of some examples.

  • Physical Data Warehouse and/or Data Mart Schema Extension: This is a way to extend existing schemas, such as adding current operations data to historical repositories.
  • Physical Warehouses, Marts and/or Stores Federation: This is a way to federate multiple physical consolidated sources, such as two or more sales data marts after a merger.
  • Data Warehouse and/or Data Mart Prototyping: This is a way to prototype new warehouses or marts, to accelerate an early stage leading into a larger BI initiative.
  • Data Warehouse and/or Data Mart Source Data Access: This is a way to provide a warehouse or mart with virtual access to source data, such as XML or packaged applications that may not be easily supported by the current ETL tool, or to integrate readily available, already federated views.
  • Data Mart Elimination: This is a way to eliminate or replace physical marts with virtual ones, such as stopping rogue data mart proliferation by providing an easier, more cost-effective virtual option.

Mistake #4 - Assuming Perfect Data Is Prerequisite
Poor data quality is a pervasive problem in enterprises today. While correcting and perfecting source data is the ultimate goal, we may end up leaving our source data alone while settling for cleaning up the data in a warehouse or mart during the consolidation and transformation phases of physical data consolidation.

When data quality issues are simple format discrepancies that reflect implementation details in various systems, data virtualization solutions easily resolve these types of common data discrepancies with zero impact on performance. Some examples include a Part_id field in one source system that reads VARCHAR, while a similar field in another source has INTEGER. Or, Sales_Regions in one system does not match Field_Territories in another. If "heavy-lifting" cleanups are required, integrating with specialized data quality solutions at runtime often meets the business needs, while opening up the opportunity for data virtualization.

Mistake #5 - Anticipating Negative Impact on Operational Systems
Although operational systems are often one of the primary data sources used when virtualizing data, the runtime performance of these systems is not typically impacted as a result. Yet, designers have been schooled to think about data volumes in terms of the size of the physical store and the throughput of the nightly ETLs. When using a virtual approach, designers should instead consider the amount of data that end solutions will actually query on any individual query, and how often these queries will run. If the queries are relatively small (for example, 10,000 rows) and broad (across multiple systems and/or tables), or run relatively infrequently (several hundred times per day), then the impact on operation systems will be light.

System designers and architects anticipating negative impact on operational systems are typically underestimating the speed of the latest data virtualization solutions. Certainly, Moore's Law has accelerated hardware and networks. In addition, 64-bit JVMs, high-performance query optimization algorithms, push-down techniques, caching, clustering and more have advanced the software side of the solution as well.

Taking the time to calculate required data loads helps avoid misjudging the potential impact on the operational systems. One best practice for predicting actual performance impact is to test-drive several of the biggest queries using a data virtualization tool of choice.

Mistake #6 - Failing to Simplify the Problem
While the enterprise data environment is understandably complex, it is usually unnecessary to develop complex data virtualization solutions. The most successful data virtualization projects are broken into smaller components, each addressing pieces of the overall need. This simplification can occur in two ways: by leveraging tools and by right-sizing integration components.

Data virtualization tools help address three fundamental challenges of data integration:

  1. Data Location: Data resides in multiple locations and sources.
  2. Data Structure: Data isn't always in the required form.
  3. Data Completeness: Data frequently needs to be combined with other data to have meaning.

Data virtualization middleware simplifies the location challenge by making all data appear as if it is available from one place, rather than where it is actually stored.

Data abstraction simplifies data complexity by transforming data from its native structure and syntax into reusable views and Web services that are easy for business solutions' developers to understand and business solutions to consume.

Data federation combines data to form more meaningful business information, producing a single view of a customer or a get inventory balances composite service, as examples. Data can be federated from both consolidated stores such as the enterprise data warehouse as well as original sources such as transaction systems.

Successful right-sizing of data integration components requires smart decomposition of requirements. Virtualized views or services built using data virtualization work best when aimed at serving focused needs. These can then be leveraged across multiple use cases and/or combined to support more complex needs.

A recently published book by a team of experts from five technology vendors including Composite Software, An Implementor's Guide to Service Oriented Architecture - Getting It Right, identifies three levels of virtualized data services that allow designers and architects to design smaller, more manageable data integration components as follows:

  • Physical Services: Physical services lie just above the data source, and they transform the data into a form that is easily consumed by higher-level services.
  • Business Services: Business services embody the bulk of the transformation logic that converts data from its physical form into its required business form.
  • Application Services: Application services leverage business services to provide data optimally to the consuming applications.

In this way, solution developers can draw from these simpler, focused data services (relational views work similarly), significantly simplifying their development efforts today, and providing greater reuse and agility tomorrow.

Mistake #7 - Treating SQL/Relational and XML/Hierarchical as Separate Silos
Historically, data integration has focused on supporting business intelligence applications needs, whereas process integration focused on optimizing business processes. These two divergent approaches led to different architectures, tools, middleware, methods, teams and more. However, because today's data virtualization middleware is equally adept at relational and hierarchical data, it is a mistake to silo these key data forms.

This is especially important in cases where a mix of SQL and XML is required; for example, when combining XML data from an outside payroll processor with relational data from an internal sales force automation system to serve XML data within a single view of a sales rep performance portal.

Not only will a unified approach lead to better solutions regardless of data type, but developers and designers will gain experience outside their traditional core areas of expertise.

Mistake #8 - Implementing Data Virtualization Using the Wrong Infrastructure
The loose coupling of data services in a services-oriented architecture (SOA) environment is an excellent use for data virtualization. As a result, SOA is one of data virtualization's most frequent use cases. However, there is sometimes confusion about when to deploy enterprise service bus (ESB) middleware and when to use information servers to design and run the data services typically required.

ESBs are excellent for mediating various transactional and data services. However, they are not designed to support heavy-duty data functions such as high-performance queries, complex federations, XML/SQL transformations, and so forth as required in many of today's enterprise application use cases. On the other hand, data virtualization tools provide an easy-to-use, high-productivity data service development environment and a high-performance, high-reliability runtime information server to meet both design and runtime needs. ESBs can then mediate these services as needed.

Mistake #9 - Segregating Data Virtualization People and Processes
As physical data consolidation technology and approaches have matured, supporting organizations in the form of Integration Competency Centers (ICC) along with best practice methods and processes have grown in support. These centers improve developer productivity, optimize tool usage, reduce project risk, and more. In fact, 10 specific benefits are identified in a book written by two experts at Informatica, Integration Competency Center: An Implementation Methodology.

It would be a mistake to assume that these ICCs, which have evolved from support of physical data consolidation approaches and middleware, can not or should not also be leveraged in support of data virtualization. By embracing data virtualization, ICCs can compound the technology value of data virtualization with complementary people and process resources.

Mistake #10 - Failing to Identify and Communicate Benefits
While data virtualization can accelerate new development, perform quicker change iterations, and reduce both development and operating costs, it's a mistake to assume these benefits sell themselves, especially in tough business times when new technology investment is highly scrutinized.

Fortunately, these benefits can (and should) be measured and communicated.  Here are some ideas for accomplishing this:

  • Start by using the virtual versus physical integration decision tool described previously to identify several data virtualization candidates as a pilot.
  • During the design and development phase for these projects, track the time it takes using data virtualization and contrast it to the time it would have taken using traditional physical approaches.
  • Use this time savings to calculate two additional points of value: time to solution reduction and development cost savings.
  • To measure lifecycle value, estimate the operating costs of extra physical data stores that are saved because of virtualization.
  • Add these hardware operating costs to the estimated development lifecycle cost savings that occur from faster turns on break-fix and enhancement development activities.
  • Finally, package the results of these pilot projects along with an extrapolation across future projects, and communicate them to business and IT leadership.

Industry analysts agree that best practices' leaders draw from portfolios containing both physical and virtual data integration tools to meet the ever-changing information needs of today's enterprises. Multiple use cases across a broad spectrum of industries and government agencies illustrate the mission-critical benefits derived from data virtualization. These benefits include reduced time-to-solution, lower overall costs for both implementation and on-going maintenance, and greater agility to adapt to change. By becoming familiar with common mistakes to avoid, enterprises arm themselves with the wisdom necessary to successfully implement data virtualization in their data integration infrastructures, and thereby begin to reap the benefits.


  • Composite Software, in conjunction with data virtualization users and industry analysts, developed a simple decision-making tool for determining when to use a virtual, physical or hybrid approach to data integration. Free copies are available online.

More Stories By Robert Eve

Robert "Bob" Eve is vice president of marketing at Composite Software. Prior to joining Composite, he held executive-level marketing and business development roles at several other enterprise software companies. At Informatica and Mercury Interactive, he helped penetrate new segments in his role as the vice president of Market Development. Bob ran Marketing and Alliances at Kintana (acquired by Mercury Interactive in 2003) where he defined the IT Governance category. As vice president of Alliances at PeopleSoft, Bob was responsible for more than 300 partners and 100 staff members. Bob has an MS in management from MIT and a BS in business administration with honors from University of California, Berkeley. He is a frequent contributor to publications including SYS-CON's SOA World Magazine and Virtualization Journal.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Every organization is facing their own Digital Transformation as they attempt to stay ahead of the competition, or worse, just keep up. Each new opportunity, whether embracing machine learning, IoT, or a cloud migration, seems to bring new development, deployment, and management models. The results are more diverse and federated computing models than any time in our history.
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...