Containers Expo Blog Authors: Pat Romanski, Zakia Bouachraoui, Yeshim Deniz, Elizabeth White, Liz McMillan

Related Topics: Containers Expo Blog

Containers Expo Blog: Article

Ten Mistakes to Avoid When Virtualizing Data

Meeting the ever-changing information needs of today's enterprises

Mistake #3 - Missing the Hybrid Opportunity
In many cases, the best data integration solution is a combination of virtual and physical approaches. There is no reason to be locked into one way or the other. Figure 2 illustrates hybrid use cases, followed by a description of some examples.

  • Physical Data Warehouse and/or Data Mart Schema Extension: This is a way to extend existing schemas, such as adding current operations data to historical repositories.
  • Physical Warehouses, Marts and/or Stores Federation: This is a way to federate multiple physical consolidated sources, such as two or more sales data marts after a merger.
  • Data Warehouse and/or Data Mart Prototyping: This is a way to prototype new warehouses or marts, to accelerate an early stage leading into a larger BI initiative.
  • Data Warehouse and/or Data Mart Source Data Access: This is a way to provide a warehouse or mart with virtual access to source data, such as XML or packaged applications that may not be easily supported by the current ETL tool, or to integrate readily available, already federated views.
  • Data Mart Elimination: This is a way to eliminate or replace physical marts with virtual ones, such as stopping rogue data mart proliferation by providing an easier, more cost-effective virtual option.

Mistake #4 - Assuming Perfect Data Is Prerequisite
Poor data quality is a pervasive problem in enterprises today. While correcting and perfecting source data is the ultimate goal, we may end up leaving our source data alone while settling for cleaning up the data in a warehouse or mart during the consolidation and transformation phases of physical data consolidation.

When data quality issues are simple format discrepancies that reflect implementation details in various systems, data virtualization solutions easily resolve these types of common data discrepancies with zero impact on performance. Some examples include a Part_id field in one source system that reads VARCHAR, while a similar field in another source has INTEGER. Or, Sales_Regions in one system does not match Field_Territories in another. If "heavy-lifting" cleanups are required, integrating with specialized data quality solutions at runtime often meets the business needs, while opening up the opportunity for data virtualization.

Mistake #5 - Anticipating Negative Impact on Operational Systems
Although operational systems are often one of the primary data sources used when virtualizing data, the runtime performance of these systems is not typically impacted as a result. Yet, designers have been schooled to think about data volumes in terms of the size of the physical store and the throughput of the nightly ETLs. When using a virtual approach, designers should instead consider the amount of data that end solutions will actually query on any individual query, and how often these queries will run. If the queries are relatively small (for example, 10,000 rows) and broad (across multiple systems and/or tables), or run relatively infrequently (several hundred times per day), then the impact on operation systems will be light.

System designers and architects anticipating negative impact on operational systems are typically underestimating the speed of the latest data virtualization solutions. Certainly, Moore's Law has accelerated hardware and networks. In addition, 64-bit JVMs, high-performance query optimization algorithms, push-down techniques, caching, clustering and more have advanced the software side of the solution as well.

Taking the time to calculate required data loads helps avoid misjudging the potential impact on the operational systems. One best practice for predicting actual performance impact is to test-drive several of the biggest queries using a data virtualization tool of choice.

Mistake #6 - Failing to Simplify the Problem
While the enterprise data environment is understandably complex, it is usually unnecessary to develop complex data virtualization solutions. The most successful data virtualization projects are broken into smaller components, each addressing pieces of the overall need. This simplification can occur in two ways: by leveraging tools and by right-sizing integration components.

Data virtualization tools help address three fundamental challenges of data integration:

  1. Data Location: Data resides in multiple locations and sources.
  2. Data Structure: Data isn't always in the required form.
  3. Data Completeness: Data frequently needs to be combined with other data to have meaning.

Data virtualization middleware simplifies the location challenge by making all data appear as if it is available from one place, rather than where it is actually stored.

Data abstraction simplifies data complexity by transforming data from its native structure and syntax into reusable views and Web services that are easy for business solutions' developers to understand and business solutions to consume.

Data federation combines data to form more meaningful business information, producing a single view of a customer or a get inventory balances composite service, as examples. Data can be federated from both consolidated stores such as the enterprise data warehouse as well as original sources such as transaction systems.

Successful right-sizing of data integration components requires smart decomposition of requirements. Virtualized views or services built using data virtualization work best when aimed at serving focused needs. These can then be leveraged across multiple use cases and/or combined to support more complex needs.

A recently published book by a team of experts from five technology vendors including Composite Software, An Implementor's Guide to Service Oriented Architecture - Getting It Right, identifies three levels of virtualized data services that allow designers and architects to design smaller, more manageable data integration components as follows:

  • Physical Services: Physical services lie just above the data source, and they transform the data into a form that is easily consumed by higher-level services.
  • Business Services: Business services embody the bulk of the transformation logic that converts data from its physical form into its required business form.
  • Application Services: Application services leverage business services to provide data optimally to the consuming applications.

In this way, solution developers can draw from these simpler, focused data services (relational views work similarly), significantly simplifying their development efforts today, and providing greater reuse and agility tomorrow.

Mistake #7 - Treating SQL/Relational and XML/Hierarchical as Separate Silos
Historically, data integration has focused on supporting business intelligence applications needs, whereas process integration focused on optimizing business processes. These two divergent approaches led to different architectures, tools, middleware, methods, teams and more. However, because today's data virtualization middleware is equally adept at relational and hierarchical data, it is a mistake to silo these key data forms.

This is especially important in cases where a mix of SQL and XML is required; for example, when combining XML data from an outside payroll processor with relational data from an internal sales force automation system to serve XML data within a single view of a sales rep performance portal.

Not only will a unified approach lead to better solutions regardless of data type, but developers and designers will gain experience outside their traditional core areas of expertise.

Mistake #8 - Implementing Data Virtualization Using the Wrong Infrastructure
The loose coupling of data services in a services-oriented architecture (SOA) environment is an excellent use for data virtualization. As a result, SOA is one of data virtualization's most frequent use cases. However, there is sometimes confusion about when to deploy enterprise service bus (ESB) middleware and when to use information servers to design and run the data services typically required.

ESBs are excellent for mediating various transactional and data services. However, they are not designed to support heavy-duty data functions such as high-performance queries, complex federations, XML/SQL transformations, and so forth as required in many of today's enterprise application use cases. On the other hand, data virtualization tools provide an easy-to-use, high-productivity data service development environment and a high-performance, high-reliability runtime information server to meet both design and runtime needs. ESBs can then mediate these services as needed.

Mistake #9 - Segregating Data Virtualization People and Processes
As physical data consolidation technology and approaches have matured, supporting organizations in the form of Integration Competency Centers (ICC) along with best practice methods and processes have grown in support. These centers improve developer productivity, optimize tool usage, reduce project risk, and more. In fact, 10 specific benefits are identified in a book written by two experts at Informatica, Integration Competency Center: An Implementation Methodology.

It would be a mistake to assume that these ICCs, which have evolved from support of physical data consolidation approaches and middleware, can not or should not also be leveraged in support of data virtualization. By embracing data virtualization, ICCs can compound the technology value of data virtualization with complementary people and process resources.

Mistake #10 - Failing to Identify and Communicate Benefits
While data virtualization can accelerate new development, perform quicker change iterations, and reduce both development and operating costs, it's a mistake to assume these benefits sell themselves, especially in tough business times when new technology investment is highly scrutinized.

Fortunately, these benefits can (and should) be measured and communicated.  Here are some ideas for accomplishing this:

  • Start by using the virtual versus physical integration decision tool described previously to identify several data virtualization candidates as a pilot.
  • During the design and development phase for these projects, track the time it takes using data virtualization and contrast it to the time it would have taken using traditional physical approaches.
  • Use this time savings to calculate two additional points of value: time to solution reduction and development cost savings.
  • To measure lifecycle value, estimate the operating costs of extra physical data stores that are saved because of virtualization.
  • Add these hardware operating costs to the estimated development lifecycle cost savings that occur from faster turns on break-fix and enhancement development activities.
  • Finally, package the results of these pilot projects along with an extrapolation across future projects, and communicate them to business and IT leadership.

Industry analysts agree that best practices' leaders draw from portfolios containing both physical and virtual data integration tools to meet the ever-changing information needs of today's enterprises. Multiple use cases across a broad spectrum of industries and government agencies illustrate the mission-critical benefits derived from data virtualization. These benefits include reduced time-to-solution, lower overall costs for both implementation and on-going maintenance, and greater agility to adapt to change. By becoming familiar with common mistakes to avoid, enterprises arm themselves with the wisdom necessary to successfully implement data virtualization in their data integration infrastructures, and thereby begin to reap the benefits.


  • Composite Software, in conjunction with data virtualization users and industry analysts, developed a simple decision-making tool for determining when to use a virtual, physical or hybrid approach to data integration. Free copies are available online.

More Stories By Robert Eve

Robert "Bob" Eve is vice president of marketing at Composite Software. Prior to joining Composite, he held executive-level marketing and business development roles at several other enterprise software companies. At Informatica and Mercury Interactive, he helped penetrate new segments in his role as the vice president of Market Development. Bob ran Marketing and Alliances at Kintana (acquired by Mercury Interactive in 2003) where he defined the IT Governance category. As vice president of Alliances at PeopleSoft, Bob was responsible for more than 300 partners and 100 staff members. Bob has an MS in management from MIT and a BS in business administration with honors from University of California, Berkeley. He is a frequent contributor to publications including SYS-CON's SOA World Magazine and Virtualization Journal.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...