Containers Expo Blog Authors: Liz McMillan, Yeshim Deniz, Pat Romanski, Elizabeth White, Ravi Rajamiyer

Related Topics: Containers Expo Blog

Containers Expo Blog: Article

Ten Mistakes to Avoid When Virtualizing Data

Meeting the ever-changing information needs of today's enterprises

Mistake #3 - Missing the Hybrid Opportunity
In many cases, the best data integration solution is a combination of virtual and physical approaches. There is no reason to be locked into one way or the other. Figure 2 illustrates hybrid use cases, followed by a description of some examples.

  • Physical Data Warehouse and/or Data Mart Schema Extension: This is a way to extend existing schemas, such as adding current operations data to historical repositories.
  • Physical Warehouses, Marts and/or Stores Federation: This is a way to federate multiple physical consolidated sources, such as two or more sales data marts after a merger.
  • Data Warehouse and/or Data Mart Prototyping: This is a way to prototype new warehouses or marts, to accelerate an early stage leading into a larger BI initiative.
  • Data Warehouse and/or Data Mart Source Data Access: This is a way to provide a warehouse or mart with virtual access to source data, such as XML or packaged applications that may not be easily supported by the current ETL tool, or to integrate readily available, already federated views.
  • Data Mart Elimination: This is a way to eliminate or replace physical marts with virtual ones, such as stopping rogue data mart proliferation by providing an easier, more cost-effective virtual option.

Mistake #4 - Assuming Perfect Data Is Prerequisite
Poor data quality is a pervasive problem in enterprises today. While correcting and perfecting source data is the ultimate goal, we may end up leaving our source data alone while settling for cleaning up the data in a warehouse or mart during the consolidation and transformation phases of physical data consolidation.

When data quality issues are simple format discrepancies that reflect implementation details in various systems, data virtualization solutions easily resolve these types of common data discrepancies with zero impact on performance. Some examples include a Part_id field in one source system that reads VARCHAR, while a similar field in another source has INTEGER. Or, Sales_Regions in one system does not match Field_Territories in another. If "heavy-lifting" cleanups are required, integrating with specialized data quality solutions at runtime often meets the business needs, while opening up the opportunity for data virtualization.

Mistake #5 - Anticipating Negative Impact on Operational Systems
Although operational systems are often one of the primary data sources used when virtualizing data, the runtime performance of these systems is not typically impacted as a result. Yet, designers have been schooled to think about data volumes in terms of the size of the physical store and the throughput of the nightly ETLs. When using a virtual approach, designers should instead consider the amount of data that end solutions will actually query on any individual query, and how often these queries will run. If the queries are relatively small (for example, 10,000 rows) and broad (across multiple systems and/or tables), or run relatively infrequently (several hundred times per day), then the impact on operation systems will be light.

System designers and architects anticipating negative impact on operational systems are typically underestimating the speed of the latest data virtualization solutions. Certainly, Moore's Law has accelerated hardware and networks. In addition, 64-bit JVMs, high-performance query optimization algorithms, push-down techniques, caching, clustering and more have advanced the software side of the solution as well.

Taking the time to calculate required data loads helps avoid misjudging the potential impact on the operational systems. One best practice for predicting actual performance impact is to test-drive several of the biggest queries using a data virtualization tool of choice.

Mistake #6 - Failing to Simplify the Problem
While the enterprise data environment is understandably complex, it is usually unnecessary to develop complex data virtualization solutions. The most successful data virtualization projects are broken into smaller components, each addressing pieces of the overall need. This simplification can occur in two ways: by leveraging tools and by right-sizing integration components.

Data virtualization tools help address three fundamental challenges of data integration:

  1. Data Location: Data resides in multiple locations and sources.
  2. Data Structure: Data isn't always in the required form.
  3. Data Completeness: Data frequently needs to be combined with other data to have meaning.

Data virtualization middleware simplifies the location challenge by making all data appear as if it is available from one place, rather than where it is actually stored.

Data abstraction simplifies data complexity by transforming data from its native structure and syntax into reusable views and Web services that are easy for business solutions' developers to understand and business solutions to consume.

Data federation combines data to form more meaningful business information, producing a single view of a customer or a get inventory balances composite service, as examples. Data can be federated from both consolidated stores such as the enterprise data warehouse as well as original sources such as transaction systems.

Successful right-sizing of data integration components requires smart decomposition of requirements. Virtualized views or services built using data virtualization work best when aimed at serving focused needs. These can then be leveraged across multiple use cases and/or combined to support more complex needs.

A recently published book by a team of experts from five technology vendors including Composite Software, An Implementor's Guide to Service Oriented Architecture - Getting It Right, identifies three levels of virtualized data services that allow designers and architects to design smaller, more manageable data integration components as follows:

  • Physical Services: Physical services lie just above the data source, and they transform the data into a form that is easily consumed by higher-level services.
  • Business Services: Business services embody the bulk of the transformation logic that converts data from its physical form into its required business form.
  • Application Services: Application services leverage business services to provide data optimally to the consuming applications.

In this way, solution developers can draw from these simpler, focused data services (relational views work similarly), significantly simplifying their development efforts today, and providing greater reuse and agility tomorrow.

Mistake #7 - Treating SQL/Relational and XML/Hierarchical as Separate Silos
Historically, data integration has focused on supporting business intelligence applications needs, whereas process integration focused on optimizing business processes. These two divergent approaches led to different architectures, tools, middleware, methods, teams and more. However, because today's data virtualization middleware is equally adept at relational and hierarchical data, it is a mistake to silo these key data forms.

This is especially important in cases where a mix of SQL and XML is required; for example, when combining XML data from an outside payroll processor with relational data from an internal sales force automation system to serve XML data within a single view of a sales rep performance portal.

Not only will a unified approach lead to better solutions regardless of data type, but developers and designers will gain experience outside their traditional core areas of expertise.

Mistake #8 - Implementing Data Virtualization Using the Wrong Infrastructure
The loose coupling of data services in a services-oriented architecture (SOA) environment is an excellent use for data virtualization. As a result, SOA is one of data virtualization's most frequent use cases. However, there is sometimes confusion about when to deploy enterprise service bus (ESB) middleware and when to use information servers to design and run the data services typically required.

ESBs are excellent for mediating various transactional and data services. However, they are not designed to support heavy-duty data functions such as high-performance queries, complex federations, XML/SQL transformations, and so forth as required in many of today's enterprise application use cases. On the other hand, data virtualization tools provide an easy-to-use, high-productivity data service development environment and a high-performance, high-reliability runtime information server to meet both design and runtime needs. ESBs can then mediate these services as needed.

Mistake #9 - Segregating Data Virtualization People and Processes
As physical data consolidation technology and approaches have matured, supporting organizations in the form of Integration Competency Centers (ICC) along with best practice methods and processes have grown in support. These centers improve developer productivity, optimize tool usage, reduce project risk, and more. In fact, 10 specific benefits are identified in a book written by two experts at Informatica, Integration Competency Center: An Implementation Methodology.

It would be a mistake to assume that these ICCs, which have evolved from support of physical data consolidation approaches and middleware, can not or should not also be leveraged in support of data virtualization. By embracing data virtualization, ICCs can compound the technology value of data virtualization with complementary people and process resources.

Mistake #10 - Failing to Identify and Communicate Benefits
While data virtualization can accelerate new development, perform quicker change iterations, and reduce both development and operating costs, it's a mistake to assume these benefits sell themselves, especially in tough business times when new technology investment is highly scrutinized.

Fortunately, these benefits can (and should) be measured and communicated.  Here are some ideas for accomplishing this:

  • Start by using the virtual versus physical integration decision tool described previously to identify several data virtualization candidates as a pilot.
  • During the design and development phase for these projects, track the time it takes using data virtualization and contrast it to the time it would have taken using traditional physical approaches.
  • Use this time savings to calculate two additional points of value: time to solution reduction and development cost savings.
  • To measure lifecycle value, estimate the operating costs of extra physical data stores that are saved because of virtualization.
  • Add these hardware operating costs to the estimated development lifecycle cost savings that occur from faster turns on break-fix and enhancement development activities.
  • Finally, package the results of these pilot projects along with an extrapolation across future projects, and communicate them to business and IT leadership.

Industry analysts agree that best practices' leaders draw from portfolios containing both physical and virtual data integration tools to meet the ever-changing information needs of today's enterprises. Multiple use cases across a broad spectrum of industries and government agencies illustrate the mission-critical benefits derived from data virtualization. These benefits include reduced time-to-solution, lower overall costs for both implementation and on-going maintenance, and greater agility to adapt to change. By becoming familiar with common mistakes to avoid, enterprises arm themselves with the wisdom necessary to successfully implement data virtualization in their data integration infrastructures, and thereby begin to reap the benefits.


  • Composite Software, in conjunction with data virtualization users and industry analysts, developed a simple decision-making tool for determining when to use a virtual, physical or hybrid approach to data integration. Free copies are available online.

More Stories By Robert Eve

Robert "Bob" Eve is vice president of marketing at Composite Software. Prior to joining Composite, he held executive-level marketing and business development roles at several other enterprise software companies. At Informatica and Mercury Interactive, he helped penetrate new segments in his role as the vice president of Market Development. Bob ran Marketing and Alliances at Kintana (acquired by Mercury Interactive in 2003) where he defined the IT Governance category. As vice president of Alliances at PeopleSoft, Bob was responsible for more than 300 partners and 100 staff members. Bob has an MS in management from MIT and a BS in business administration with honors from University of California, Berkeley. He is a frequent contributor to publications including SYS-CON's SOA World Magazine and Virtualization Journal.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
Disruption, Innovation, Artificial Intelligence and Machine Learning, Leadership and Management hear these words all day every day... lofty goals but how do we make it real? Add to that, that simply put, people don't like change. But what if we could implement and utilize these enterprise tools in a fast and "Non-Disruptive" way, enabling us to glean insights about our business, identify and reduce exposure, risk and liability, and secure business continuity?
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time t...
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
SYS-CON Events announced today that Silicon India has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Published in Silicon Valley, Silicon India magazine is the premiere platform for CIOs to discuss their innovative enterprise solutions and allows IT vendors to learn about new solutions that can help grow their business.