Containers Expo Blog Authors: Elizabeth White, Pat Romanski, Yeshim Deniz, Flint Brenton, Gordon Haff

Related Topics: Containers Expo Blog

Containers Expo Blog: Article

Ten Mistakes to Avoid When Virtualizing Data

Meeting the ever-changing information needs of today's enterprises

Mistake #3 - Missing the Hybrid Opportunity
In many cases, the best data integration solution is a combination of virtual and physical approaches. There is no reason to be locked into one way or the other. Figure 2 illustrates hybrid use cases, followed by a description of some examples.

  • Physical Data Warehouse and/or Data Mart Schema Extension: This is a way to extend existing schemas, such as adding current operations data to historical repositories.
  • Physical Warehouses, Marts and/or Stores Federation: This is a way to federate multiple physical consolidated sources, such as two or more sales data marts after a merger.
  • Data Warehouse and/or Data Mart Prototyping: This is a way to prototype new warehouses or marts, to accelerate an early stage leading into a larger BI initiative.
  • Data Warehouse and/or Data Mart Source Data Access: This is a way to provide a warehouse or mart with virtual access to source data, such as XML or packaged applications that may not be easily supported by the current ETL tool, or to integrate readily available, already federated views.
  • Data Mart Elimination: This is a way to eliminate or replace physical marts with virtual ones, such as stopping rogue data mart proliferation by providing an easier, more cost-effective virtual option.

Mistake #4 - Assuming Perfect Data Is Prerequisite
Poor data quality is a pervasive problem in enterprises today. While correcting and perfecting source data is the ultimate goal, we may end up leaving our source data alone while settling for cleaning up the data in a warehouse or mart during the consolidation and transformation phases of physical data consolidation.

When data quality issues are simple format discrepancies that reflect implementation details in various systems, data virtualization solutions easily resolve these types of common data discrepancies with zero impact on performance. Some examples include a Part_id field in one source system that reads VARCHAR, while a similar field in another source has INTEGER. Or, Sales_Regions in one system does not match Field_Territories in another. If "heavy-lifting" cleanups are required, integrating with specialized data quality solutions at runtime often meets the business needs, while opening up the opportunity for data virtualization.

Mistake #5 - Anticipating Negative Impact on Operational Systems
Although operational systems are often one of the primary data sources used when virtualizing data, the runtime performance of these systems is not typically impacted as a result. Yet, designers have been schooled to think about data volumes in terms of the size of the physical store and the throughput of the nightly ETLs. When using a virtual approach, designers should instead consider the amount of data that end solutions will actually query on any individual query, and how often these queries will run. If the queries are relatively small (for example, 10,000 rows) and broad (across multiple systems and/or tables), or run relatively infrequently (several hundred times per day), then the impact on operation systems will be light.

System designers and architects anticipating negative impact on operational systems are typically underestimating the speed of the latest data virtualization solutions. Certainly, Moore's Law has accelerated hardware and networks. In addition, 64-bit JVMs, high-performance query optimization algorithms, push-down techniques, caching, clustering and more have advanced the software side of the solution as well.

Taking the time to calculate required data loads helps avoid misjudging the potential impact on the operational systems. One best practice for predicting actual performance impact is to test-drive several of the biggest queries using a data virtualization tool of choice.

Mistake #6 - Failing to Simplify the Problem
While the enterprise data environment is understandably complex, it is usually unnecessary to develop complex data virtualization solutions. The most successful data virtualization projects are broken into smaller components, each addressing pieces of the overall need. This simplification can occur in two ways: by leveraging tools and by right-sizing integration components.

Data virtualization tools help address three fundamental challenges of data integration:

  1. Data Location: Data resides in multiple locations and sources.
  2. Data Structure: Data isn't always in the required form.
  3. Data Completeness: Data frequently needs to be combined with other data to have meaning.

Data virtualization middleware simplifies the location challenge by making all data appear as if it is available from one place, rather than where it is actually stored.

Data abstraction simplifies data complexity by transforming data from its native structure and syntax into reusable views and Web services that are easy for business solutions' developers to understand and business solutions to consume.

Data federation combines data to form more meaningful business information, producing a single view of a customer or a get inventory balances composite service, as examples. Data can be federated from both consolidated stores such as the enterprise data warehouse as well as original sources such as transaction systems.

Successful right-sizing of data integration components requires smart decomposition of requirements. Virtualized views or services built using data virtualization work best when aimed at serving focused needs. These can then be leveraged across multiple use cases and/or combined to support more complex needs.

A recently published book by a team of experts from five technology vendors including Composite Software, An Implementor's Guide to Service Oriented Architecture - Getting It Right, identifies three levels of virtualized data services that allow designers and architects to design smaller, more manageable data integration components as follows:

  • Physical Services: Physical services lie just above the data source, and they transform the data into a form that is easily consumed by higher-level services.
  • Business Services: Business services embody the bulk of the transformation logic that converts data from its physical form into its required business form.
  • Application Services: Application services leverage business services to provide data optimally to the consuming applications.

In this way, solution developers can draw from these simpler, focused data services (relational views work similarly), significantly simplifying their development efforts today, and providing greater reuse and agility tomorrow.

Mistake #7 - Treating SQL/Relational and XML/Hierarchical as Separate Silos
Historically, data integration has focused on supporting business intelligence applications needs, whereas process integration focused on optimizing business processes. These two divergent approaches led to different architectures, tools, middleware, methods, teams and more. However, because today's data virtualization middleware is equally adept at relational and hierarchical data, it is a mistake to silo these key data forms.

This is especially important in cases where a mix of SQL and XML is required; for example, when combining XML data from an outside payroll processor with relational data from an internal sales force automation system to serve XML data within a single view of a sales rep performance portal.

Not only will a unified approach lead to better solutions regardless of data type, but developers and designers will gain experience outside their traditional core areas of expertise.

Mistake #8 - Implementing Data Virtualization Using the Wrong Infrastructure
The loose coupling of data services in a services-oriented architecture (SOA) environment is an excellent use for data virtualization. As a result, SOA is one of data virtualization's most frequent use cases. However, there is sometimes confusion about when to deploy enterprise service bus (ESB) middleware and when to use information servers to design and run the data services typically required.

ESBs are excellent for mediating various transactional and data services. However, they are not designed to support heavy-duty data functions such as high-performance queries, complex federations, XML/SQL transformations, and so forth as required in many of today's enterprise application use cases. On the other hand, data virtualization tools provide an easy-to-use, high-productivity data service development environment and a high-performance, high-reliability runtime information server to meet both design and runtime needs. ESBs can then mediate these services as needed.

Mistake #9 - Segregating Data Virtualization People and Processes
As physical data consolidation technology and approaches have matured, supporting organizations in the form of Integration Competency Centers (ICC) along with best practice methods and processes have grown in support. These centers improve developer productivity, optimize tool usage, reduce project risk, and more. In fact, 10 specific benefits are identified in a book written by two experts at Informatica, Integration Competency Center: An Implementation Methodology.

It would be a mistake to assume that these ICCs, which have evolved from support of physical data consolidation approaches and middleware, can not or should not also be leveraged in support of data virtualization. By embracing data virtualization, ICCs can compound the technology value of data virtualization with complementary people and process resources.

Mistake #10 - Failing to Identify and Communicate Benefits
While data virtualization can accelerate new development, perform quicker change iterations, and reduce both development and operating costs, it's a mistake to assume these benefits sell themselves, especially in tough business times when new technology investment is highly scrutinized.

Fortunately, these benefits can (and should) be measured and communicated.  Here are some ideas for accomplishing this:

  • Start by using the virtual versus physical integration decision tool described previously to identify several data virtualization candidates as a pilot.
  • During the design and development phase for these projects, track the time it takes using data virtualization and contrast it to the time it would have taken using traditional physical approaches.
  • Use this time savings to calculate two additional points of value: time to solution reduction and development cost savings.
  • To measure lifecycle value, estimate the operating costs of extra physical data stores that are saved because of virtualization.
  • Add these hardware operating costs to the estimated development lifecycle cost savings that occur from faster turns on break-fix and enhancement development activities.
  • Finally, package the results of these pilot projects along with an extrapolation across future projects, and communicate them to business and IT leadership.

Industry analysts agree that best practices' leaders draw from portfolios containing both physical and virtual data integration tools to meet the ever-changing information needs of today's enterprises. Multiple use cases across a broad spectrum of industries and government agencies illustrate the mission-critical benefits derived from data virtualization. These benefits include reduced time-to-solution, lower overall costs for both implementation and on-going maintenance, and greater agility to adapt to change. By becoming familiar with common mistakes to avoid, enterprises arm themselves with the wisdom necessary to successfully implement data virtualization in their data integration infrastructures, and thereby begin to reap the benefits.


  • Composite Software, in conjunction with data virtualization users and industry analysts, developed a simple decision-making tool for determining when to use a virtual, physical or hybrid approach to data integration. Free copies are available online.

More Stories By Robert Eve

Robert "Bob" Eve is vice president of marketing at Composite Software. Prior to joining Composite, he held executive-level marketing and business development roles at several other enterprise software companies. At Informatica and Mercury Interactive, he helped penetrate new segments in his role as the vice president of Market Development. Bob ran Marketing and Alliances at Kintana (acquired by Mercury Interactive in 2003) where he defined the IT Governance category. As vice president of Alliances at PeopleSoft, Bob was responsible for more than 300 partners and 100 staff members. Bob has an MS in management from MIT and a BS in business administration with honors from University of California, Berkeley. He is a frequent contributor to publications including SYS-CON's SOA World Magazine and Virtualization Journal.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@ThingsExpo Stories
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect at GE, and Ibrahim Gokcen, who leads GE's advanced IoT analytics, focused on the Internet of Things / Industrial Internet and how to make it operational for business end-users. Learn about the challenges posed by machine and sensor data and how to marry it with enterprise data. They also discussed the tips and tricks to provide the Industrial Internet as an end-user consumable service using Big Data Analytics and Industrial C...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term.
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
With privacy often voiced as the primary concern when using cloud based services, SyncriBox was designed to ensure that the software remains completely under the customer's control. Having both the source and destination files remain under the user?s control, there are no privacy or security issues. Since files are synchronized using Syncrify Server, no third party ever sees these files.
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Internet-of-Things discussions can end up either going down the consumer gadget rabbit hole or focused on the sort of data logging that industrial manufacturers have been doing forever. However, in fact, companies today are already using IoT data both to optimize their operational technology and to improve the experience of customer interactions in novel ways. In his session at @ThingsExpo, Gordon Haff, Red Hat Technology Evangelist, shared examples from a wide range of industries – including en...
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Rodrigo Coutinho is part of OutSystems' founders' team and currently the Head of Product Design. He provides a cross-functional role where he supports Product Management in defining the positioning and direction of the Agile Platform, while at the same time promoting model-based development and new techniques to deliver applications in the cloud.
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
delaPlex is a global technology and software development solutions and consulting provider, deeply committed to helping companies drive growth, revenue and marketplace value. Since 2008, delaPlex's objective has been to be a trusted advisor to its clients. By redefining the outsourcing industry's business model, the innovative delaPlex Agile Business Framework brings an unmatched alliance of industry experts, across industries and functional skillsets, to clients anywhere around the world.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...