Click here to close now.

Welcome!

Containers Expo Blog Authors: David Sprott, Pat Romanski, Rex Morrow, Datical, John Wetherill, Liz McMillan

Related Topics: Containers Expo Blog, Microservices Expo

Containers Expo Blog: Article

Five Ways Data Virtualization Improves Data Warehousing

Data virtualization fills the EDW agility gap

An array of business intelligence (BI), predictive analytics, data and content mining, portals and more tap a growing volume of information sourced from enterprise data warehouses (EDW).  However, significant volumes of business-critical enterprise data resides outside the enterprise data warehouse.  To deliver the most comprehensive information to business decision-makers, IT teams are implementing data virtualization to preserve and extend their existing enterprise data warehouse investments.

This article discusses five integration patterns that combine both enterprise data warehouses and data virtualization to solve real business and IT problems along with examples from Composite Software's data virtualization customers.  The five patterns include:

  1. Data Warehouse Augmentation
  2. Data Warehouse Federation
  3. Data Warehouse Hub and Virtual Data Mart Spoke
  4. Complementing the ETL Process
  5. Data Warehouse Prototyping

Maximizing Value from Enterprise Data Warehouse Investments
Supporting critical, yet ever-changing information requirements in an environment of ever-increasing data volumes and complexity is a challenge well understood by large enterprises and government agencies today.

This inexorable pressure has and will continue to drive the demand for enterprise data warehouses as an array of BI, predictive analytics, data and content mining, portals and other key applications rely on data sourced from enterprise data warehouses.

However, business change often outpaces enterprise data warehouse evolution.  And while useful for physically consolidating and transforming a large portion of enterprise data, significant volumes of enterprise data resides outside the confines of the enterprise data warehouse.  Further, enterprise data warehouses themselves require support throughout their lifecycles, driving demand for solutions that prototype, migrate, extend, federate and leverage enterprise data warehouse assets.

Data virtualization middleware, an advanced version of earlier data federation or enterprise information integration (EII) middleware, complements enterprise data warehouses by providing a range of flexible data integration techniques that preserve, extend and thereby drive greater business value from existing enterprise data warehouse investments.

1. Data Warehouse Augmentation
Organizations overwhelmed by scattered data silos and exponentially growing data volumes have deployed data warehouses to meet many of their reporting requirements.  However, a number of data sources remain outside the warehouse.  Providing users with complete business insight in support of revenue, cost and risk management goals often requires the following:

  • Historical data from the warehouse and up-to-the-minute data from transaction systems or operational data stores;
  • Summarized data from the warehouse and drill-down detail from transaction systems or operational data stores;
  • Master customer, product or employee data from an MDM hub or warehouse and detail from transaction systems or operational data stores; and
  • Internal data from the warehouse and external data from outside sources including cloud computing.

Data virtualization effectively federates data-warehouse information with additional sources, therefore extending existing data warehouse schemas and data.  These complementary views are conducive to adding current data to historical warehouse data, detailed data to summarized warehouse data, and external data to internal warehouse data.

Energy Company Combines Up-to-the-minute and Historical Data - To optimize deployment of repair crews and equipment across more than 10,000 production oil wells, an energy company uses data virtualization to federate real-time crew, equipment and well status data from their wells and SAP's maintenance management system with historical surface, subsurface and business data from their enterprise data warehouse.  The net result is faster repairs for more uptime and thus more revenue.

2. Data Warehouse Federation
A primary reason enterprises implement data warehouses is to overcome the various transaction and analytic system silos typical in most large enterprise and government agencies today.  However, for a number of often pragmatic reasons, the single "enterprise" data warehouse remains elusive.  Instead, for these same reasons, multiple data warehouses and data marts have been developed and deployed, in effect perpetuating, rather than overcoming, the data silo issue.

Optimizing business performance requires data from across these various warehouses and marts.   But physically combining multiple marts and warehouses into a singular and complete enterprise-wide data warehouse is often too costly and time consuming.

Data virtualization federates multiple physical warehouses.  Two examples include combining data from the sales and financial warehouses, or combining two sales data warehouses after a corporate merger. This approach achieves logical consolidation of warehouses by creating an integrated view across them, using abstraction to rationalize the different schema designs.

Investment Bank Federate Financial Trading Data Warehouses - To enable more flexible customer self-service reporting and meet SEC compliance reporting mandates, a prime brokerage uses data virtualization to federate equity, fixed income and other investment positions and trades information from siloed trading data warehouses.  The net result is higher customer satisfaction and lower reporting costs.

3. Data Warehouse Hub and Virtual Spoke
A typical data warehouse pattern is a central data warehouse hub with satellite data marts as spokes around the hub.  These marts use a subset of the warehouse data and are used by a subset of the data warehouse users.   Sometimes these marts are created because the analytic tools require data in a different form than the warehouse.  On the other hand, they may be created to work around the controls provided by the warehouse, and thus act as "rogue" data marts.  Regardless of the reason, every additional mart adds cost and compromises data quality.

Data virtualization provides virtual data marts that eliminate, or at least significantly reduce, the need for physical data marts around the data warehouse hubs.  This approach abstracts the warehouse data to meet specific consuming tool and user query requirements, while still preserving the quality and controls inherent in the data warehouse.

Mutual Fund Manager Eliminates "Rogue" Financial Data Marts - A mutual fund company uses data virtualization to enable more than 150 financial analysts to build portfolio analysis models with MATLAB® and other analysis tools leveraging a wide range of equity financial data from a 10 terabyte financial research data warehouse.  Prior to introducing data virtualization, analysts frequently spawned new satellite data marts with useful data subsets for every new project.  To accelerate and simplify data access and to stop the proliferation of costly, unnecessary physical marts, the firm instead used data virtualization to create virtual data marts formed from a set of robust, reusable views that directly accessed the financial warehouse on demand.  This enables analysts to spend more time on analysis and less on access, thereby improving portfolio returns.  The IT team has also eliminated extra, unneeded marts and all the costs that go with maintaining them.

4. Complementing the ETL Process
Extract, Transform, and Load (ETL) middleware is the tool of choice for loading data warehouses.  However, there are some cases where ETL tools are not the most effective approach.  Some examples include:

  • ETL tools lack interfaces to easily access source data, for example data from packaged applications such as SAP or new technologies such as web services;
  • Readily available, existing virtual views or data services can be reused rather than building new ETL scripts from scratch; and
  • Tight batch windows require access, abstraction and federation activities to be pre-processed and virtually staged in advance of ETL processes.

ETL tools can leverage data virtualization views and data services as inputs to their batch processes, appearing as another data source. This integration pattern also integrates data source types that ETL tools cannot easily access as well as reuse existing views and services, saving time and costs.  Further these abstractions do not require ETL developers to understand the structure of, or interact directly with, actual data sources, significantly simplifying their work and reducing time to solution.

Energy Company Preprocesses SAP Data - To provide the SAP financial data required for their financial data warehouse, an energy company uses data virtualization to access and abstract SAP R/3 FICO data.  This replaces an error-prone, SAP data-expert-intensive, flat-file-extraction process that would not scale across a complex SAP landscape.  The results include more complete and timely data in the financial data warehouse enabling better performance management.

5. Data Warehouse Prototyping
Building a new data warehouse from scratch is a large undertaking that requires significant design, development and deployment efforts.  One of the biggest issues is schema change, a frequent activity early in a warehouse's lifecycle.   This change process requires modification of both the ETL scripts and physical data in the warehouse and thus becomes a bottleneck that slows new warehouse deployments.  This problem does not go away later in the lifecycle; it just lessens as the pace of change slows.

Data virtualization middleware can be the platform for prototype development environment for a new data warehouse.  In this prototype stage, a virtual data warehouse is built, rather than a physical one, saving the time to build the physical warehouse.  This virtual warehouse includes a full schema that is easy to iterate as well as a complete functional testing environment.  Performance testing is somewhat constrained at this stage, however.

Once the actual warehouse is deployed, the views and data services built during the prototype stage still have value.  These are useful for prototyping and testing subsequent warehouse schema changes that arise as business needs or underlying data sources change.

Government Agency Prototypes New Data Warehouses - To reduce data warehousing time-to-solution for new data warehouse projects and changes to existing ones, a government agency uses data virtualization.  The time spent in getting the data right has proven to be four times faster than directly building the ETL and warehouse, even when the subsequent translation of these working views into ETL scripts and physical warehouse schemas is factored in.

Key Takeaways
As data sources proliferate, including many web-based and cloud computing sources outside the traditional enterprise data warehouse, enterprises and government agencies are deploying solutions that combine enterprise data warehouses and data virtualization to deliver the most comprehensive information to decision-makers.  The results are extended life to existing information system investments, greater agility for adding new BI and other analytic technologies, and less disruption from corporate activities such as mergers and acquisitions.

More Stories By Robert Eve

Robert Eve is the EVP of Marketing at Composite Software, the data virtualization gold standard and co-author of Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility. Bob's experience includes executive level roles at leading enterprise software companies such as Mercury Interactive, PeopleSoft, and Oracle. Bob holds a Masters of Science from the Massachusetts Institute of Technology and a Bachelor of Science from the University of California at Berkeley.

@ThingsExpo Stories
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, discussed how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money!
SYS-CON Events announced today that MetraTech, now part of Ericsson, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Ericsson is the driving force behind the Networked Society- a world leader in communications infrastructure, software and services. Some 40% of the world’s mobile traffic runs through networks Ericsson has supplied, serving more than 2.5 billion subscribers.
The Internet of Things is not only adding billions of sensors and billions of terabytes to the Internet. It is also forcing a fundamental change in the way we envision Information Technology. For the first time, more data is being created by devices at the edge of the Internet rather than from centralized systems. What does this mean for today's IT professional? In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will addresses this very serious issue of profound change in the industry.
SYS-CON Events announced today that BMC will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. BMC delivers software solutions that help IT transform digital enterprises for the ultimate competitive business advantage. BMC has worked with thousands of leading companies to create and deliver powerful IT management services. From mainframe to cloud to mobile, BMC pairs high-speed digital innovation with robust IT industrialization – allowing customers to provide amazing user experiences with optimized IT per...
The Internet of Things is not new. Historically, smart businesses have used its basic concept of leveraging data to drive better decision making and have capitalized on those insights to realize additional revenue opportunities. So, what has changed to make the Internet of Things one of the hottest topics in tech? In his session at @ThingsExpo, Chris Gray, Director, Embedded and Internet of Things, discussed the underlying factors that are driving the economics of intelligent systems. Discover how hardware commoditization, the ubiquitous nature of connectivity, and the emergence of Big Data a...
The world is at a tipping point where the technology, the device and global adoption are converging to such a point that we will see an explosion of a world where smartphone devices not only allow us to talk to each other, but allow for communication between everything – serving as a central hub from which we control our world – MediaTek is at the heart of both driving this and allowing the markets to drive this reality forward themselves. The next wave of consumer gadgets is here – smart, connected, and small. If your ambitions are big, so are ours. In his session at @ThingsExpo, Jack Hu, D...
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
SYS-CON Events announced today that DragonGlass, an enterprise search platform, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. After eleven years of designing and building custom applications, OpenCrowd has launched DragonGlass, a cloud-based platform that enables the development of search-based applications. These are a new breed of applications that utilize a search index as their backbone for data retrieval. They can easily adapt to new data sets and provide access to both structured and unstruc...
We’re entering a new era of computing technology that many are calling the Internet of Things (IoT). Machine to machine, machine to infrastructure, machine to environment, the Internet of Everything, the Internet of Intelligent Things, intelligent systems – call it what you want, but it’s happening, and its potential is huge. IoT is comprised of smart machines interacting and communicating with other machines, objects, environments and infrastructures. As a result, huge volumes of data are being generated, and that data is being processed into useful actions that can “command and control” thi...
As the Internet of Things unfolds, mobile and wearable devices are blurring the line between physical and digital, integrating ever more closely with our interests, our routines, our daily lives. Contextual computing and smart, sensor-equipped spaces bring the potential to walk through a world that recognizes us and responds accordingly. We become continuous transmitters and receivers of data. In his session at @ThingsExpo, Andrew Bolwell, Director of Innovation for HP's Printing and Personal Systems Group, discussed how key attributes of mobile technology – touch input, sensors, social, and ...
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo, June 9-11, 2015, at the Javits Center in New York City. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be
WebRTC defines no default signaling protocol, causing fragmentation between WebRTC silos. SIP and XMPP provide possibilities, but come with considerable complexity and are not designed for use in a web environment. In his session at @ThingsExpo, Matthew Hodgson, technical co-founder of the Matrix.org, discussed how Matrix is a new non-profit Open Source Project that defines both a new HTTP-based standard for VoIP & IM signaling and provides reference implementations.
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? In this Power Panel at DevOps Summit, moderated by Jason Bloomberg, the leading expert on architecting agility for the enterprise and president of Intellyx, panelists will peel away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud environment, and we must architect and code accordingly. At the very least, you'll have no problem fil...
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect at GE, and Ibrahim Gokcen, who leads GE's advanced IoT analytics, focused on the Internet of Things / Industrial Internet and how to make it operational for business end-users. Learn about the challenges posed by machine and sensor data and how to marry it with enterprise data. They also discussed the tips and tricks to provide the Industrial Internet as an end-user consumable service using Big Data Analytics and Industrial Cloud.
Building low-cost wearable devices can enhance the quality of our lives. In his session at Internet of @ThingsExpo, Sai Yamanoor, Embedded Software Engineer at Altschool, provided an example of putting together a small keychain within a $50 budget that educates the user about the air quality in their surroundings. He also provided examples such as building a wearable device that provides transit or recreational information. He then reviewed the resources available to build wearable devices at home including open source hardware, the raw materials required and the options available to power s...
How do APIs and IoT relate? The answer is not as simple as merely adding an API on top of a dumb device, but rather about understanding the architectural patterns for implementing an IoT fabric. There are typically two or three trends: Exposing the device to a management framework Exposing that management framework to a business centric logic Exposing that business layer and data to end users. This last trend is the IoT stack, which involves a new shift in the separation of what stuff happens, where data lives and where the interface lies. For instance, it's a mix of architectural styles ...
We certainly live in interesting technological times. And no more interesting than the current competing IoT standards for connectivity. Various standards bodies, approaches, and ecosystems are vying for mindshare and positioning for a competitive edge. It is clear that when the dust settles, we will have new protocols, evolved protocols, that will change the way we interact with devices and infrastructure. We will also have evolved web protocols, like HTTP/2, that will be changing the very core of our infrastructures. At the same time, we have old approaches made new again like micro-services...
Connected devices and the Internet of Things are getting significant momentum in 2014. In his session at Internet of @ThingsExpo, Jim Hunter, Chief Scientist & Technology Evangelist at Greenwave Systems, examined three key elements that together will drive mass adoption of the IoT before the end of 2015. The first element is the recent advent of robust open source protocols (like AllJoyn and WebRTC) that facilitate M2M communication. The second is broad availability of flexible, cost-effective storage designed to handle the massive surge in back-end data in a world where timely analytics is e...