Welcome!

Virtualization Authors: Roger Strukhoff, Trevor Parsons, Carmen Gonzalez, JP Morgenthal, Paul Speciale

Related Topics: Virtualization

Virtualization: Article

NoSQL and Data Virtualization - Soon to Be Best Friends

Big Data is increasing the popularity of NoSQL; the challenge - data integration

According to a recent McKinsey Big Data report , "The amount of data in our world has been exploding. Companies capture trillions of bytes of information about their customers, suppliers, and operations, and millions of networked sensors are being embedding in the physical world in devices such as mobile phones and automobiles, sensing, creating and communicating data."

NoSQL is increasingly being adopted with the expansion of “big data” use cases. The challenge to enterprises is to integrate disparate NoSQL systems, each with its unique and non-standard API, with their traditional enterprise data sources.

Today’s evolved data virtualization platforms provide access to information from almost any type of data store and therefore are inherently well-suited to integrate NoSQL data stores into the enterprise.

NoSQL and Data Virtualization Defined
NoSQL data stores manage data that is not strictly tabular and relational. Beyond being non-relational, NoSQL data stores are typically distributed, open-source, and horizontally scalable, although there are exceptions for specific NoSQL data stores.

Data virtualization, has over the past eight years expanded its adoption among enterprises and government agencies due to its ability to evolve rapidly and incorporate the latest IT innovations. Originally limited to relational sources and business intelligence (BI) consumers, data virtualization today supports a wide range of sources including multi-dimensional stores, Web and data services, XML documents, analytic appliances, on- and off-premises applications and more. NoSQL data stores are the newest source type supported by data virtualization.

The NoSQL Data Stores Landscape
Although the original emergence of NoSQL data stores was motivated by Web-scale data, the movement has grown to encompass a wide variety of data stores that do not use SQL as their primary processing language. NoSQL data stores can be categorized as:

-- Tabular/Columnar Data Stores: Storing sparse tabular data, these stores look most like traditional tabular databases. Examples include Hadoop/HBase (Yahoo!), BigTable (Google), Hypertable, and VoltDB.

-- Document Stores: These sources store unstructured (e.g., text) or semi-structured (e.g., XML) documents. Examples include MongoDB, Mark Logic, and CouchDB.

-- Graph Databases: These NoSQL sources store graph-oriented data with nodes, edges and properties, and are commonly used to store associations in social networks. Examples include Neo4J, AllegroGraph, and FlockDB.

-- Key/Value Stores: These sources store simple key/value pairs like a traditional hash table. They are further subdivided into in-memory and disk-based solutions. This category of NoSQL systems probably has the largest number of members, each embodying slightly different characteristics. Examples include Memcached, Cassandra (Facebook), SimpleDB, Dynamo (Amazon), Voldemort (LinkedIn), and Kyoto Cabinet.

-- Object and Multi-value Databases: These types of stores preceded the NoSQL movement but they have found new life as part of the movement. Object databases store objects (as in object-oriented programming). Multi-value databases store tabular data, but individual cells can store multiple values. Examples include Objectivity, GemStone, and Unidata.

-- Miscellaneous Sources: Several other data stores can be classified as NoSQL stores, although they don’t fit into any of the categories above. Examples include GT.M, IBM Lotus/Domino, and the ISIS family.

Virtualizing NoSQL Data Store Sources
Data virtualization platforms provide a complete toolset for accessing, federating, abstracting, and delivering information from diverse sources. Access is typically done via standards-based protocols and APIs; for example, JDBC and ODBC for SQL-based sources, HTTP and SOAP for Web services, JMS for messages, and APIs for enterprise and cloud-based applications. Through these methods, source data is securely exposed from a single virtual location, regardless of how and where it is physically stored.

Although NoSQL access standards have yet to develop fully, each implementation provides a Java-based development API appropriate for accessing that type of NoSQL data. Data virtualization platforms typically use these APIs to access and integrate data. Three kinds of NoSQL systems are particularly suited for the data virtualization platform: tabular/columnar data, XML documents, and key-value stores.

How to Integrate Tabular/Columnar Data Stores
Because data virtualization platforms were originally designed for tabular data, retrieving and processing data from this category is a natural fit. The preferred data retrieval paradigm for tabular/columnar data stores leverages “table functions” in the FROM clause of a SQL statement. That is, a procedure resource that returns a cursor can be dropped into the data virtualization development environment as a table, where it will show up in the FROM clause of the SQL statement.

Tabular/columnar NoSQL data sources typically store very large data sets. Table function implementations should ensure sufficient data reduction within the source by leveraging input parameters. Also, the processing of large data sets can take a long time, so some form of caching may be prudent to retain the results for reuse.

This approach provides full access to the data in the underlying NoSQL source, and it will likely be sufficient for most near-term needs. However, more generic filtering and aggregation might be possible with the underlying NoSQL source, and purpose-built table functions provide only a limited interface to the data virtualization platform. If a particular NoSQL tabular data store becomes quite popular, expect data virtualization platform providers to develop a custom adapter that more fully integrates and leverages that data source’s specific capabilities.

How to Integrate XML Document Stores
Because XML document store sources leverage XQuery as their preferred data retrieval paradigm, data virtualization platforms with embedded XQuery engines (and XML as a native data type) can easily retrieve and further process documents from this category of NoSQL data store.

For a specific NoSQL XML document store, a minimum of two custom procedures can be implemented that leverage the NoSQL system’s Java API. Both procedures would return an XML document that can be further manipulated by any of the upstream XML manipulation functionality (e.g., XSLT transformations). The first procedure takes a document handle (unique identifier) as its input argument and leverages the API to retrieve and return that document. The second procedure takes an XQuery specification as its input argument and leverages the API to execute the query and return the results as a single document. Of course, additional procedures accepting more specific parameters could also be implemented, making integration into multiple views easier.

How to Integrate Key/Value Stores
Data virtualization platforms can integrate key/value stores in two ways. The first is as a simple custom SQL function. This function can be created so that it takes the key as a parameter and returns the value. This common function can then be used in SQL statements throughout the data virtualization platform.

The second leverages an in-memory key/value store as a cache target. This approach is best for small data sets or procedure results; it doesn’t work very well for large tabular data sets. Further, this form of cache integration is often challenged by the impedance mismatch between cached tabular data and cached key/value data (the cached data is opaque inside the key/value store), so the entire set must be retrieved for processing.

Key Takeaways
Web analytics, predictive analytics, voice-of-the-customer, churn, fraud, sensor-tracking and other “big data” use cases are accelerating demand  for NoSQL data stores as well as for the integration of NoSQL data with enterprise data.  Data virtualization, a more modern and versatile approach to data integration, is proving a successful solution to this fast growing problem.

More Stories By Robert Eve

Robert Eve is the EVP of Marketing at Composite Software, the data virtualization gold standard and co-author of Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility. Bob's experience includes executive level roles at leading enterprise software companies such as Mercury Interactive, PeopleSoft, and Oracle. Bob holds a Masters of Science from the Massachusetts Institute of Technology and a Bachelor of Science from the University of California at Berkeley.

@ThingsExpo Stories
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at Internet of @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, will discuss how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money! Speaker Bio: Esmeralda Swartz, CMO of MetraTech, has spent 16 years as a marketing, product management, and busin...
Samsung VP Jacopo Lenzi, who headed the company's recent SmartThings acquisition under the auspices of Samsung's Open Innovaction Center (OIC), answered a few questions we had about the deal. This interview was in conjunction with our interview with SmartThings CEO Alex Hawkinson. IoT Journal: SmartThings was developed in an open, standards-agnostic platform, and will now be part of Samsung's Open Innovation Center. Can you elaborate on your commitment to keep the platform open? Jacopo Lenzi: Samsung recognizes that true, accelerated innovation cannot be driven from one source, but requires a...
SYS-CON Events announced today that Red Hat, the world's leading provider of open source solutions, will exhibit at Internet of @ThingsExpo, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Red Hat is the world's leading provider of open source software solutions, using a community-powered approach to reliable and high-performing cloud, Linux, middleware, storage and virtualization technologies. Red Hat also offers award-winning support, training, and consulting services. As the connective hub in a global network of enterprises, partners, a...
P2P RTC will impact the landscape of communications, shifting from traditional telephony style communications models to OTT (Over-The-Top) cloud assisted & PaaS (Platform as a Service) communication services. The P2P shift will impact many areas of our lives, from mobile communication, human interactive web services, RTC and telephony infrastructure, user federation, security and privacy implications, business costs, and scalability. In his session at Internet of @ThingsExpo, Robin Raymond, Chief Architect at Hookflash Inc., will walk through the shifting landscape of traditional telephone a...
SYS-CON Events announced today that Matrix.org has been named “Silver Sponsor” of Internet of @ThingsExpo, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Matrix is an ambitious new open standard for open, distributed, real-time communication over IP. It defines a new approach for interoperable Instant Messaging and VoIP based on pragmatic HTTP APIs and WebRTC, and provides open source reference implementations to showcase and bootstrap the new standard. Our focus is on simplicity, security, and supporting the fullest feature set.
BSQUARE is a global leader of embedded software solutions. We enable smart connected systems at the device level and beyond that millions use every day and provide actionable data solutions for the growing Internet of Things (IoT) market. We empower our world-class customers with our products, services and solutions to achieve innovation and success. For more information, visit www.bsquare.com.
How do APIs and IoT relate? The answer is not as simple as merely adding an API on top of a dumb device, but rather about understanding the architectural patterns for implementing an IoT fabric. There are typically two or three trends: Exposing the device to a management framework Exposing that management framework to a business centric logic • Exposing that business layer and data to end users. This last trend is the IoT stack, which involves a new shift in the separation of what stuff happens, where data lives and where the interface lies. For instance, it’s a mix of architectural style...
SYS-CON Events announced today that SOA Software, an API management leader, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. SOA Software is a leading provider of API Management and SOA Governance products that equip business to deliver APIs and SOA together to drive their company to meet its business strategy quickly and effectively. SOA Software’s technology helps businesses to accelerate their digital channels with APIs, drive partner adoption, monetize their assets, and achieve a...
From a software development perspective IoT is about programming "things," about connecting them with each other or integrating them with existing applications. In his session at @ThingsExpo, Yakov Fain, co-founder of Farata Systems and SuranceBay, will show you how small IoT-enabled devices from multiple manufacturers can be integrated into the workflow of an enterprise application. This is a practical demo of building a framework and components in HTML/Java/Mobile technologies to serve as a platform that can integrate new devices as they become available on the market.
SYS-CON Events announced today that Utimaco will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Utimaco is a leading manufacturer of hardware based security solutions that provide the root of trust to keep cryptographic keys safe, secure critical digital infrastructures and protect high value data assets. Only Utimaco delivers a general-purpose hardware security module (HSM) as a customizable platform to easily integrate into existing software solutions, embed business logic and build s...
Connected devices are changing the way we go about our everyday life, from wearables to driverless cars, to smart grids and entire industries revolutionizing business opportunities through smart objects, capable of two-way communication. But what happens when objects are given an IP-address, and we rely on that connection, sometimes with our lives? How do we secure those vast data infrastructures and safe-keep the privacy of sensitive information? This session will outline how each and every connected device can uphold a core root of trust via a unique cryptographic signature – a “bir...
Internet of @ThingsExpo Silicon Valley announced on Thursday its first 12 all-star speakers and sessions for its upcoming event, which will take place November 4-6, 2014, at the Santa Clara Convention Center in California. @ThingsExpo, the first and largest IoT event in the world, debuted at the Javits Center in New York City in June 10-12, 2014 with over 6,000 delegates attending the conference. Among the first 12 announced world class speakers, IBM will present two highly popular IoT sessions, which will take place November 4-6, 2014 at the Santa Clara Convention Center in Santa Clara, Calif...
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
WebRTC defines no default signaling protocol, causing fragmentation between WebRTC silos. SIP and XMPP provide possibilities, but come with considerable complexity and are not designed for use in a web environment. In his session at Internet of @ThingsExpo, Matthew Hodgson, technical co-founder of the Matrix.org, will discuss how Matrix is a new non-profit Open Source Project that defines both a new HTTP-based standard for VoIP & IM signaling and provides reference implementations.

SUNNYVALE, Calif., Oct. 20, 2014 /PRNewswire/ -- Spansion Inc. (NYSE: CODE), a global leader in embedded systems, today added 96 new products to the Spansion® FM4 Family of flexible microcontrollers (MCUs). Based on the ARM® Cortex®-M4F core, the new MCUs boast a 200 MHz operating frequency and support a diverse set of on-chip peripherals for enhanced human machine interfaces (HMIs) and machine-to-machine (M2M) communications. The rich set of periphera...

SYS-CON Events announced today that Aria Systems, the recurring revenue expert, has been named "Bronze Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Aria Systems helps leading businesses connect their customers with the products and services they love. Industry leaders like Pitney Bowes, Experian, AAA NCNU, VMware, HootSuite and many others choose Aria to power their recurring revenue business and deliver exceptional experiences to their customers.
The Internet of Things (IoT) is going to require a new way of thinking and of developing software for speed, security and innovation. This requires IT leaders to balance business as usual while anticipating for the next market and technology trends. Cloud provides the right IT asset portfolio to help today’s IT leaders manage the old and prepare for the new. Today the cloud conversation is evolving from private and public to hybrid. This session will provide use cases and insights to reinforce the value of the network in helping organizations to maximize their company’s cloud experience.
The Internet of Things (IoT) is making everything it touches smarter – smart devices, smart cars and smart cities. And lucky us, we’re just beginning to reap the benefits as we work toward a networked society. However, this technology-driven innovation is impacting more than just individuals. The IoT has an environmental impact as well, which brings us to the theme of this month’s #IoTuesday Twitter chat. The ability to remove inefficiencies through connected objects is driving change throughout every sector, including waste management. BigBelly Solar, located just outside of Boston, is trans...
SYS-CON Events announced today that Matrix.org has been named “Silver Sponsor” of Internet of @ThingsExpo, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Matrix is an ambitious new open standard for open, distributed, real-time communication over IP. It defines a new approach for interoperable Instant Messaging and VoIP based on pragmatic HTTP APIs and WebRTC, and provides open source reference implementations to showcase and bootstrap the new standard. Our focus is on simplicity, security, and supporting the fullest feature set.
Predicted by Gartner to add $1.9 trillion to the global economy by 2020, the Internet of Everything (IoE) is based on the idea that devices, systems and services will connect in simple, transparent ways, enabling seamless interactions among devices across brands and sectors. As this vision unfolds, it is clear that no single company can accomplish the level of interoperability required to support the horizontal aspects of the IoE. The AllSeen Alliance, announced in December 2013, was formed with the goal to advance IoE adoption and innovation in the connected home, healthcare, education, aut...