Click here to close now.

Welcome!

Virtualization Authors: VictorOps Blog, Elizabeth White, Stackify Blog, Jason Bloomberg, Lori MacVittie

Related Topics: Virtualization

Virtualization: Article

NoSQL and Data Virtualization - Soon to Be Best Friends

Big Data is increasing the popularity of NoSQL; the challenge - data integration

According to a recent McKinsey Big Data report , "The amount of data in our world has been exploding. Companies capture trillions of bytes of information about their customers, suppliers, and operations, and millions of networked sensors are being embedding in the physical world in devices such as mobile phones and automobiles, sensing, creating and communicating data."

NoSQL is increasingly being adopted with the expansion of “big data” use cases. The challenge to enterprises is to integrate disparate NoSQL systems, each with its unique and non-standard API, with their traditional enterprise data sources.

Today’s evolved data virtualization platforms provide access to information from almost any type of data store and therefore are inherently well-suited to integrate NoSQL data stores into the enterprise.

NoSQL and Data Virtualization Defined
NoSQL data stores manage data that is not strictly tabular and relational. Beyond being non-relational, NoSQL data stores are typically distributed, open-source, and horizontally scalable, although there are exceptions for specific NoSQL data stores.

Data virtualization, has over the past eight years expanded its adoption among enterprises and government agencies due to its ability to evolve rapidly and incorporate the latest IT innovations. Originally limited to relational sources and business intelligence (BI) consumers, data virtualization today supports a wide range of sources including multi-dimensional stores, Web and data services, XML documents, analytic appliances, on- and off-premises applications and more. NoSQL data stores are the newest source type supported by data virtualization.

The NoSQL Data Stores Landscape
Although the original emergence of NoSQL data stores was motivated by Web-scale data, the movement has grown to encompass a wide variety of data stores that do not use SQL as their primary processing language. NoSQL data stores can be categorized as:

-- Tabular/Columnar Data Stores: Storing sparse tabular data, these stores look most like traditional tabular databases. Examples include Hadoop/HBase (Yahoo!), BigTable (Google), Hypertable, and VoltDB.

-- Document Stores: These sources store unstructured (e.g., text) or semi-structured (e.g., XML) documents. Examples include MongoDB, Mark Logic, and CouchDB.

-- Graph Databases: These NoSQL sources store graph-oriented data with nodes, edges and properties, and are commonly used to store associations in social networks. Examples include Neo4J, AllegroGraph, and FlockDB.

-- Key/Value Stores: These sources store simple key/value pairs like a traditional hash table. They are further subdivided into in-memory and disk-based solutions. This category of NoSQL systems probably has the largest number of members, each embodying slightly different characteristics. Examples include Memcached, Cassandra (Facebook), SimpleDB, Dynamo (Amazon), Voldemort (LinkedIn), and Kyoto Cabinet.

-- Object and Multi-value Databases: These types of stores preceded the NoSQL movement but they have found new life as part of the movement. Object databases store objects (as in object-oriented programming). Multi-value databases store tabular data, but individual cells can store multiple values. Examples include Objectivity, GemStone, and Unidata.

-- Miscellaneous Sources: Several other data stores can be classified as NoSQL stores, although they don’t fit into any of the categories above. Examples include GT.M, IBM Lotus/Domino, and the ISIS family.

Virtualizing NoSQL Data Store Sources
Data virtualization platforms provide a complete toolset for accessing, federating, abstracting, and delivering information from diverse sources. Access is typically done via standards-based protocols and APIs; for example, JDBC and ODBC for SQL-based sources, HTTP and SOAP for Web services, JMS for messages, and APIs for enterprise and cloud-based applications. Through these methods, source data is securely exposed from a single virtual location, regardless of how and where it is physically stored.

Although NoSQL access standards have yet to develop fully, each implementation provides a Java-based development API appropriate for accessing that type of NoSQL data. Data virtualization platforms typically use these APIs to access and integrate data. Three kinds of NoSQL systems are particularly suited for the data virtualization platform: tabular/columnar data, XML documents, and key-value stores.

How to Integrate Tabular/Columnar Data Stores
Because data virtualization platforms were originally designed for tabular data, retrieving and processing data from this category is a natural fit. The preferred data retrieval paradigm for tabular/columnar data stores leverages “table functions” in the FROM clause of a SQL statement. That is, a procedure resource that returns a cursor can be dropped into the data virtualization development environment as a table, where it will show up in the FROM clause of the SQL statement.

Tabular/columnar NoSQL data sources typically store very large data sets. Table function implementations should ensure sufficient data reduction within the source by leveraging input parameters. Also, the processing of large data sets can take a long time, so some form of caching may be prudent to retain the results for reuse.

This approach provides full access to the data in the underlying NoSQL source, and it will likely be sufficient for most near-term needs. However, more generic filtering and aggregation might be possible with the underlying NoSQL source, and purpose-built table functions provide only a limited interface to the data virtualization platform. If a particular NoSQL tabular data store becomes quite popular, expect data virtualization platform providers to develop a custom adapter that more fully integrates and leverages that data source’s specific capabilities.

How to Integrate XML Document Stores
Because XML document store sources leverage XQuery as their preferred data retrieval paradigm, data virtualization platforms with embedded XQuery engines (and XML as a native data type) can easily retrieve and further process documents from this category of NoSQL data store.

For a specific NoSQL XML document store, a minimum of two custom procedures can be implemented that leverage the NoSQL system’s Java API. Both procedures would return an XML document that can be further manipulated by any of the upstream XML manipulation functionality (e.g., XSLT transformations). The first procedure takes a document handle (unique identifier) as its input argument and leverages the API to retrieve and return that document. The second procedure takes an XQuery specification as its input argument and leverages the API to execute the query and return the results as a single document. Of course, additional procedures accepting more specific parameters could also be implemented, making integration into multiple views easier.

How to Integrate Key/Value Stores
Data virtualization platforms can integrate key/value stores in two ways. The first is as a simple custom SQL function. This function can be created so that it takes the key as a parameter and returns the value. This common function can then be used in SQL statements throughout the data virtualization platform.

The second leverages an in-memory key/value store as a cache target. This approach is best for small data sets or procedure results; it doesn’t work very well for large tabular data sets. Further, this form of cache integration is often challenged by the impedance mismatch between cached tabular data and cached key/value data (the cached data is opaque inside the key/value store), so the entire set must be retrieved for processing.

Key Takeaways
Web analytics, predictive analytics, voice-of-the-customer, churn, fraud, sensor-tracking and other “big data” use cases are accelerating demand  for NoSQL data stores as well as for the integration of NoSQL data with enterprise data.  Data virtualization, a more modern and versatile approach to data integration, is proving a successful solution to this fast growing problem.

More Stories By Robert Eve

Robert Eve is the EVP of Marketing at Composite Software, the data virtualization gold standard and co-author of Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility. Bob's experience includes executive level roles at leading enterprise software companies such as Mercury Interactive, PeopleSoft, and Oracle. Bob holds a Masters of Science from the Massachusetts Institute of Technology and a Bachelor of Science from the University of California at Berkeley.

@ThingsExpo Stories
The Open Compute Project is a collective effort by Facebook and a number of players in the datacenter industry to bring lessons learned from the social media giant's giant IT deployment to the rest of the world. Datacenters account for 3% of global electricity consumption – about the same as all of Switzerland or the Czech Republic -- according to people I met at the recent Open Compute Summit in San Jose. With increasing mobility at the edge of the cloud and vast new dataflows being predicted with the growth of the Internet of Things (and The Coming Age of Many Zettabytes) in the near...
GENBAND has announced that SageNet is leveraging the Nuvia platform to deliver Unified Communications as a Service (UCaaS) to its large base of retail and enterprise customers. Nuvia’s cloud-based solution provides SageNet’s customers with a full suite of business communications and collaboration tools. Two large national SageNet retail customers have recently signed up to deploy the Nuvia platform and the company will continue to sell the service to new and existing customers. Nuvia’s capabilities include HD voice, video, multimedia messaging, mobility, conferencing, Web collaboration, deskt...
Wearable technology was dominant at this year’s International Consumer Electronics Show (CES) , and MWC was no exception to this trend. New versions of favorites, such as the Samsung Gear (three new products were released: the Gear 2, the Gear 2 Neo and the Gear Fit), shared the limelight with new wearables like Pebble Time Steel (the new premium version of the company’s previously released smartwatch) and the LG Watch Urbane. The most dramatic difference at MWC was an emphasis on presenting wearables as fashion accessories and moving away from the original clunky technology associated with t...
The WebRTC Summit 2014 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborate. Cisco and our partners are building the platform for the Internet of Everything by connecting the...
15th Cloud Expo, which took place Nov. 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA, expanded the conference content of @ThingsExpo, Big Data Expo, and DevOps Summit to include two developer events. IBM held a Bluemix Developer Playground on November 5 and ElasticBox held a Hackathon on November 6. Both events took place on the expo floor. The Bluemix Developer Playground, for developers of all levels, highlighted the ease of use of Bluemix, its services and functionality and provide short-term introductory projects that developers can complete between sessions.
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
Temasys has announced senior management additions to its team. Joining are David Holloway as Vice President of Commercial and Nadine Yap as Vice President of Product. Over the past 12 months Temasys has doubled in size as it adds new customers and expands the development of its Skylink platform. Skylink leads the charge to move WebRTC, traditionally seen as a desktop, browser based technology, to become a ubiquitous web communications technology on web and mobile, as well as Internet of Things compatible devices.
The list of ‘new paradigm’ technologies that now surrounds us appears to be at an all time high. From cloud computing and Big Data analytics to Bring Your Own Device (BYOD) and the Internet of Things (IoT), today we have to deal with what the industry likes to call ‘paradigm shifts’ at every level of IT. This is disruption; of course, we understand that – change is almost always disruptive.
WebRTC is an up-and-coming standard that enables real-time voice and video to be directly embedded into browsers making the browser a primary user interface for communications and collaboration. WebRTC runs in a number of browsers today and is currently supported in over a billion installed browsers globally, across a range of platform OS and devices. Today, organizations that choose to deploy WebRTC applications and use a host machine that supports audio through USB or Bluetooth can use Plantronics products to connect and transit or receive the audio associated with the WebRTC session.
Docker is an excellent platform for organizations interested in running microservices. It offers portability and consistency between development and production environments, quick provisioning times, and a simple way to isolate services. In his session at DevOps Summit at 16th Cloud Expo, Shannon Williams, co-founder of Rancher Labs, will walk through these and other benefits of using Docker to run microservices, and provide an overview of RancherOS, a minimalist distribution of Linux designed expressly to run Docker. He will also discuss Rancher, an orchestration and service discovery platf...
SYS-CON Events announced today that Alert Logic, the leading provider of Security-as-a-Service solutions for the cloud, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® and DevOps Summit 2015 New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo® and DevOps Summit 2015 Silicon Valley, which will take place November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA.
Sonus Networks introduced the Sonus WebRTC Services Solution, a virtualized Web Real-Time Communications (WebRTC) offer, purpose-built for the Cloud. The WebRTC Services Solution provides signaling from WebRTC-to-WebRTC applications and interworking from WebRTC-to-Session Initiation Protocol (SIP), delivering advanced real-time communications capabilities on mobile applications and on websites, which are accessible via a browser.
SYS-CON Events announced today that Aria Systems, the leading innovator in recurring revenue, has been named “Bronze Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Proven by the world’s most demanding enterprises, including AAA NCNU, Constant Contact, Falck, Hootsuite, Pitney Bowes, Telekom Denmark, and VMware, Aria helps enterprises grow their recurring revenue businesses. With Aria’s end-to-end active monetization platform, global brands can get to market faster with a wider variety of products and services, while maximizin...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes for use cases across the industrial, enterprise, and consumer segments.
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional Social, Mobile and Cloud user experiences, our solutions help large and medium-sized organizations dr...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make smarter decisions, faster.
Connected devices and the Internet of Things are getting significant momentum in 2014. In his session at Internet of @ThingsExpo, Jim Hunter, Chief Scientist & Technology Evangelist at Greenwave Systems, examined three key elements that together will drive mass adoption of the IoT before the end of 2015. The first element is the recent advent of robust open source protocols (like AllJoyn and WebRTC) that facilitate M2M communication. The second is broad availability of flexible, cost-effective storage designed to handle the massive surge in back-end data in a world where timely analytics is e...
SYS-CON Events announced today that Akana, formerly SOA Software, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Akana’s comprehensive suite of API Management, API Security, Integrated SOA Governance, and Cloud Integration solutions helps businesses accelerate digital transformation by securely extending their reach across multiple channels – mobile, cloud and Internet of Things. Akana enables enterprises to share data as APIs, connect and integrate applications, drive part...