|By Jason Bloomberg||
|February 28, 2013 08:00 AM EST||
Scenario #1: out of the blue, your boss calls, looking for some long-forgotten entry in a spreadsheet from 1989. Where do you look? Or consider scenario #2: said boss calls again, only this time she wants you to analyze customer purchasing behavior...going back to 1980. Similar problem, only instead of finding a single datum, you must find years of ancient information and prepare it for analysis with a modern business intelligence tool.
The answer, of course, is archiving. Fortunately, you (or your predecessor, or predecessor's predecessor) have been archiving important-or potentially important-corporate data since your organization first started using computers back in the 1960s. So all you have to do to keep your boss happy is find the appropriate archives, recover the necessary data, and you're good to go, right?
Not so fast. There are a number of gotchas to this story, some more obvious than others. Cloud to the rescue? Perhaps, but many archiving challenges remain, and the Cloud actually introduces some new speed bumps as well. Now factor in Big Data. Sure, Big Data are big, so archiving Big Data requires a big archive. Lucky you-vendors have already been knocking on your door peddling Big Data archiving solutions. Now can you finally breathe easy? Maybe, maybe not. Here's why.
Archiving: The Long View
So much of our digital lives have taken place over the last twenty years or so that we forget that digital computing dates back to the 1940s-and furthermore, we forget that this sixty-odd year lifetime of the Information Age is really only the first act of perhaps centuries of computing before humankind either evolves past zeroes and ones altogether or kills itself off in the process. Our technologies for archiving information, however, are woefully shortsighted, for several reasons:
- Hardware obsolescence (three to five years) - Using a hard drive or tape drive for archiving? It won't be long till the hardware is obsolete. You may get more life out of the gear you own, but one it wears out, you'll be stuck. Anyone who archived to laser disk in the 1980s has been down this road.
- File format obsolescence (five to ten years) - True, today's Office products can probably read that file originally saved in the Microsoft Excel version 1 file format back in the day, but what about those VisiCalc or Lotus 123 files? Tools that will convert such files to their modern equivalents will eventually grow increasingly scarce, and you always risk the possibility that they won't handle the conversion properly, leading to data corruption. If your data are encrypted, then your encryption format falls into the file format obsolescence bucket as well. And what about the programs themselves? From simple spreadsheet formulas to complex legacy spaghetti code, how do you archive algorithms in an obsolescence-proof format?
- Media obsolescence (ten to fifteen years) - CD-ROMs and digital backup tapes have an expected lifetime. Keeping them cool and dry can extend their life, but actually using them will shorten it. Do you really want to rely upon a fifteen-year-old backup tape for critical information?
- Computing paradigm obsolescence (fifty years perhaps; it's anybody's guess) - will quantum computing or biological processors or some other futuristic gear drive binary digital technologies into the Stone Age? Only time will tell. But if you are forward thinking enough to archive information for the 22nd century, there's no telling what you'll need to do to maintain the viability of your archives in a post-binary world.
Cloud to the Rescue?
On the surface, letting your Cloud Service Provider (CSP) archive your data solves many of these issues. Not only are the new archiving services like Amazon Glacier impressively cost-effective, but we can feel reasonably comfortable counting on today's CSPs to migrate our data from one hardware/media platform to the next over time as technology advances. So, can Cloud solve all your archiving issues?
At some point the answer may be yes, but Cloud Computing is still far too immature to jump to such a conclusion. Will your CSP still be in business decades from now? As the CSP market undergoes its inevitable consolidation phase, will the new CSP who bought out your old CSP handle your archive properly? Only time will tell.
But even if the CSPs rise to the archiving challenge, you may still have the file format challenge. Sure, archiving those old Lotus 123 files in the Cloud is a piece of cake, but that doesn't mean that your CSP will return them in Excel version 21.3 format ten years hence-an unfortunate and unintentional example of garbage in the Cloud.
The Big Data Old Tail
You might think that the challenges inherent in archiving Big Data are simply a matter of degree: bigger storage for bigger data sets, right? But thinking of Big Data as little more than extra-large data sets misses the big picture of the importance of Big Data.
The point to Big Data is that the indicated data sets continue to grow in size on an ongoing basis, continually pushing the limits of existing technology. The more capacity available for storage and processing, the larger the data sets we end up with. In other words, Big Data are by definition a moving target.
One familiar estimate states that the quantity of data in the world doubles every two years. Your organization's Big Data may grow somewhat faster or slower than this convenient benchmark, but in any case, the point is that Big Data growth is exponential. So, taking the two-year doubling factor as a rule of thumb, we can safely say that at any point in time, half of your Big Data are less than two years old, while the other half of your Big Data are more than two years old. And of course, this ZapFlash is concerned with the older half.
The Big Data archiving challenge, therefore, is breaking down the more-than-two-years-old Big Data sets. Remember that this two-year window is true at any point in time. Thinking about the problem mathematically, then, you can conclude that a quarter of your Big Data are more than four years old, an eighth are more than six years old, etc.
Combine this math with the lesson of the first part of this ZapFlash, and a critical point emerges: byte for byte, the cost of maintaining usable archives increases the older those archives become. And yet, the relative size of those archives is vanishingly small relative to today's and tomorrow's Big Data. Furthermore, this problem will only get worse over time, because the size of the Old Tail continues to grow exponentially.
We call this Big Data archiving problem the Big Data Old Tail. Similar to the Long Tail argument, which focuses on the value inherent in summing up the Long Tail of customer demand for niche products, the Big Data Old Tail focuses on the costs inherent in maintaining archives of increasingly small, yet increasingly costly data as we struggle to deal with older and older information. True, perhaps the fact that the Old Tail data sets from a particular time period are small will compensate for the fact that they are costly to archive, but remember that the Old Tail continues to grow over time. Unless we deal with the Old Tail, it threatens to overwhelm us.
The ZapThink Take
The obvious question that comes to mind is whether we need to save all those old data sets anyway. After all, who cares about, say, purchasing data from 1982? And of course, you may have a business reason for deleting old information. Since information you preserve may be subject to lawsuits or other unpleasantness, you may wish to delete data once it's legal to do so.
Fair enough. But there are perhaps far more examples of Big Data sets that your organization will wish to preserve indefinitely than data sets you're happy to delete. From scientific data to information on market behavior to social trends, the richness of our Big Data do not simply depend on the information from the last year or two or even ten. After all, if we forget the mistakes of the past then we are doomed to repeat them. Crunching today's Big Data can give us business intelligence, but only by crunching yesterday's Big Data as well can we ever expect to glean wisdom from our information.
SYS-CON Events announced today that BMC Software has been named "Siver Sponsor" of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. BMC is a global leader in innovative software solutions that help businesses transform into digital enterprises for the ultimate competitive advantage. BMC Digital Enterprise Management is a set of innovative IT solutions designed to make digital business fast, seamless, and optimized from mainframe to mo...
May. 28, 2016 09:45 AM EDT Reads: 2,238
What a difference a year makes. Organizations aren’t just talking about IoT possibilities, it is now baked into their core business strategy. With IoT, billions of devices generating data from different companies on different networks around the globe need to interact. From efficiency to better customer insights to completely new business models, IoT will turn traditional business models upside down. In the new customer-centric age, the key to success is delivering critical services and apps wit...
May. 28, 2016 09:15 AM EDT Reads: 1,170
Join us at Cloud Expo | @ThingsExpo 2016 – June 7-9 at the Javits Center in New York City and November 1-3 at the Santa Clara Convention Center in Santa Clara, CA – and deliver your unique message in a way that is striking and unforgettable by taking advantage of SYS-CON's unmatched high-impact, result-driven event / media packages.
May. 28, 2016 09:00 AM EDT Reads: 2,408
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, will provide an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life ...
May. 28, 2016 08:45 AM EDT Reads: 1,962
SYS-CON Events announced today that MobiDev will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business managers to a full-scale mobile software company with over 200 develope...
May. 28, 2016 07:15 AM EDT Reads: 2,632
SoftLayer operates a global cloud infrastructure platform built for Internet scale. With a global footprint of data centers and network points of presence, SoftLayer provides infrastructure as a service to leading-edge customers ranging from Web startups to global enterprises. SoftLayer's modular architecture, full-featured API, and sophisticated automation provide unparalleled performance and control. Its flexible unified platform seamlessly spans physical and virtual devices linked via a world...
May. 28, 2016 06:00 AM EDT Reads: 2,237
Companies can harness IoT and predictive analytics to sustain business continuity; predict and manage site performance during emergencies; minimize expensive reactive maintenance; and forecast equipment and maintenance budgets and expenditures. Providing cost-effective, uninterrupted service is challenging, particularly for organizations with geographically dispersed operations.
May. 28, 2016 05:00 AM EDT Reads: 2,098
SYS-CON Events announced today TechTarget has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. TechTarget is the Web’s leading destination for serious technology buyers researching and making enterprise technology decisions. Its extensive global networ...
May. 28, 2016 05:00 AM EDT Reads: 3,219
SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management...
May. 28, 2016 04:15 AM EDT Reads: 3,179
As cloud and storage projections continue to rise, the number of organizations moving to the cloud is escalating and it is clear cloud storage is here to stay. However, is it secure? Data is the lifeblood for government entities, countries, cloud service providers and enterprises alike and losing or exposing that data can have disastrous results. There are new concepts for data storage on the horizon that will deliver secure solutions for storing and moving sensitive data around the world. ...
May. 28, 2016 03:00 AM EDT Reads: 1,307
SYS-CON Events announced today Object Management Group® has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
May. 28, 2016 03:00 AM EDT Reads: 2,527
SYS-CON Events announced today that MangoApps will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. MangoApps provides modern company intranets and team collaboration software, allowing workers to stay connected and productive from anywhere in the world and from any device. For more information, please visit https://www.mangoapps.com/.
May. 28, 2016 02:30 AM EDT Reads: 815
The essence of data analysis involves setting up data pipelines that consist of several operations that are chained together – starting from data collection, data quality checks, data integration, data analysis and data visualization (including the setting up of interaction paths in that visualization). In our opinion, the challenges stem from the technology diversity at each stage of the data pipeline as well as the lack of process around the analysis.
May. 28, 2016 01:30 AM EDT Reads: 1,450
The IoTs will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm and share the must-have mindsets for removing complexity from the development proc...
May. 28, 2016 01:00 AM EDT Reads: 1,930
Designing IoT applications is complex, but deploying them in a scalable fashion is even more complex. A scalable, API first IaaS cloud is a good start, but in order to understand the various components specific to deploying IoT applications, one needs to understand the architecture of these applications and figure out how to scale these components independently. In his session at @ThingsExpo, Nara Rajagopalan is CEO of Accelerite, will discuss the fundamental architecture of IoT applications, ...
May. 28, 2016 01:00 AM EDT Reads: 1,257
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, wh...
May. 28, 2016 12:30 AM EDT Reads: 2,019
SYS-CON Events announced today that Tintri Inc., a leading producer of VM-aware storage (VAS) for virtualization and cloud environments, will exhibit at the 18th International CloudExpo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, New York, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
May. 28, 2016 12:15 AM EDT Reads: 2,431
In his session at 18th Cloud Expo, Bruce Swann, Senior Product Marketing Manager at Adobe, will discuss how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects). Bruce Swann has more than 15 years of experience working with digital marketing disciplines like web analytics, social med...
May. 28, 2016 12:00 AM EDT Reads: 1,337
SYS-CON Events announced today that EastBanc Technologies will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. EastBanc Technologies has been working at the frontier of technology since 1999. Today, the firm provides full-lifecycle software development delivering flexible technology solutions that seamlessly integrate with existing systems – whether on premise or cloud. EastBanc Technologies partners with p...
May. 27, 2016 11:30 PM EDT Reads: 2,328
The IoT is changing the way enterprises conduct business. In his session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, discuss how businesses can gain an edge over competitors by empowering consumers to take control through IoT. We'll cite examples such as a Washington, D.C.-based sports club that leveraged IoT and the cloud to develop a comprehensive booking system. He'll also highlight how IoT can revitalize and restore outdated business models, making them profitable...
May. 27, 2016 07:45 PM EDT Reads: 2,897