Containers Expo Blog Authors: Pat Romanski, Zakia Bouachraoui, Yeshim Deniz, Elizabeth White, Liz McMillan

Related Topics: Containers Expo Blog

Containers Expo Blog: Article

Direct Indexing Enables Management of Legacy Tape Data

Tape remediation is quickly becoming the preferred method

"How many backup tapes do you have?"
"I have no idea - probably thousands."

"Do you need to keep them?"

"Why don't you recycle them?"
"Legal won't let us."

This might be a typical storage manager's response when questioned about a company's backup tape stockpile. These tapes are often created in response to a key objective of any IT organization - to protect enterprise data assets. Thus a mountain of old backup tapes has been amassed, largely tapes that have long outlived their disaster recovery purpose. Why not recycle or destroy all these old tapes? Federal regulations forbid it. Data on these tapes "may" be necessary to support current or future litigation. What data? A very, very small percentage of what exists, typically less than 1 percent. Why then keep all these tapes? Because it has been next to impossible to separate the useless data from what legal requires.

Sometime down the road, if not already, specific data from backup tapes will be requested by legal.  Some corporate legal teams have proactively issued a mandate to not touch tapes; others have been forced to do so. Either way, stricter regulations are forcing the issue. The June 2009 California Electronic Discovery Act, for example, declares all electronically stored information should be accessible and requires it to be produced. In January 2010 Judge Scheindlin, the judge on the groundbreaking Zubulake v. UBS Warburg case, issued an opinion where she denied the use of the burdensome argument, called out the defendant as grossly negligent, and issued sanctions against UBS Warburg for not collecting data from backup tapes to support the case. The courts are ruling more frequently against firms that do not produce data, including tape data, in a timely manner. Many cases exist today where fines have been imposed against the botched collection of historical files and email. Will your company be next?

Storing old tapes is not only a potential liability but also a wasted expense. Even if it costs only a few dollars a month to store a tape, those dollars quickly add up. In addition, since these old tapes cannot be recycled, new tapes must be purchased for ongoing tape backups. This expense, combined with the storage costs, quickly becomes a large item in the budget. This IT expense could easily be allocated to something more useful for the organization. This article discusses how to take a mountain of stored tapes and turn them into a molehill by extracting the relevant data and eliminating unnecessary tapes.

Consider Remediation
In the past it was far too expensive and difficult to understand the detailed content of old backup tapes. The content would first need to be restored and analyzed in order to determine what to keep and what is safe to purge. The restoration process uses the original backup software to remove data from tape and bring it back online in order to begin the discovery process. Restoring thousands or tens of thousands of tapes would be out of the question, taking too much time, money and legacy infrastructure. As a result IT departments have let the mountain of tapes grow taller every day - with no end in sight.

The problem has now been solved by applying a more intelligent approach and eliminating the need for expensive and time-consuming backup restoration. Direct indexing and extraction is a more intelligent process since it significantly streamlines the collection of ESI (electronically stored information) from tape.

Direct indexing technology scans tapes and then searches and extracts specific files and email without requiring the original backup software. This allows you to only deal with relevant files (less than 1 percent of the tape content) and not the bulk of useless content (the other 99-plus percent). In significantly less time an IT department can process tapes in-house, find what legal needs, archive it and make it available when it is needed. This efficient, cost-effective process enables tape remediation, allowing IT departments to recapture tape-storage budgets, while supporting legal with the data they need.

Automated Direct Indexing Illustrated
The new automated process is simple - no specialized skills or software are required. Assume a situation where there are 10,000 tapes in offsite storage. The first step would be to catalog the tapes to profile the content. Using a tape library, tape headers can be scanned in minutes, only requiring manpower to load the tapes. Once the scan is complete, the indexing technology can analyze the catalog and eliminate incremental backups, as well as backups of non-user data servers and blank tapes. This typically reduces the volume by 80 percent, turning a 10,000-tape job into a 2,000-tape job. Stopping here eliminates 80 percent of the tapes and achieves significant cost savings.

Once the cataloging is done the remaining set of tapes contains potentially responsive data that will support current and future litigation. The next step requires a full scan of the tapes. This generates a searchable index of the content and metadata without copying or modifying the existing tapes. Collaborating with legal, the search queries are defined (the management team's email, files related to a sensitive project, intellectual property documents, etc.). Legal can then search the index, tag what they want and request the data be extracted. IT will then run an extract job and all the tagged files and emails will be ripped from tape, keeping all the content and metadata intact. When this process is complete the tapes can then be recycled.

Details of a typical tape remediation project with 10,000 tapes using direct indexing are as follows:

If you combine the cost to store tapes offsite with the cost to acquire new tapes in support of the existing backup process it equals $430,000 per year. As the volume of tapes is growing each week, this number will continue to increase over time. In order to compute the payback for such a project you would need to break out the costs for the acquisition of a direct indexing product, the dedicated tape library, and manpower. The expenditure for manpower, tape libraries, hardware, and software will prove out an ROI in less than one year. This does not include any costs associated with ongoing litigation where tapes are pulled from storage for restoration. Such litigation support costs could easily reach hundreds of thousands of dollars annually, which would result in a shorter period of ROI.

In the past it was not cost-effective to remediate the mountains of tape stored offsite. Direct indexing technology now makes this feasible and is quickly becoming a best practice for any organization that is faced with constant legal events involving legacy data. Extraction using direct indexing technology does not require the backup software to access tape content. In addition, extraction leverages the index to understand data at a file and email level. By using direct indexing and extraction you can review the contents on tape, find relevant content and extract what is interesting. Direct indexing is a non-invasive scan of the tape that allows intelligence to be obtained about the contents: file types, dates, custodians, etc., and allows the selection and specific content to be gathered. Restoration requires you to first restore data before you can find the relevant content; it's a radically different process. The benefits of direct indexing over restoration are a clear savings of both time and money. As legal and IT work together, tape remediation is quickly becoming the preferred method to reduce corporate liability, and expand IT's ever-shrinking budget.

More Stories By Jim McGann

Jim McGann serves as Vice President of Information Discovery for Index Engines. He has extensive experience with the eDiscovery and Information Management. He is currently contributing to the Sedona working group addressing electronic document retention and production. Jim is also a frequent speaker for industry organizations such as ARMA and ILTA, and has authored multiple articles for legal technology and information management publications.

In recent years Jim has worked for technology based start-ups that provided financial services and information management solutions. Prior to Index Engines, he worked for leading software firms, including Information Builders and the French based engineering software provider Dassault Systemes. Jim was responsible for the Business Development of Scopeware at Mirror Worlds Technologies, the knowledge management software firm founded by Dr. David Gelernter of Yale University. Jim graduated from Villanova University with a degree in Mechanical Engineering.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments t...
If a machine can invent, does this mean the end of the patent system as we know it? The patent system, both in the US and Europe, allows companies to protect their inventions and helps foster innovation. However, Artificial Intelligence (AI) could be set to disrupt the patent system as we know it. This talk will examine how AI may change the patent landscape in the years to come. Furthermore, ways in which companies can best protect their AI related inventions will be examined from both a US and...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
Bill Schmarzo, Tech Chair of "Big Data | Analytics" of upcoming CloudEXPO | DXWorldEXPO New York (November 12-13, 2018, New York City) today announced the outline and schedule of the track. "The track has been designed in experience/degree order," said Schmarzo. "So, that folks who attend the entire track can leave the conference with some of the skills necessary to get their work done when they get back to their offices. It actually ties back to some work that I'm doing at the University of San...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...