Containers Expo Blog Authors: Elizabeth White, Pat Romanski, Yeshim Deniz, Liz McMillan, Zakia Bouachraoui

Related Topics: Containers Expo Blog

Containers Expo Blog: Article

Direct Indexing Enables Management of Legacy Tape Data

Tape remediation is quickly becoming the preferred method

"How many backup tapes do you have?"
"I have no idea - probably thousands."

"Do you need to keep them?"

"Why don't you recycle them?"
"Legal won't let us."

This might be a typical storage manager's response when questioned about a company's backup tape stockpile. These tapes are often created in response to a key objective of any IT organization - to protect enterprise data assets. Thus a mountain of old backup tapes has been amassed, largely tapes that have long outlived their disaster recovery purpose. Why not recycle or destroy all these old tapes? Federal regulations forbid it. Data on these tapes "may" be necessary to support current or future litigation. What data? A very, very small percentage of what exists, typically less than 1 percent. Why then keep all these tapes? Because it has been next to impossible to separate the useless data from what legal requires.

Sometime down the road, if not already, specific data from backup tapes will be requested by legal.  Some corporate legal teams have proactively issued a mandate to not touch tapes; others have been forced to do so. Either way, stricter regulations are forcing the issue. The June 2009 California Electronic Discovery Act, for example, declares all electronically stored information should be accessible and requires it to be produced. In January 2010 Judge Scheindlin, the judge on the groundbreaking Zubulake v. UBS Warburg case, issued an opinion where she denied the use of the burdensome argument, called out the defendant as grossly negligent, and issued sanctions against UBS Warburg for not collecting data from backup tapes to support the case. The courts are ruling more frequently against firms that do not produce data, including tape data, in a timely manner. Many cases exist today where fines have been imposed against the botched collection of historical files and email. Will your company be next?

Storing old tapes is not only a potential liability but also a wasted expense. Even if it costs only a few dollars a month to store a tape, those dollars quickly add up. In addition, since these old tapes cannot be recycled, new tapes must be purchased for ongoing tape backups. This expense, combined with the storage costs, quickly becomes a large item in the budget. This IT expense could easily be allocated to something more useful for the organization. This article discusses how to take a mountain of stored tapes and turn them into a molehill by extracting the relevant data and eliminating unnecessary tapes.

Consider Remediation
In the past it was far too expensive and difficult to understand the detailed content of old backup tapes. The content would first need to be restored and analyzed in order to determine what to keep and what is safe to purge. The restoration process uses the original backup software to remove data from tape and bring it back online in order to begin the discovery process. Restoring thousands or tens of thousands of tapes would be out of the question, taking too much time, money and legacy infrastructure. As a result IT departments have let the mountain of tapes grow taller every day - with no end in sight.

The problem has now been solved by applying a more intelligent approach and eliminating the need for expensive and time-consuming backup restoration. Direct indexing and extraction is a more intelligent process since it significantly streamlines the collection of ESI (electronically stored information) from tape.

Direct indexing technology scans tapes and then searches and extracts specific files and email without requiring the original backup software. This allows you to only deal with relevant files (less than 1 percent of the tape content) and not the bulk of useless content (the other 99-plus percent). In significantly less time an IT department can process tapes in-house, find what legal needs, archive it and make it available when it is needed. This efficient, cost-effective process enables tape remediation, allowing IT departments to recapture tape-storage budgets, while supporting legal with the data they need.

Automated Direct Indexing Illustrated
The new automated process is simple - no specialized skills or software are required. Assume a situation where there are 10,000 tapes in offsite storage. The first step would be to catalog the tapes to profile the content. Using a tape library, tape headers can be scanned in minutes, only requiring manpower to load the tapes. Once the scan is complete, the indexing technology can analyze the catalog and eliminate incremental backups, as well as backups of non-user data servers and blank tapes. This typically reduces the volume by 80 percent, turning a 10,000-tape job into a 2,000-tape job. Stopping here eliminates 80 percent of the tapes and achieves significant cost savings.

Once the cataloging is done the remaining set of tapes contains potentially responsive data that will support current and future litigation. The next step requires a full scan of the tapes. This generates a searchable index of the content and metadata without copying or modifying the existing tapes. Collaborating with legal, the search queries are defined (the management team's email, files related to a sensitive project, intellectual property documents, etc.). Legal can then search the index, tag what they want and request the data be extracted. IT will then run an extract job and all the tagged files and emails will be ripped from tape, keeping all the content and metadata intact. When this process is complete the tapes can then be recycled.

Details of a typical tape remediation project with 10,000 tapes using direct indexing are as follows:

If you combine the cost to store tapes offsite with the cost to acquire new tapes in support of the existing backup process it equals $430,000 per year. As the volume of tapes is growing each week, this number will continue to increase over time. In order to compute the payback for such a project you would need to break out the costs for the acquisition of a direct indexing product, the dedicated tape library, and manpower. The expenditure for manpower, tape libraries, hardware, and software will prove out an ROI in less than one year. This does not include any costs associated with ongoing litigation where tapes are pulled from storage for restoration. Such litigation support costs could easily reach hundreds of thousands of dollars annually, which would result in a shorter period of ROI.

In the past it was not cost-effective to remediate the mountains of tape stored offsite. Direct indexing technology now makes this feasible and is quickly becoming a best practice for any organization that is faced with constant legal events involving legacy data. Extraction using direct indexing technology does not require the backup software to access tape content. In addition, extraction leverages the index to understand data at a file and email level. By using direct indexing and extraction you can review the contents on tape, find relevant content and extract what is interesting. Direct indexing is a non-invasive scan of the tape that allows intelligence to be obtained about the contents: file types, dates, custodians, etc., and allows the selection and specific content to be gathered. Restoration requires you to first restore data before you can find the relevant content; it's a radically different process. The benefits of direct indexing over restoration are a clear savings of both time and money. As legal and IT work together, tape remediation is quickly becoming the preferred method to reduce corporate liability, and expand IT's ever-shrinking budget.

More Stories By Jim McGann

Jim McGann serves as Vice President of Information Discovery for Index Engines. He has extensive experience with the eDiscovery and Information Management. He is currently contributing to the Sedona working group addressing electronic document retention and production. Jim is also a frequent speaker for industry organizations such as ARMA and ILTA, and has authored multiple articles for legal technology and information management publications.

In recent years Jim has worked for technology based start-ups that provided financial services and information management solutions. Prior to Index Engines, he worked for leading software firms, including Information Builders and the French based engineering software provider Dassault Systemes. Jim was responsible for the Business Development of Scopeware at Mirror Worlds Technologies, the knowledge management software firm founded by Dr. David Gelernter of Yale University. Jim graduated from Villanova University with a degree in Mechanical Engineering.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Every organization is facing their own Digital Transformation as they attempt to stay ahead of the competition, or worse, just keep up. Each new opportunity, whether embracing machine learning, IoT, or a cloud migration, seems to bring new development, deployment, and management models. The results are more diverse and federated computing models than any time in our history.
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Charles Araujo is an industry analyst, internationally recognized authority on the Digital Enterprise and author of The Quantum Age of IT: Why Everything You Know About IT is About to Change. As Principal Analyst with Intellyx, he writes, speaks and advises organizations on how to navigate through this time of disruption. He is also the founder of The Institute for Digital Transformation and a sought after keynote speaker. He has been a regular contributor to both InformationWeek and CIO Insight...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...