Containers Expo Blog Authors: Liz McMillan, Yeshim Deniz, Pat Romanski, Elizabeth White, Ravi Rajamiyer

Related Topics: Containers Expo Blog

Containers Expo Blog: Article

Direct Indexing Enables Management of Legacy Tape Data

Tape remediation is quickly becoming the preferred method

"How many backup tapes do you have?"
"I have no idea - probably thousands."

"Do you need to keep them?"

"Why don't you recycle them?"
"Legal won't let us."

This might be a typical storage manager's response when questioned about a company's backup tape stockpile. These tapes are often created in response to a key objective of any IT organization - to protect enterprise data assets. Thus a mountain of old backup tapes has been amassed, largely tapes that have long outlived their disaster recovery purpose. Why not recycle or destroy all these old tapes? Federal regulations forbid it. Data on these tapes "may" be necessary to support current or future litigation. What data? A very, very small percentage of what exists, typically less than 1 percent. Why then keep all these tapes? Because it has been next to impossible to separate the useless data from what legal requires.

Sometime down the road, if not already, specific data from backup tapes will be requested by legal.  Some corporate legal teams have proactively issued a mandate to not touch tapes; others have been forced to do so. Either way, stricter regulations are forcing the issue. The June 2009 California Electronic Discovery Act, for example, declares all electronically stored information should be accessible and requires it to be produced. In January 2010 Judge Scheindlin, the judge on the groundbreaking Zubulake v. UBS Warburg case, issued an opinion where she denied the use of the burdensome argument, called out the defendant as grossly negligent, and issued sanctions against UBS Warburg for not collecting data from backup tapes to support the case. The courts are ruling more frequently against firms that do not produce data, including tape data, in a timely manner. Many cases exist today where fines have been imposed against the botched collection of historical files and email. Will your company be next?

Storing old tapes is not only a potential liability but also a wasted expense. Even if it costs only a few dollars a month to store a tape, those dollars quickly add up. In addition, since these old tapes cannot be recycled, new tapes must be purchased for ongoing tape backups. This expense, combined with the storage costs, quickly becomes a large item in the budget. This IT expense could easily be allocated to something more useful for the organization. This article discusses how to take a mountain of stored tapes and turn them into a molehill by extracting the relevant data and eliminating unnecessary tapes.

Consider Remediation
In the past it was far too expensive and difficult to understand the detailed content of old backup tapes. The content would first need to be restored and analyzed in order to determine what to keep and what is safe to purge. The restoration process uses the original backup software to remove data from tape and bring it back online in order to begin the discovery process. Restoring thousands or tens of thousands of tapes would be out of the question, taking too much time, money and legacy infrastructure. As a result IT departments have let the mountain of tapes grow taller every day - with no end in sight.

The problem has now been solved by applying a more intelligent approach and eliminating the need for expensive and time-consuming backup restoration. Direct indexing and extraction is a more intelligent process since it significantly streamlines the collection of ESI (electronically stored information) from tape.

Direct indexing technology scans tapes and then searches and extracts specific files and email without requiring the original backup software. This allows you to only deal with relevant files (less than 1 percent of the tape content) and not the bulk of useless content (the other 99-plus percent). In significantly less time an IT department can process tapes in-house, find what legal needs, archive it and make it available when it is needed. This efficient, cost-effective process enables tape remediation, allowing IT departments to recapture tape-storage budgets, while supporting legal with the data they need.

Automated Direct Indexing Illustrated
The new automated process is simple - no specialized skills or software are required. Assume a situation where there are 10,000 tapes in offsite storage. The first step would be to catalog the tapes to profile the content. Using a tape library, tape headers can be scanned in minutes, only requiring manpower to load the tapes. Once the scan is complete, the indexing technology can analyze the catalog and eliminate incremental backups, as well as backups of non-user data servers and blank tapes. This typically reduces the volume by 80 percent, turning a 10,000-tape job into a 2,000-tape job. Stopping here eliminates 80 percent of the tapes and achieves significant cost savings.

Once the cataloging is done the remaining set of tapes contains potentially responsive data that will support current and future litigation. The next step requires a full scan of the tapes. This generates a searchable index of the content and metadata without copying or modifying the existing tapes. Collaborating with legal, the search queries are defined (the management team's email, files related to a sensitive project, intellectual property documents, etc.). Legal can then search the index, tag what they want and request the data be extracted. IT will then run an extract job and all the tagged files and emails will be ripped from tape, keeping all the content and metadata intact. When this process is complete the tapes can then be recycled.

Details of a typical tape remediation project with 10,000 tapes using direct indexing are as follows:

If you combine the cost to store tapes offsite with the cost to acquire new tapes in support of the existing backup process it equals $430,000 per year. As the volume of tapes is growing each week, this number will continue to increase over time. In order to compute the payback for such a project you would need to break out the costs for the acquisition of a direct indexing product, the dedicated tape library, and manpower. The expenditure for manpower, tape libraries, hardware, and software will prove out an ROI in less than one year. This does not include any costs associated with ongoing litigation where tapes are pulled from storage for restoration. Such litigation support costs could easily reach hundreds of thousands of dollars annually, which would result in a shorter period of ROI.

In the past it was not cost-effective to remediate the mountains of tape stored offsite. Direct indexing technology now makes this feasible and is quickly becoming a best practice for any organization that is faced with constant legal events involving legacy data. Extraction using direct indexing technology does not require the backup software to access tape content. In addition, extraction leverages the index to understand data at a file and email level. By using direct indexing and extraction you can review the contents on tape, find relevant content and extract what is interesting. Direct indexing is a non-invasive scan of the tape that allows intelligence to be obtained about the contents: file types, dates, custodians, etc., and allows the selection and specific content to be gathered. Restoration requires you to first restore data before you can find the relevant content; it's a radically different process. The benefits of direct indexing over restoration are a clear savings of both time and money. As legal and IT work together, tape remediation is quickly becoming the preferred method to reduce corporate liability, and expand IT's ever-shrinking budget.

More Stories By Jim McGann

Jim McGann serves as Vice President of Information Discovery for Index Engines. He has extensive experience with the eDiscovery and Information Management. He is currently contributing to the Sedona working group addressing electronic document retention and production. Jim is also a frequent speaker for industry organizations such as ARMA and ILTA, and has authored multiple articles for legal technology and information management publications.

In recent years Jim has worked for technology based start-ups that provided financial services and information management solutions. Prior to Index Engines, he worked for leading software firms, including Information Builders and the French based engineering software provider Dassault Systemes. Jim was responsible for the Business Development of Scopeware at Mirror Worlds Technologies, the knowledge management software firm founded by Dr. David Gelernter of Yale University. Jim graduated from Villanova University with a degree in Mechanical Engineering.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
Disruption, Innovation, Artificial Intelligence and Machine Learning, Leadership and Management hear these words all day every day... lofty goals but how do we make it real? Add to that, that simply put, people don't like change. But what if we could implement and utilize these enterprise tools in a fast and "Non-Disruptive" way, enabling us to glean insights about our business, identify and reduce exposure, risk and liability, and secure business continuity?
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time t...
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
SYS-CON Events announced today that Silicon India has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Published in Silicon Valley, Silicon India magazine is the premiere platform for CIOs to discuss their innovative enterprise solutions and allows IT vendors to learn about new solutions that can help grow their business.