YOUR FEEDBACK
Three RIA Platforms Compared: Adobe Flex, Google Web Toolkit, and OpenLaszlo
NN wrote: Yeah you are right GWT is poor man's Flex. After using GWT on two...
SOA World Conference
Virtualization Conference
$200 Savings Expire May 16, 2008... – Register Today!

SYS-CON.TV
TODAY'S TOP SOA & WEBSERVICES LINKS


Applying Information Lifecycle Management Today
Separating value from visions

Digg This!

With volumes of stored data growing seemingly without limits, organizations are struggling to meet their burgeoning storage demands. While the price of high-performance disk storage continues to drop, it is not dropping fast enough to accommodate the annual doubling of data in more data-intensive environments. The only alternative for many has been manually archiving data from primary disk to tape or other forms of storage - a time-consuming and error-prone process that can inhibit or even prevent access to critical data when it's needed.

Increasingly, Information Lifecycle Management (ILM) is being discussed as the solution to these problems. While much of this concept is based on future developments, a real and significant piece of the functionality proposed by ILM is available today. That piece, referred to as Data Lifecycle Management (DLM), delivers immediate value for data intensive environments.

ILM - the Promise

In theory, under ILM all data is classified and then managed from cradle to grave to ensure that it is automatically stored on cost-appropriate storage devices and given the appropriate level of data protection. In most cases, data goes through a fairly predictable life cycle. It is accessed most heavily in the first few weeks after creation, and then that access frequency drops off significantly as the data ages. Data may eventually be deleted, but an increasing amount of data must be retained indefinitely.

As shown in Figure 1, step 1 of the ILM process is categorization and includes considerations such as criticality of data as well as compliance requirements. In step 2, policies are created to ensure that each category has an appropriate level of access, protection, recoverability, etc. These policies are implemented automatically in step 3. Step 4 is the verification that the system is working and adjustments are made if necessary.

ILM - the Reality

The bad news is that three of the four steps are still manual. The good news is that the DLM solutions available today perform the "automate policies" step, which can save a lot of time and money while helping to manage risk. Policy automation (DLM) solutions keep data available to users and applications while moving it seamlessly among different types of storage without administrative intervention to yield:
  • Lower Total Cost of Ownership (TCO) versus buying an all-disk solution to store live data
  • Higher productivity versus traditional, off-line
  • Lower risk to data availability and integrity versus manual data migration
  • Lower data-related liability risk because flexible policies can accommodate a wide range of current and potential compliance requirements
The manual ILM steps and their application are outlined below and illustrated in a customer example that demonstrates the outcome, including a typical representation of the benefits of DLM today. Categorization establishes the information about the data and can be driven by productivity or non-productivity requirements. Productivity requirements dictate that data should remain available as long as it contributes more to the productivity or quality of the work than it costs to keep it available. Accountability elements highlight conditions where a company rule or an outside regulation requires the data to be retained in a certain way, or for a certain length of time.

Categorization Factors

  • Productivity Elements
    - Owner
    - Age: When created
    - Size
    - Format
    - Frequency of access and how it changes over time
    - Speed of access and how it changes over time
    - Access Permissions and how they change over time
  • Accountability Elements
    - Subject to company policies
    - Subject to compliance or regulatory rules or laws
The goal of policy creation is to ensure that all the factors specified in the categorization process are accommodated in how the data is retained over time, and to take budgetary constraints into consideration. If productivity and accountability elements have been conscientiously determined they should clearly dictate the policy for each data category. Policies need to ensure that frequently used data is on the fastest access media, that no critical data is lost or deleted, and that less frequently accessed data is moved to slower media to save money.

Policy Creation Considerations

  • Persistence: How long data must be available
  • Location: On what storage media
  • Access protection: Degree to which data access is protected
  • Data protection: Degree to which data is protected from loss
  • TCO/ROI considerations: Cost of retention vs. the value of the data over time
Policy automation is illustrated in the case study presented below.

Verification is the last step and should be performed at recurring, fixed intervals. Verification consists of checking that the current state of the data fits with the requirements determined in the data categorization and policy creation steps.

Case Study - Widget Co.

Categorization
The design department of Widget Co. designs all of the company's products. A typical design cycle lasts six months and the department needs immediate access to current design-cycle data. To avoid undesirable design elements as well as time-wasting re-invention, they compare against the design data of all products shipped in the past 10 years. These comparisons involve large amounts of data and, because they affect product shipping dates, need to be completed quickly. While there is no current regulatory rule that applies to the retention of this data, the company believes rules are likely to be created in the future. They require that data remains accessible for 25 years to protect against any unforeseen liabilities from either product defect claims, or the introduction of industry-wide compliance rules on data retention.

Policy Creation
Design-related data must be accessible at the fastest possible rates for six months. Design data for the past 10 years needs to be accessed quickly enough to allow for same-day comparison and analysis to occur to protect ship schedules. Because the perceived value to the company of keeping data over 10 years is solely for liability reasons, the only time constraint is that it be retrievable within a reasonable legal discovery period which they determine to be one month.

The requirements for data access and retention indicated by the categorization and policy creation steps confirms that Widget Co. needs an ILM solution. Attempting to address these with an all fibre channel RAID (FC RAID) disk solution would provide the fastest data access and removes the risk, cost, and complexity of manual migration, but would cost almost three times the total current and projected IT budget for storage. Attempting to address cost by using FC RAID only for current design-cycle data while placing older data on a low-cost off-line archive fixes the cost problem, but all policy implementation would be manual and the responsibility of the IT department, and historic designs would need to be restored from archive, adding an estimated 20 days to every release.

Sourcing a DLM Solution
Now the IT department has all the core information they need to source a solution for the automate policies step. From the categorization and policy definition steps they know the design department needs the fastest possible access to all current design cycle data and access to data from the last 20 design cycles within a few hours. The legal department needs to be able to access all design data on product releases in the past 25 years in under a month. All design data up to 25.5 years old is considered important, but only current design-cycle data is considered critical.

Taking the policy implementation requirements above, Widget Co. issues an RFI for a solution to meet their requirements. The resulting submissions fall roughly under the same approach: a mix of FC RAID for the current data and less expensive storage media like Serial ATA (SATA) and tape where the policy implementation is automated by DLM software intelligence (see Figure 2). The benefits are that current design data is on the fastest media; and because of the automated DLM implementation design, data under 10.5 years old is stored on less expensive media, but does not have to be restored from archive. Design data over 10.5 years old is automatically identified by the DLM software as ready for archive, human error is removed from the policy implementation, and the system conforms to data access standards. No drawbacks to this approach are identified.

Refining DLM Solution Criteria
Widget Co. requests quotes for DLM solutions. The submissions show the company the importance of several criteria they had not allowed for. They then refine their criteria to verifying that the right DLM solution should:

  • Be proven to scale to a capacity that addresses 10-year projected growth
  • Be proven to scale to a performance level at which same-day analysis of historic data comparisons can still be performed in one day at data sizes projected for 10 years in the future.
  • Have standard interfaces and allow the maximum policy flexibility to accommodate possible future accountability requirements
Results
The IT department estimates that over the next 10 years they will save:
  • Over 65% on the storage capacity versus all disk since they have planned SATA and Tape as the majority of their planned capacity
  • Approximately 50% on storage management costs through the removal of manual monitoring and provisioning of available capacity and data movement
The design department estimates that over the next 10 years they will save almost $1 million in personnel costs by eliminating the 20 days per release accessing data from the offline archive would add to comparisons with historic product designs. An additional benefit to the company of this approach is that it reduces the design cycle by 20%, allowing them to become 20% more productive.

Widget Co. is satisfied that the system will allow them to find any data for which they could reasonably have any liability and to demonstrate its integrity, and because it works with their backup system and fits the IT budget, no additional analysis is considered necessary to approve the implementation.

They decide to perform a verification of the system function every six months, adjusting policies to meet any new requirements or change any that are not achieving the desired results.

Conclusion

As the Widget Co. example illustrates, while ILM solutions do not currently deliver on all promised areas, existing approaches, or DLM solutions, do offer significant value for environments seeking to reduce costs, increase productivity, and meet specific retention requirements.
About Laura Shepard
Laura Shepard is marketing product line manager for SGI InfiniteStorage (www.sgi.com/storage). Having worked with data-intensive computing and storage environments for seven years, Laura has seen data solutions grow from gigabytes to petabytes.

SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON FEATURED WHITEPAPERS


ADS BY GOOGLE