Friday, December 22, 2006

ILM: the promises and the problems

Information Lifecycle Management is a new storage paradigm that has been embraced to some degree by almost every enterprise storage hardware, software and systems vendor. We believe that the time has come to take a critical look at ILM, and explore both the benefits and the challenges associated with ILM.

What is ILM?

Where Did it Come From?

The concept behind Information Lifecycle Management is both simple and powerful. With ILM, information is always stored in the right place. In this definition, always recognizes that the value of information, and its access requirements, changes at different points in its lifetime, from creation to eventual destruction. Right place means the least expensive storage resource available that meets the operational requirements for that piece of information, at that particular point in time. These operational requirements may include such variables as time-to-access, levels of protection and security, retention characteristics, etc.

ILM is strongly related to HSM (Hierarchical Storage Management). Where HSM was developed for direct-attached storage, first in the mainframe environment and then adapted for client/server systems, ILM applies to networked storage HSM is a 2-dimensional structure where data is automatically migrated from primary to secondary storage when certain policy parameters are met, such as the age of the data or the time since it was last accessed. Additional layers, or tiers, may be deployed to further take advantage of cost savings. For example, the primary storage may be a RAID array that stores all newly created or modified data files. After 60 days, the file would then be moved to an automated tape library that provides near-line access but is much less expensive than the RAID system. After a year, the data may be migrated to an off-line tape that is stored on a shelf, further reducing the costs of retaining this information.

The migration happens automatically and in the background, and is transparent to users and applications. In effect, the capacity of the primary storage resource is extended or expanded by the capacity of the secondary resource(s). This is done by virtualizing, or combining, all of these resources into a single file system. The HSM engine may keep track of where migrated files are physically located, or it may leave pointers at the original file location on the primary device to indicate where the file was moved. In either case, when access to a migrated file is requested, the HSM software retrieves it from the secondary storage and delivers it to the user or application, just as if it were stored in its original location. The only difference may be in the amount of time it takes to retrieve the file, since many secondary storage resources, such as tape and optical disk libraries, rely on robotics to move a cartridge from a shelf in the library and load it into a drive to read and write data. For this reason, the timeout values for applications that interface with HSM (and ILM) may need to be adjusted.

ILM is more of a 3-dimensional model. It pools all of the available storage resources on a storage network into a single, large virtual repository. These resources are then organized into storage classes, each with its own value proposition (e.g., cost vs. performance). Programmed policies then monitor all of the information stored in this pool, and when conditions change such that the requirements of a policy are met, the affected data is moved to the storage class specified in the policy. Instead of using the tiered approach of the HSM model, ILM can move data to and between any of the devices in the storage network. These devices may include enterprise-class disk arrays (with appropriate levels of mirroring and data protection), NAS and CAS filers, tape and optical disk libraries, and even off-site storage.

But in practice, almost all ILM policies will define an HSM-like tiered approach to data management. There are few applications that, as a rule, have data sets that need to be moved from secondary storage back to primary storage at predetermined times. For example, old medical records may need to be retrieved prior to a patient's visit, or an archived legal transcript may be needed when a new proceeding is started; but these are event-driven access requirements and cannot be programmed into a global policy.

Benefits of ILM

The introduction of storage networking allowed IT administrators to centralize the management of diverse storage resources, using common tools from a single console to handle tasks such as resource utilization, adding and removing capacity, provisioning and data protection. ILM extends that concept to centralizing the management of the data that is created and used by perhaps hundreds of applications and thousands of users. But the big benefit of ILM, and really the only reason to deploy it, is cost savings.