Automated Tiered Storage Definition
Automated tiered storage (also automated storage tiering) is the automated progression or demotion of data across different tiers (types) of storage devices and media. The movement of data takes place in an automated way with the help of a software or embedded firmware and is assigned to the related media according to performance and capacity requirements. More advanced implementations include the ability to define rules and policies that dictate if and when data can be moved between the tiers, and in many cases provides the ability to pin data to tiers permanently or for specific periods of time. Implementations vary, but are classed into two broad categories: pure software based implementations that run on general purpose processors supporting most forms of general purpose storage media and embedded automated tiered storage controlled by firmware as part of a closed embedded storage system such as a SAN disk array. Software Defined Storage architectures commonly include a component of tiered storage as part of their primary functions.
In the most general definition, Automated Tiered Storage is a form of Hierarchical Storage Management. However, the term automated tiered storage has emerged to accommodate newer forms of real-time performance optimized data migration driven by the proliferation of solid state disks and storage class memory. Furthermore, where traditional HSM systems act on files and move data between storage tiers in a batch, scheduled like fashion, automated storage tiered systems are capable of operating at sub-file level both in batch and real-time modes. In the case of the latter, data is moved almost as soon as it enters the storage system or relocated based on its activity levels within seconds of data being accessed, whereas more traditional tiering tends to operate on an hourly, daily or even weekly schedule. Some more background on the relative differences between HSM, ILM and automated tiered storage is available at SNIA web site. A general comparison of different approaches can also be found in this ‘comparison article on auto tiered storage.
OS and Software Based Automated Tiered Storage
Most server oriented software automated tiered storage vendors offer tiering as a component of a general storage virtualization stack offering, an example being Microsoft with their Tiered Storage Spaces. However, automated tiering is now becoming a common part of industry standard operating systems such as Linux and Microsoft Windows, and in the case of consumer PCs, Apple OSX with its Fusion Drive.This solution allowed a single SSD and hard disk drive to be combined into a single automated tiered storage drive that ensured that the most frequently accessed data was stored on the SSD portion of the virtual disk. A more OS agnostic version was introduced by Enmotus which supports real-time tiering with its FuzeDrive product for Linux and Windows operating systems, extending support to storage class memory offerings such as NVDIMM and NVRAM devices.
SAN Based Tiered Storage
An example of automated tiered storage in a hardware storage array is a feature called Data Progression from Compellent Technologies. Data Progression has the capability to transparently move blocks of data between different drive types and RAID groups such as RAID 10 and RAID 5. The blocks are part of the “same virtual volume even as they span different RAID groups and drive types. Compellent can do this because they keep metadata about every block — which allows them to keep track of each block and its associations.”. Another strong example of SAN based tiering is DotHill’s Autonomous Tiered Storage which moves data between tiers of storage within the SAN disk array with decisions made every few seconds”.
Automated Tiered Storage vs. SSD Caching
While tiering solutions and caching may look the same on the surface, the fundamental differences lie in the way the faster storage is utilized and the algorithms used to detect and accelerate frequently accessed data. SSD caching operates much like SRAM-DRAM caches do i.e. they make a copy of frequently accessed blocks of data, for example in 4K cache page sizes, and store the copy in the SSD and use this copy instead of the original data source on the slower backend storage. Every time a storage IO occurs, the caching software look to see if a copy of this data already exists using a variety of algorithms and service the host request from the SSD if it is found. The SSD is used in this case as a lookaside device as it is not part of the primary storage. While some good caching algorithms can demonstrate native SSD performance on reads and short bursts of writes, caching typically operates well below the maximum sustainable rate of the underlying SSD devices as overhead CPU cycles are introduced during the host IO commands that increasingly impact performance as the amount of data cached grows. Tiering on the other hand operates very differently. Using the specific case of SSDs, once data is identified as frequently used, the identified blocks of data are moved in the background to the SSD and not copied as the SSD is being utilized as a primary storage tier, not a look aside copy area. When the data is subsequently accessed, the IOs occur at or near the native performance of the SSDs as there area are few if any CPU cycles needed to do the simpler virtual to physical addressing translations.