Ash Ashutosh argues that a move to expensive, fast, data storage technologies such as SSD needs to be run in parallel with data cleansing and de-duplication through techniques like Copy Data Virtualisation.
Data centre owners today face a number of challenges with data centre technology being subject to a radical change. The storage media trend is moving away from tape and hard disk drive (HDD) to flash memory (SSD), while some companies opt for a mixed mode. The complexity of the IT landscape in the data centre increases regardless. At the same time the cost pressure rises so that compute, storage and network resources need to be consolidated. Energy costs are an issue and it is key to increase data centre efficiency. This is all the more urgent as the avalanche of data grows and the vast memory requirements are significantly reflected in operating costs. Also, companies want to draw valuable insights from this huge mountain of ‘Big Data’.
Then there is the issue of security. All this data must be accessible but secure. This is a difficult balancing act and one that becomes more complex with more data copies floating around, leading to the increase of the "attack surface". If fewer copies are created, the number of security-related targets would reduce leading to lower administrative and operating costs.
The volume of data grows daily by the unchecked proliferation of data copies. Multiple copies of data are generated in separate silos for different purposes such as backup, disaster recovery, test, snapshots or migrations. In 2013, IDC estimated up to 120 copies of specific production data can circulate within a company, meaning the cost of managing the flood of data copies reached 44 billion dollars worldwide. As a net result, the management of this issue within a company is now taking more resources than the management of the actual production data.
Use Data Virtualisation to stem the flow
Data virtualisation has proven to be an effective measure to make data management more efficient. By integrating data de-duplication and optimising the network utilisation, efficient data handling is possible. Since less bandwidth and memory is required, short recovery times can be reached.
A principle is the use of a so-called "virtual pipeline", a distributed object file system in which the fundamentals of data management are virtualised. With this approach, virtual copies can be time-specific data from the collection of unique data blocks at any time. When data must be restored, the underlying object file system from the Copy Data Management solution is then extracted and analysed on a user-defined recovery point in any application format. Data is mounted directly on a server with no data movement required, leading to quick recovery times. The recovered data is then immediately available.
Data handling efficiency
The virtual data pipeline is used to collect, manage and provide data as efficiently as possible. After creating and storing a single complete snapshot, only the changed blocks of application data are detected by by using Change Block Tracking with an incremental-forever principle. Data is collected at the block level as this is the most efficient way to track changes and transfer. The data will be used and stored in its native format and there is no need to create or restore data from backup files as it can be managed and used in an efficient manner.
The data is recorded on the basis of Service Level Agreements (SLAs) that can be set by the administrator. These include the frequency of the snapshots, the type of memory in which to store them and the duration of storage. Theses SLAs could also be set if the data is to be replicated to a remote location or to a cloud service provider. Once an SLA is created, any application or virtual machine can access the data.
For the data management element, a "master copy" of the production data, which is updated with incremental changes forever, is held to be always available. Unlimited virtual copies of selected production data can me provisioned for testing, development, analysis and so on within minutes without affecting production. The "golden master copy" can also be transferred to an outsourced location for disaster recovery.
Positive impact on the data centre
The virtualisation of data copies relieves production systems and supports data backup, disaster recovery and business continuity almost as a side effect. For the server backup, the conventional NDMP backup server becomes obsolete. Full backups are not necessary as the mount images of an arbitrary point in time is possible at any time instantly. For long-term retention data can be efficiently deduplicated and compressed. Sinca data is captured in it’s native format even older images can be mounted immediately and lost or damaged data can be recovered in minutes.
Copy Data Management also supports disaster recovery requirements within the business. Data is replicated to a remote location using different options. Synchronous or asynchronous LUN mirroring is also possible. With special De-dup-Async replication technology, bandwidth is significantly reduced. For De-dup-Backup replication, only the required block that is already deduplicated and compressed has to be transferred for long-term storage via the WAN.
In application development, testing and analysis, Copy Data Virtualisation also plays to its strengths. Mounting virtual images on any device takes place immediately without the need to create a complete copy. This results in lower memory access.
Copy Data Virtualisation pays
Copy Data Virtualisation can be used for a subset of the data backup in coexistence with existing applications and infrastructure. The greatest efficiency potential unfolds when existing isolated silo systems can gradually be replaced. A Copy Data Virtualisation Platform then acts as a central platform for data management and solves the growing problem of redundant data copies.
Companies which prefer to catch the rapid growth of data by investing in additional storage hardware will sooner or later be thwarted by the cost explosion. SSD as a new, efficient but still expensive storage technology should not be flooded with redundant copies of data towards capacity limits. With Copy Data Virtualisation, data management is compressed and freed of redundancies. This gives the available memory resources air to cope with data growth in the coming years and reduces operating costs for the data centre.
About the Author
Ash Ashutosh is CEO of Actifio and a recognised leader and architect in the storage industry where he has spearheaded several major industry initiatives, including iSCSI and storage virtualisation, and led the authoring of numerous storage industry standards.
Ashutosh was most recently a Partner with Greylock Partners where he focused on making investments in enterprise IT companies. Prior to Greylock, he was Vice President and Chief Technologist for HP Storage.