The explosion of new types of data such as social, graphical and video from inputs such as the web and connected devices, or just sheer volumes of records - has put tremendous pressure on the Enterprise Data Warehouse (EDW).
A recent IDC report estimated businesses would be dealing with 2.8 Zetta Bytes (a Zetta byte is one billion terabytes, or one with 21 zeroes after it) of data in 2012 and that’s expected to grow to 40 Zetta Bytes by 2020, and the majority (85%) of this data growth is expected to come from new types; with machine-generated data being projected to increase 15x by 2020.
In response to this disruption, an increasing number of organisations have turned to new methods of dealing with this tsunami of data that both help manage the enormous increase in data whilst maintaining coherence of the EDW.
The Journey to a Data Lake
This paper discusses Apache Hadoop as a solution to the EDW problem and looks at its capabilities as a data platform and how the core of Hadoop and its surrounding ecosystem solution vendors provides the enterprise requirements to integrate alongside the EDW and other enterprise data systems as part of a modern data architecture, and as a step on the journey toward delivering the concept of an enterprise ‘Data Lake’.
With an enterprise “data lake” businesses receive all the following core benefits to an enterprise:
- Data architecture efficiencies - through a significantly lower cost of storage, and through optimisation of data processing workloads such as data transformation and integration.
- New opportunities through flexible ‘schema-on-read’ access to all enterprise data, and through multi-use and multi-workload data processing on the same sets of data: from batch to real-time.
To read more on data lakes and how they can improve your business and reduce your dependence on EDW click on the white paper below.