Our website makes use of cookies like most of the websites. In order to deliver a personalised, responsive and improved experience, we remember and store information about how you use it. This is done using simple text files called cookies which sit on your computer. These cookies are completely safe and secure and will never contain any sensitive information. By clicking continue here, you give your consent to the use of cookies by our website.

Tuesday, 08 September 2015 02:24

The four Vs of data and mitigating big data latency

Posted By  Graham Jarvis

Big data needs a steady supply of real-time data to be effective, but most business networks weren’t designed for this sort of throughput. Graham Jarvis looks at the problem of the four v’s of data, volume, velocity, variety and veracity.

Data has become the currency of business. It is a highly valuable asset, the lifeblood of a modern enterprise, and hence the flow and distribution of that data is now critical for business decisions. Business intelligence and analytics, for example, requires that data flow to be as close to real time as possible. Markets such as financial insurance services and airlines rely on updates from a myriad of sourcesto decide pricing including deal flows, demand, social events and even climate changes. However any inability to deliver the spread and depth of information, particularly if their network is bottlenecked by latency issues, means organisations will miss out on opportunities to maximise profit.

Managing the Four Vs

What organisations are looking for is a solution that can manage all of the ‘four Vs of data’, volume, velocity, variety and veracity.  As David Trossell – CEO of Bridgeworks explains. “We can now gather more data in half the time than we used to be able to do, and with it organisations can examine scenarios in a more informed way –it’s not necessarily about having just two or three of the Four Vs, you need them all: volume, variety, velocity and veracity to be there in order to be able to gain a complete picture.”

So what are the ‘four V’s of data?’

  • Volume is the measure of data quantity that is generated per unit of time, and currently it is estimated that there are
  • Variety is about the different types and sources of data such as medical data or video footage
  • Velocity is the speed at which large datasets can be read and analysed.
  • Veracity is the extent to which the data can be trusted under analysis. For example, factual data from social media is likely to be more trusted than people’s opinions.

Latency’s impact

Velocity and volume are key problems particularly when looking at big data, as Trossell explains: “Network latency can slow down the process and your ability to move data around –including into the cloud.”He adds that if you have lots of data points, and you can’t get them into the data centre then your data is going to be old. Big data, therefore, needs to be analysed while it’s still fresh, but your ability to manage it depends on your infrastructure and the size of your organisation. “I don’t think we’re here to tell people what’s the right or what’s the wrong way, but it is essentially about how quickly you can move the data within your infrastructure capability”, says Claire Buchanan, Chief Commercial Officer for Bridgeworks.

“Some data uses, such as video replay or instantaneous credit card authorisation across the world, require very low latency for the business use of the data to be viable”, suggests Bryan Foss who works as a visiting professor at Bristol Business School.  Foss adds: “While some data is stored and little used, other data is used extensively –in real-time and in parallel, in environments where any delay such as buffering can destroy its value and use, and so latency delays may be acceptable in some environments but completely forbidden in others.”

Trossell agrees: “It’s best to have the data as fresh as possible from wherever you like, but you need to ask what you want to do with it that day: keep the data or throw it away?”Much depends on demand and the questions asked. Organisations may want to analyse both historic and real-time data in order to know what customers have bought, what they are buying and what they might buy in the future.

Managing storage

Additionally with data volumes increasing organisations need to plan to ensure that they have adequate storage capacity now and into the future to cope with the needs of their big data and analytics solutions and this cloud be either in-house or using a cloud service provider.

“If you have a high utilisation of CPU and storage for big data then in-house may be the route to go but if you have low utilisation, then off-premise may be the answer”, explains Trossell. There is in his experience a tipping point at the 50% utilisation mark at which point it might be worth considering the business case for managing it all in-house rather than outsourcing storage to a cloud or managed service provider.

He argues that there is a point when becomes more expensive to manage data and storage in the cloud than it does in-house because of cloud computing's utility pricing model. However, Phil Taylor - Director and Founder of Flex/50 thinks that managing it all in-house can “lead to organisations being burdened with capital and project costs that can be hard to scale according to changes in demand. Cloud means that organisations can upscale and downscale, for example, to meet the demand of seasonal market trends, or unexpected demands of another kind.”

Lastly, the issues around veracity and velocity come into play when organisations start to look at using data from many different places including multiple on-premise systems and external data sources. As Buchanan details moving data internally within a data centre is straightforward but as soon as take data from multiple data centres then “the issues of moving the data begin and latency will affect the volume and the speed of the data transfer.”Furthermore, if you are sending the data to the cloud she rightly suggests that you’d want it to be as secure as possible.

Finding solutions

To completely reduce the latencies from all four v’s described above, to an acceptable level for real-time data applications like big data, would require a complete network re-architecture, but most organisations don’t have the time and money to attempt such a drastic solution. However, there are solutions that allow you to reduce latencies in the network without having to ‘rip and replace.’ and one such solution is Self-Configuring Infrastructure Optimised Networks (SCION).

By using SCION, organisations can mitigate the effects of latency and maximise network utilisation infrastructure by as much as 95-98% claims Buchanan. “Using machine intelligence SCIONs offer optimal performance at all time, requiring no manual intervention in a way that dramatically reduces IT’s operational costs and helps to enable the four Vs and let’s face it, if you can’t have freshness, volume and speed it’s going to affect your ability to be agile from a business decision perspective.”

Gartner’s ‘Hype Cycle for Infrastructure Services, 2015’report also agrees and describes SCION solutions such as Bridgeworks WANrockIT SCION solution as going “beyond just keeping the lights on to now actively managing the configuration and performance tuning of the network between discrete points.”

About the author

Graham Jarvis is a contributing editor to CCi and an expert in cloud technologies.

1 comment

  • Comment Link CyberH Tuesday, 08 September 2015 20:32 posted by CyberH

    Graham, I think it’s worth mentioning HPCC Systems to tackle the four Vs of big data. Designed by data scientists, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems. It is a mature platform and provides for a data delivery engine together with a data transformation and linking system. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. More info at http://hpccsystems.com


Leave a comment

Make sure you enter the (*) required information where indicated. HTML code is not allowed.

IBM skyscraper2

datazen side

Most Read Articles