Google have announced a plethora of new products and services to their Google Cloud Platform including bringing together their PaaS and IaaS platforms, Managed Virtual Machines for App Engine, real-time Big Data analytics with Google BigQuery, Google Cloud DNS and some heavy price cutting across the board.
Nuno Godinho, Director of Cloud Services, Europe for Aditi Technologies discusses how cloud technologies can enable business analytics and big data.
Ever heard of data scientists? Well the Harvard Business Review named it the sexiest job of the 21st Century and it is has quickly risen to prominence in a number of industries including retail, oil and gas, telecommunications and financial services. So what has this got to do with cloud? Should we all be packing up our RESTful APIs and retraining? Not at all. Before looking at what the cloud has to offer the data scientist and IT departments that work with them, lets have a quick dash through history and give an explanation of the role.
Analysing data with computers has gone through a number of significant changes over the years, from looking at raw numbers by hand, to spreadsheets, Business Intelligence and, more recently, visualisations and complex real-time predictive analytics. One thing that has largely been consistent throughout that period has been the data and analysis taking place on-premise. For a long time this was down to technical reasons, but now that has been overcome with many vendors offering PaaS and SaaS versions of their tools and platforms, which means most of the barrier can be put down to the commercial sensitivity of data and it going outside the firewall. Will the service be reliable, safe from hackers, and adequately protected with encryption etc?
In terms of the data scientists themselves, in truth this role has been around for a long time. Essentially it boils down to extracting deep meaning from data. However, the key to doing it is understanding the data itself – not just the results – the raw data, its sources, formats and relationships and then combining this with strategies to analyse the data, from descriptive and prescriptive to predictive perspectives. The data scientist has to have a mind that combines analytic, business and IT skills. Previously representatives from these departments would (read might) have worked to generate the required analysis normally leveraging some kind of data warehouse tool.
Traditional business intelligence, reviews and presents historic data, whether that is two-seconds or two-years old. For years analysts have been taking those same datasets and using them to built prescriptive models that describe the relationships between the data elements and how the numbers interact together. But the latest frontier is real-time predictive analytics: using those models to predict (for example) an action, value or preference. Usually an event, such as an attempted credit card transaction, triggers a model to be run against the transaction details to determine the chances of whether it could be fraudulent, or to make recommendations for other products an online shopper might like. In financial services trading floor systems make thousands of these predictions a second to assess stock movements.
So where can the cloud fit into all this, and should it? The answer is, the cloud can be of benefit throughout the process of collecting and getting data to the point that it can be used in predictive analytics. Firstly, it can be an aggregation point. If your data sources are sensors distributed across an oil field, or even mobile such as truck geo-location data, the cloud can be the point where all those resources are brought together into a single data source for further processing.
Raw compute power from the cloud can also be used to process the big data associated with predictive analytics. Creating models can be an intensive task depending on the size of the data sets, if you don’t actually need to do this often, why make the capital investment when you can just buy the machine time?
The cloud can also be used to enrich your data sets, by providing additional data sources for your models. There are hundreds if not thousands of sources that can enhance your data, whether you need traffic data, government information, or simply temperature data. These sources are validated, reliable and can substantially improve the quality of your models, whilst reducing the costs.
The cloud can of course be responsible for the predictive analysis itself and it is at this point more than any other that you have to consider how quickly you need the results and whether speed and reliability demand you have the infrastructure on-site. For example if you are making thousands of transactions a second that rely on predictive analytics, and the internet connection to your cloud provider is lost – what happens? You may be able to switch to a back-up line, but how long does it take and what is the impact?
Data Scientists and IT departments alike should not ignore the role that the cloud can play in any analytics scenario. That is not to say that it right for all of them, but as we have explained above, there are a number of ways that cloud computing can play a role, it doesn’t have to be all or nothing. It can enhance models, lower costs and give smaller companies access to intelligence that they would otherwise not be able to afford. Basically in order to do these kinds of activities and analysis we require huge amounts of compute power and storage space. This is why the cloud is the perfect partner for big data.
So, to say ‘No’ outright is to deny yourself the possibility of improving or simplifying the way in which analytics is executed in your company.
He has been an MVP for the last six years, first as an MVP in ASP.NET and the last three years as a Windows Azure MVP. His is also a speaker at some of Microsoft's key events such as TechEd North America, TechEd Europe, Tech Days Worldwide Online, TechDays Netherlands and at other community events such as GASP - Grupo de Arquitectura de Software Português, Windows Azure UK User Group, Azure BE UG and so on. He is a prolific blogger and community creator.