Our website makes use of cookies like most of the websites. In order to deliver a personalised, responsive and improved experience, we remember and store information about how you use it. This is done using simple text files called cookies which sit on your computer. These cookies are completely safe and secure and will never contain any sensitive information. By clicking continue here, you give your consent to the use of cookies by our website.

Thursday, 10 December 2015 11:52

Hortonworks pushes Spark development

Written by 

Hortonworks are enhancing support for Spark with closer ties into the Hortonworks Data Platform, support for Spark SQL and Spark Streaming and commitments to new enterprise and data science enhancements

Hortonworks have firmed up their commitment to the Spark real-time big data platform by announcing that it will be integrating Apache Spark 1.5.2’s in-memory analytics and support for Spark SQL and Spark Streaming into its next release of the Hortonworks Data Platform (HDP).

Hortonworks has included Spark in it’s HDP for over a year now, it started with Spark version 1.2.1 in HDP 2.2 back in December 2014, and version 2.3 now includes version 1.3.1. But future versions of HDP will have closer links with Spark with users able to deploy Spark-based applications alongside Hadoop workloads in what it describes as a “consistent, predictable and reliable way.” Additionally Hortonworks has also said that in response to customer demand for access to multiple data sources it will also improve Spark’s integration with YARN, HDFS, Hive, HBase and ORC and will work to further optimise data access via a new Data Source API with the promise that Spark SQL users will be able to take advantage of the following capabilities:

  • ORC File instantiation as a table
  • Column pruning
  • Language integrated queries
  • Predicate pushdown

The business is also making a commitment to enterprises to enhance its versions of Spark to with “enterprise security, governance, operations and overall readiness for real-world production deployment.”

Hortonworks are also looking at helping the data science markets by increasing its commitment to Apache Zeppelin – a data analytics and visualisation project. It will be contributing additional Spark algorithms and packages to the project, including Project Magellan, an open source library for geospatial analytics that facilitates geospatial queries and builds upon Spark to solve hard problems dealing with geospatial data at scale.

Hortonworks has also launched Hortonworks Community Connection (HCC), a new online collaboration destination for developers, DevOps, and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.

Leave a comment

Make sure you enter the (*) required information where indicated. HTML code is not allowed.



255x635 banner2-compressed