Hortonworks have firmed up their commitment to the Spark real-time big data platform by announcing that it will be integrating Apache Spark 1.5.2’s in-memory analytics and support for Spark SQL and Spark Streaming into its next release of the Hortonworks Data Platform (HDP).
Hortonworks has included Spark in it’s HDP for over a year now, it started with Spark version 1.2.1 in HDP 2.2 back in December 2014, and version 2.3 now includes version 1.3.1. But future versions of HDP will have closer links with Spark with users able to deploy Spark-based applications alongside Hadoop workloads in what it describes as a “consistent, predictable and reliable way.” Additionally Hortonworks has also said that in response to customer demand for access to multiple data sources it will also improve Spark’s integration with YARN, HDFS, Hive, HBase and ORC and will work to further optimise data access via a new Data Source API with the promise that Spark SQL users will be able to take advantage of the following capabilities:
- ORC File instantiation as a table
- Column pruning
- Language integrated queries
- Predicate pushdown
The business is also making a commitment to enterprises to enhance its versions of Spark to with “enterprise security, governance, operations and overall readiness for real-world production deployment.”
Hortonworks are also looking at helping the data science markets by increasing its commitment to Apache Zeppelin – a data analytics and visualisation project. It will be contributing additional Spark algorithms and packages to the project, including Project Magellan, an open source library for geospatial analytics that facilitates geospatial queries and builds upon Spark to solve hard problems dealing with geospatial data at scale.
Hortonworks has also launched Hortonworks Community Connection (HCC), a new online collaboration destination for developers, DevOps, and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.