Apache Livy - Simplified Apache Spark Integration
Overview
Apache Livy provides a REST interface for interacting with Apache Spark. Prior to Livy, Apache Spark typically required running spark-submit
from the command line or required tools to run spark-submit
. This was not feasible in many situations and made security around Spark hard.
Apache Livy History
Cloudera originally built Livy to solve these problems by providing an interface by which Spark jobs can be submitted and monitored easily. Hortonworks decided to support and improve Livy as indicated here and here. Livy to the Apache Software Foundation and is in the incubator process currently. Many other companies and tools have started using Apache Livy as an integration point for interacting with Apache Spark. Outlined below is an example of what Apache Livy enables.
Apache Livy Architecture
Integration with Apache Livy
As diagramed above, Apache Livy integrates with many different tools to enable users to quickly and securely use Apache Spark. Microsoft with Azure HDInsight supports Apache Livy for connecting to Spark clusters. Jupyter Notebook, an open source web based notebook, can use Livy with sparkmagic
to interact with Spark. Another web based notebook solution, Apache Zeppelin integrates natively with Livy. Anaconda, which supports both Jupyter and Apache Zeppelin, works with Livy (video) as well. Recently Apache NiFi added support for submitting Spark jobs via Livy. Finally, Apache Knox can provide LDAP authentication in front of Apache Livy.
All of the integrations above make it easier to use Apache Spark without requiring spark-submit
due to Apache Livy. Building on top of Apache Livy provides a great abstraction to not worry about where the Spark job will be run.
What is next?
Over the past year, I have been working with my team and multiple analytics teams to simplify the experience of getting started and using Apache Spark. Apache Livy provides the capabitilies necessary to do this without compromising on ease of use or security. Since much of the documentation for Apache Spark revolves around spark-submit
, I have been looking into converting those examples to work with Apache Livy.