Apache Livy - Apache Spark, HBase, and Kerberos

Overview

Apache Livy provides a REST interface for interacting with Apache Spark. When using Apache Spark to interact with Apache HBase that is secured with Kerberos, a Kerberos token needs to be obtained. This tends to pose some issues due to token delegation. spark-submit provides a solution to this by getting a delegation token on your behalf when the job is submitted. For this to work, HBase configurations and JAR files must be on the spark-submit classpath. Specifically which configurations an JAR files are explained in multiple references (here, here, and here). Livy doesn’t currently expose a way to dynamically add the required configuration and JARs to the spark-submit classpath. A long term solution could be explored with LIVY-414 which could allow the appropriate environment variables to be set when a Spark job is submitted.

Assumptions

Apache Ambari for managing Apache Spark and Apache Livy
Apache Knox in front of Apache Livy secured with Kerberos
Apache HBase secured with Kerberos

Setup Spark Environment with Ambari

Add to bottom of `spark-env.sh`

export SPARK_CLASSPATH="/etc/hbase/conf:/usr/hdp/current/hadoop-client/hadoop-common.jar:/usr/hdp/current/hadoop-client/lib/guava-11.0.2.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar"

Add to `spark-defaults.conf`

spark.driver.extraClassPath /etc/hbase/conf:/usr/hdp/current/hadoop-client/hadoop-common.jar:/usr/hdp/current/hadoop-client/lib/guava-11.0.2.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar
spark.executor.extraClassPath /etc/hbase/conf:/usr/hdp/current/hadoop-client/hadoop-common.jar:/usr/hdp/current/hadoop-client/lib/guava-11.0.2.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar

Submitting an Example Spark HBase Job with Livy

Livy submission script

#!/usr/bin/env bash

curl \
  -u ${USER} \
  --location-trusted \
  -H 'X-Requested-by: livy' \
  -H 'Content-Type: application/json' \
  -X POST \
  https://localhost:8443/gateway/default/livy/v1/batches \
  --data "{
    \"proxyUser\": \"${USER}\",
    \"file\": \"hdfs:///user/${USER}/spark-hbase-kerberos-1.0-SNAPSHOT.jar\",
    \"className\": \"SparkHBaseKerberos\",
    \"args\": [
      \"tableName\"
    ]
  }"

Example `SparkHBaseKerberos` Class

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class SparkHBaseKerberos {
  public static void main(String[] args) throws Exception {
    System.out.println("Starting");

    String tableName = args[0];
    System.out.println("tableName: " + tableName);

    SparkConf sparkConf = new SparkConf().setAppName(SparkHBaseKerberos.class.getCanonicalName());
    try (JavaSparkContext jsc = new JavaSparkContext(sparkConf)) {
      Configuration config = HBaseConfiguration.create();
      config.set(TableInputFormat.INPUT_TABLE, tableName);
      JavaPairRDD<ImmutableBytesWritable, Result> rdd = jsc.newAPIHadoopRDD(config, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
      System.out.println("Number of Records found: " + rdd.count());

      System.out.println("Done");

      jsc.stop();
    }
  }
}