Overview

Apache Livy provides a REST interface for interacting with Apache Spark. When using Apache Spark to interact with Apache HBase that is secured with Kerberos, a Kerberos token needs to be obtained. This tends to pose some issues due to token delegation. spark-submit provides a solution to this by getting a delegation token on your behalf when the job is submitted. For this to work, HBase configurations and JAR files must be on the spark-submit classpath. Specifically which configurations an JAR files are explained in multiple references (here, here, and here). Livy doesn’t currently expose a way to dynamically add the required configuration and JARs to the spark-submit classpath. A long term solution could be explored with LIVY-414 which could allow the appropriate environment variables to be set when a Spark job is submitted.

Assumptions

Setup Spark Environment with Ambari

Add to bottom of spark-env.sh

export SPARK_CLASSPATH="/etc/hbase/conf:/usr/hdp/current/hadoop-client/hadoop-common.jar:/usr/hdp/current/hadoop-client/lib/guava-11.0.2.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar"

Add to spark-defaults.conf

spark.driver.extraClassPath /etc/hbase/conf:/usr/hdp/current/hadoop-client/hadoop-common.jar:/usr/hdp/current/hadoop-client/lib/guava-11.0.2.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar
spark.executor.extraClassPath /etc/hbase/conf:/usr/hdp/current/hadoop-client/hadoop-common.jar:/usr/hdp/current/hadoop-client/lib/guava-11.0.2.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar

Submitting an Example Spark HBase Job with Livy

Livy submission script

#!/usr/bin/env bash

curl \
  -u ${USER} \
  --location-trusted \
  -H 'X-Requested-by: livy' \
  -H 'Content-Type: application/json' \
  -X POST \
  https://localhost:8443/gateway/default/livy/v1/batches \
  --data "{
    \"proxyUser\": \"${USER}\",
    \"file\": \"hdfs:///user/${USER}/spark-hbase-kerberos-1.0-SNAPSHOT.jar\",
    \"className\": \"SparkHBaseKerberos\",
    \"args\": [
      \"tableName\"
    ]
  }"

Example SparkHBaseKerberos Class

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;

public class SparkHBaseKerberos {
  public static void main(String[] args) throws Exception {
    System.out.println("Starting");

    String tableName = args[0];
    System.out.println("tableName: " + tableName);

    SparkConf sparkConf = new SparkConf().setAppName(SparkHBaseKerberos.class.getCanonicalName());
    try (JavaSparkContext jsc = new JavaSparkContext(sparkConf)) {
      Configuration config = HBaseConfiguration.create();
      config.set(TableInputFormat.INPUT_TABLE, tableName);
      JavaPairRDD<ImmutableBytesWritable, Result> rdd = jsc.newAPIHadoopRDD(config, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
      System.out.println("Number of Records found: " + rdd.count());

      System.out.println("Done");

      jsc.stop();
    }
  }
}