您可以通过编程方式使用spark-shell吗

您可以通过编程方式使用spark-shell吗

问题描述:

是否可以从Java或Scala程序运行spark-shell?换句话说,在Java程序中启动spark-shell会话,将spark代码传递给它,并读回响应,然后继续代码内的交互.

is it possible to run a spark-shell from a java or scala program? another words, start a spark-shell session inside a java program, pass spark code to it and read back the response, and continue the interaction inside the code.

这是在 Spark 1.6.0 Scala 2.10 之上的有效解决方案.使用 Settings bind 与类型关联的变量和值创建 SparkIMain .

This is a working solution on top of Spark 1.6.0 and Scala 2.10. Create SparkIMain with Settings and bind the variables and values associated with types.

import org.apache.spark.repl.SparkIMain
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

import scala.tools.nsc.GenericRunnerSettings
class TestMain {
  def exec(): Unit = {
    val settings = new GenericRunnerSettings( println _ )
        settings.usejavacp.value = true
    val interpreter = new SparkIMain(settings)

    val conf = new SparkConf().setAppName("TestMain").setMaster("local[*]")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)

    val methodChain =
      """
        val df = sqlContext.read
              .format("com.databricks.spark.csv")
              .option("header", "false")
              .option("inferSchema", "true")
              .option("treatEmptyValuesAsNulls", "true")
              .option("parserLib", "univocity")
              .load("example-data.csv")

        df.show()

      """
    interpreter.bind("sqlContext" ,"org.apache.spark.sql.SQLContext", sqlContext)
    val resultFlag = interpreter.interpret(methodChain)
  }
}

object TestInterpreter{

    def main(args: Array[String]) {
      val testMain = new TestMain()
      testMain.exec()
      System.exit(0)
    }}