您可以通过编程方式使用spark-shell吗
问题描述:
是否可以从Java或Scala程序运行spark-shell?换句话说,在Java程序中启动spark-shell会话,将spark代码传递给它,并读回响应,然后继续代码内的交互.
is it possible to run a spark-shell from a java or scala program? another words, start a spark-shell session inside a java program, pass spark code to it and read back the response, and continue the interaction inside the code.
答
这是在 Spark 1.6.0
和 Scala 2.10
之上的有效解决方案.使用 Settings
和 bind
与类型关联的变量和值创建 SparkIMain
.
This is a working solution on top of Spark 1.6.0
and Scala 2.10
. Create SparkIMain
with Settings
and bind
the variables and values associated with types.
import org.apache.spark.repl.SparkIMain
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import scala.tools.nsc.GenericRunnerSettings
class TestMain {
def exec(): Unit = {
val settings = new GenericRunnerSettings( println _ )
settings.usejavacp.value = true
val interpreter = new SparkIMain(settings)
val conf = new SparkConf().setAppName("TestMain").setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val methodChain =
"""
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "false")
.option("inferSchema", "true")
.option("treatEmptyValuesAsNulls", "true")
.option("parserLib", "univocity")
.load("example-data.csv")
df.show()
"""
interpreter.bind("sqlContext" ,"org.apache.spark.sql.SQLContext", sqlContext)
val resultFlag = interpreter.interpret(methodChain)
}
}
object TestInterpreter{
def main(args: Array[String]) {
val testMain = new TestMain()
testMain.exec()
System.exit(0)
}}