SparkLauncher和Java -jar fat-jar相比有什么好处?

问题描述:

我知道SparkLauncher用于以编程方式启动Spark应用程序,而不是使用spark-submit脚本,但是当使用SparkLauncher或有什么好处时,我感到有些困惑.

I know SparkLauncher is used to launch spark application programmatically instead of using spark-submit script, but I am feeling a bit confused when to use SparkLauncher or what's the benefit.

以下代码使用SparkLauncher启动主类为"org.apache.spark.launcher.WordCountApp的spark应用程序:

Following code is using SparkLauncher to launch a spark application whose main class is "org.apache.spark.launcher.WordCountApp:

代码是:

object WordCountSparkLauncher {
  def main(args: Array[String]) {
    val proc = new SparkLauncher()
      .setAppName("WordCountSparkLauncherApp")
      .setMaster("local")
      .setSparkHome("D:/spark-2.2.0-bin-hadoop2.7")
      .setAppResource("file:///d:/spark-2.2.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.2.0.jar")
      .setVerbose(true)
      .setMainClass("org.apache.spark.launcher.WordCountApp")
      .launch()

    new Thread(new IORunnable(proc.getInputStream, "proc-input-stream")).start()

    new Thread(new IORunnable(proc.getErrorStream, "proc-error-input-stream")).start()

    proc.waitFor()

  }
}

它工作正常,但是还有另一种选择:

It is working fine,but there is another choice:

使用maven shade插件创建一个可运行的胖子罐,将所有与spark相关的依赖项打包到一个jar中,这样,我仍然可以使用java -jar thefatjar运行spark应用程序.

Create a runnable fat jar using maven shade plugin to pack all the spark related dependencies into one jar, and in this way, I could still run the spark application with java -jar thefatjar.

SparkLauncher与可运行的胖罐子相比有什么好处?

What are the benefits of SparkLauncher vs a fat runnable jar?

SparkLauncher有什么好处,SparkLauncher比可运行的胖罐子有什么好处吗?

what's benefit of SparkLauncher, Is there some benefit of SparkLauncher over fat runnable jar?

考虑启动Spark应用程序的不同方式以及具有的集成选项.

Think of the different ways you launch a Spark application and what integration options you have.

您必须使用胖子安装Java,并且启动Spark应用程序需要执行java -jar [your-fat-jar-here].如果要从Web应用程序启动该应用程序,很难实现自动化.

With a fat-jar you have to have Java installed and launching the Spark application requires executing java -jar [your-fat-jar-here]. It's hard to automate it if you want to, say, launch the application from a web application.

使用SparkLauncher,您可以选择从另一个应用程序启动Spark应用程序,例如上面的Web应用程序.只是简单得多.

With SparkLauncher you're given the option of launching a Spark application from another application, e.g. the web application above. It is just much easier.

虽然两者都以某种方式为您提供了集成点,但SparkLauncher却更易于在另一个基于JVM的应用程序中使用.您不必还原为使用命令行(有其自己的漂亮").

While both give you integration points in some way, SparkLauncher is just simpler to work with from another JVM-based application. You don't have to revert to using the command line (that has its own "niceties").

如果我想在另一个程序中运行spark应用程序,则只需在Web应用程序中创建SparkContext,spark便是Web中的常规框架.

If I want to run spark application within another program, I will simply create SparkContext within the web application, spark is used as normal framework in the web.

这将使Web应用程序和Spark应用程序紧密耦合在一起,并在Spark应用程序执行时保持计算资源(如线程)繁忙. HTTP请求是短暂的,而Spark作业是长期的.

That would tightly couple the web application and the Spark application for one and would keep the compute resources (like threads) busy while the Spark application executes. HTTP requests are short-lived while Spark jobs are long-lived.