SparkLauncher和Java -jar fat-jar相比有什么好处?



I know SparkLauncher is used to launch spark application programmatically instead of using spark-submit script, but I am feeling a bit confused when to use SparkLauncher or what's the benefit.


Following code is using SparkLauncher to launch a spark application whose main class is "org.apache.spark.launcher.WordCountApp:


object WordCountSparkLauncher {
  def main(args: Array[String]) {
    val proc = new SparkLauncher()

    new Thread(new IORunnable(proc.getInputStream, "proc-input-stream")).start()

    new Thread(new IORunnable(proc.getErrorStream, "proc-error-input-stream")).start()




It is working fine,but there is another choice:

使用maven shade插件创建一个可运行的胖子罐,将所有与spark相关的依赖项打包到一个jar中,这样,我仍然可以使用java -jar thefatjar运行spark应用程序.

Create a runnable fat jar using maven shade plugin to pack all the spark related dependencies into one jar, and in this way, I could still run the spark application with java -jar thefatjar.


What are the benefits of SparkLauncher vs a fat runnable jar?


what's benefit of SparkLauncher, Is there some benefit of SparkLauncher over fat runnable jar?


Think of the different ways you launch a Spark application and what integration options you have.

您必须使用胖子安装Java,并且启动Spark应用程序需要执行java -jar [your-fat-jar-here].如果要从Web应用程序启动该应用程序,很难实现自动化.

With a fat-jar you have to have Java installed and launching the Spark application requires executing java -jar [your-fat-jar-here]. It's hard to automate it if you want to, say, launch the application from a web application.


With SparkLauncher you're given the option of launching a Spark application from another application, e.g. the web application above. It is just much easier.


While both give you integration points in some way, SparkLauncher is just simpler to work with from another JVM-based application. You don't have to revert to using the command line (that has its own "niceties").


If I want to run spark application within another program, I will simply create SparkContext within the web application, spark is used as normal framework in the web.

这将使Web应用程序和Spark应用程序紧密耦合在一起,并在Spark应用程序执行时保持计算资源(如线程)繁忙. HTTP请求是短暂的,而Spark作业是长期的.

That would tightly couple the web application and the Spark application for one and would keep the compute resources (like threads) busy while the Spark application executes. HTTP requests are short-lived while Spark jobs are long-lived.