停止正在运行的Spark应用程序

停止正在运行的Spark应用程序

问题描述:

我正在以独立模式运行Spark集群.

I'm running a Spark cluster in standalone mode.

我已经使用选项在集群模式下提交了Spark应用程序:

I've submitted a Spark application in cluster mode using options:

--deploy-mode cluster –supervise 

使作业具有容错性.

现在,我需要保持集群运行,但要停止运行应用程序.

Now I need to keep the cluster running but stop the application from running.

我尝试过的事情:

  • 停止集群并重新启动它.但是应用程序恢复 当我这样做时执行.
  • 使用了名为DriverWrapper的守护程序的Kill -9,但此后该作业再次恢复.
  • 我还删除了临时文件和目录,并重新启动了群集,但作业又恢复了.
  • Stopping the cluster and restarting it. But the application resumes execution when I do that.
  • Used Kill -9 of a daemon named DriverWrapper but the job resumes again after that.
  • I've also removed temporary files and directories and restarted the cluster but the job resumes again.

因此,正在运行的应用程序确实可以容错.

So the running application is really fault tolerant.

问题: 基于上述情况,有人可以建议我如何停止作业运行,或者可以尝试停止应用程序运行但使群集保持运行状态.

Question: Based on the above scenario can someone suggest how I can stop the job from running or what else I can try to stop the application from running but keep the cluster running.

我刚想到了什么,如果我打电话给sparkContext.stop(),它应该做,但是需要一些代码工作,虽然可以,但是您可以建议其他任何方法而无需更改代码.

Something just accrued to me, if I call sparkContext.stop() that should do it but that requires a bit of work in the code which is OK but can you suggest any other way without code change.

如果您希望终止反复失败的应用程序,则可以通过以下方式进行:

If you wish to kill an application that is failing repeatedly, you may do so through:

./bin/spark-class org.apache.spark.deploy.Client kill <master url> <driver ID>

您可以通过独立的Master Web UI(位于http://:8080)找到驱动程序ID.

You can find the driver ID through the standalone Master web UI at http://:8080.

来自火花文档