应用程序中的各个阶段是否可以并行运行?

问题描述:

我怀疑在spark应用程序中各个阶段如何执行.可以由程序员定义的阶段执行是否有一致性,还是由Spark引擎派生呢?

I have a doubt that, how do stages execute in a spark application. Is there any consistency in execution of stages that can be defined by programmer or will it derived by spark engine?

检查此图片中的实体(阶段,分区):

图片积分

作业(火花应用程序)中的各个阶段是否可以并行运行?

Does stages in a job(spark application ?) run parallel in spark?

是的,如果没有顺序依赖性,则可以并行执行它们.

由于阶段1和阶段2中的依赖项分区,因此可以并行执行阶段1和阶段2分区,但不能并行执行阶段0分区. 2个必须处理.

Here Stage 1 and Stage 2 partitions can be executed in parallel but not Stage 0 partitions, because of dependency partitions in Stage 1 & 2 has to be processed.

是否可以通过以下方式定义阶段的执行一致性? 程序员,还是会由Spark引擎派生出来?

Is there any consistency in execution of stages that can be defined by programmer or will it derived by spark engine?

阶段边界是通过在分区之间进行数据改组来定义的. (检查图片中的粉红色线条)

Stage boundary is defined by when data shuffling happens among partitions. (check pink lines in pic)