为什么不建议在Airflow中使用动态start_date?

问题描述:

我已阅读有关 开始日期有什么用? ,但是我仍然不清楚为什么不建议使用动态开始日期

I've read Airflow's FAQ about "What's the deal with start_date?", but it still isn't clear to me why it is recommended against using dynamic start_date.

据我了解,DAG的执行日期由最小的 start_date ,以及随后的DAG运行在最新的 execution_date + schedule_interval运行

To my understanding, a DAG's execution_date is determined by the minimum start_date between all of the DAG's tasks, and subsequent DAG Runs are ran at the latest execution_date + schedule_interval.

如果我设置了DAG的 default_args 开始日期表示昨天在 20:00:00 ,其中 schedule_interval 为1天的时间,如果有的话,它将如何破坏或混淆调度程序?如果我理解正确,则调度程序将在 20:00:00 的昨天 execution_date 触发DAG,并且下一次DAG运行将于今天在 20:00:00 进行。

If I set my DAG's default_args start_date to be for, say, yesterday at 20:00:00, with a schedule_interval of 1 day, how would that break or confuse the scheduler, if at all? If I understand correctly, the scheduler would trigger the DAG with an execution_date of yesterday at 20:00:00, and the next DAG Run would be scheduled for today at 20:00:00.

我是否有一些概念

第一次运行是在 start_date + schedule_interval 。它不会在开始日期上运行dag,而总是在 start_date + schedule_interval 上运行。

First run would be at start_date+schedule_interval. It doesn't run dag on start_date, it always runs on start_date+schedule_interval.

如文件中所述,如果您给 start_date 动态例如 datetime.now()并给出一些 schedule_interval (1小时),它将永远不会执行 now()与时间一起移动, datetime.now()+ 1小时不可能

As they mentioned in document if you give start_date dynamic for e.g. datetime.now() and give some schedule_interval(1 hour), it will never execute that run as now() moves along with time and datetime.now()+ 1 hour is not possible