为什么不建议在Airflow中使用动态start_date?
我已阅读有关 开始日期
有什么用? ,但是我仍然不清楚为什么不建议使用动态开始日期
。
I've read Airflow's FAQ about "What's the deal with start_date
?", but it still isn't clear to me why it is recommended against using dynamic start_date
.
据我了解,DAG的执行日期
由最小的 start_date ,以及随后的DAG运行在最新的 execution_date
+ schedule_interval运行
。
To my understanding, a DAG's execution_date
is determined by the minimum start_date
between all of the DAG's tasks, and subsequent DAG Runs are ran at the latest execution_date
+ schedule_interval
.
如果我设置了DAG的 default_args
开始日期
表示昨天在 20:00:00
,其中 schedule_interval
为1天的时间,如果有的话,它将如何破坏或混淆调度程序?如果我理解正确,则调度程序将在 20:00:00
的昨天 execution_date
触发DAG,并且下一次DAG运行将于今天在 20:00:00
进行。
If I set my DAG's default_args
start_date
to be for, say, yesterday at 20:00:00
, with a schedule_interval
of 1 day, how would that break or confuse the scheduler, if at all? If I understand correctly, the scheduler would trigger the DAG with an execution_date
of yesterday at 20:00:00
, and the next DAG Run would be scheduled for today at 20:00:00
.
我是否有一些概念
第一次运行是在 start_date + schedule_interval
。它不会在开始日期
上运行dag,而总是在 start_date + schedule_interval
上运行。
First run would be at start_date+schedule_interval
. It doesn't run dag on start_date
, it always runs on start_date+schedule_interval
.
如文件中所述,如果您给 start_date
动态例如 datetime.now()
并给出一些 schedule_interval
(1小时),它将永远不会执行 now()
与时间一起移动, datetime.now()+ 1小时
不可能
As they mentioned in document if you give start_date
dynamic for e.g. datetime.now()
and give some schedule_interval
(1 hour), it will never execute that run as now()
moves along with time and datetime.now()+ 1 hour
is not possible