The catchup behavior is documented in the Airflow documentation:
https://airflow.apache.org/scheduler.html?highlight=backfill#backfill-and-catchupHaving been designed for ETL, Airflow's objective is to ensure that data generated in all completed intervals since the start_date are processed by the DAG. When you unpause a DAG, it sets about achieving this objective in one of 2 ways:
- catchup=True: Airflow generates a DAG run for every completed interval between the start_date and the time you unpause your DAG. It will ignore intervals that have already been processed (as evidenced by a DAG run in the Airflow metadata DB). The DAG is expected to only process data for a single interval, hence the need to spawn a DAG run for every interval.
- catchup=False: Airflow generates a single DAG run for the most-recently completed interval. The DAG is assumed to have backfill logic integrated into it, meaning that it can potentially process more than one interval's worth of data.
If you want to delay the first run of your DAG, you'll need to set the start_date in the future. Note, however, that changing the start_date of an existing DAG will confuse Airflow; you'll need to assign a different DAG ID. (
source)
best,
Wilson