wait_for_downstream only waits for immediate neighbor in dag

1,895 views
Skip to first unread message

mj...@about.com

unread,
Nov 3, 2015, 3:26:48 PM11/3/15
to Airflow
When setting a task's wait_for_downstream to True the task will run whenever the immediate dependent is finished for the previous day.

e.g.

the dag is

a -> b -> c

then a will run on day 2 if b is done for day 1. I would expect airflow to wait for ALL downstream tasks to complete on day 1 before running a on day 2, but it doesn't. Unexpected.

mj...@about.com

unread,
Nov 3, 2015, 3:31:33 PM11/3/15
to Airflow
forgot my question!

we get around this by configuring the dag to look like this


a -> b -> c

and

a -> c

Is there a better way?

Maxime Beauchemin

unread,
Nov 4, 2015, 12:09:25 AM11/4/15
to Airflow
The original use case that made me create wait_for_downstream was migrating a set of poorly designed pipelines that would reprocess all history every day and would drop and recreate a table at each run. Wait for downstream would prevent the race condition where a task that would depend on the table would query it while the next run (in a scenario of catching up) would drop the table while the downstream process was reading it.

Another way to do what you are interested in is to add a converging point downstream of your pipeline and use an ExternalTaskSensor depend on that very last step from the previous day (ExternalTaskSensor allows to wait for a previous run).

Maybe the long term vision is something around "trigger rules" allowing for arbitrarily complex combination of predefined rules.

Max
Reply all
Reply to author
Forward
0 new messages