how to rerun job

4,686 views
Skip to first unread message

Zhengjun Chen

unread,
Oct 5, 2015, 8:32:18 PM10/5/15
to Airflow

Airflow recorded the successful tasks in the db.
If try to rerun a job, airflow will first check the latest tasks run in the db.
Since the tasks were successfully, airflow will not rerun the tasks before the next schedule.

Is it possible to force it rerun? One way is to remove the records from db.
Other better way?

Maxime Beauchemin

unread,
Oct 5, 2015, 10:55:35 PM10/5/15
to Airflow
There are many ways to do this, but the most flexible is to use the CLI and use a combination of `airflow clear` and `airflow backfill commands`. Both these subcommands have many options that allow you to precisely rerun only what you need. Options are around date ranges, task_id regexes, include upstream and/or downstream, ...

For more simple tasks you can clear and force re-run individual tasks from the UI. You may want to read the docs about how the scheduler as in many cases you can just clear task instances from the UI and the scheduler will re-run them.

Max

Andrey Oskin

unread,
Oct 10, 2015, 8:01:57 AM10/10/15
to Airflow
Actually, now it's not a simple task to rerun dag. Yes, backfill, regex etc are working, but it's all too complex. It would be very very nice, if there was command for dag in cli like

```
airflow test_run some_arbitrary_dag 2015-10-01
```

in the same manner, as 'test' command for individual tasks. What was the reason to not implement such feature? It looks like must have.

Maxime Beauchemin

unread,
Oct 10, 2015, 11:47:06 AM10/10/15
to Airflow
Well for the dependency engine to run it needs to record state in the db, which `airflow test` explicitly does not interfere with. But backfill essentially does exactly what you describe, the arguments you may find too complex are actually optional, so you can just `airflow backfill some_arbitrary_dag -s  2015-10-01`. Note that it will memoize the state of task in the db.
Reply all
Reply to author
Forward
0 new messages