ideas for troubleshooting deadlock problem?

2,360 views
Skip to first unread message

Dennis O'Brien

unread,
Apr 27, 2016, 8:41:34 PM4/27/16
to Airflow
Hi,

Yesterday I activated the celery executor using redis as the back end and I'm seeing this problem when running a backfill job.

... many errors similar to the next two error lines ...
[2016-04-28 00:13:07,497] {jobs.py:947} ERROR - Task instance ('etl_gsn_daily_kpi_email', 't_validate_ww_dim_users', datetime.datetime(2016, 4, 10, 10, 0)) failed
[2016-04-28 00:13:07,499] {jobs.py:947} ERROR - Task instance ('etl_gsn_daily_kpi_email', 't_validate_gsnmobile_events_dau', datetime.datetime(2016, 4, 10, 10, 0)) failed
[2016-04-28 00:13:07,499] {jobs.py:1020} INFO - [backfill progress] | waiting: 22 | succeeded: 0 | kicked_off: 28 | failed: 28 | skipped: 0 | deadlocked: 0
[2016-04-28 00:13:12,446] {jobs.py:1020} INFO - [backfill progress] | waiting: 0 | succeeded: 0 | kicked_off: 28 | failed: 28 | skipped: 0 | deadlocked: 22
Traceback (most recent call last):
  File "/home/airflow/venv/bin/airflow", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/home/airflow/venv/src/airflow/airflow/bin/airflow", line 15, in <module>
    args.func(args)
  File "/home/airflow/venv/src/airflow/airflow/bin/cli.py", line 117, in backfill
    pool=args.pool)
  File "/home/airflow/venv/src/airflow/airflow/models.py", line 2997, in run
    job.run()
  File "/home/airflow/venv/src/airflow/airflow/jobs.py", line 173, in run
    self._execute()
  File "/home/airflow/venv/src/airflow/airflow/jobs.py", line 1047, in _execute
    raise AirflowException(err)
airflow.exceptions.AirflowException: ---------------------------------------------------
BackfillJob is deadlocked. These tasks were unable to run:
set([ ... every TaskInstance in my dag ...])

This happens for all the example dags as well.

Any ideas of what to try in order to debug this problem?

Some things I'll try next:
Try SequentialExecutor or LocalExecutor.
Tweak 'parallelism', 'dag_concurrency', 'max_active_runs_per_dag' in airflow.cfg.
Try a stable version like 1.7.0 (instead of tag 'airbnb_1.7.1rc3').
Debug celery or redis?

Any ideas are much appreciated.

cheers,
Dennis

Maxime Beauchemin

unread,
Apr 27, 2016, 10:46:55 PM4/27/16
to Airflow
It's not a database deadlock or anything, it just means some tasks failed and your DAG can't process all the way through. You can simply troubleshoot your task that have failed and re-run the backfill command, it will start where it left off.

Max
Reply all
Reply to author
Forward
0 new messages