Airflow not scheduling in right interval and sending mail on retry and errors

3,874 views
Skip to first unread message

Vignesh Kalai

unread,
Oct 13, 2015, 7:07:09 AM10/13/15
to Airflow

My dag looks like above

My code for developing the dag is like this

    from airflow import DAG
    from airflow.operators import BashOperator
    from datetime import datetime, timedelta


    default_args = {
        'owner': 'Vignesh',
        'depends_on_past': False,
        'start_date': datetime(2015, 6, 1),
        'email': ['air...@airflow.com'],
        'email_on_failure': False,
        'email_on_retry': False,
        'retries': 1,
        'retry_delay': timedelta(minutes=5),
    }
    dag = DAG('POC', default_args=default_args,schedule_interval=timedelta(minutes=5))
    run_this_first = BashOperator(task_id='Start',bash_command='python ~/airflow/initial_db.py', dag=dag)
    for i in range(5):
        t = BashOperator(task_id="Orders"+str(i), bash_command='python ~/airflow/process_db.py',dag=dag)
        t.set_upstream(run_this_first)

As you can see I have given a time interval of five minutes for the next start of the same dag .I am running the tasks order0,order1,... concurrently after the execution of task Start

The problem here is the tasks are not following the time interval and running continuously

here is my cofing file sample

    [core]
    airflow_home = /home/perlzuser/airflow

    dags_folder = /home/perlzuser/airflow/dags

    base_log_folder = /home/perlzuser/airflow/logs
    executor = LocalExecutor
    sql_alchemy_conn = mysql://dbuser:xxx@localhost/airflow
    parallelism = 30
    load_examples = False
    plugins_folder = /home/perlzuser/airflow/plugins

    [webserver]

    base_url = http://localhost:8080
    web_server_host = 0.0.0.0
    web_server_port = 8080
    secret_key = temporary_key
    expose_config = true


    [smtp]
    smtp_host = localhost
    smtp_starttls = True
    smtp_user = airflow
    smtp_port = 25
    smtp_password = airflow
    smtp_mail_from = air...@airflow.com

    [celery]

    celery_app_name = airflow.executors.celery_executor

    celeryd_concurrency = 16
    worker_log_server_port = 8793
    broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow
    celery_result_backend = db+mysql://airflow:airflow@localhost:3306/airflow
    flower_port = 8383

    default_queue = default

    [scheduler]

    job_heartbeat_sec = 5
    scheduler_heartbeat_sec = 5

From my task timing picture it is clear that the tasks are running randomly with out any interval

I scheduled it using `airflow scheduler`

and I would like to send mail to receipt when there is failure of a task I was not able to find any sample code fro that could you possibly link a code and show where it should be added

Thanks and very sorry if it is a simple or bad question and again thanks for your time cheers







Maxime Beauchemin

unread,
Oct 13, 2015, 11:17:51 AM10/13/15
to Airflow
Hi Vignesh,

The Airflow scheduler does not work like cron. Please read this section of the docs:

as well as the start_date section here:

And report back if it's not behaving as described. If tasks schedule got misaligned, you can clear everything or pick a recent date and mark success on all task on the same schedule. This will "re-seed" a starting point.

As far as email goes you need to configure your SMTP server in `airflow.cfg`, and you can test with a simple EmailOperator. I have to add a section in the installation docs, though airflow.cfg should be self explanatory.

Max

Vignesh Kalai

unread,
Oct 13, 2015, 2:33:48 PM10/13/15
to Airflow
Thanks for your reply max ,

But now also the task are started before the interval

I gave this command dag = DAG('POC', default_args=default_args,schedule_interval=timedelta(minutes=5)) to schedule the dags at 5 minutes interval

but the second instance of tasks are started before the five minute mark . I followed the same thing on the tutorial page  .Could you point out where I am doing wrong .
I think I missed something very obvious

For now I want to schedule a dag every five minutes

Thanks,

Vignesh Kalai

unread,
Oct 15, 2015, 8:33:35 AM10/15/15
to Airflow
Max this problem still exists .The tasks just run simultaneously .That is the first task runs then second then again first there was no  schedule_time interval between first instance and second instance of the first task they just keep on running with out any  scheduled_interval
Currently there are two commands running

airflow webserver -p=8080 and airflow scheduler

When I ran airflow backfill all the tasks stated to run same like in the scheduler .They were happening with out any time interval

and the scheduler run only when I give the command

airflow scheduler 

and when I try

sudo airflow scheduler it throws error

We have to get this into production by this week so any help would be helpful

Thanks Vignesh,

Steven Yvinec-Kruyk

unread,
Oct 15, 2015, 10:40:18 AM10/15/15
to Airflow
Vignesh as Max mentioned the Airflow scheduler is not like cron. 

Not sure I follow above you claim the tasks are running randomly without any interval ... when I look at the attached image and just follow the start task there does see to be an interval of every 5 minutes from 06/01 in the correct order ... maybe I am misunderstanding what you mean.

I think I can see what is happening. 

Your dag has a start_date of datetime(2015, 6, 1) ... so the schedules of every 5 minute from 2015-6-1 00:00:00 has been met. 
Meaning 288 executions per day between 6/1 and today.

Also I believe that  'depends_on_past': False will cause all the task instances who have a schedule that has been met (ie. the above 288 per day) to run regardless of the the instance directly prior to itself completing.

That may explain the behavoir you are seeing ...

Maxime Beauchemin

unread,
Oct 15, 2015, 2:03:52 PM10/15/15
to Airflow
At this point we probably just need to make sure that everyone confused about this reads the docs and help us clarify items that are unclear.

Vignesh Kalai

unread,
Oct 15, 2015, 2:31:57 PM10/15/15
to Airflow
thanks max and airflow is a wonderful product.It is just that I am quite new to this DAG and scheduling .So got a little lost sorry if this confused other you may delete this if this cause future confucion

Maxime Beauchemin

unread,
Oct 15, 2015, 2:41:51 PM10/15/15
to Airflow
Vignesh, no worries man! This discussion (and many others) just shows that some aspect of Airflow could be more intuitive or better documented. 

Max
Reply all
Reply to author
Forward
0 new messages