About start_date

1,935 views
Skip to first unread message

Jason Chen

unread,
Apr 18, 2016, 9:53:00 PM4/18/16
to Airflow

 Hi,

  I have a DAG running with
  (1) 'start_date': datetime(2016, 4, 18, 17, 51)
  (2) schedule_interval=timedelta(minutes=1)

It went fine (staring from 4/18 17:51). And, at some point, I decided to change the start_date to datetime(2016, 4, 18, 18, 30).
I did a "airflow clean" after my change to the DAG.
However; when I "airflow scheduler" the DAG, it still started from "4/18 17:51" which I am thinking it should start from "4/18 18:30".
Any thought ?

Thanks.

Jason

Jeremiah Lowin

unread,
Apr 20, 2016, 8:12:38 PM4/20/16
to Airflow
Hi Jason,

DAGs are not easy to update, since syncing the new information can be tough. One approach is to completely delete the DAG from the db. There's an unfinished PR here which may help: https://github.com/airbnb/airflow/pull/1344

Maxime Beauchemin

unread,
Apr 21, 2016, 3:16:56 PM4/21/16
to Airflow

from: http://pythonhosted.org/airflow/faq.html


What’s the deal with ``start_date``?

start_date is partly legacy from the pre-DagRun era, but it is still relevant in many ways. When creating a new DAG, you probably want to set a global start_date for your tasks usingdefault_args. The first DagRun to be created will be based on the min(start_date) for all your task. From that point on, the scheduler creates new DagRuns based on your schedule_interval and the corresponding task instances run as your dependencies are met. When introducing new tasks to your DAG, you need to pay special attention to start_date, and may want to reactivate inactive DagRuns to get the new task to get onboarded properly.

We recommend against using dynamic values as start_date, especially datetime.now() as it can be quite confusing. The task is triggered once the period closes, and in theory an @hourly DAG would never get to an hour after now as now() moves along.

We also recommend using rounded start_date in relation to your schedule_interval. This means an @hourly would be at 00:00 minutes:seconds, a @daily job at midnight, a @monthly job on the first of the month. You can use any sensor or a TimeDeltaSensor to delay the execution of tasks within that period. While schedule_interval does allow specifying a datetime.timedelta object, we recommend using the macros or cron expressions instead, as it enforces this idea of rounded schedules.

When using depends_on_past=True it’s important to pay special attention to start_date as the past dependency is not enforced only on the specific schedule of the start_date specified for the task. It’ also important to watch DagRun activity status in time when introducing newdepends_on_past=True, unless you are planning on running a backfill for the new task(s).

Also important to note is that the tasks start_date, in the context of a backfill CLI command, get overridden by the backfill’s command start_date. This allows for a backfill on tasks that havedepends_on_past=True to actually start, if it wasn’t the case, the backfill just wouldn’t start.

Jason Chen

unread,
May 2, 2016, 1:38:49 AM5/2/16
to Airflow
 
 Hi Maxime,
 
 Thanks for your reply.
 It was not my question.
 My question was "is it possible to change the start_date of a DAG ?"
 I tried, but it seems not respect the new start_date I set.

 

 Jason
Reply all
Reply to author
Forward
0 new messages