How to properly run a backfill?

1,940 views
Skip to first unread message

tomas....@unacast.com

unread,
Dec 21, 2018, 5:30:17 PM12/21/18
to cloud-composer-discuss
We have problems running backfills on our pipeline, it works fine day to day, but we have had two serious problem as I see it with doing the backfill. 

First problem:

* When running backfills through `gcloud beta composer environments run <env> --location=<location> backfill -- <args>` I expected all the runs to be scheduled but only the number constrained by max active runs to actually start.
* What happened was that only two, since we have max active runs is set to two, were started, and then the command was stuck waiting for the rest to start. That is not very convenient when you are backfilling a pipeline where each run takes 4-5 hours.

Question 1: Is there a way to schedule everything so we don't have to follow along and start the backfill two days at a time.

Second problem:

* During the backfill it happens that some tasks get stuck in "Scheduled", it never moves to "Queued" or "Running".
* Clearing doesn't help
* Running `gcloud beta composer environments run <env> --location=<location> run` manage to start a task, but then the task after doesn't start automatically.
* This all works fine in the daily runs, so I don't understand what might cause the problem in the backfill.

Any ideas? Or have I misunderstood backfills?

Feng Lu

unread,
Jan 4, 2019, 3:04:07 AM1/4/19
to tomas....@unacast.com, cloud-composer-discuss
Hi Tomas, 

Please see my reply inline:

On Fri, Dec 21, 2018 at 2:30 PM <tomas....@unacast.com> wrote:
We have problems running backfills on our pipeline, it works fine day to day, but we have had two serious problem as I see it with doing the backfill. 

First problem:

* When running backfills through `gcloud beta composer environments run <env> --location=<location> backfill -- <args>` I expected all the runs to be scheduled but only the number constrained by max active runs to actually start.
* What happened was that only two, since we have max active runs is set to two, were started, and then the command was stuck waiting for the rest to start. That is not very convenient when you are backfilling a pipeline where each run takes 4-5 hours.

Question 1: Is there a way to schedule everything so we don't have to follow along and start the backfill two days at a time.
It is possible to override the max_active_runs on a per-DAG basis, you may want to set the max_active_runs to a larger value for this specific DAG.
Unfortunately, it's not possible to force the backfill job to schedule more dag_runs than the max_active_runs (see the code here). You could however try to run the airflow backfill job directly in one of the worker pod and make it a backgroud job so you don't have to wait for the gcloud CLI to terminate, if that's your concern. 

Alternatively, you could also try running multiple backfill jobs concurrently by breaking the start_date, end_date into smaller ranges. 
Note that the backfill job may get killed prematurely if the worker pod is forced to restart due to issues like OOM. 

Second problem:

* During the backfill it happens that some tasks get stuck in "Scheduled", it never moves to "Queued" or "Running".
* Clearing doesn't help
* Running `gcloud beta composer environments run <env> --location=<location> run` manage to start a task, but then the task after doesn't start automatically.
* This all works fine in the daily runs, so I don't understand what might cause the problem in the backfill.
Could you make sure your airflow backfill process is still alive in this case?  
If for some reason, the airflow backfill job is terminated, the scheduled tasks will be stuck in that state until the next backfill job as they are ignored by the scheduler. 

Any ideas? Or have I misunderstood backfills?

--
You received this message because you are subscribed to the Google Groups "cloud-composer-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-di...@googlegroups.com.
To post to this group, send email to cloud-compo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cloud-composer-discuss/693c9a95-d73b-4bec-b813-adae49b8c152%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tomas Jansson

unread,
Jan 4, 2019, 5:23:16 AM1/4/19
to Feng Lu, cloud-composer-discuss
Thanks Feng. I think running it in one of the pods would have been the best option. What I ended up doing was to use the `trigger_dag` instead since that just adds the DAG run and then it runs when it can. It was a little bit tedious since you can only trigger one at a time (didn't write a script for it this time).
--

 

Tomas Jansson

Sr. Director of Software Engineering

+47 91862293 | @tomasjansson | skype:mastoj

Karl Johans gate 21, 0159 Oslo, Norway


   

LSA17 Ad-To-Action Award Winners!

Ready for your next journey to begin? We're hiring!

Alexander Nordin

unread,
Dec 12, 2019, 9:47:41 AM12/12/19
to cloud-composer-discuss
Hey Tomas and Feng,

Tomas, did you eventually get large scale backfills to work properly on Composer?

I'm frequently running into reliability issues when backfilling large (~10k tasks) workloads. 

Best,
Alex

To unsubscribe from this group and stop receiving emails from it, send an email to cloud-composer-discuss+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages