Running luigi scheduler in the cloud

418 views
Skip to first unread message

Dan Davis

unread,
Feb 7, 2019, 11:52:18 AM2/7/19
to Luigi
So, I'm beginning to be interested in translating my data workflow management to the cloud.   My company is committed to AWS.
I could run luigi with a local scheduler, but that doesn't seem great.   

I'm interested in what architectures have been used.   Thanks.

Lars Albertsson

unread,
Feb 8, 2019, 7:25:47 PM2/8/19
to Dan Davis, Luigi
Going from local scheduler to a central is easy. Do you have a central scheduler on premise today? There is not much difference in the cloud. 

Spin up a container or an instance with a luigid. You don't need a database for task history, but it is convenient to have persistent storage for pickling state when upgrading luigid. 

Put the name of the luigid service in the clients' config files, and you are good to go. We run luigid as a standard Kubernetes service, and luigi worker containers scheduled with CronJob, which works fine. You might want to expose the luigid web UI for easier monitoring, but it is not required.

Make sure to never run two luigid. Split brain scenarios might make two identical jobs run concurrently, resulting in a corrupted output dataset.

In order to avoid split brain when upgrading luigid, bring it down before you bring up a new version. If the state is not pickled properly, let all jobs run to completion or kill them before bringing up the new version.


You may also find relevant information in the batch processing section at https://www.mapflat.com/lands/resources/reading-list/index.html

Overall, keep it simple, both the deployment and the workflow DAGs. 

BTW, if you are using S3 for storage, you might face race conditions between the jobs, irrespective of workflow scheduling mechanism. Steve Loughran explains it in https://youtu.be/UOE2m_XUr3U

Regards,


Lars Albertsson
Data engineering entrepreneur


--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages