Luigi across multiple hosts

1,239 views
Skip to first unread message

tr...@dose.com

unread,
Jan 22, 2016, 10:59:49 AM1/22/16
to Luigi
I've been reading carefully through the docs to try to get a sense of the luigi model and my current impression is that luigi tasks _can_ be run across multiple hosts where the work is distributed by the scheduler. However it seems that the central scheduler _must_ be run on a single instance to keep track of state.

Is this correct?

The reason I ask is that at my company our deployment process assumes that you can have multiple instances of your application running at any time, and from the looks of the central scheduling server does not support that. I've toyed with executing tasks via celery, but then I lose all the benefits of central scheduler. Has anyone else run into this problem?

Erik Bernhardsson

unread,
Jan 22, 2016, 10:37:40 PM1/22/16
to tr...@dose.com, Luigi
The central scheduler is run on a single instance, yes. The reason is that it does a lot of locking and transactional stuff and so it would be very hard to parallelize it


--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lars Albertsson

unread,
Jan 27, 2016, 4:11:11 AM1/27/16
to Erik Bernhardsson, tr...@dose.com, Luigi
Erik is correct - Luigi uses a single scheduler instance model, like
most workflow managers.

What do you mean with "multiple instances of your application"? If you
want long-running services, Luigi is not the tool to use. You should
instead look at e.g. Marathon, Kubernetes, Aurora, Asgard.

If you are scheduling batch processes, perhaps you can describe what
you want to do, and I can try to help figuring out how to do it with
Luigi.

Regards,


Lars Albertsson
Data engineering consultant
www.mapflat.com
+46 70 7687109

the...@gmail.com

unread,
May 10, 2016, 3:31:50 PM5/10/16
to Luigi
Addroll has a blog about using a task queue with luigi to distribute jobs across hosts:

http://tech.adroll.com/blog/data/2015/10/15/luigi.html

I've been using luigi to track my job dependencies, but I'd like to be able to distribute jobs across several machines too. I'm using AddRoll's strategy of defining a docker image within a luigi task and calling out to a task queue (and waiting), but their queue 'quentin' is not open source. I chose celery for my task queue, but I can't get it to return that a task is completed/failed so that my luigi task will finish. If I watch the celery command line it says my tasks have completed (output files are there), but it wont return the result state reliabily. I'm new to celery (just this week) so I'm probably configuring it wrong or something. Let me know if this helps or if you've found a clear

MJ Tung

unread,
Aug 13, 2020, 3:28:30 AM8/13/20
to Luigi
Do you have any new insights to this problem?  Did you end up implementing the task queue in Celery, or did you find another solution to run distributed tasks with Luigi? What did you end up doing for the setup?

MJ

Lars Albertsson

unread,
Sep 30, 2020, 9:10:22 AM9/30/20
to MJ Tung, Luigi
Nowadays, the easiest way to scale out to multiple nodes for Luigi
pipelines is to run Luigi in containers inside a Kubernetes cluster.
Or some other cloud container-as-a-service.

If you don't want to use some elastic container service, you can use
the simple heathen method: run identical Luigi jobs on multiple
machines, connecting to the same luigid. The tasks would then be
divided among the nodes. There is no resource awareness, however, so
you might get lower utilisation or robustness than the container
variant.

Regards,

Lars Albertsson
Data engineering entrepreneur
www.scling.com, www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
> To view this discussion on the web visit https://groups.google.com/d/msgid/luigi-user/c3f3edf0-19a1-4cf4-8005-c628c35bd3e3n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages