Luigi on YARN + Scaling luigi

carol...@gmail.com

unread,

Aug 28, 2014, 11:48:12 AM8/28/14

to luigi...@googlegroups.com

Let's say I have to run 1000 tasks
FooTask(param1=1,param2=2)
FooTask(param1=10,param2=20)
...
FooTask(param1=100,param2=40)

every day (tasks are independent of each other) and I have a cluster with some machines. Luigi doesn't do resource management itself, but we already have YARN installed to run Map Reduce stuff (and the Luigi outputs are on places common to all the machines on the clusters, so if machine A has finished task T, machine B can see its output). It theoretically is possible to launch YARN containers for every luigi task (and then YARN figures out where to run it), right? Has anybody tried this already? I am also curious what technologies spotify itself uses to scale up luigi.

Thanks

df.rodr...@gmail.com

unread,

Aug 29, 2014, 11:05:49 AM8/29/14

to luigi...@googlegroups.com, carol...@gmail.com

That's a very interesting idea, you could also use something like Mesos (http://mesos.apache.org/) to distribute the workload on a Mesos cluster. I have never used YARN or Mesos and i am not sure how they compare, i have seen people doing a lot of Mesos lately.

There is a simpler solution that i have been using to some success. I have multiple boxes running a simple rabbitmq daemon reading from a queue that starts the luigi worker when it receives a message. The luigi worker connects to the luigi central scheduler that tells if another worker is executing the tasks so work is not done twice.

I hacked the solution in a couple of days including a "master" node that is in charge of creating the rabbitmq messages in the correct format.

I have used with more 20 machines and its been working fine. The amount of messaging is minimal so i think it should scale relatively fine to more boxes.

My next plan is to do better routing in rabbitmq so specific tasks can be executed in specific boxes, right now there is only one type of worker.

I would like to see more ideas on how people is been "scaling" luigi to multiple machines. I notice that spotify has another opensource project called scalegrease(https://github.com/spotify/scalegrease) but documentation is very mininal so i don't know how that works.

Erik Bernhardsson

unread,

Sep 2, 2014, 11:56:04 AM9/2/14

to Daniel Rodriguez, luigi...@googlegroups.com, Caroline Alexiou

This is a really interesting discussion.

There's two aspects that can be scaled:

1. Scheduling

2. Execution

I'm not convinced #1 is really a scalability issue, rather than just a triggering issue. I.e. how can we avoid relying on cron? I think it would be nice if Luigi workers can fetch things to schedule from the central planner and then schedule everything.

#2 can be achieved by decoupling scheduling from execution. I've been playing around with the idea of having Luigi "slaves" that pull work from the central scheduler. The main issue is how do you distribute the code? But I think you could rely on some external mechanism to do that and assume you can just load the same module. This could be done using a same mechanism as https://github.com/spotify/luigi/pull/378

Regarding the framework, I don't really care about YARN or Mesos or whatever. I think it would be great if this is an implementation detail from Luigi's point of view. But I really don't know much about either of those. Would be great to hear your thoughts

--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Erik Bernhardsson

Engineering Manager, Spotify, New York

Daniel Rodriguez

unread,

Sep 2, 2014, 7:13:55 PM9/2/14

to Erik Bernhardsson, luigi...@googlegroups.com, Caroline Alexiou

That idea of luigi "slaves" is exactly what I implemented using rabbitmq. I did think of hacking the luigi worker code but i didn't like that much the idea of polling the scheduler for tasks, also i already had rabbitmq running so it was natural for me to use it. Finally, with rabbitmq its possible do some complex routing like having specific workers for specific tasks, I haven't implemented that part though, mainly because I am not sure on how the syntax should be.

I do assume that the luigi tasks are importable on the slave nodes, i use salt for config management. I have one git repo only with luigi tasks so when the developers push the code to that repo Jenkins triggers a salt state and the new code is available on all the slave boxes.

It would be nice for luigi to have a built in distributor of tasks, or something like that, it will make it compete more directly with other frameworks like chronos. I do have to admit that i have trouble convincing people to use luigi, they always ask: "How do i do parallel tasks?" and when i say there is not something built in its almost like a lost battle from that point.

My rationale is something like: I am willing to write the code for scaling luigi as long as i can maintain the clear syntax for tasks, targets, parameters and so on, most of the people disagree with that position i am afraid.

Would you consider this a main priority?, how often is this type of question asked? What i mean is that its easy to write luigi tasks for hadoop or spark and let those tools do the heavy lifting of "scaling" an specific task.

Sorry if this post was mainly an opinion than ideas :D

Daniel Rodriguez

Dan Frank

unread,

Sep 3, 2014, 12:33:44 PM9/3/14

to df.rodr...@gmail.com, erik...@spotify.com, carol...@gmail.com, luigi...@googlegroups.com

Just thought I'd add another datapoint here - we've recently had some luck distributing a daily workflow doing something pretty simple. We just deploy all the code that workers need to run to a set of machines, and give each of them an identical crontab to run their workflow. We let the scheduler server take care of ensuring that tasks are distributed across the worker machines (and no two are running the same task). Of course, we need to ensure that we exclusively use targets that can be shared between machines (HdfsTarget and S3Target, for us), but that would be the case with just about any distributed execution pattern.

Erik, does that sound reasonable? It is a bit odd to give out the crontab to multiple machines, but it's easy to implement and does the job. Thus far, we've been served pretty well by having luigi remain ignorant of triggering (your #1), and the central scheduler does a good job of handling #2. I think the only improvement we'd really like to see in this area would be to back the scheduler server with a database so we can be more resilient to hardware failure there.

Always curious to hear others' approaches to this stuff

df

Daniel Rodriguez

unread,

Sep 3, 2014, 1:22:37 PM9/3/14

to Dan Frank, Erik Bernhardsson, Caroline Alexiou, luigi...@googlegroups.com

Michael, yes i know about that feature, i love it. It makes it possible to develop a parallel step locally and have some confidence on how its going to work on multiple boxes.

On my case I use S3 targets to ensure that the data is available for all the workers.

I like the idea of backing the scheduler with a database i think right now you have the option of creating a pickle file, right?. I am using the task history luigi has and i gave it an UI using sandman. Of course this only gives events and not a real backup in case the scheduler fails.

Erik Bernhardsson

unread,

Sep 3, 2014, 1:44:15 PM9/3/14

to Daniel Rodriguez, luigi...@googlegroups.com, Caroline Alexiou

On Tue, Sep 2, 2014 at 7:13 PM, Daniel Rodriguez <df.rodr...@gmail.com> wrote:

That idea of luigi "slaves" is exactly what I implemented using rabbitmq. I did think of hacking the luigi worker code but i didn't like that much the idea of polling the scheduler for tasks, also i already had rabbitmq running so it was natural for me to use it. Finally, with rabbitmq its possible do some complex routing like having specific workers for specific tasks, I haven't implemented that part though, mainly because I am not sure on how the syntax should be.

I do assume that the luigi tasks are importable on the slave nodes, i use salt for config management. I have one git repo only with luigi tasks so when the developers push the code to that repo Jenkins triggers a salt state and the new code is available on all the slave boxes.

If you have code (even ugly), would love to take a look at this.

It would be nice for luigi to have a built in distributor of tasks, or something like that, it will make it compete more directly with other frameworks like chronos. I do have to admit that i have trouble convincing people to use luigi, they always ask: "How do i do parallel tasks?" and when i say there is not something built in its almost like a lost battle from that point.

When you say "parallel", you mean parallel as in horizontal scaling, right? I mean being able to run things on multiple machines. There are clearly ways of doing so (see Dan Frank's answer) but they are not super beautiful imho.

My rationale is something like: I am willing to write the code for scaling luigi as long as i can maintain the clear syntax for tasks, targets, parameters and so on, most of the people disagree with that position i am afraid.

Which is understandably :) Luigi was always built with clarity and simplicity as the main feature, over things like scalability. I think it's easier to go in the direction of adding scalability once you nail simplicity, but not the other way around. Again I'd love to take a look at your code if you feel comfortable sharing it.

Would you consider this a main priority?, how often is this type of question asked? What i mean is that its easy to write luigi tasks for hadoop or spark and let those tools do the heavy lifting of "scaling" an specific task.

Luigi isn't necessarily driven by "priorities" since it's not the day job of any of the authors to add new features to Luigi. So Luigi has had a tendency to grow very organically and incrementally. This is great for stability, but I think the current model isn't great for big features. I hope to have more time in the future to put together scaffolds for more complicated features like scalability etc.

What do you mean with Hadoop/Spark? Those are examples where you get scalability because it's handled by something else. But we're at the point where some people (including Spotify) run 10k-100k tasks per day, and we also need to scale the launching of those tasks :)

Erik Bernhardsson

unread,

Sep 3, 2014, 1:49:39 PM9/3/14

to Dan Frank, Daniel Rodriguez, Caroline Alexiou, luigi...@googlegroups.com

That's how we do it at Spotify too, but it's a bit too much glue imo. Luigi was always designed to be a system where development and production are as identical as possible, but having scalability only supported in production breaks the symmetry. If that makes sense :)

It's getting more clear to me that a robust database-backed central scheduler is probably the first step towards a lot of future features, so that's probably where we should start. Once there's persistent state in the scheduler, then it's easy to implement triggering, for instance.

Slave-workers is a separate thing to work on. I think it's pretty easy to build a prototype for it, but I'm concerned that overlooks the subtle aspects of it. Distributing computation is always a can of works of dealing with heterogenous machines and failures you never expected. But I think it would be cool to put together a simple version of it that brave people can use at their own risk :)

Daniel Rodriguez

unread,

Sep 3, 2014, 2:29:12 PM9/3/14

to Erik Bernhardsson, luigi...@googlegroups.com, Caroline Alexiou

Daniel Rodriguez

On Wed, Sep 3, 2014 at 12:44 PM, Erik Bernhardsson <erik...@spotify.com> wrote:

On Tue, Sep 2, 2014 at 7:13 PM, Daniel Rodriguez <df.rodr...@gmail.com> wrote:

That idea of luigi "slaves" is exactly what I implemented using rabbitmq. I did think of hacking the luigi worker code but i didn't like that much the idea of polling the scheduler for tasks, also i already had rabbitmq running so it was natural for me to use it. Finally, with rabbitmq its possible do some complex routing like having specific workers for specific tasks, I haven't implemented that part though, mainly because I am not sure on how the syntax should be.

I do assume that the luigi tasks are importable on the slave nodes, i use salt for config management. I have one git repo only with luigi tasks so when the developers push the code to that repo Jenkins triggers a salt state and the new code is available on all the slave boxes.

If you have code (even ugly), would love to take a look at this.

It would be nice for luigi to have a built in distributor of tasks, or something like that, it will make it compete more directly with other frameworks like chronos. I do have to admit that i have trouble convincing people to use luigi, they always ask: "How do i do parallel tasks?" and when i say there is not something built in its almost like a lost battle from that point.

When you say "parallel", you mean parallel as in horizontal scaling, right? I mean being able to run things on multiple machines. There are clearly ways of doing so (see Dan Frank's answer) but they are not super beautiful imho.

Yes, i do mean being able to run thinks on multiple machines.

My rationale is something like: I am willing to write the code for scaling luigi as long as i can maintain the clear syntax for tasks, targets, parameters and so on, most of the people disagree with that position i am afraid.

Which is understandably :) Luigi was always built with clarity and simplicity as the main feature, over things like scalability. I think it's easier to go in the direction of adding scalability once you nail simplicity, but not the other way around. Again I'd love to take a look at your code if you feel comfortable sharing it.

I have no problems sharing the code, i would have to ask first since i wrote it for the company i work for that is not very open source friendly :(

Would you consider this a main priority?, how often is this type of question asked? What i mean is that its easy to write luigi tasks for hadoop or spark and let those tools do the heavy lifting of "scaling" an specific task.

Luigi isn't necessarily driven by "priorities" since it's not the day job of any of the authors to add new features to Luigi. So Luigi has had a tendency to grow very organically and incrementally. This is great for stability, but I think the current model isn't great for big features. I hope to have more time in the future to put together scaffolds for more complicated features like scalability etc.

What do you mean with Hadoop/Spark? Those are examples where you get scalability because it's handled by something else. But we're at the point where some people (including Spotify) run 10k-100k tasks per day, and we also need to scale the launching of those tasks :)

Indeed, if you have that amount of tasks you need something else to handle how to launch them.

Michael Placentra

unread,

Sep 3, 2014, 2:35:30 PM9/3/14

to Daniel Rodriguez, Erik Bernhardsson, luigi...@googlegroups.com, Caroline Alexiou

We've integrated Luigi with Amazon's Simple Workflow Service, and we are able to distribute tasks within the same workflow across many machines of different classes that way (Hadoop workers, high-memory workers, etc with different security). We are hoping to be able to share that project soon.

Mike Placentra | RUN | Ad Tech for a Connected World | www.runads.com

df.rodr...@gmail.com

unread,

Sep 4, 2014, 3:10:25 PM9/4/14

to luigi...@googlegroups.com, daniel...@gmail.com, df.rodr...@gmail.com, carol...@gmail.com

Having a DB backed scheduler would also to make the UI side the scheduler more resilient to failures, since it would be possible to have multiple servers running the UI and a simple load balancer in front of those. If one fails the other ones would still be available.

Christopher Severs

unread,

Sep 15, 2014, 8:14:42 PM9/15/14

to luigi...@googlegroups.com, daniel...@gmail.com, df.rodr...@gmail.com, carol...@gmail.com

If you go the DB backend and worker route will it be as a separate addon? I think one of the best parts of Luigi is its simplicity, and there are a lot of good options for scalability and fault tolerance (I like Mesos/Chronos for this, works perfectly with Luigi).

Erik Bernhardsson

unread,

Sep 15, 2014, 9:05:59 PM9/15/14

to Christopher Severs, luigi...@googlegroups.com, Dan Frank, Daniel Rodriguez, Caroline Alexiou

Yeah, the scheduler storage will definitely be a separate component. I started factoring out the interface between the storage layer and the scheduler and the interface is actually pretty small. https://github.com/spotify/luigi/pull/446

Reply all

Reply to author

Forward