behavior of yield in requires vs run

Chris Beaumont

unread,

Mar 5, 2016, 9:12:09 PM3/5/16

to Luigi

As I understand, yielding a task within the run method will guarantee that the yielded task completes before the yielding task continues. However, yielding a task in requires() does not have this effect -- it seems that tasks yielded from requires are all scheduled concurrently.

Is there any fundamental reason for this difference in behavior? If requires behaved like run then WrapperTask would be more expressive -- you could for example express simple combinations of sequential pipelines and fan-in/fan-out behavior with less boilerplate:

class SimplePipeline(WrapperTask):

def requires(self):

yield SetupTask() # wait to finish

yield [FanOut() for i in range(5)] # schedule concurrently, wait to finish

yield Reducer()

Also, naively I would assume that the behavior of yielding a task would be easier to understand, since run and requires would do the same thing (though maybe I'm missing some subtlety here).

cheers,

Chris

Dave Buchfuhrer

unread,

Mar 5, 2016, 10:46:24 PM3/5/16

to Chris Beaumont, Luigi

requires is called during scheduling to build up a dependency graph. We don't run anything until the whole dependency graph is built, so it would be impossible to run SetupTask() before finishing your requires. Because luigi is meant for batch scheduling, SetupTask() could potentially run hours or days after it's scheduled, so it doesn't make much sense to pause scheduling to wait for that. The scheduler will tend to prefer tasks that are scheduled earlier, so SetupTask() will run before the FanOut tasks, which will run before Reducer() unless priority or other dependencies get in the way.

--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Erik Bernhardsson

unread,

Mar 5, 2016, 11:38:15 PM3/5/16

to Chris Beaumont, Luigi

Luigi was first built around requires, and the dynamic dependency support is still somewhat experimental

In theory you could argue that the requires() methods are useless, since you can do everything from run(). The difference is that dynamic requirements might end up running a task several times (so the task has to be idempotent) and has some other limitations with scheduling

--

Reply all

Reply to author

Forward