behavior of yield in requires vs run

2,509 views
Skip to first unread message

Chris Beaumont

unread,
Mar 5, 2016, 9:12:09 PM3/5/16
to Luigi
As I understand, yielding a task within the run method will guarantee that the yielded task completes before the yielding task continues. However, yielding a task in requires() does not have this effect -- it seems that tasks yielded from requires are all scheduled concurrently.

Is there any fundamental reason for this difference in behavior? If requires behaved like run then WrapperTask would be more expressive -- you could for example express simple combinations of sequential pipelines and fan-in/fan-out behavior with less boilerplate:

class SimplePipeline(WrapperTask):
    def requires(self):
        yield SetupTask()  # wait to finish
        yield [FanOut() for i in range(5)]  # schedule concurrently, wait to finish
        yield Reducer()

Also, naively I would assume that the behavior of yielding a task would be easier to understand, since run and requires would do the same thing (though maybe I'm missing some subtlety here).

cheers,
Chris

Dave Buchfuhrer

unread,
Mar 5, 2016, 10:46:24 PM3/5/16
to Chris Beaumont, Luigi
requires is called during scheduling to build up a dependency graph. We don't run anything until the whole dependency graph is built, so it would be impossible to run SetupTask() before finishing your requires. Because luigi is meant for batch scheduling, SetupTask() could potentially run hours or days after it's scheduled, so it doesn't make much sense to pause scheduling to wait for that. The scheduler will tend to prefer tasks that are scheduled earlier, so SetupTask() will run before the FanOut tasks, which will run before Reducer() unless priority or other dependencies get in the way.

--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Erik Bernhardsson

unread,
Mar 5, 2016, 11:38:15 PM3/5/16
to Chris Beaumont, Luigi
Luigi was first built around requires, and the dynamic dependency support is still somewhat experimental

In theory you could argue that the requires() methods are useless, since you can do everything from run(). The difference is that dynamic requirements might end up running a task several times (so the task has to be idempotent) and has some other limitations with scheduling

--
Reply all
Reply to author
Forward
0 new messages