Host resources

13 views
Skip to first unread message

Zoltan

unread,
Feb 7, 2021, 3:26:03 AM2/7/21
to luigi...@googlegroups.com
Hi,

I need a way to schedule tasks based on host capabilities (I called it
worker_resources in the code, but that can be changed), so for example
one task can be executed by a worker with a capability (specified in a
parameter on startup), but not by the other (where the capability is not
in the startup parameters).

I know, this can be done with external requirements, but I don't want to
take that path if I don't have to. If you see any other ways to make
this work without code change in luigi, please let me know.

I found a similar idea implemented previously, but not merged:
https://github.com/spotify/luigi/pull/1669

I forked luigi and made changes in the code (see commit message):
https://github.com/kz0ltan/luigi/commit/2bab8685a9483dcaf91195900b0446fd66a3af7f

If you know the inner workings of luigi, I'd like to ask you to take a
look at it and let me know if you see any glaring problems I made with this
change.

Thanks

HadoopMarc

unread,
Feb 7, 2021, 5:42:42 AM2/7/21
to Luigi
Hi Zoltan

I have no idea whether the luigi scheduler supports heterogeneous workers and whether tasks can indicate what type of worker they need. The reason to answer is because I know that GitLab CI has a nice feature for this, which might be useful to you:


You can still use luigi and trigger a gitlab pipeline from a luigi task.

Best wishes,     Marc

Op zondag 7 februari 2021 om 09:26:03 UTC+1 schreef Zoltan:

Zoltan

unread,
Feb 7, 2021, 7:04:17 AM2/7/21
to Luigi
Hi Marc,

Thanks for the tip. The reason I'm looking for a solution with luigi only
is that my use-case includes a lot of dependencies requiring one type of
worker or the other. The whole problem could be solved with the
mentioned ExternalTasks, but putting everything in one pipeline seems
considerably more optimal and simpler to handle. Also, this is a home
project, so using GitLab would be a bit overkill because of pricing,
etc. :)
> --
> You received this message because you are subscribed to the Google Groups "Luigi" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/luigi-user/541b4f50-785b-495e-bc28-7512782051d7n%40googlegroups.com.

Lars Albertsson

unread,
Feb 10, 2021, 3:35:23 PM2/10/21
to Zoltan, Luigi
Hi,

This is easy to achieve without explicit support in Luigi with
ExternalTasks, as you mention. There is a convenience function
externalize() that you can use in your requirements, i.e:

class A(Task):
...

class SpecialTaskB(Task):
def requires(self):
return A()

class C(Task):
def requires(self):
return externalize(B())


Now schedule A and C on all machines, and schedule SpecialTaskB on the
machine with the special resources.

This solution is simple enough that we probably shouldn't add
complexity to Luigi to solve it. It is IMHO to be preferred over
capability filter function, since it is more explicit.

If you would like to create a PR with some documentation on the
pattern, I am happy to review. (I am not a maintainer, however.) We
are short on documentation of good workflow patterns, so it would be a
great contribution.


Regards,

Lars Albertsson
Data engineering entrepreneur
www.scling.com, www.mapflat.com
+46 70 7687109
https://twitter.com/lalleal, https://www.linkedin.com/in/larsalbertsson/


Lars Albertsson
Data engineering entrepreneur
www.scling.com, www.mapflat.com
+46 70 7687109
https://twitter.com/lalleal, https://www.linkedin.com/in/larsalbertsson/
> To view this discussion on the web visit https://groups.google.com/d/msgid/luigi-user/20210207120404.4wfytoivoxptwqtz%40x230.
Reply all
Reply to author
Forward
0 new messages