Workflow that includes manual steps

85 views
Skip to first unread message

Hari Krishna Dara

unread,
Jun 18, 2020, 10:54:17 AM6/18/20
to Luigi
I am very new to Luig (and workflow libraries in general)i and so far have gone through some tutorial blogs. I have sense that I can use it to solve our workflow problem, though it would involve making use of some contraptions. Our workflow involves interacting with a few web services for creating objects based on some configuration that is fed into the workflow. I am thinking of representing each such interaction as a task and the parameters to the remote service becoming the parameters to the task. I can then arrange the task dependencies and let the workflow engine take care of running them in the right order with retries (on failures) and automatically keeping track of the completed tasks. The later is important for me because I want to make use of the state_path property so that I can run it as a batch job instead of a service. Everytime my program runs, it would restore the state of all workflows from the file, progress as far as possible and then shutdown cleanly while saving the state back to the file.

Now, some of these tasks would take unpredictable amount of time to complete either because there is a manual intervention required (such as manual signoffs) or because the remote service could take unpredictable amount time to make the result available, which becomes the input for the next task. I would like to understand the right approach here. E.g., in case when the workflow enters a state when a manual signoff is required, I can have a task that checks for the signoff status and if it is not yet ready fail the task by raising an exception. But since there is no way to predict how long a human will take to do sign off, a set number of retries will not work at all. I need to have unlimited number of retries, but not actually try again until the next batch run. I can probably create a marker file and fail the task early as long as the file exists. I can clear these marker files before the next batch run so that it would be able to check for sign off status once more and proceed or recreate the marker file to block itself again.

I am just wondering if Luigi would be an appropriate library for the above workflow or would it be stretching it too far? I would really appreciate any feedback. Please feel free to ask for more clarity or context where required.

Thank you,
Hari

Lars Albertsson

unread,
Oct 27, 2020, 8:35:31 AM10/27/20
to Hari Krishna Dara, Luigi
Hi,

Sorry that you had to wait so long for an answer. You can build
workflows that include manual steps. The manual interaction itself is
outside Luigi scope, however.

You are on the right track with markers. Your full pipeline will be a
combination of human and machine tasks. The challenge is the
integration between. When you have a human task before a machine task,
Luigi could poll for completion of the human task. In the general
case, you can use a custom ExternalTask, where you override the
complete() method. You can also poll for existence of files by
overriding output(), or run a database query, depending on how you
want the human to signal completion. If you use GSuite or Office 365,
a tick box or a row in an online spreadsheet might be a good signal
for handover.

Luigi will check for completion every time it runs. In order to make
the pipeline reliable, you schedule Luigi with a cron job or similar
service to poll regularly until the pipeline is completed.

This pattern is also applicable for data collection, where data is
pushed from source systems, rather than provided by a human.

Luigi will know nothing about the human part of the process. If you
would like a more integrated experience with UI visibility into the
pipeline, you will need a tool that has explicit support for human
workflows. Either manual process tools, such as Jira, or workflow
engine tools that support mixed workflows. If you already use a CI/CD
tool that has support for a mixed pipeline, e.g. Jenkins or GoCD, that
might be a good candidate.

Regards,

Lars Albertsson
Data engineering entrepreneur
www.scling.com, www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109
> --
> You received this message because you are subscribed to the Google Groups "Luigi" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/luigi-user/d87fe144-833e-4506-9fbc-5bd9b0bb64beo%40googlegroups.com.

Hari Krishna Dara

unread,
Oct 28, 2020, 11:20:48 AM10/28/20
to Lars Albertsson, Luigi
Thank you Lars, for the response, much appreciated!

We have done a POC that combines Luigi with a SQLite database for storing the output and Jenkins to play the role of an external scheduler, along with a git repo and this seems to work well. When it comes to the tasks that require human inputs, we already have a service that registers this state and so the task simply checks for the value and if unavailable raises an exception to cause the workflow to fail. The next time Jenkins triggers the build, Luigi automatically skips the tasks that have been completed and reruns the task that checks for human input.

Lars Albertsson

unread,
Oct 28, 2020, 3:47:26 PM10/28/20
to Hari Krishna Dara, Luigi
That sounds like a good approach to me. Looking forward to the blog post. ;-)

Lars Albertsson
Data engineering entrepreneur
www.scling.com, www.mapflat.com
https://twitter.com/lalleal
+46 70 7687109

Reply all
Reply to author
Forward
0 new messages