Workflow managers - airflow, luigi, ...?

61 views
Skip to first unread message

fangohr

unread,
Nov 5, 2018, 1:42:49 PM11/5/18
to Jupyter at Research Facilities, fangohr
Hi all,

has anybody got experience with workflow/task managers such as Apache Airflow https://airflow.apache.org or http://luigi.readthedocs.io or similar tools to manage data analysis jobs?

I anticipate computational jobs that can vary between a few minutes and a few hours, would like to use dedicated hardware and HPC hardware from a cluster; jobs need to be triggerable to start on demand, and I’d like a graphical overview of the current progress, failing jobs/compute resources, etc, ideally in a web interface. One type of job would include to execute Jupyter Notebooks as scripts (for example using papermill).

Airflow looks like a good candidate, but I’d love to hear if anybody else here has relevant experience.

Many thanks, 

Hans






Samuel Lelièvre

unread,
Nov 6, 2018, 4:07:15 AM11/6/18
to Jupyter at Research Facilities

Mon 2018-11-05 19:42:49 UTC+1, fangohr:
Just four hours later on a different list:



Carsten Fortmann-Grote

unread,
Nov 6, 2018, 4:45:36 AM11/6/18
to jupyter-resea...@googlegroups.com


On 11/6/18 10:07 AM, Samuel Lelièvre wrote:
>
> Mon 2018-11-05 19:42:49 UTC+1, fangohr:
>>
>> has anybody got experience with workflow/task managers
>> such as Apache Airflow https://airflow.apache.org
> <https://airflow.apache.org/>
>> or http://luigi.readthedocs.io <http://luigi.readthedocs.io/> or
> similar tools to manage
>> data analysis jobs?
>>
>> I anticipate computational jobs that can vary between a few minutes
>> and a few hours, would like to use dedicated hardware and HPC
>> hardware from a cluster; jobs need to be triggerable to start on demand,
>> and I’d like a graphical overview of the current progress,
>> failing jobs/compute resources, etc, ideally in a web interface.
>> One type of job would include to execute Jupyter Notebooks
>> as scripts (for example using papermill).
>>
>> Airflow looks like a good candidate, but I’d love to hear if anybody
>> else here has relevant experience.
>
> Just four hours later on a different list:
>
> https://groups.google.com/d/topic/jupyter/_fTy7L0k7QE/discussion
>
blue waters
https://bluewaters.ncsa.illinois.edu/webinars/workflows
has a series of webinars on scientific workflows, maybe you find these
helpful.

c.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Jupyter at Research Facilities" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jupyter-research-fa...@googlegroups.com
> <mailto:jupyter-research-fa...@googlegroups.com>.
> To post to this group, send email to
> jupyter-resea...@googlegroups.com
> <mailto:jupyter-resea...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jupyter-research-facilities/a420b6a5-9cf1-4b92-ad95-ab2bea5b6079%40googlegroups.com
> <https://groups.google.com/d/msgid/jupyter-research-facilities/a420b6a5-9cf1-4b92-ad95-ab2bea5b6079%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Dr. Carsten Fortmann-Grote
Scientist for Scientific Simulations / Wissenschaftler fuer
wissenschaftliche Simulationen
European XFEL GmbH
Holzkoppel 4
22869 Schenefeld
Germany

Phone: +49 (0)40 8998-5603
Fax: +49 (0)40 8998-1905
Email: carste...@xfel.eu
Web: www.xfel.eu

Managing Directors: Prof. Dr. Robert Feidenhans'l, Dr. Nicole Elleuche

Registered as European X-Ray Free-Electron Laser Facility GmbH
at Amtsgericht Hamburg, HRB 111165

signature.asc

Christophe Duong

unread,
Nov 7, 2018, 4:11:08 AM11/7/18
to Jupyter at Research Facilities
Hello,
I am working on a tool that would handle what you are describing:
Jupyter + Airflow + Papermill + Docker = http://www.aiscalate.com

I anticipate computational jobs that can vary between a few minutes and a few hours would like to use dedicated hardware and HPC hardware from a cluster;

I designed it such that you can to work on a version locally on your machine for example to iterate such data analysis before pushing through version control to a dedicated cluster where longer computations can happen (on more data). The environment should be reproducible through docker images.

I’d like a graphical overview of the current progress, failing jobs/compute resources, etc, ideally in a web interface.
 
Yes, Airflow is becoming really popular for such use.
On my project, I am still lacking custom web interface apart from Airflow's, but it's still work in progress...

Chris

Allan, Daniel

unread,
Nov 13, 2018, 12:23:17 PM11/13/18
to fangohr, Jupyter at Research Facilities

Hi Hans,


We have written plans to evaluate a couple workflow systems like this on several scientific use cases and then decide on a recommended solution for NSLS-II. The member of our group who was taking the lead on this recently departed to industry, so the plans are momentarily on hold. I will reach out when we resume; in the meantime, we'd be interested in your thoughts.


Thanks,

Dan


Daniel B. Allan, Ph.D
Associate Computational Scientist, Brookhaven National Lab

From: jupyter-resea...@googlegroups.com <jupyter-resea...@googlegroups.com> on behalf of fangohr <hans.f...@xfel.eu>
Sent: Monday, November 5, 2018 1:42:46 PM
To: Jupyter at Research Facilities
Cc: fangohr
Subject: [jupyter-research-facilities] Workflow managers - airflow, luigi, ...?
 
--
You received this message because you are subscribed to the Google Groups "Jupyter at Research Facilities" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter-research-fa...@googlegroups.com.
To post to this group, send email to jupyter-resea...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter-research-facilities/E9E70373-9349-4F09-91CC-35AC6A35255F%40xfel.eu.

Tim Head

unread,
Nov 13, 2018, 5:28:49 PM11/13/18
to dal...@bnl.gov, hans.f...@xfel.eu, jupyter-resea...@googlegroups.com
What is the experience/story like with airflow for "ad-hoc" workflows
or workflows that I want to run only when a human/external condition
thinks they need executing instead of on a schedule?

Airflow always comes up as a cron++ which is a bit different.

Does anyone have thoughts on that?

T
On Tue, Nov 13, 2018 at 6:23 PM 'Allan, Daniel' via Jupyter at
Research Facilities <jupyter-resea...@googlegroups.com>
wrote:
> To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter-research-facilities/BN7PR09MB2579F3B1823FD8CA3D5882D1C0C20%40BN7PR09MB2579.namprd09.prod.outlook.com.

Hans Fangohr

unread,
Nov 14, 2018, 4:30:46 AM11/14/18
to Tim Head, Hans Fangohr, Allan, Daniel, 'Tyler Erickson' via Jupyter at Research Facilities
Hi all,

(Thank you for the various contributions; which I haven’t digested yet.)

> On 13 Nov 2018, at 23:28, Tim Head <bet...@gmail.com> wrote:
>
> What is the experience/story like with airflow for "ad-hoc" workflows
> or workflows that I want to run only when a human/external condition
> thinks they need executing instead of on a schedule?

I haven’t actually tried Airflow yet, but the documentation suggests that running a task on demand should be possible (https://airflow.apache.org/cli.html#run)
I agree that this is important for our use case, and that a ‘better cron’ is not sufficient.

Best wishes,

Hans
> To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter-research-facilities/CAN3x1Ra6fCNDD1QwmDS8oXk7vMdQcLXfYSw3n2uAZ0ZvkY4DTw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages