Luigi vs. Airflow

2,621 views
Skip to first unread message

Nicholas Chammas

unread,
Jan 13, 2016, 4:53:34 PM1/13/16
to Luigi

I am just learning about Python libraries for building and managing data pipelines, and the two that I am most interested are Luigi and Airflow.

I’m trying to get a handle on the high-level differences between the two projects that might push a data engineer towards using one or the other.

Looking at the tutorials for each project (Luigi tutorial, Airflow tutorial), it seems like one difference is that you can optionally run your actual data processing and streaming in Luigi, whereas Airflow seems more strictly focused on pipeline definition and management. In other words, Airflow doesn’t touch any data directly, whereas Luigi lets you do that if you want.

The picture is still not clear enough for me, though, and the difference I tried explaining above is still quite abstract for me since I haven’t used either library.

If you’ve used both libraries and ended up choosing one over the other, can you share why? What are the key differences that pushed you one way or the other?

Arash Rouhani Kalleh

unread,
Jan 13, 2016, 9:37:10 PM1/13/16
to Nicholas Chammas, Luigi, maximebe...@gmail.com, es...@spotify.com
This is a very interesting topic. I have not used Airflow but it seems really cool. And looking at it's community I personally think it seems very active and have a culture of contributing, similar to luigi. So I wouldn't be afraid to use it.

I loop in mistercrunch (author of airflow) and es...@spotify.com (he evaluated Airflow when I worked at Spotify), so they have the chance to comment here too.

I'm also very interested to hear from people who've used/tried both and picked one over the other.

--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brian Caswell

unread,
Jan 18, 2016, 2:44:19 PM1/18/16
to Nicholas Chammas, Luigi

I've used both Airflow and Luigi.  The primary driver behind my preferred use of Luigi I need to know far less Luigi knowledge to get my job done when compared to other workflow systems including Airflow.

* Task parameterization is done via python, rather than XML or Jinja templates.

* Dynamic graph creation can be done via python with 'yield', rather than project specific operations.

* Built-in support for idempotent tasks

Static analysis (pyflakes/pylint), code reformatting (yapf), and refactoring (rope/pycharm) are all possible within Luigi.  Not so much with Airflow.

Airflow's UI is useful for visualization for ongoing tasks, but I find adding in my own tasks and using my existing log analysis infrastructure to be *far* superior.

Airflow's primary benefit is the use of a distributed task manager (Celery, in particular).  In a handful of cases, I have had to redesign my task structure due to Luigi's distribution model.

Brian


--

Brian Bloniarz

unread,
Jan 20, 2016, 1:24:51 PM1/20/16
to Brian Caswell, Nicholas Chammas, Luigi
We evaluated both here at Opendoor and ported one of our more complex workflows over to both, as a test.

We chose luigi -- the main reasons were: understandable code, simple execution model, few moving parts. At the time we weren't really sure whether either was going to fill our needs, so committing to the simpler library was easier to stomache.

We also depend on dynamic task creation (yield) in Luigi to control the degree of parallelization at runtime, which wasn't a feature supported by Airflow (ticket here: https://github.com/airbnb/airflow/issues/289).  It's incredibly handy -- in our usecase, there's no underlying Spark or Hadoop that we're farming out work to; we run heavyweight python stuff directly in our workflow or in a subprocess, so being able to dynamically subdivide work was important.

This was a while back so Airflow was still young, I definitely still think they're both promising projects. As of now, we're interested in ways to make the testability & development of workflow code better, but otherwise are happy with where we're at.

-Brian
Reply all
Reply to author
Forward
0 new messages