I am just learning about Python libraries for building and managing data pipelines, and the two that I am most interested are Luigi and Airflow.
I’m trying to get a handle on the high-level differences between the two projects that might push a data engineer towards using one or the other.
Looking at the tutorials for each project (Luigi tutorial, Airflow tutorial), it seems like one difference is that you can optionally run your actual data processing and streaming in Luigi, whereas Airflow seems more strictly focused on pipeline definition and management. In other words, Airflow doesn’t touch any data directly, whereas Luigi lets you do that if you want.
The picture is still not clear enough for me, though, and the difference I tried explaining above is still quite abstract for me since I haven’t used either library.
If you’ve used both libraries and ended up choosing one over the other, can you share why? What are the key differences that pushed you one way or the other?
--
You received this message because you are subscribed to the Google Groups "Luigi" group.
To unsubscribe from this group and stop receiving emails from it, send an email to luigi-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I've used both Airflow and Luigi. The primary driver behind my preferred use of Luigi I need to know far less Luigi knowledge to get my job done when compared to other workflow systems including Airflow.
* Task parameterization is done via python, rather than XML or Jinja templates.
* Dynamic graph creation can be done via python with 'yield', rather than project specific operations.
* Built-in support for idempotent tasks
Static analysis (pyflakes/pylint), code reformatting (yapf), and refactoring (rope/pycharm) are all possible within Luigi. Not so much with Airflow.
Airflow's UI is useful for visualization for ongoing tasks, but I find adding in my own tasks and using my existing log analysis infrastructure to be *far* superior.
Airflow's primary benefit is the use of a distributed task manager (Celery, in particular). In a handful of cases, I have had to redesign my task structure due to Luigi's distribution model.
Brian
--