Ruffus vs. Luigi

Saeed Al Turki

unread,

Nov 10, 2014, 5:55:38 PM11/10/14

to ruffus_...@googlegroups.com

Hi,

Our NGS pipelines are currently managed by bash scripts and we plan to re-write them using either Ruffus or Luigi (Luigi is a Python module that helps building complex pipelines of batch jobs).

https://github.com/spotify/luigi

We are trying to:
- Track all jobs and capture the failed jobs (i.e. tasks monitoring, email the pipeline manager when things go wrong).
- Re-run the pipeline from the last failed task and not from scratch.
- Define the tasks in a modular fashion (e.g. alignment task) so they can be used in other pipelines.

Both Luigi and Ruffus have a lot to offer and both are easy to use. Personally, I like Ruffus more for few reasons. Mainly because we have LSF on our cluster and Ruffus supports submitting LSF jobs via DRMAA.

I'd love to hear your general thoughts on Luigi and how it is compared to Ruffus? Any points are greatly appreciated.

Thanks a lot,
Saeed

Radhouane Aniba

unread,

Nov 10, 2014, 6:26:06 PM11/10/14

to ruffus_...@googlegroups.com

Hello Saeed

For Luigi here is an example of how to write a bioinformatics pipeline using it http://coderscrowd.com/app/codes/view/229

I liked it as prototyping pipelines, but never used it in production. I liked their notification system, their scheduler and it seems to be well supported

When I used Luigi for the first time, someone at CodersCrowd pointed to this http://ratatosk.readthedocs.org/en/latest/index.html which is a framework for bioinformatics pipeline development based on Luigi you may want to give this a try but I dont think it is supported

to be fair (sorry Leo :) ) you may want to look at snakemake as well, but I prefer Ruffus to it for a lot of reasons ( I was about to dig deep in snakemake because of python 3 but now its supported by ruffus, I prefer polishing my pipelines using the same framework

Voila,

Hope that helps

Rad

--
You received this message because you are subscribed to the Google Groups "ruffus_discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruffus_discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Radhouane Aniba
Bioinformatics Scientist
BC Cancer Agency, Vancouver, Canada

Saeed H. Al Turki

unread,

Nov 11, 2014, 7:29:17 PM11/11/14

to ruffus_...@googlegroups.com

Thank you Rad, these are helpful suggestions.

I'm curious, which one of them you use in your work and why?

--
You received this message because you are subscribed to a topic in the Google Groups "ruffus_discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ruffus_discuss/4mdQthN2wvY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ruffus_discus...@googlegroups.com.

Saeed H. Al Turki

unread,

Nov 11, 2014, 7:38:14 PM11/11/14

to ruffus_...@googlegroups.com

I mean it seems that you are using Ruffus but why not Luigi? I just need to make a case for / or against Luigi when I discuss 'Ruffus vs. Luigi' topic in a few days with our team.

Aniba, Radhouane

unread,

Nov 11, 2014, 7:49:51 PM11/11/14

to ruffus_...@googlegroups.com

I am using Ruffus because we are using it for years in our group. All of these solutions are great, you must think about these points and investigate more to make the right decision:

- Scalability

- Ease of use

- Portability

- Dependency

Ideally the less you write code the better it will be, Luigi has a clear syntax when you develop a Task, but Ruffus is better to personalize and reuse pipelines.

It is really a personal decision that you need to make. My suggestion : pick a subject : pipeline for indel calling, or a good exercise would be to create GATK best practices pipeline where you’ll have to wrap GATK, picard tools etc ..

Do that with Ruffus, Luigi and any other framework, you’ll get your hands dirty and you’ll end up discovering pros and cons of each one while implementing things

and share your experience @ CodersCrowd too :)

When you do this, you’ll know :)

Rad

Saeed H. Al Turki

unread,

Nov 11, 2014, 7:51:44 PM11/11/14

to ruffus_...@googlegroups.com

Fair enough, thanks Rad :)

Leo Goodstadt

unread,

Nov 12, 2014, 6:49:48 AM11/12/14

to ruffus_...@googlegroups.com

Dear Saaed,

Ruffus seems to have a slightly different functionality set from Luigi.

Ruffus is aimed squarely at managing scalability, running many (e.g. bioinformatics) operations in parallel.

Ruffus manages task dependencies like Luigi, though no one has yet contributed (!) a run time work flow graph gui visualization for Ruffus unlike Luigi. (You do get static views of the workflow / pipeline).

However, a key part of the design for Ruffus is that each of these dependent tasks can comprise multiple parallel operations which have their own dependencies, can merge together or be split up and then transformed in multiple steps.

So for example, in bioinformatics, you might need to (1) split up a fastq file into small chunks, (2) run bwa to align them onto reference genome(s), (3) run stampy to refine the alignment, (4) merge all these alignments back together and (5) sort and compress to give a bam file. Each of these steps would be a separate task with hundreds of component files running in parallel. Many of these components may fail but that should not cause the whole pipeline to be rerun.

My understanding is that these [single input ->parallel->single output] operations would be combined into one Hadoop map reduce task in Luigi. This makes some parts of the pipeline look simpler (All these complicated operations get hidden as a single task) but you still need to manage the underlying dependencies and complexities and failures one way or another.

The other part of Ruffus is that I am very wary of monolithic designs. I try to ensure that Ruffus plays nice with other libraries and setups. So I try to make sure that bioinformatics groups can use Ruffus on a shared computation cluster without asking to take over job scheduling, worrying about whether hadoop needs to be installed and supported etc. This is a design philosophy and has both pros and cons (and obviously makes less sense for a single big company like spotify where one team can make IT decisions for the whole company).

Leo

Leo Goodstadt

unread,

Nov 12, 2014, 7:10:30 AM11/12/14

to ruffus_...@googlegroups.com

Dear Saeed,

We hope to provide great improvements in writing modular pipelines building blocks in Ruffus. Clare Sloggett and Bernie Pope contributed new syntax which allows groups of tasks to be placed in a sub-pipeline (e.g. in a separate python module). Hopefully, this will allow the Ruffus community to start sharing full fledged DNA alignment pipeline modules, SNP calling etc.

I am also always up for improving Ruffus. We have a plan on how to slowly add functionality to Ruffus to make it more powerful and flexible while maintaining 100% backwards compatibility.

However, it is often the small suggestions (emailing the pipeline manager on errors) involving relatively little work and can be prioritised which can make a big difference in the usability of Ruffus.

Please let me know which way you jump.

I hadn't heard of Luigi and it does look very interesting. We might try it out ourselves... :-)

Leo

Saeed H. Al Turki

unread,

Nov 14, 2014, 1:38:58 PM11/14/14

to ruffus_...@googlegroups.com

Dear Leo,

Many thanks for your detailed and very helpful response.

I'm writing a small pipeline in both Luigi and Ruffus for testing purposes. I'll share my results and findings here when I finish.

Saeed

--
You received this message because you are subscribed to a topic in the Google Groups "ruffus_discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ruffus_discuss/4mdQthN2wvY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ruffus_discus...@googlegroups.com.

Akshita Dutta

unread,

Apr 1, 2015, 12:40:52 PM4/1/15

to ruffus_...@googlegroups.com

Hi Saeed,

Did you happen to compare these two libraries? I am relatively new to both, and it would be helpful to decide which one to choose.

./akshita

Reply all

Reply to author

Forward