Ruffus and Testing

Rad

unread,

Jul 8, 2014, 11:33:36 PM7/8/14

to ruffus_...@googlegroups.com

Hello all,

I was wondering if there is any recommendation on doing Unit Testing of Ruffus Tasks. I have a ruffus based pipeline and would like to establish a CI and a full testing of each Task and function in the pipeline. But tests should be fast, as always recommended, in that sense, let's imagine a Task running a tools such as blast or bwa, what should be tested in that Task, should we go over the whole core function of a Task (which is alignment) OR breaking the test into small tests such as if a file exists or not, if an alignment is generated or not etc .. And for that is the usage of ruffus decorators recommended to 'fake' existence of inputs / outputs ?

Thanks

Rad

Leo Goodstadt

unread,

Jul 9, 2014, 7:32:18 AM7/9/14

to ruffus_...@googlegroups.com

Dear Rad,

You can just "touch" files in

pipeline_run ( ..., touch_files_only = False)

However, this does not work for pipelines where the identity of output files depends on a runtime decision (i.e. @split or @originate)

Leo

--
You received this message because you are subscribed to the Google Groups "ruffus_discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruffus_discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Valentin Krasontovitsch

unread,

Jul 6, 2017, 4:29:06 AM7/6/17

to ruffus_discuss, llewgo...@gmail.com

Dear Leo,

I am also interested in writing some tests for a pipeline written with ruffus.

At the first stage, I want to test that my setup of the pipeline is correct in the following sense:

- existence of tasks (i.e. all tasks that should be run are present in the pipeline)

- dependency relations of tasks

- each task will produce correct output, assuming certain input

Especially the last point gives me some difficulty:

I have managed to make the pipeline deterministic without having to run it, i.e. if I just do a printout, I get all the input and output files.

But instead of going through the output from the printout manually, I would like to write some tests that can be run automatically.

I think that running a printout, capturing the output, parsing it and using that for automated tests is not really optimal.

Instead, I would love to be able to programmatically, within python, get the information that I see during a printout, i.e.

- the task names in order of being run

- input files for each task

- output files for each task

To achieve this at the moment, I could for example mimic the function `pipeline_printout`, but then I'm using unexposed methods, like `_pipeline_prepare_to_run`.

While the ruffus code doesn't seem to be changing too much lately, a certain stability for these may be assumed, yet it would be nice to have an "official" way of achieving this : )

This might also be connected to this issue - we solve status reporting using an own reporter, calling it in @posttask. If the information I need, or more generally the state of the pipeline and of tasks, were exposed, it would be easy to write custom status reporting and write automated tests.