Ruffus and Testing

31 views
Skip to first unread message

Rad

unread,
Jul 8, 2014, 11:33:36 PM7/8/14
to ruffus_...@googlegroups.com
Hello all,

I was wondering if there is any recommendation on doing Unit Testing of Ruffus Tasks. I have a ruffus based pipeline and would like to establish a CI and a full testing of each Task and function in the pipeline. But tests should be fast, as always recommended, in that sense, let's imagine a Task running a tools such as blast or bwa, what should be tested in that Task, should we go over the whole core function of a Task (which is alignment) OR breaking the test into small tests such as if a file exists or not, if an alignment is generated or not etc .. And for that is the usage of ruffus decorators recommended to 'fake'  existence of inputs / outputs ?

Thanks

Rad

Leo Goodstadt

unread,
Jul 9, 2014, 7:32:18 AM7/9/14
to ruffus_...@googlegroups.com
Dear Rad,

You can just "touch" files in 
pipeline_run ( ..., touch_files_only = False)
However, this does not work for pipelines where the identity of output files depends on a runtime decision (i.e. @split or @originate)
Leo


--
You received this message because you are subscribed to the Google Groups "ruffus_discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ruffus_discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Valentin Krasontovitsch

unread,
Jul 6, 2017, 4:29:06 AM7/6/17
to ruffus_discuss, llewgo...@gmail.com
Dear Leo,

I am also interested in writing some tests for a pipeline written with ruffus.

At the first stage, I want to test that my setup of the pipeline is correct in the following sense:

- existence of tasks (i.e. all tasks that should be run are present in the pipeline)
- dependency relations of tasks
- each task will produce correct output, assuming certain input

Especially the last point gives me some difficulty:

I have managed to make the pipeline deterministic without having to run it, i.e. if I just do a printout, I get all the input and output files.

But instead of going through the output from the printout manually, I would like to write some tests that can be run automatically.

I think that running a printout, capturing the output, parsing it and using that for automated tests is not really optimal.

Instead, I would love to be able to programmatically, within python, get the information that I see during a printout, i.e.

- the task names in order of being run
- input files for each task
- output files for each task

To achieve this at the moment, I could for example mimic the function `pipeline_printout`, but then I'm using unexposed methods, like `_pipeline_prepare_to_run`. 

While the ruffus code doesn't seem to be changing too much lately, a certain stability for these may be assumed, yet it would be nice to have an "official" way of achieving this : )

This might also be connected to this issue - we solve status reporting using an own reporter, calling it in @posttask. If the information I need, or more generally the state of the pipeline and of tasks, were exposed, it would be easy to write custom status reporting and write automated tests.

Cheers,
Valentin
Reply all
Reply to author
Forward
0 new messages