Dear Leo,
I am also interested in writing some tests for a pipeline written with ruffus.
At the first stage, I want to test that my setup of the pipeline is correct in the following sense:
- existence of tasks (i.e. all tasks that should be run are present in the pipeline)
- dependency relations of tasks
- each task will produce correct output, assuming certain input
Especially the last point gives me some difficulty:
I have managed to make the pipeline deterministic without having to run it, i.e. if I just do a printout, I get all the input and output files.
But instead of going through the output from the printout manually, I would like to write some tests that can be run automatically.
I think that running a printout, capturing the output, parsing it and using that for automated tests is not really optimal.
Instead, I would love to be able to programmatically, within python, get the information that I see during a printout, i.e.
- the task names in order of being run
- input files for each task
- output files for each task
To achieve this at the moment, I could for example mimic the function `pipeline_printout`, but then I'm using unexposed methods, like `_pipeline_prepare_to_run`.
While the ruffus code doesn't seem to be changing too much lately, a certain stability for these may be assumed, yet it would be nice to have an "official" way of achieving this : )
This might also be connected to
this issue - we solve status reporting using an own reporter, calling it in @posttask. If the information I need, or more generally the state of the pipeline and of tasks, were exposed, it would be easy to write custom status reporting and write automated tests.
Cheers,
Valentin