The way to turn it off would be to remove its downstream dependencies, I suppose. Would that make sense?I could add an extra keyword @inactive or @active, but only if it would be of general use, and I can't see that at the moment.
What about a pipeline_remove_task function?
The @active or @inactive keyword was intended exactly in the sense of @conditional_run.The major point about adding decorators isn't the changes required but making sure thatthe number of decorators don't spiral out of control: keywords are a precious resource!
How about1) Let me ask around to see if anybody objects to have some such facility, or have other opinions on it,or perhaps are desperate for such a feature.
2) We try to come up with a really good name. Something more immediately obvious than@conditional_run (This is my subjective judgement: I am open to persuasion.)Suggestions include@active_if
@include_task_if
@run_if
3) If we implement this, let us not export this keyword automatically. So you wouldneed to write eitherfrom ruffus import conditional_run
import ruffusrf = ruffus
What do you think?
>> @active_if
>
>
> + 1 for active_if -- much better than my conditional_run
Agreed
>
>
>>
>>
>> 3) If we implement this, let us not export this keyword automatically.
>
> this would be fine-- I never do a 'from ruffus import *' anyway as I like to know where those functions and decorators are coming from.
Agreed. That is what I do as well. But other people do
from ruffus import *
which is, I guess, my fault for the examples in the docs.
So I was talking nonsense. What I actually meant was having an *additional* nested name space within ruffus for less mainstream or more experimental decorators or syntax.
So it would have to be
from ruffus import XXX.active_if
(Is this the right python syntax?)
Suggestions for XXX are also welcome.
Names turn out to be the second most painful part of changing ruffus. Writing docs is a whole other level of torture...
I am on the road so this will have to await my return next week.
Leo
> 2) We try to come up with a really good name. Something more immediately obvious than
> @conditional_run (This is my subjective judgement: I am open to persuasion.)
> Suggestions include
> @active_if
> @include_task_if
> @run_if
How about:
@when
(which was inspired by the monadic function in Haskell of the same name).
Cheers,
Bernie.
@when would be among my first choices but are not such simple names a liability in terms of clashes with variable names?
So I was talking nonsense. What I actually meant was having an *additional* nested name space within ruffus for less mainstream or more experimental decorators or syntax.
So it would have to be
from ruffus import XXX.active_if(Is this the right python syntax?)
Suggestions for XXX are also welcome.
Cheers,
Bernie.
Just wanted to say I think this is a *necessary* feature for the future of ruffus.
-- Ryan Dale, PhD Bioinformatics Scientist, NIH/NIDDK Contractor, Kelly Government Solutions
So I am extremely, extremely loathe to make things more complicated rather thansimpler.
I like the "if use_task_1 else None" idea with @transform. Assuming the addition of @active_if, I think the code below would run the example pipeline in my previous message (I could be wrong though) :
Perhaps another option would be to allow string task names, like @follows does. Then the trick is to decide what should be considered a task name and what should be a filename.
@transform(((task1 if use_task_1 else IGNORE_TASK),task2 if use_task_2 else IGNORE_TASK)), suffix(".bam"), ".alt.bam")def real_bammer(infile,outfile):pass
This syntax is more python-y rather than ruffus-y in approach.
The alternative would be have some sort of conditional wrapper around each task,sort of along the lines your suggest, and analogous to @active_if to indicate whetherthe task should be a real dependence.
Here is some possible syntax:
The new keyword would beget_input_if(This is really awful name: suggestions welcome).
I wonder whether it is correct to have 'if' statements in a pipeline.
In principle, a real pipeline (the one with the tubes and water in it) doesn't have "if" conditions. It is not that the water can choose whether to enter or not in pipe given some conditions, or that the pipeline changes its structure depending whether there is oil or water running in it.
With Makefiles, you usually tend to avoid conditions and loops because they make the code difficult to understand.
What I am saying is that you should keep your pipeline as simple as possible and leave the if conditions inside the function called. So, if you think that it is not appropriate to call GLITR every time, just write the condition within function that calls it and this will make your pipeline easier to understand. In principle, each run of the pipeline, with different data, should follow the same steps.
2) If the inactive task was up to date and did not produce any output
I think this means that all tasks functions of an entire branch *may* have to be tagged with @active_if.
This does mean that if you have the following scenario
-> task2 -> task3-> task4task1 -> task8-> task5 -> task6 -> task7
and task3 is inactive, then task2 will still run as well as task1-task8task4 will have no inputs unless it has extra dependencies which arenot inactive.
In order to avoid conditions, you just have to put the decisions inside a function. For example:
[jobs_that_need_macs, other_jobs] = determine_which_jobs_need_macs()
make_macs(jobs_that_need_macs)
make_other_jobs(other_jobs)
resume_results()
________________________________________
From: Jacob Biesinger [jake.bi...@gmail.com]
Sent: Friday, February 25, 2011 8:30 PM
To: ruffus_...@googlegroups.com
Cc: Dale, Ryan (NIH/NIDDK) [C]; Leo Goodstadt
Subject: Re: Easy way to turn *off* certain jobs?
Sorry to dig up an old thread. I find myself in the position of needing conditionals in my pipeline and I can't see an easy workaround.
I wonder if this feature has been implemented/included?
Cheers,
Bernie.
> <task.py.active_if.diff><ruffus_test.py><ruffus_utils.py>
On 30/09/2011, at 4:55 PM, Jacob Biesinger wrote:
> It takes a boolean, a list of booleans, or a function that returns one of those, and doesn't run if any of the booleans are False.
That looks quite cool.
> Another way of doing the conditionals is shown in Ryan Dale's code here:
> https://github.com/daler/pipeline-example/blob/master/pipeline-3/pipeline.py
Oh great, that looks like it will do what I want.
Cheers,
Bernie.
I implemented the @active_if decorator in https://github.com/jakebiesinger/ruffus
1954 | # I thought overwriting the param_generator_func et al would lead to | |
1955 | # trouble with the order of decorator nesting, but it seems to work fine |
Another way of doing the conditionals is shown in Ryan Dale's code here:
On 30 September 2011 07:55, Jacob Biesinger <jake.bi...@gmail.com> wrote:I implemented the @active_if decorator in https://github.com/jakebiesinger/ruffusYour @active_if code looks great. Would you mind contributing it to the main Ruffus code?
Would you like to be listed as a contributor / co-author?
pipeline_active_if = None
# # task with active state switching # @active_if(lambda:pipeline_active_if) @transform(...) def test_task(...): pass
# activate task pipeline_active_if = True pipeline_run([test_task], ...)
# inactivate task pipeline_active_if = False pipeline_run([test_task], ...)
self.param_generator_func self.needs_update_func
make_job_parameter_generator(...) printout(...) signal(...)
In a command which is merging/collating output from several tasks, how can include output from the previous steps which have not been inactive?
I.e. the following gives an error if features_lm_target has been deactivated
@active_if(cfg.exists_lm(target_language))
@transform(data_fetch, suffix(".orig.jcml"), ".lm.%s.f.jcml" % target_language, target_language, cfg.get_lm_name(target_language))
def features_lm_target(input_file, output_file, language, lm_name):
#code goes here
@collate([data_fetch] + [features_lm_target] if cfg.exists_lm(target_language) else [], regex(r"([^.]+)\.(.+)\.f.jcml"), r"\1.all.f.jcml")