Nested workflow best practices

Fred Loney

unread,

Mar 7, 2013, 1:32:14 PM3/7/13

to nipy...@googlegroups.com

I am exploring different solutions to organize a large workflow. Suppose I have a composite workflow named pipeline two workflow components stage and register, which I often execute consecutively but sometimes execute separately. A nipype Workflow cannot serve as a parent Workflow node. There are two techniques to run stage and register from pipeline:

Wrap pipeline in a Function adapter and make pipeline nodes on those interfaces
Make Pipeline, Stage and Register Interface adapters rather than Workflows and call them imperatively using Memory.cache

The cache technique is appealing because it looks more like normal Python. Each Interface _run_interface method creates its own Memory cache in the working directory, by calling mem = Memory('.'). pipeline calls mem.cache(Stage) and mem.cache(Register) to make cached instances that run these child workflows, which in turn make their own cache contexts.

If I then run pipeline as follows:

mem = Memory('/tmp/mypipeline')
mem.cache(Pipeline)(in_arg=...).run()

then is /tmp/mypipeline the parent directory of all the cached work areas?

Is this is a reasonable workflow modularization approach? Is it ok to make multiple cache contexts like this? Is there a better alternative?

Note that this example is a simplification. The general problem is how to scale up workflows in a modular way rather than use a single flat workflow.

Thanks,

Fred

Fred Loney

unread,

Mar 7, 2013, 2:27:48 PM3/7/13

to nipy...@googlegroups.com

Correction: pipeline is run by the constructor, not the run method:

result = mem.cache(Pipeline)(in_arg=...)

Chris Filo Gorgolewski

unread,

Mar 8, 2013, 5:32:04 AM3/8/13

to nipy...@googlegroups.com

Interesting... I have not been using Memory interfaces too extensively, but maybe Gael could chime in.

Best,

Chris

--

---
You received this message because you are subscribed to the Google Groups "NiPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Satrajit Ghosh

unread,

Mar 8, 2013, 8:08:54 AM3/8/13

to nipy-user

hi fred,

nipype is meant to be flexible to allow many different use cases. i think how you construct and maintain your workflows is completely up to you, but some provisions have impact on what features of nipype are available.

in this context there are a few considerations:

- distributed execution may be limited depending on how you construct these things and what your cluster supports - for example a workflow running inside a "Workflow" interface won't parallelize except with MultiProc.

- extracting a static version of the workflow (https://github.com/nipy/nipype/pull/517) - the intent is to generate a python script that can explicitly contains all defaults used to run that workflow

- while each interface will generate formal provenance (https://github.com/nipy/nipype/pull/451), the overall workflow provenance may not be available and as we move towards making such info available together with data, you might have to generate things.

i am listing the above here, just to give a sense of where things might be headed and also to solicit feedback on how things might be improved. you should totally feel free to explore options that suit your needs and we can discuss how some of the above considerations could be made to work in other scenarios.

cheers,

satra

ps. gael's on vacation till the end of the month.

pps. i'm mostly in grant mode till monday - so won't necessarily respond till after the weekend.

Fred Loney

unread,

Mar 13, 2013, 8:40:30 PM3/13/13

to nipy...@googlegroups.com

Thanks for the caveats, Satra.

Can only Workflow can parallelize via a plugin? A Memory cache cannot, correct?

Thus, in a nutshell, Workflow buys you parallelization and provenance whereas a Memory cache buys you rerun currency checking and Python task control. Is that right? If so, are there plans to merge the two? I understand that you that can't infer node connection edges from Python control structures, but it would be nice to support more flexible "node glue" besides io, MapNode and Function. Perhaps generalize the connection edge with a generator yield expression callback.

As a I understand it, child workflows can be added to a parent workflow as a node using add_nodes, but a child workflow cannot be connected to other nodes. In that case, the parent expands the child workflows to a single flat graph before execution. The expanded graph is what is parallelized. In fact, child workflows can even share nodes, right? If that is the case, then a common node can bridge two workflows as a sync point. Is that right?

Thanks,

Fred

Chris Filo Gorgolewski

unread,

Mar 14, 2013, 7:28:31 AM3/14/13

to nipy...@googlegroups.com

On 14 March 2013 01:40, Fred Loney <lon...@ohsu.edu> wrote:

Thanks for the caveats, Satra.

Can only Workflow can parallelize via a plugin? A Memory cache cannot, correct?

Thus, in a nutshell, Workflow buys you parallelization and provenance whereas a Memory cache buys you rerun currency checking and Python task control. Is that right?

Workflows also check if a node/interface needs to be rerun.

If so, are there plans to merge the two? I understand that you that can't infer node connection edges from Python control structures, but it would be nice to support more flexible "node glue" besides io, MapNode and Function. Perhaps generalize the connection edge with a generator yield expression callback.

Could you provide an example how this would look in code?

As a I understand it, child workflows can be added to a parent workflow as a node using add_nodes, but a child workflow cannot be connected to other nodes. In that case, the parent expands the child workflows to a single flat graph before execution. The expanded graph is what is parallelized. In fact, child workflows can even share nodes, right? If that is the case, then a common node can bridge two workflows as a sync point. Is that right?

Hmm... sharing nodes between workflows seems like a bad idea, but I have not tried it. Workflows and and Nodes inherit from the same class and can do basically same things. You can have a mix of interconnected workflows and nodes inside a workflow ad infinitum.