Help needed: how to collect data information by running a first Node and then using these metadata?

38 views
Skip to first unread message

Michael

unread,
Jul 29, 2013, 4:51:46 PM7/29/13
to nipy...@googlegroups.com
Before running the main workflow  MainWf I need to read some information from the data and record it in the InputNode containing all necessary information (e.g: DTI_path, number_of_B0s) so that a subsequent function chooses which worfklow to run next according to InputNode. How to implement this?

Since NiPype creates the pipeline on the spot and not dynamically, it seems i would need to first run an initialization workflow IO to get all the information from the files (for example number_of_B0s) in InputNode and only subsequently run the rest of the pipeline:

  inputnode = pe.Node(interface=util.IdentityInterface(fields=["DTI_path","number_of_B0s"]), 
                      name="inputnode")
  IO.connect([
                    (infosource, datasource,
                    [('subject_id', 'subject_id')]),
                    (datasource, inputnode,
                    [('DTI_path', 'DTI_path')]),
                    (datasource, GetNumberOfB0s,
                    [('DTI_path', 'DTI_path')]),
                    (GetNumberOfB0s, inputnode,
                    [('number_of_B0s', 'number_of_B0s')])  
           ])
  IO.run() # At this point the aim would be to have IO workflow variables set, however they are all empty, as before the execution

  def get_my_next_worflow(inputnode):
    if inputnode.inputs.GetNumberOfB0s == 1 :
         MainWf = ... 
    else:
         MainWf = ...
  return MainWf

  MainWf = get_my_next_workflow(IO.inputnode)  # Not working: IO.inputnode is empty and so is inputnode.inputs.GetNumberOfB0s
    
  MasterWf = connect(IO, MainWf)
  MasterWf.run()

When working with files the solution is to make use of the base_dir variable and save the data in a specific dir with the first workflow, run it, and then collect the data from the same dir with Datagrabber part of the second workflow. But how to do it with only metadata without creating files? I only want to initialize IO.inputnode with all the information i need (which requires running it once) to find which workflows to run in the pipeline. Any idea on how to do this would be greatly appreciated

Michael

unread,
Jul 29, 2013, 5:03:52 PM7/29/13
to nipy...@googlegroups.com
PS: please replace inputnode.inputs.GetNumberOfB0s by inputnode.inputs.number_of_B0s in the post above


--
 
---
You received this message because you are subscribed to a topic in the Google Groups "NiPy Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nipy-user/Gj73dGOF9ts/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nipy-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Satrajit Ghosh

unread,
Jul 29, 2013, 5:12:25 PM7/29/13
to nipy-user
hi michael,

you could do it with a function node.

fnode = Node(Function(function=getinfo))
inode = Node(IdentityInterface(...))

connect(fnode, 'output', inode, 'input')

you could also write a function that creates your workflow based on the input info.

cheers,

satra

--
 
---
You received this message because you are subscribed to the Google Groups "NiPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+...@googlegroups.com.

Michael

unread,
Jul 29, 2013, 5:26:55 PM7/29/13
to nipy...@googlegroups.com

Hi Satra,

I don't see how this would solve the problem. The result of getinfo will not be available when the pipeline is built (before it is run) so the condition will evaluate to None in the workflow-choosing function which will thus always returns the same workflow

Satrajit Ghosh

unread,
Jul 30, 2013, 9:11:25 AM7/30/13
to nipy-user
hi michael,

if your io workflow is not too complex, you can simply use a python paradigm here:

----
def create_workflow(subject_id):
  info = get_info(subject_id)
  if info.GetNumberOfB0s == 1 :
         MainWf = ... 
    else:
         MainWf = ...
  return MainWf

meta_workflow = Workflow('big_one')
meta_workflow.add_nodes([create_workflow(subject) for subject in subjects])
----

and you can use DataGrabber/DataFinder simply as an interface in get_info().

your other options are:
1. indeed to do as you proposed execute workflow1  and then run workflow2
2. change workflow2 to use MapNodes which are meant to handle dynamic cases (if you are able to share your workflow, we can take a look to see if this is feasible). 
   
cheers,

satra

Michael

unread,
Jul 30, 2013, 10:49:16 AM7/30/13
to nipy...@googlegroups.com
Hi Satra,

Many thanks for this. I couldn't succeed in using option #1 and in solely passing metadata (node and workflow parameters were not set after workflow execution). It would be fantastic if you could have a look. I'm going to write a minimal test function with some data and post it here.

Michael

unread,
Jul 30, 2013, 5:13:30 PM7/30/13
to nipy...@googlegroups.com
The data and scripts relating to the issue can be found here: https://www.dropbox.com/sh/lwylg03er15i7bi/IZ47jZ_r3b
The main script collects the paths of the DMRI data (1 b0 and 3 non-b0 for subject sub1, 2 b0 and 2 non-b0 for subject sub2) and the bvals and bvecs file (comment or uncomment line 18/19 to change subject). The objective of the first IO worflow is to set all the data in the inputnode, especially GetBvalsIndices which collects the volume indices related to each b value (from which can be inferred the number of b0s). The objective of the second workflow is to use the resulting inputnode info to calculate the mean b0, the method depending on the number of b0.

The main problem is that after IO is run, no variables relating to IO or inputnode remain set so when get_meanb0_worflow is called with IO.inputnode, the error "'Workflow' object has no attribute 'inputnode'" is thrown. A similar error is thrown if get_meanb0_worflow is called with inputnode.

I reduced the scripts to the minimum, although i let the main script (dmri_main) import a function from the second script (dmrihelp) on purpose (i may have a question about this later). In the main script, three paths have to be changed before being run:
- the path to the data on line 26: datasource.inputs.base_directory = '/path/to/my/data'
- the path to the basedir of the first worflow on line 55: IO.base_dir = '/how/does/this/path/relate/to/metaflow/path'
- the path to the master workflow on line 81: metaflowDMRI.base_dir = '/metaflow/path'

Any help would be greatly appreciated and i'll answer asap any question you may have about the scripts.
cheers,

satra


cheers,

satra

To unsubscribe from this group and all its topics, send an email to nipy-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
 
---
You received this message because you are subscribed to the Google Groups "NiPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
 
---
You received this message because you are subscribed to a topic in the Google Groups "NiPy Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nipy-user/Gj73dGOF9ts/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nipy-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
 
---
You received this message because you are subscribed to the Google Groups "NiPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 

--
 
---
You received this message because you are subscribed to a topic in the Google Groups "NiPy Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nipy-user/Gj73dGOF9ts/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nipy-user+unsubscribe@googlegroups.com.

Michael

unread,
Aug 5, 2013, 11:23:37 AM8/5/13
to nipy...@googlegroups.com
Is there any chance someone could have a look at the script? Satra? I still didn't find a clean solution to first generate metadata and then pass them to subsequent workflows which are initialized depending on these metadata. In the scripts found at the link below, the computed metadata is the bvals --> DWI volume indices dictionary, and the workflow depending on these metadata is the one calculating the mean b0.

Satrajit Ghosh

unread,
Aug 7, 2013, 9:14:00 AM8/7/13
to nipy-user
hi michael,

i see what you are trying to do. in the example you sent me you could do the get_meanb0_workflow without the if clause, as it would work for `b0vals_number >= 1`. but assuming you have something more complicated where you do need quite different workflows this is what you can do. basically isolate and integrate the outermost datagrabbing loop into the workflow generation function. since the datagrabber is fairly lightweight, doing this has almost 0 performance penalty.

```
def get_subject_workflow(subject):
    datasource = nio.DataGrabber(sort_filelist=True,
                                 infields=['subject_id'],
                                 outfields=['DMRIdata','DMRIbvals','DMRIbvecs']))
    datasource.inputs.base_directory = '/path/to/my/data'
    datasource.inputs.template = '%s/%s/%s.%s'
    datasource.inputs.template_args = dict(DMRIdata=[['subject_id', 'DMRI', 'DMRI', 'nii.gz']],
                                       DMRIbvals=[['subject_id','DMRI', 'testbvals', 'bval']],
                                       DMRIbvecs=[['subject_id','DMRI', 'testbvecs', 'bvec']]
                                       )
    results = datasource.run()
    nbvals = np.nonzero(np.genfromtxt(results.outputs.DMRIbvals) == 0)[0]
    # create subject workflow
    ...
    return subject_workflow 
    

metawf = Workflow(name="meta")
for subject in subject_list:
     subject_wf = get_subject_workflow(subject)
     metawf.add_nodes([subject_wf])
```

cheers,

satra

To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+...@googlegroups.com.

Michael

unread,
Aug 7, 2013, 12:08:32 PM8/7/13
to nipy...@googlegroups.com
Many thanks for this solution Satra, it seems to solve the problem (the example in the script was trivial on purpose), i will try it now. 

Could you comment on the advantages/disadvantages in using "for-loop + embed DataGrabber" compared to the standard "DataGrabber + iterables" procedure (aside that "for-loop + embed DataGrabber" can be used for conditional workflow)? Is there any incidence on parallelization, Datasink management, etc?


To unsubscribe from this group and all its topics, send an email to nipy-user+...@googlegroups.com.

Satrajit Ghosh

unread,
Aug 13, 2013, 1:21:17 PM8/13/13
to nipy-user
hi michael,

Could you comment on the advantages/disadvantages in using "for-loop + embed DataGrabber" compared to the standard "DataGrabber + iterables" procedure (aside that "for-loop + embed DataGrabber" can be used for conditional workflow)? Is there any incidence on parallelization, Datasink management, etc?

iterables are essentially for loops on the subgraph of the node on which it is set and are expanded before running the workflow. so mostly this should not have any impact on the parallelization of a workflow. the executable graph will look very similar in both cases.

cheers,

satra
Reply all
Reply to author
Forward
0 new messages