Nipype: A simple solution for using an existing workflow and saving outputs?

294 views
Skip to first unread message

Jason

unread,
Oct 13, 2013, 5:56:52 PM10/13/13
to nipy...@googlegroups.com
Hello, 

I'm a Nipype newbie and I have a relatively straightforward problem. I'm hoping for both a specific solution for this situation, and ideally a more generic solution for this type of problem, since I anticipate using Nipype a lot in the future in these kinds of situations. 

I have a bunch of images that I want to smooth using FSL's SUSAN and I stumbled upon a readymade workflow, create_susan_smooth. I can run the workflow fine by manually giving it inputs, but in order to structure the output images in a nice way, I need to use a Datasink. Also, in order to give it multiple inputs (to iterate over subjects) I should also use a DataGrabber. I think I am creating all 3 pieces correctly in isolation, but I keep getting errors when trying to link them up. Here's what I have so far, copied from the documentation: 

import nipype.pipeline.engine as pe          # pypeline engine
import nipype.interfaces.io as nio           # Data i/o
import os, os.path as op
from nipype.workflows.fmri.fsl.preprocess import * #this is where create_susan_smooth is

# Specify the location of the data.
data_dir = os.path.abspath('/Users/jason/Desktop/testsmooth')
# Specify the subject directories
subject_list = ['0001167','0000322']

# Map field names to individual subject runs.
info = dict(func=[['subject_id', ['func']]],
            mask=[['subject_id','mask']])

datasource = nio.DataGrabber(infields=['subject_id'], outfields=['func', 'mask'])
datasource.inputs.base_directory = data_dir
datasource.inputs.template = '%s/r1.feat/*.nii.gz'
datasource.inputs.field_template = dict(func='%s/r1.feat/filtered_func_data.nii.gz',
                                        mask='%s/r1.feat/mask.nii.gz')
datasource.inputs.template_args = dict(func=[['subject_id']],
                                       mask=[['subject_id']])
datasource.inputs.subject_id = subject_list
datasource.inputs.sort_filelist= True

smoother = create_susan_smooth()
smoother.inputs.inputnode.fwhm = 4

datasink = pe.Node(interface=nio.DataSink(), name="datasink")
datasink.inputs.base_directory = '/Users/jason/Desktop/smoothing_test'

#Initiation of the metaflow
metaflow = pe.Workflow(name="metaflow")

#Define where the workingdir of the metaflow should be stored at
metaflow.base_dir = '/Users/jason/Desktop/smoothing_test'


#Connect up all components
metaflow.connect([(datasource,smoother,[('func','inputspec.in_files')]),
                  (smoother,datasink,[('inputspec.smoothed_files',
                                      'smoothed')
                                       ])
                  ])
                  
and I always get the following error when running the last bit:

AttributeError                            Traceback (most recent call last)
<ipython-input-6-63200700350b> in <module>()
      9 metaflow.connect([(datasource,smoother,[('func','inputspec.in_files')]),
     10                   (smoother,datasink,[('inputspec.smoothed_files',
---> 11                                       'smoothed')
     12                                        ])
     13                   ])

/usr/local/lib/python2.7/site-packages/nipype/pipeline/engine.pyc in connect(self, *args, **kwargs)
    316                 newnodes.append(destnode)
    317         if newnodes:
--> 318             self._check_nodes(newnodes)
    319             for node in newnodes:
    320                 if node._hierarchy is None:

/usr/local/lib/python2.7/site-packages/nipype/pipeline/engine.pyc in _check_nodes(self, nodes)
    790         node_lineage = [node._hierarchy for node in self._graph.nodes()]
    791         for node in nodes:
--> 792             if node.name in node_names:
    793                 idx = node_names.index(node.name)
    794                 if node_lineage[idx] in [node._hierarchy, self.name]:

AttributeError: 'DataGrabber' object has no attribute 'name'

The DataGrabber is giving me what I want when I run it independently, and I think that the datasink is working correctly too. I'm just unsure about how to connect everything up. 

This brings me to my generic question. Let's assume I have a folder "mydata" that has a subfolder for each subject (101,102, etc). I have found a workflow that performs some function that I want. I want to feed that workflow the inputs from my folder structure, run it, then save the outputs in an orderly way (a separate folder with a subfolder for each subject, or in the same location as the original inputs). This is a general use of Nipype that I could see myself and many other people using very frequently. While the documentation provides many comprehensive examples of large, complex workflows, it does not seem to address this basic "building block" of how to re-use workflows (or perhaps I have just missed it). It's a hurdle that has pushed me away from using Nipype many times in favor of creating throwaway shell scripts. 

Generally, I would like to know: 

1. What's the most efficient way to figure out the necessary inputs that the workflow needs?
2. What's the most efficient way to see what outputs the workflow creates, and how do I access them?
3. What's the most straightforward way to feed my existing data to the workflow (assuming the folder structure above)? 
4. What's the most straightforward way to save those outputs in a reasonably organized way (i.e., a subfolder for each subject)?

Thanks for any help you could provide!
Jason

P.S. -- I realize that FEAT can do the smoothing, but for various reasons we have to do it at a later stage. 

Satrajit Ghosh

unread,
Oct 14, 2013, 10:38:59 AM10/14/13
to nipy-user
hi jason,

to the extent that workflows are created with some standard there should be a node representing the needed inputs and outputs. in this case:


it should be outputnode.smoothed_files instead of inputspec.smoothed_files

This brings me to my generic question. Let's assume I have a folder "mydata" that has a subfolder for each subject (101,102, etc). I have found a workflow that performs some function that I want. I want to feed that workflow the inputs from my folder structure, run it, then save the outputs in an orderly way (a separate folder with a subfolder for each subject, or in the same location as the original inputs). This is a general use of Nipype that I could see myself and many other people using very frequently. While the documentation provides many comprehensive examples of large, complex workflows, it does not seem to address this basic "building block" of how to re-use workflows (or perhaps I have just missed it). It's a hurdle that has pushed me away from using Nipype many times in favor of creating throwaway shell scripts. 
please do post here and let us know where we can improve. 

Generally, I would like to know: 

1. What's the most efficient way to figure out the necessary inputs that the workflow needs?
nothing better than looking at the workflow. (workflow code, workflow graph, workflow.export). within nipype we try to give each workflow and input/output node to help this situation - if you spot otherwise please let us know. outside nipype it's up to the creator. 

2. What's the most efficient way to see what outputs the workflow creates, and how do I access them?
perhaps this intro might help with the basics of nipype a bit, but the most efficient way for outputs is same as the strategy for inputs above.


3. What's the most straightforward way to feed my existing data to the workflow (assuming the folder structure above)? 
this part can get complicated depending on one's situation. in your situation i would use one of the three input interfaces: SelectFiles, DataFinder, DataGrabber to first get one subject's files and then run iterables over subjects for the datagrabber or even an identity node prior to that. most of the examples use this pattern. for example see:


4. What's the most straightforward way to save those outputs in a reasonably organized way (i.e., a subfolder for each subject)?
use the `container` field for DataSink and feed it the subject id. remember in nipype you cannot connect one input field to another input field. you'll have to use an identityinterface to distribute an input/paramter to multiple nodes.
 
cheers,

satra

Jason

unread,
Oct 14, 2013, 10:58:08 AM10/14/13
to nipy...@googlegroups.com
Hi Satra, 

Thank you, this is very helpful! It actually clears things up a lot. However, for my current problem, I am still having problems connecting the datasource to the smoothing workflow. Specifically, it looks like it's trying to access the "name" attribute in datasource when trying to connect, and it keeps giving me the error "DataGrabber object has no attribute 'name'". I even tried including 'name="datasource"' when creating it, but that didn't seem to work either. 

I even tried the simplified method of using connect: metaflow.connect(datasource, 'func',smoother,'inputspec.in_files') and I got the same error as in my original post. Any thoughts? 

Thanks again
Jason

Satrajit Ghosh

unread,
Oct 14, 2013, 12:58:25 PM10/14/13
to nipy-user
hi jason,

However, for my current problem, I am still having problems connecting the datasource to the smoothing workflow. Specifically, it looks like it's trying to access the "name" attribute in datasource when trying to connect, and it keeps giving me the error "DataGrabber object has no attribute 'name'". I even tried including 'name="datasource"' when creating it, but that didn't seem to work either. 

I even tried the simplified method of using connect: metaflow.connect(datasource, 'func',smoother,'inputspec.in_files') and I got the same error as in my original post. Any thoughts? 

you need to encapsulate DataGrabber in a Node, much like you have done for DataSink.

cheers,

satra

Jason

unread,
Oct 14, 2013, 2:57:06 PM10/14/13
to nipy...@googlegroups.com
Great, thanks! I actually stumbled across that solution just before receiving your response. I finally got everything to work, and have posted the code below (in case it's helpful to others). The one last problem I have concerns the output-- using the datasink "container" I am able to create a separate subfolder for each subject. However, it creates a series of subfolders, for example: 

0001167 -> smoothed ->  _subject_id_0001167 -> _smooth0 -> [smoothed file]

If I want a flatter organization, with the smoothed file saved in the "0001167" folder, what's the best way to do that? I managed to use the code below to essentially find-and-replace the folder names to remove them, but it feels like a hack: 

datasink.inputs.substitutions = [('_smooth0', ''),('smoothed','')]
datasink.inputs.substitutions_regexp = [('_subject_id_.*','')]

here's the rest of the code: 

import nipype.pipeline.engine as pe          # pypeline engine
import nipype.interfaces.io as nio           # Data i/o
import os, os.path as op
import nipype.interfaces.utility as util     # utility
from nipype.workflows.fmri.fsl.preprocess import *


# Specify the location of the data.
data_dir = os.path.abspath('/Users/jason/Desktop/testsmooth')

# Specify the subject directories
subject_list = ['0001167','0000322']
infosource = pe.Node(util.IdentityInterface(fields=['subject_id']),name='infosource')
infosource.iterables = [('subject_id', subject_list)]

#grab all the files with a DataGrabber
datasource=pe.Node(nio.DataGrabber(infields=['subject_id'], outfields=['func','mask']),name='datasource')
datasource.inputs.base_directory = data_dir
datasource.inputs.template = '%s/r1.feat/*.nii.gz'
datasource.inputs.field_template = dict(func='%s/r1.feat/filtered_func_data.nii.gz',mask='%s/r1.feat/mask.nii.gz')
datasource.inputs.template_args = dict(func=[['subject_id']], mask=[['subject_id']])
datasource.inputs.sort_filelist= True


#smoothing workflow
smoother = create_susan_smooth()
#smoother.base_dir='/Users/jason/Desktop/smoothing_test'
smoother.inputs.inputnode.fwhm = 4

#Node: Datasink - Create a datasink node to store important outputs
datasink = pe.Node(interface=nio.DataSink(), name="datasink")
datasink.inputs.base_directory = '/Users/jason/Desktop/smoothing_test'
#datasink.inputs.substitutions = [('_smooth0', ''),('smoothed','')]
#datasink.inputs.substitutions_regexp = [('_subject_id_.*','')]

#Initiation of the metaflow
metaflow = pe.Workflow(name="metaflow")

#Define where the workingdir of the metaflow should be stored at
metaflow.base_dir = '/Users/jason/Desktop/smoothing_test'

metaflow.connect(infosource,'subject_id',datasource,'subject_id')
metaflow.connect(datasource,'func',smoother,'inputnode.in_files');
metaflow.connect(datasource,'mask',smoother,'inputnode.mask_file');
metaflow.connect(infosource, 'subject_id', datasink, 'container')
metaflow.connect(smoother,'outputnode.smoothed_files',datasink,'smoothed');
metaflow.run()


Thanks!
Jason


--
 
---
You received this message because you are subscribed to a topic in the Google Groups "NiPy Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nipy-user/XmJcd9wLvuQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nipy-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Satrajit Ghosh

unread,
Oct 14, 2013, 3:00:17 PM10/14/13
to nipy-user
hi jason,

given that the only iterable in your workflow is subject_id, you can set parameterization=False as an argument for DataSink.  

cheers,

satra

--
 
---
You received this message because you are subscribed to the Google Groups "NiPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+...@googlegroups.com.

Jason

unread,
Oct 14, 2013, 5:24:16 PM10/14/13
to nipy-user
perfect! I just added 

datasink.inputs.parameterization=False;

and in order to keep it from making the "smoothed" subfolder, I just added @ in front of "smoothed" when connecting it to the workflow:

metaflow.connect(smoother,'outputnode.smoothed_files',datasink,'@smoothed')

thanks, 
Jason
Reply all
Reply to author
Forward
0 new messages