[nipype] Parameterize datasink by runs (or how to prevent files from being given the same location)

85 views
Skip to first unread message

Chris Johnson

unread,
Oct 23, 2013, 5:23:31 PM10/23/13
to nipy...@googlegroups.com
Hi all,

I'm working from a FreeSurfer directory. Given a session SESS, there are a number of runs, each of which contains a raw functional, motion-corrected functional and a set of motion parameters.

Run directory: $FUNCTIONALS_DIR/$SESS/bold/???/
Raw functional: f.nii
Motion-corrected functional: fmc.nii.gz
Motion parameters: fmc.mcdat

Freesurfer uses 'mc-afni' to perform motion correction.

For the purposes of this post, let's assume that I have two functional runs: 004, and 005.

I would like to use rapidart to detect outliers; following is a slightly modified version of my workflow:

infosource = pe.Node(interface=util.IdentityInterface(fields=['session_id']), name='infosource')
infosource.iterables=('session_id', sessions_list)

ds = pe.Node(interface=nio.DataGrabber(
                infields=['session_id'],
                outfields=['functionals', 'motion_corrected', 'mcparams'],
                base_directory=os.environ['FUNCTIONALS_DIR'],
                template='%s/bold/???/%s',
                template_args={'functionals': [['session_id', 'f.nii']],
                               'motion_corrected': [['session_id', 'fmc.nii.gz']],
                               'mcparams': [['session_id', 'fmc.mcdat']]},
                sort_filelist=True), name='datasource')

afni_artdetect = pe.Node(
    interface=ra.ArtifactDetect(parameter_source='AFNI',
                                mask_type='spm_global',
                                norm_threshold=1.0,
                                zintensity_threshold=3.0,
                                use_differences=[True, False]),
    name='afni_artdetect')

sinker = pe.Node(nio.DataSink(base_directory='./nipype_test', parameterization=False), name='sinker')

multiworkflow = pe.Workflow(name='workflow')
multiworkflow.connect([
    (infosource, ds, [('session_id', 'session_id')]),
    (infosource, sinker, [('session_id', 'container')]),
    (ds, sinker, [('motion_corrected', 'motion_corrected'),
                  ('mcparams', 'mcparams')]),
    (ds, afni_artdetect, [('motion_corrected', 'realigned_files'),
                          ('mcparams', 'realignment_parameters')]),
    (afni_artdetect, sinker,
        [('plot_files','afni.plot_files')])
    ])


This results in the following directory structure:

./nipype_test/
    $SESS/
        afni/
            plot_files/
                plot.fmc.png

All runs are being processed, but the outputs are mapped to the same file name. What I would like is something like the following:

./nipype_test/
    $SESS/
        afni/
            plot_files/
                004/
                    plot.fmc.png

                005/
                    plot.fmc.png


Sinker seems to have a DSL that would be useful for resolving this problem, but the documentation is completely opaque. Any assistance in differentiating file names would be helpful.

Chris

Satrajit Ghosh

unread,
Oct 23, 2013, 5:34:35 PM10/23/13
to nipy-user
hi chris,

try setting parameterization=True in Datasink.

i do think we need to revamp the datasink help.

cheers,

satra


--
 
---
You received this message because you are subscribed to the Google Groups "NiPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Chris Johnson

unread,
Oct 23, 2013, 7:02:57 PM10/23/13
to nipy...@googlegroups.com
Satra,

Setting parameterization=True simply changes the file structure to:

./nipype_test/
    $SESS/
        afni/
            plot_files/
                _session_id_$SESS/
                    plot.fmc.png


This is because the iterable is over sessions, not runs; for each session, the datagrabber gets a collection of files (one from each run) using globbing.

I was hoping I could add something like:

ds.inputs.template_args['runs'] = [['session_id', '']]

This would give me a 'runs' parameter that takes on the values 004 and 005. But it's not clear how I would be able to hook that into datasink to parameterize each file. I would also be fine with something like plot004.fmc.png in plot_files.

Thanks,
Chris

Satrajit Ghosh

unread,
Oct 23, 2013, 7:29:36 PM10/23/13
to nipy-user
hi chris,

ah i see, this is the fsfast structure where every file has the same name.

can you check the working directory of the art node to see if a file is created for each input run? i think that's where the problem is - that interface is overwriting each output.

in this case you should use the Rename node at the beginning to change the names of the files - see example three of Rename:

Chris Johnson

unread,
Oct 24, 2013, 12:32:42 PM10/24/13
to nipy...@googlegroups.com
Hi Satra,

As written, this doesn't quite meet my needs, as the run information is available as part of the directory name, but only the basename of the in_file is taken: https://github.com/nipy/nipype/blob/3219fc659f1b9a55037c1ff1ddf8ea4a1f31127b/nipype/interfaces/utility.py#L213

The natural way to write this would be
rename = util.Rename(parse_string=r'/bold/(?P<run>\d{3})/(?P<base>[^\.]+)\.(?P<ext>.+)$', format_string="%(base)_%(run).%(ext)")

I've also tried to pipe in both the file and the run directory, as follows:

ds = pe.Node(
    interface=nio.DataGrabber(
        infields=['session_id'],
        outfields=['runs', 'motion_corrected'],
        base_directory=os.environ['FUNCTIONALS_DIR'],
        template='%s/bold/???%s',
        template_args={'runs': [['session_id', '']],
                       'motion_corrected': [['session_id', '/fmc.nii.gz']]},
        sort_filelist=True),
    name='datasource')

rename = pe.MapNode(
    interface=util.Rename(
        parse_string="(?P<base>[^\.]+)",
        keep_ext=True,
        format_string="%(base)_%(run)"),
    iterfield=['in_file', 'run'], name='rename')

sinker = pe.Node(nio.DataSink(base_directory='./nipype_test3'
                              ), name='sinker')

workflow = pe.Workflow(name='renamer')

workflow.connect([
    (infosource, ds, [('session_id', 'session_id')]),
    (ds, rename, [('runs', 'run'), ('motion_corrected', 'in_file')]),
    (rename, sinker, [('out_file', 'out_file')])
])

This doesn't work, either. Is there a better way to pass run parameters? Unfortunately, not all sessions have the same number or set of runs, so I would like to use the directory structure to derive run information.

Thanks,
Chris

Satrajit Ghosh

unread,
Oct 24, 2013, 2:00:22 PM10/24/13
to nipy-user

Chris Johnson

unread,
Oct 24, 2013, 4:32:40 PM10/24/13
to nipy...@googlegroups.com
Satra,

That works great!

Thanks,
Chris
Reply all
Reply to author
Forward
0 new messages