Multiple Runs per subject - Datasink container for each subject and run?

137 views
Skip to first unread message

Jason

unread,
May 31, 2014, 12:23:31 AM5/31/14
to nipy...@googlegroups.com
Hi, 

I'm working on a modification of the SPM, DARTEL example and generally it's working well. This is an experiment that has multiple runs per subject and I'm trying to make the script fairly general-purpose. I need to run a 3-level model (i.e, a regular level 1 analysis, a fixed effects analysis at level 2 to combine runs for each subject, and random effects at level 3 across subjects). First off, if someone has a good suggestion for running this type of analysis besides the solution I'm using I'm happy to hear it. 

In order to use the subjectinfo function to pull the appropriate information for each run, I need to provide "run" as an iterable for InfoSource. When running the workflow, in the working directory I get a separate subfolder for each run and subject ("_run_r2_subject_id_0000133"). Things appear to run smoothly, but when it comes to the DataSink, I have 2 options: if I set parameterization to True, then it separates everything by the run, but also creates lots of unnecessary subfolders for each contrast. When parameterization is False, then it stores everything in 1 subfolder for each subject, but only 1 set of results-- instead of a set of contrasts for each run. 

I'm wondering if there's a way to specify the Datasink so the files are outputted like this: 

subject_id/run/[all contrast images]

so, in the example below, each subject would have 2 subfolders, each containing all contrast images for that run.

I can attach my full workflow, but here's the snippet that's relevant: 


runs = ['r2','r3']
struct_name = 'anat_full'

file_template = {'struct': '%s/'+struct_name+'.nii*',
                'func': '%s/%s.nii'}

file_args = {'struct': [['subject_id']],
            'func': [['subject_id', 'run']]}

subject_list = ['0000133', '0000322']

infosource = pe.Node(interface=util.IdentityInterface(fields=['subject_id','run']), name="infosource")
infosource.iterables = [('subject_id', subject_list),('run',runs)]


datasource = pe.Node(interface=nio.DataGrabber(infields=['subject_id','run'],
                                               outfields=['func', 'struct']),
                     name = 'datasource')

datasource.inputs.base_directory = data_dir
datasource.inputs.template = '*'
datasource.inputs.field_template = file_template
datasource.inputs.template_args = file_args
datasource.inputs.sort_filelist = True

datasink = pe.Node(interface=nio.DataSink(), name="datasink")
datasink.inputs.base_directory = os.path.abspath(data_output_dir)
datasink.inputs.parameterization = False #added by Jason, so we don't need getstripdir

report = pe.Node(interface=nio.DataSink(infields=['subject_id','run']), name='report')
report.inputs.base_directory = os.path.abspath(report_dir)
report.inputs.parameterization = False


# store relevant outputs from various stages of the 1st level analysis
level1.connect([(infosource, datasink,[('subject_id','container')]),
                (l1pipeline, datasink,[('analysis.contrastestimate.con_images','contrasts.@con'),
                                       ('analysis.contrastestimate.spmT_images','contrasts.@T')]),
                (infosource, report,[('subject_id', 'container')]),
                (l1pipeline, report,[('analysis.slicestats.out_file', '@report')]),
                ])


I realize that the original script was capable of processing multiple runs, but I couldn't see how to get the correct timing info for each run without a "run" field from InfoSource. Also, it looks like the example script simply concatenates the runs together, which isn't exactly what I want. 

Any help you could offer would be great. 

Thanks!
Jason

Satrajit Ghosh

unread,
May 31, 2014, 8:31:18 AM5/31/14
to nipy-user
hi jason,

do all participants have the same number of runs? if so, you don't need runs as an iterable. you would just pick up all the runs in order using DataGrabber/SelectFiles/DataFinder.

re: datasink look into substitutions/regex_substitutions. these work by altering the output path and in general depending on the complexity of the substitution tree can generate any output configuration.

the openfmri example has some regex substitutions.

which example script are you building from? i'd like to see if it really concatenates or does a typical SPM style first level model (different in implementation from an FSL style first level model). In SPM first level+fixed effects are rolled into one, in FSL it's two stages. Both are theoretically identical.

cheers,

satra

--

---
You received this message because you are subscribed to the Google Groups "NiPy Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nipy-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason

unread,
Jun 1, 2014, 5:14:18 PM6/1/14
to nipy...@googlegroups.com
Hi Satra, 

Ok, that makes sense. Maybe it is doing the right thing, then. I’m used to FSL so I didn’t realize that the SPM method was equivalent to the first level + fixed effects. I’m building from the fmri_spm_dartel example here:


Regarding the timing information, is it safe to assume that nipype will always pull the runs in order, so if I simply have the subjectinfo function return the data for both runs in the same order, then everything should match up properly? 

I will see if I can use renaming or regex substitutions to solve the datasink issue. However, the root of the problem is about the hierarchy of the folders— parameterization creates too many subfolders, but turning parameterization off creates too few (i.e., it only saves a set of data from 1 run, instead of both). It would be nice if I could selectively specify folders to not be created, but maybe it’s just unavoidable.

Also, I forgot to mention that I found a couple potential mistakes in this example. First, starting on line 297:

def pickFieldFlow(dartel_flow_fields, subject_id):
    from nipype.utils.filemanip import split_filename
    for f in dartel_flow_fields:
        _, name, _ = split_filename(f)
        if name.find("subject_id_%s"%subject_id):
            return f

When I tried this on my own data, it kept choosing the first subject’s flow field, and it was because name.find would return -1 if it didn’t match. When I changed it to: 

   if name.find("subject_id_%s"%subject_id)>0:
            return f

it appeared to fix the problem. 

I also had an issue with normalizing and smoothing the functionals with DARTEL which required me to add the input: 

normalize_and_smooth_func.inputs.modulate = False

Otherwise, I had an issue where the normalized functionals were truncated. By default this is set to True, which is the preferred option for running VBM, setting it to False is the preferred option for fMRI (according to the SPM help when you fire up the DARTEL normalize to MNI module). 


Thanks!
Jason


You received this message because you are subscribed to a topic in the Google Groups "NiPy Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nipy-user/-fpWnQPNOW0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nipy-user+...@googlegroups.com.

Jason

unread,
Jun 3, 2014, 6:49:31 PM6/3/14
to nipy...@googlegroups.com
Hi, 

I have reverted back to the original way the example script was made, but simply added a static field, "runs" to infosource. That way, I can pass it to my subjectinfo function and always pull the timing information for both runs in the same order. I've pasted subjectinfo below-- you give it a subject number, a file name, and a list of runs ([2,3]) and it will subset the data accordingly, assuming the data is in "long" format, with a separate row for each unique event. 

The workflow runs without errors. However, I was checking the psyscript_level1design.m script to double-check how the model was set up, and I wanted to check what is being done with the timing. 

I see that both runs are concatenated, but there's a regressor "UR1" that specifies which volumes are from run 1. The timing information for the first run is accurate, but it looks like the timing information for the second run was transformed, and I wasn't sure how. The onsets have been shifted as if it were a continuation of a long run (i.e., the first onset of the second run, instead of being ~10 seconds is ~800). However, when I compare the timing between onsets, it's clear that they are not simply shifted in time-- the timing between events is sometimes similar, sometimes very different from the original. I'm pretty confident that the subjectinfo function is pulling out the correct timing information (it returns a list with 2 Bunches, 1 for each run).  

I've attached an excel sheet comparing the raw timing information (for 1 event) to what was produced in spm.level1design. It's possible it's doing what it's supposed to, but I just don't know what's happening to the timing after it grabs it from subjectinfo. 


infosource = pe.Node(interface=util.IdentityInterface(fields=['subject_id','runs']), name="infosource")
infosource.inputs.runs = [2,3]


def subjectinfo(subject_id,datafile,runs):
    import pandas as pd
    import os
    from nipype.interfaces.base import Bunch
    temp = pd.read_csv(os.path.expanduser(datafile),sep='\t')
    temp.convert_objects(convert_numeric=True).dtypes
    temp['name'] = temp['name'].astype(str)
    
    temp = temp[(temp.id==int(subject_id))]
    
    output = []

    for run_num in runs:

        subset = temp[(temp.run==run_num)]
    
        condnames=pd.unique(subset.name)
        
        onsets = []
        durations = []
        amplitudes = []
    
    
        for c in condnames:
            sub_subset = subset.loc[subset.name==c]
            
            onsets.append(sub_subset.onset.tolist())
            durations.append(sub_subset.duration.tolist())
            amplitudes.append(sub_subset.amplitude.tolist())
    
    
        run_output = Bunch(conditions = condnames,
                       onsets = onsets,
                       durations = durations,
                       amplitudes=amplitudes,
                       tmod=None,
                       pmod=None,
                       regressor_names=None,
                       regressors=None)

        output.append(run_output)

    return output

get_subject_info = pe.Node(name='get_subject_info',
               interface=Function(input_names=['subject_id','datafile','runs'],
                                  output_names=['subject_info'],
                                  function=subjectinfo))


data_file = pe.Node(interface=util.IdentityInterface(fields=['file']), name="data_file")
data_file.inputs.file = os.abspath('/path/to/timing/file.txt')


level1.connect([ (infosource, get_subject_info, [('subject_id','subject_id'),
                ('runs','runs')]),
                (data_file, get_subject_info, [('file','datafile')]),
                (get_subject_info,l1pipeline,[('subject_info', 'analysis.modelspec.subject_info')])
                ])


Thanks!
Jason
spm_dartel_timing_check.xlsx

Satrajit Ghosh

unread,
Jun 4, 2014, 1:08:53 AM6/4/14
to nipy-user
hi jason,

make sure concatenate_runs is not set to True - you can leave it unset.

if you can paste your entire script as a gist (gist.github.com), i can take a look later today.

cheers,

satra

Reply all
Reply to author
Forward
0 new messages