[derivatives] Changing file naming scheme

186 views
Skip to first unread message

Chris Gorgolewski

unread,
Jul 7, 2018, 11:04:49 PM7/7/18
to bids-discussion
Issue: All derivatives are named with the following scheme:

<source_file>_<key1>-<value1>_<key2>-<value2>_<suffix>.<extension>

where <source_file> is already in a form of  

<key1>-<value1>_<key2>-<value2>_<suffix>

Example:
sub-01_T1w_brainmask.nii,gz

This breaks with the overarching rule in the main specification of each filename consisting of key/value pairs followed by a single suffix. The suffix from the source file breaks this pattern.


Proposed solution: turn the suffix from the source_file to a key/value pair with key set to "srctype". For example"

sub-01_srctype-T1w_brainmask.nii,gz


Please let me know what do you think think about this proposal.

Best,
Chris

Tal Yarkoni

unread,
Jul 7, 2018, 11:27:10 PM7/7/18
to bids-di...@googlegroups.com

+1

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bids-discussion/CAAQzouOj-UiihgapSdHK0TseBR5XAJN0kpjf7vwCfDXawwKwNw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

James Kent

unread,
Jul 8, 2018, 12:15:20 AM7/8/18
to bids-discussion
Is there a proper definition of what a suffix is (or should be) in BIDS? I've implicitly used it to indicate the type of data represented in the file. 
Just to turn your proposal on it's head, what could the implication using something like drvtype instead of srctype, keep the suffixes the same as the source data files, and changing the type of derivative?

This configuration makes more sense to me in cases when the underlying file type remains the same, but has been cleaned or otherwise preprocessed. For example, the preproc output from fmriprep is still a "bold" file in my mind, but creating a statistical map from a source bold file changes what the file represents. 
In the former case (preproc) I would prefer drvtype, but for the latter case (statistical map) I would prefer srctype since the data the file is representing is different. 

If we unilaterally adopted either srctype or drvtype we would violate my implicit definition of a suffix.

sensible srctype:
sub-01_srctype-T1w_brainmask.nii.gz

less sensible srctype:
sub-01_srctype-bold_preproc.nii.gz


sensible drvtype:
sub-01_drvtype-preproc_bold.nii.gz

less sensible drvtype:
sub-01_drvtype-brainmask_T1w.nii.gz

Chris Gorgolewski

unread,
Jul 8, 2018, 12:25:43 AM7/8/18
to bids-discussion
Interesting idea. It's kind of arbitrary which way we go, but I have to say that srctype feels more intuitive to me. Probably because the suffix stands out more so in a folder full of derivatives coming from the same source it is easier to find (visually - programmatically it makes no difference) the derivative you are looking for.

Best,
Chris


--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

Henk-Jan Mutsaerts

unread,
Jul 8, 2018, 3:29:39 AM7/8/18
to bids-discussion
This makes sense, the only difference being that this key "srctype" may not be needed in many instances, where the value is self explanatory, leading to unnecessary long file names. Would it be an option to keep the key to 3 letters like "sub" - e.g. "src"? Keeping keys in filenames to 3 letters only would make their use in BIDS more feasible (thinking of future expansions).

BTW: I might be missing something, but why not keeping the source filetype if possible (eg anat in this example) and specifying the "derivative" part by "der" for derivative?

One great feature of the bids naming convention, is that when the directory structure is lost, it can be restored based on the file names. "der" would make it easy to differentiate between raw data and derivatives.

My 5cts, thanks again for the nice work!

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

Thomas Nichols

unread,
Jul 8, 2018, 7:17:35 AM7/8/18
to BIDS Discussion
+1 for just making the keyword just "src".  As long as we ensure "src" isn't used in any other (primary data) BIDS spec, then the presence of a "src" keyword would unambiguously flag the file as a derivative.


For more options, visit https://groups.google.com/d/optout.


--
__________________________________________________________
Thomas Nichols, PhD
Professor of Neuroimaging Statistics
Nuffield Department of Population Health | University of Oxford
Big Data Institute | Li Ka Shing Centre for Health Information and Discovery
Old Road Campus | Headington | Oxford | OX3 7LF | United Kingdom
T: +44 1865 743590 | E: thomas....@bdi.ox.ac.uk
W: http://nisox.org | http://www.bdi.ox.ac.uk

Harms, Michael

unread,
Jul 8, 2018, 8:09:58 AM7/8/18
to bids-di...@googlegroups.com

 

I’m not sure if I follow the proposal here.  Are you proposing

sub-01_srctype-T1w_brainmask_<key1>-<value1>_<key2>-<value2>_<suffix>.<extension>

 

If underscores separate <key-value> pairs, that still looks like you have two suffixes to me…

 

Cheers,

-MH

 

-- 

Michael Harms, Ph.D.

-----------------------------------------------------------

Associate Professor of Psychiatry

Washington University School of Medicine

Department of Psychiatry, Box 8134

660 South Euclid Ave.                        Tel: 314-747-6173

St. Louis, MO  63110                          Email: mha...@wustl.edu

 


The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

Chris Gorgolewski

unread,
Jul 8, 2018, 11:36:35 AM7/8/18
to bids-discussion, Henk-Jan Mutsaerts
@Henk-Jan Mutsaerts re: "der" this is analogous to what James Kent proposed - see my comments in my reply. BTW as Tom mentioned "src" can also allow to differentiate derivatives from raw data.

@Tom  I would have to check, but I don't think "src" is used in the main spec

@MH: This is not what I was proposing, because indeed it would lead to two "keyless values". The proposal was: <srckey1>-<srcvalue1>_<srckey2>-<srcvalue2>_srctype-<source_suffix>_<derkey1>-<dervalue1>_<derkey2>-<dervalue2>_<dersuffix>.<extension>

Best,
Chris


Franco Pestilli

unread,
Jul 8, 2018, 12:32:41 PM7/8/18
to bids-di...@googlegroups.com
This looks good to me.

Regards,
Franco


Mainak Jas

unread,
Jul 8, 2018, 11:46:37 PM7/8/18
to bids-discussion
Maybe silly question. But why not:

<srckey1>-<srcvalue1>_<srckey2>-<srcvalue2>_<derkey1>-<dervalue1>_<derkey2>-<dervalue2>_<source_suffix>.<extension>

Is there a specific reason source suffix and derivative suffix must be different?

Mainak

On Sun, Jul 8, 2018 at 12:32 PM, Franco Pestilli <frakk...@gmail.com> wrote:
This looks good to me.

Regards,
Franco
On Jul 7, 2018, at 11:04 PM, Chris Gorgolewski <krzysztof.gorgolewski@gmail.com> wrote:

Issue: All derivatives are named with the following scheme:

<source_file>_<key1>-<value1>_<key2>-<value2>_<suffix>.<extension>

where <source_file> is already in a form of  

<key1>-<value1>_<key2>-<value2>_<suffix>

Example:
sub-01_T1w_brainmask.nii,gz

This breaks with the overarching rule in the main specification of each filename consisting of key/value pairs followed by a single suffix. The suffix from the source file breaks this pattern.


Proposed solution: turn the suffix from the source_file to a key/value pair with key set to "srctype". For example"

sub-01_srctype-T1w_brainmask.nii,gz


Please let me know what do you think think about this proposal.

Best,
Chris

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bids-discussion/F00CF6EA-AACC-4BE5-94ED-C29F1B4ECFF8%40gmail.com.

James Kent

unread,
Jul 9, 2018, 12:32:21 AM7/9/18
to bids-discussion
Hi Chris,

I agree with your assessment that the suffix would stand out more visually with the srctype (or just src, I like that abbreviation) key. I cannot speculate as to whether drvtype versus srctype would lead to a higher frequency of sensible violations (given my implicit definition of the BIDS suffix), so I cannot rank the options based on that metric. So in the absence of some concise/programmatic rule that can accommodate the scenarios where the file type changes (from bold to a statistical map) or remains the same (from raw bold to processed bold), I vote for your original proposal of srctype (specifically the shortened key src).

My remaining question (still pertaining to this proposal) is how to handle 2nd order derivatives or more generally nth order derivatives. The output from fmriprep is still an intermediate step and further specification of a model is needed to achieve statistical maps or whatever the final output should be.

The two options I can see are to carry the src-<label> from the first order derivative, or to adopt the suffix of the first order derivative.

Or, if we want to go key crazy, we can add another optional key that represents the pipeline the output was generated from and src would still adopt the suffix of the first order derivative (or the n-1th order derivative). To complete the option space, I added a fourth option.

1. Keep the src-<label> from the first order derivative
(this example is pulling data from the fmriprep directory)
sub-01_task-rest_src-bold_zstat.nii.gz

2. Adopt the suffix of the first order derivative
(this example is pulling data from the fmriprep directory)
sub-01_task-rest_src-preproc_zstat.nii.gz

3. Key crazy option (preferred way)
sub-01_task-rest_pipe-fmriprep_src-preproc_zstat.nii.gz

4. Key crazy option (less preferred way)
sub-01_task-rest_pipe-fmriprep_src-bold_zstat.nii.gz

With the third option, while it wouldn't be immediately obvious that the zstat image was derived from bold data, you are given enough information to track the provenance or the file back to its roots, and this would work with derivatives to the nth degree. With the fourth option, you would always know the srctype from the BIDS dataset, but you wouldn't necessarily know what derivative you used in the previous pipeline (e.g. did I use preproc or confounds?). Context may make it obvious, but that may not be the case for every derivatives pipeline.

Best,
James

Michael Hanke

unread,
Jul 9, 2018, 1:36:05 AM7/9/18
to The Brain Imaging Data Structure (BIDS) discussion
+1

On Sun, Jul 8, 2018, 05:04 Chris Gorgolewski <krzysztof....@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

Henk-Jan Mutsaerts

unread,
Jul 9, 2018, 2:41:22 AM7/9/18
to bids-discussion
I would keep the complete provenance details in the sidecar, (was this already proposed?) and keep the file name short, for visual purposes, hence only the most recent processing step in the filename.

However, we could define a few single letter prefixes that come before the last suffix, for default processing steps, eg:

s = smoothed
r = resampled

But again the exact details of smoothing/resampling go into the sidecar provenance field.

Would this work?

yarikoptic

unread,
Jul 9, 2018, 10:19:19 AM7/9/18
to bids-discussion
Having in mind my somewhat possibly unconventional philosophy that "BIDS dataset is a derivative dataset" (i.e. derived from DICOMS using specific conversion pipeline or manually by a human) and that a derivative dataset could be derived from an already derivative dataset, I would vote to replace `_suffix` with an explicit `_key=value` pair which could be 'inherited' into the derivative dataset (and its derivatives) filename as is/without any mutations. That is why IMHO it shouldn't contain `src` in the key name.  So why not just `type`, i.e. `_type-bold`, `_type-T1w`?   I do not think it is used in the BIDS spec yet.

Robert Oostenveld

unread,
Jul 9, 2018, 10:34:00 AM7/9/18
to bids-di...@googlegroups.com
This principle would work in cases where one file has multiple results (branching), but not when multiple files have a single result (merging). E.g. multimodal parcellations, or mapping MEG activity onto anatomy using source estimation techniques. Dertivatives should allow for the merging-pattern, so I wonder whether the full filename(s) of the source file(s) should go in the derived file name, or whether they should go in the sidecar metadata file (as a list in the json).

I do like the consistent key-value pairs in the filenames. 

Robert


Vince Calhoun

unread,
Jul 9, 2018, 10:36:34 AM7/9/18
to bids-di...@googlegroups.com

Will this handle multiple sources/types?  E.g. if we do an analysis using multiple derivatives from multiple modalities?

 

VDC

 

--

You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

yarikoptic

unread,
Jul 9, 2018, 10:37:55 AM7/9/18
to bids-discussion
> I would keep the complete provenance details in the sidecar, (was this already proposed?) and keep the file name short, for visual purposes, hence only the most recent processing step in the filename

I would like to see that too -- file name lengths shouldn't grow indefinitely.  But  I guess it should be a separate topic of discussion on how we could "shortcut" or "alias" them.  I've started a new thread for that

yarikoptic

unread,
Jul 9, 2018, 10:43:53 AM7/9/18
to bids-discussion


On Monday, July 9, 2018 at 10:36:34 AM UTC-4, vcalhoun wrote:

Will this handle multiple sources/types?  E.g. if we do an analysis using multiple derivatives from multiple modalities?


Any in-congruent "key" (or suffix for that matter) probably should be dropped anyways in such cases where there is no clear original "type" since coming from multiple types/modalities/etc.  ATM BIDS derivatives spec seems to rely heavily on having a <source_file> prefix assuming "a single major source file".  May be we should reserve in the sidecar "source_files" field to point to multiple files whenever relevant? But IMHO that is also an orthogonal to the original topic here (_suffix -> _key=value) and if not yet discussed/settled upon - we should start a new thread

Thomas Nichols

unread,
Jul 9, 2018, 10:50:36 AM7/9/18
to BIDS Discussion
What about appending multiple key-less values at the end? 

<srckey1>-<srcvalue1>_<srckey2>-<srcvalue2>_<derkey1>-<dervalue1>_<derkey2>-<dervalue2>_<source_suffix>_<der_suffix>.<extension>

It breaks the current syntax but it is well defined and reads well, e.g.:

sub-01_T1w_brainmask.nii,gz 

and better, IMHO, than sub-01_srctype-T1w_brainmask.nii,gz 

Also, it scales to multiple successive derivatives.

-Tom 


--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Henk-Jan Mutsaerts

unread,
Jul 9, 2018, 11:31:33 AM7/9/18
to bids-discussion
+1

Best wishes/hartelijke groet,

 

 

Henk(-Jan) Mutsaerts, MD PhD

VUmc Amsterdam/AMC Amsterdam/UMCU Utrecht
UMCG Groningen/
Sunnybrook Toronto/RIT NY

Phone: +31 6 4390 8284; Skype: hj.mutsaerts



eric...@gmail.com

unread,
Jul 9, 2018, 11:31:49 AM7/9/18
to bids-discussion
Hi All,

I'm late to the party, but I wanted to give folks a heads up that we will be making a tool in our lab to do filename mapping from the as-processed name to a derivative name as specified by the user in a JSON "mapping" file.  I don't think this changes the discussion, but it's more of a heads up that it will be easier to go from "what you have" to "what BIDS requires" in the next month or two.  I'll email the list on a new thread when we have a prototype on GitHub.

Inspired by an old BIDS Extension Proposal (BEP015):

~Eric

JB Poline

unread,
Jul 9, 2018, 11:59:19 AM7/9/18
to The Brain Imaging Data Structure (BIDS) discussion
That sounds a good idea - I guess we need to see how that would impact things in pybids and other packages that use the key-value pattern. The alternative of having a sidecar json file is interesting but wouldnt we need unique strings ? that may also be a little bit harder to deal with for applications
cheers
JB

On Mon, Jul 9, 2018 at 11:31 AM, Henk-Jan Mutsaerts <henkjanm...@gmail.com> wrote:
+1

Best wishes/hartelijke groet,

 

 

Henk(-Jan) Mutsaerts, MD PhD

VUmc Amsterdam/AMC Amsterdam/UMCU Utrecht
UMCG Groningen/
Sunnybrook Toronto/RIT NY

Phone: +31 6 4390 8284; Skype: hj.mutsaerts


On Mon, 9 Jul 2018 at 16:50, Thomas Nichols <thomas....@bdi.ox.ac.uk> wrote:
What about appending multiple key-less values at the end? 

<srckey1>-<srcvalue1>_<srckey2>-<srcvalue2>_<derkey1>-<dervalue1>_<derkey2>-<dervalue2>_<source_suffix>_<der_suffix>.<extension>

It breaks the current syntax but it is well defined and reads well, e.g.:

sub-01_T1w_brainmask.nii,gz 

and better, IMHO, than sub-01_srctype-T1w_brainmask.nii,gz 

Also, it scales to multiple successive derivatives.

-Tom 


On Mon, Jul 9, 2018 at 3:43 PM yarikoptic <yarik...@gmail.com> wrote:


On Monday, July 9, 2018 at 10:36:34 AM UTC-4, vcalhoun wrote:

Will this handle multiple sources/types?  E.g. if we do an analysis using multiple derivatives from multiple modalities?


Any in-congruent "key" (or suffix for that matter) probably should be dropped anyways in such cases where there is no clear original "type" since coming from multiple types/modalities/etc.  ATM BIDS derivatives spec seems to rely heavily on having a <source_file> prefix assuming "a single major source file".  May be we should reserve in the sidecar "source_files" field to point to multiple files whenever relevant? But IMHO that is also an orthogonal to the original topic here (_suffix -> _key=value) and if not yet discussed/settled upon - we should start a new thread

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.


--
__________________________________________________________
Thomas Nichols, PhD
Professor of Neuroimaging Statistics
Nuffield Department of Population Health | University of Oxford
Big Data Institute | Li Ka Shing Centre for Health Information and Discovery
Old Road Campus | Headington | Oxford | OX3 7LF | United Kingdom
T: +44 1865 743590 | E: thomas....@bdi.ox.ac.uk
W: http://nisox.org | http://www.bdi.ox.ac.uk

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bids-discussion/CAPO0dgorGR1GwqvpD%3D8BxkP5K2fdv6JF4J44crvvT_%2BOUvj_wQ%40mail.gmail.com.

Harms, Michael

unread,
Jul 9, 2018, 12:00:15 PM7/9/18
to bids-di...@googlegroups.com

 

+1

IMO, attempting to (effectively) encode provenance info as part of the file name is going to lead to some very long file names…

 

-- 

Michael Harms, Ph.D.

-----------------------------------------------------------

Associate Professor of Psychiatry

Washington University School of Medicine

Department of Psychiatry, Box 8134

660 South Euclid Ave.                        Tel: 314-747-6173

St. Louis, MO  63110                          Email: mha...@wustl.edu

From: <bids-di...@googlegroups.com> on behalf of yarikoptic <yarik...@gmail.com>


Reply-To: "bids-di...@googlegroups.com" <bids-di...@googlegroups.com>
Date: Monday, July 9, 2018 at 10:38 AM
To: bids-discussion <bids-di...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to


To post to this group, send email to

Tal Yarkoni

unread,
Jul 9, 2018, 12:20:21 PM7/9/18
to bids-di...@googlegroups.com

One approach to this issue that seems reasonable to me is to say that any key-value pairs that apply to the file as it currently exists should be encoded in the filename, and any key-value pairs (or any other metadata) that applied to one or more source files used to generate the current derivative file should go in the JSON file as metadata.

Problem: handling it this way would seem to demand that a modality like "bold" or "t1w" should go in the filename, not in the sidecar, because it's pretty important to know that one is dealing with BOLD data, even if it also happens to be, say, "preproc". Personally I'd be pretty apprehensive about relegating "bold" to a JSON sidecar, as (a) it means one can no longer tell at a glance what kind of file one is looking at (because there's no guarantee a derivative file will have a comprehensible modality value), and (b) it means any BIDS-compliant software will necessarily have to inspect the sidecar on every access.

Suggestion: Inelegant as it may seem, there doesn't seem to be anything terribly wrong syntactically with allowing multiple values for modality just before the suffix. And as Tom pointed out above, it reads fairly well. It's also not clear to me that it's actually necessary to distinguish source modality from derivative modality in many/most cases. Consider the case of a preprocessed BOLD image: it would make sense to call this "...run-1_preproc_bold.nii.gz". Provided both modalities apply to the current file, one does not need to determine whether the bold part has been around since some predecessor file, or was newly introduced. BUT, in a case where the predecessor file *was* a BOLD image, but the present file is not, then "bold" would (optionally) go in the JSON sidecar. (We could also stipulate that new modalities must be inserted *before* any current ones, giving at least an implicit sense of the history of the file).

Tal


To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.


--
__________________________________________________________
Thomas Nichols, PhD
Professor of Neuroimaging Statistics
Nuffield Department of Population Health | University of Oxford
Big Data Institute | Li Ka Shing Centre for Health Information and Discovery
Old Road Campus | Headington | Oxford | OX3 7LF | United Kingdom
T: +44 1865 743590 | E: thomas....@bdi.ox.ac.uk
W: http://nisox.org | http://www.bdi.ox.ac.uk

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bids-discussion/CALMaa8316uJ72qC7BUQ8PWjj1tN0mqJNtPbdY4CnYj9Rg74WCw%40mail.gmail.com.

Harms, Michael

unread,
Jul 9, 2018, 12:30:37 PM7/9/18
to bids-di...@googlegroups.com

 

Is there a firm definition of what constitutes a “modality” vs a “suffix”?  You seem to be viewing them as distinct, but in the previous email chain “T1w” (for example) has been treated as a “suffix”…

 

-- 

Michael Harms, Ph.D.

-----------------------------------------------------------

Associate Professor of Psychiatry

Washington University School of Medicine

Department of Psychiatry, Box 8134

660 South Euclid Ave.                        Tel: 314-747-6173

St. Louis, MO  63110                          Email: mha...@wustl.edu

From: <bids-di...@googlegroups.com> on behalf of Tal Yarkoni <tyar...@gmail.com>
Reply-To: "bids-di...@googlegroups.com" <bids-di...@googlegroups.com>
Date: Monday, July 9, 2018 at 12:20 PM
To: "bids-di...@googlegroups.com" <bids-di...@googlegroups.com>
Subject: Re: [bids-discussion] Re: [derivatives] Changing file naming scheme

 

 

One approach to this issue that seems reasonable to me is to say that any key-value pairs that apply to the file as it currently exists should be encoded in the filename, and any key-value pairs (or any other metadata) that applied to one or more source files used to generate the current derivative file should go in the JSON file as metadata.

Tal Yarkoni

unread,
Jul 9, 2018, 12:34:38 PM7/9/18
to bids-di...@googlegroups.com

Section 8.3.2 of the spec defines valid modality labels for anatomical images; that's what I'm basing my usage on. But I do think we probably need to introduce an explicit "modality" definition at the beginning alongside all the other terms.

Harms, Michael

unread,
Jul 9, 2018, 1:07:40 PM7/9/18
to bids-di...@googlegroups.com

 

Yes, and perhaps more explicitly distinguish between “modality” and “suffix” in the documentation.  Currently, “suffix” is used interchangeably to refer to both modality labels, but also to optional key/label pairs (e.g., “_run-1”) and things like “_physio” and “_stim” additions.

 

There is also the odd allowance (IMO) for anatomy data to end with a “_defacemask” suffix (although not explicitly called a suffix), in which case the modality can be *optionally* specified with a “mod-<label>” key/value pair…

 

Cheers,

-MH

Thomas Nichols

unread,
Jul 9, 2018, 4:00:05 PM7/9/18
to BIDS Discussion
Consider the case of a preprocessed BOLD image: it would make sense to call this "...run-1_preproc_bold.nii.gz". Provided both modalities apply to the current file, one does not need to determine whether the bold part has been around since some predecessor file, or was newly introduced. BUT, in a case where the predecessor file *was* a BOLD image, but the present file is not, then "bold" would (optionally) go in the JSON sidecar. (We could also stipulate that new modalities must be inserted *before* any current ones, giving at least an implicit sense of the history of the file).

I was just going to second this, but I'm now struggling to come up with examples of 'bright lines' when you would vs. would not want to list the predecessor *suffix*.

Maybe we need to clarify that while all modalities are suffixes, not all suffixes are modalities; predecessor modalities need to be preserved while non-modalities don't need to be?

Also, and this is fun, note this from the spec, Sec 8.3.2:

When there is only one scan of a given type the suffix MAY be omitted.

So, when we're all thinking about use cases, are we thinking about the cases an image from an anat or func or dwi directory has *no* suffix?  (Is this something we want to revise, and make modality suffixes mandatory?)

-Tom
 

Tal Yarkoni

unread,
Jul 9, 2018, 4:14:51 PM7/9/18
to bids-di...@googlegroups.com

On my reading of the relevant section, I think this is a misuse of the term "suffix", as the example is clearly referring to keywords like "run". It's basically saying that you don't need to explicitly add a run index if you only have a single run. That seems reasonable. I definitely don't think it's reasonable to make the suffix optional, but I think this is just a matter of fixing the language in the spec, not of changing the intended meaning.

With respect to the more general issue, my suggestion is that we always propagate *all* suffixes that remain applicable to the current file. I haven't thought about this at length, but all the cases I can think of seem pretty cut and dried. E.g., if the derivative file still has the modality "bold", "T1w", "physio", and so on, then you would keep that suffix. If the modality changes, you would drop it. It doesn't seem like there's much ambiguity. I can see there being some uncertainty down the line in the case of multiple chained derivatives; e.g., if you do some further processing of a "*_preproc_bold.nii.gz" image, you would definitely need to keep the "bold" suffix, but would you keep the "preproc" modality as well? I'm not sure. But I kind of feel like weird things are going to happen once we start talking about second-order derivatives anyway, so I'm not sure how much of a concern this is. We would have similar problems even if we didn't adopt the convention I'm proposing (e.g., how do we encode the various steps that constitute the full provenance of a file in the JSON sidecar?).

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

Harms, Michael

unread,
Jul 9, 2018, 4:35:42 PM7/9/18
to bids-di...@googlegroups.com

 

I think it will be important that any proposal deal from the get-go with the issue of chained derivatives.

 

Along those lines, given the infinite number of ways to “preprocess” data, and chain together different preprocessing steps, is it meaningful to have a catch-all “suffix” that is simply “_preproc” ?

 

-- 

Michael Harms, Ph.D.

-----------------------------------------------------------

Associate Professor of Psychiatry

Washington University School of Medicine

Department of Psychiatry, Box 8134

660 South Euclid Ave.                        Tel: 314-747-6173

St. Louis, MO  63110                          Email: mha...@wustl.edu

From: <bids-di...@googlegroups.com> on behalf of Tal Yarkoni <tyar...@gmail.com>
Reply-To: "bids-di...@googlegroups.com" <bids-di...@googlegroups.com>
Date: Monday, July 9, 2018 at 3:14 PM
To: "bids-di...@googlegroups.com" <bids-di...@googlegroups.com>
Subject: Re: [bids-discussion] Re: [derivatives] Changing file naming scheme

 

 

On my reading of the relevant section, I think this is a misuse of the term "suffix", as the example is clearly referring to keywords like "run". It's basically saying that you don't need to explicitly add a run index if you only have a single run. That seems reasonable. I definitely don't think it's reasonable to make the suffix optional, but I think this is just a matter of fixing the language in the spec, not of changing the intended meaning.

Thomas Nichols

unread,
Jul 9, 2018, 4:37:26 PM7/9/18
to BIDS Discussion
On my reading of the relevant section, I think this is a misuse of the term "suffix", as the example is clearly referring to keywords like "run". It's basically saying that you don't need to explicitly add a run index if you only have a single run. That seems reasonable. I definitely don't think it's reasonable to make the suffix optional, but I think this is just a matter of fixing the language in the spec, not of changing the intended meaning.

Phew!  I've just proposed this change in the 1.1.1 draft:

If several scans of the same modality are  acquired they MUST be indexed with a key-value pairsuffix: _run-1, _run-2, _run-3 etc. (only integers are allowed as run labels). When there is only one scan of a given type the run keysuffix MAY be omitted.
 
As a minor thing, I used "run key-value pair", but I see the "value" is always called "label".  It's a minor point, but I don't know if there can be any improved clarity between what's a "value" in a key-value pair vs. a label.

With respect to the more general issue, my suggestion is that we always propagate *all* suffixes that remain applicable to the current file. I haven't thought about this at length, but all the cases I can think of seem pretty cut and dried. E.g., if the derivative file still has the modality "bold", "T1w", "physio", and so on, then you would keep that suffix. If the modality changes, you would drop it. It doesn't seem like there's much ambiguity. I can see there being some uncertainty down the line in the case of multiple chained derivatives; e.g., if you do some further processing of a "*_preproc_bold.nii.gz" image, you would definitely need to keep the "bold" suffix, but would you keep the "preproc" modality as well? I'm not sure. But I kind of feel like weird things are going to happen once we start talking about second-order derivatives anyway, so I'm not sure how much of a concern this is. We would have similar problems even if we didn't adopt the convention I'm proposing (e.g., how do we encode the various steps that constitute the full provenance of a file in the JSON sidecar?).

So, just to be clear, you're proposing to keep the 'modality' suffix at the end of the name, adding on derived "pre-suffixes" before.  This makes sense to me, as if an image is mainly "bold" then that's the first thing you'll want to see at the end.

-Tom
 

Harms, Michael

unread,
Jul 9, 2018, 4:46:45 PM7/9/18
to bids-di...@googlegroups.com

 

Not sure if I’m a fan of reading backwards from the end of the file name to imply the history of the processing, esp. in the context of chained processing…

 

-- 

Michael Harms, Ph.D.

-----------------------------------------------------------

Associate Professor of Psychiatry

Washington University School of Medicine

Department of Psychiatry, Box 8134

660 South Euclid Ave.                        Tel: 314-747-6173

St. Louis, MO  63110                          Email: mha...@wustl.edu

From: <bids-di...@googlegroups.com> on behalf of Thomas Nichols <thomas....@bdi.ox.ac.uk>
Reply-To: "bids-di...@googlegroups.com" <bids-di...@googlegroups.com>
Date: Monday, July 9, 2018 at 3:37 PM
To: BIDS Discussion <bids-di...@googlegroups.com>
Subject: Re: [bids-discussion] Re: [derivatives] Changing file naming scheme

 

--

You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to


To post to this group, send email to

Thomas Close

unread,
Jul 9, 2018, 8:27:39 PM7/9/18
to bids-di...@googlegroups.com
Apologies if I missed mention of this earlier in the thread as I have just done a brief scan, but wouldn't you also want to have derivatives that are derived from more than one source image (e.g. freesurfer that uses T2w for peel surface)?

Although in this case having the source image(s) as kw pairs would definitely work better.

On 10 July 2018 at 06:46, Harms, Michael <mha...@wustl.edu> wrote:

 

Not sure if I’m a fan of reading backwards from the end of the file name to imply the history of the processing, esp. in the context of chained processing…

 

-- 

Michael Harms, Ph.D.

-----------------------------------------------------------

Associate Professor of Psychiatry

Washington University School of Medicine

Department of Psychiatry, Box 8134

660 South Euclid Ave.                        Tel: 314-747-6173

St. Louis, MO  63110                          Email: mha...@wustl.edu

From: <bids-discussion@googlegroups.com> on behalf of Thomas Nichols <thomas....@bdi.ox.ac.uk>
Reply-To: "bids-discussion@googlegroups.com" <bids-discussion@googlegroups.com>
Date: Monday, July 9, 2018 at 3:37 PM
To: BIDS Discussion <bids-discussion@googlegroups.com>
Subject: Re: [bids-discussion] Re: [derivatives] Changing file naming scheme

 

If several scans of the same modality are  acquired they MUST be indexed with a key-value pairsuffix: _run-1, _run-2, _run-3 etc. (only integers are allowed as run labels). When there is only one scan of a given type the run keysuffix MAY be omitted.

 

As a minor thing, I used "run key-value pair", but I see the "value" is always called "label".  It's a minor point, but I don't know if there can be any improved clarity between what's a "value" in a key-value pair vs. a label.

 

With respect to the more general issue, my suggestion is that we always propagate *all* suffixes that remain applicable to the current file. I haven't thought about this at length, but all the cases I can think of seem pretty cut and dried. E.g., if the derivative file still has the modality "bold", "T1w", "physio", and so on, then you would keep that suffix. If the modality changes, you would drop it. It doesn't seem like there's much ambiguity. I can see there being some uncertainty down the line in the case of multiple chained derivatives; e.g., if you do some further processing of a "*_preproc_bold.nii.gz" image, you would definitely need to keep the "bold" suffix, but would you keep the "preproc" modality as well? I'm not sure. But I kind of feel like weird things are going to happen once we start talking about second-order derivatives anyway, so I'm not sure how much of a concern this is. We would have similar problems even if we didn't adopt the convention I'm proposing (e.g., how do we encode the various steps that constitute the full provenance of a file in the JSON sidecar?).

 

So, just to be clear, you're proposing to keep the 'modality' suffix at the end of the name, adding on derived "pre-suffixes" before.  This makes sense to me, as if an image is mainly "bold" then that's the first thing you'll want to see at the end.

 

-Tom

 

__________________________________________________________

Thomas Nichols, PhD

Professor of Neuroimaging Statistics

Nuffield Department of Population Health | University of Oxford
Big Data Institute | Li Ka Shing Centre for Health Information and Discovery
Old Road Campus | Headington | Oxford | OX3 7LF | United Kingdom
T: +44 1865 743590 | E:
thomas....@bdi.ox.ac.uk
W:
http://nisox.org | http://www.bdi.ox.ac.uk

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to
bids-discussion@googlegroups.com.


To view this discussion on the web visit

 


The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bids-discussion/2485C47C-EA66-471E-A666-EA3C6D2CBC65%40wustl.edu.

For more options, visit https://groups.google.com/d/optout.



--
THOMAS G. CLOSE, PHD
Senior Informatics Officer

Monash Biomedical Imaging
Monash University
Room 139, 770 Blackburn Rd
Clayton Campus, Clayton VIC 3800
Australia

Chris Gorgolewski

unread,
Jul 15, 2018, 11:23:06 PM7/15/18
to bids-discussion
What a monster of a thread! This is great - it means that we have an engaged community with lots of opinions. Now we need to harness this energy in a constructive and structured way.

Some clarifications.
- Current draft of the BIDS Derivatives specification does not explicitly deal with files that have more than one main origin (however, @Thomas Close it does have specific provisions for freesurfer outputs)
- Current draft of the BIDS Derivatives specification does not explicitly deal with chaining derivatives or second order derivatives

The proposal I outlined in this thread was not intended to introduce those missing features - it was simply attempting to change filename syntax for existing use cases (single source primary derivatives). Those problems (missing features) are very real though. We need to be smart about tackling them though - I would encourage people interested in those usecases (chaining derivatives and multi source derivatives) to start new threads with concrete suggestions on how to change the current draft to tackle those use cases. This will allow us to have a focused conversation and tackle issues one at a time.

Now back to the topic. Summarizing, in current spec we had:

sub-01_T1w_brainmask.nii.gz
sub-01_task-rest_bold_variant-smoothed_preproc.nii.gz

which breaks with the convention of having only one keyless value - the suffix. Existing proposals:

1)  "srctype" key
sub-01_srctype-T1w_brainmask.nii.gz
sub-01_task-rest_srctype-bold_variant-smoothed_preproc.nii.gz

2) "src" key
sub-01_src-T1w_brainmask.nii.gz
sub-01_task-rest_src-bold_variant-smoothed_preproc.nii.gz 

3) "drvtype" key - flip thing around
sub-01_drvtype-brainmask_T1w.nii.gz
sub-01_task-rest_drvtype-preproc _variant-smoothed_bold.nii.gz 

4) remove the type and put it inside the JSON file
sub-01_brainmask.nii.gz
sub-01_brainmask.json
sub-01_task-rest _variant-smoothed_preproc.nii.gz 
sub-01_task-rest _variant-smoothed_preproc.json 

5) keep all keyless values at the end
sub-01_T1w_brainmask.nii.gz
sub-01_task-rest_variant-smoothed_bold_preproc.nii.gz 

I personally like 2) the most since: 
- it is shorter than 1)
- makes finding (visually) the right derivative easier than 3)
- does not require reading sidecar JSON files like 4)
- keeps compatibility with the previous convention (in contrast to 5)

What do others think?

Best,
Chris


On Mon, Jul 9, 2018 at 5:27 PM 'Thomas Close' via bids-discussion <bids-di...@googlegroups.com> wrote:
Apologies if I missed mention of this earlier in the thread as I have just done a brief scan, but wouldn't you also want to have derivatives that are derived from more than one source image (e.g. freesurfer that uses T2w for peel surface)?

Although in this case having the source image(s) as kw pairs would definitely work better.
On 10 July 2018 at 06:46, Harms, Michael <mha...@wustl.edu> wrote:

 

Not sure if I’m a fan of reading backwards from the end of the file name to imply the history of the processing, esp. in the context of chained processing…

 

-- 

Michael Harms, Ph.D.

-----------------------------------------------------------

Associate Professor of Psychiatry

Washington University School of Medicine

Department of Psychiatry, Box 8134

660 South Euclid Ave.                        Tel: 314-747-6173

St. Louis, MO  63110                          Email: mha...@wustl.edu

From: <bids-di...@googlegroups.com> on behalf of Thomas Nichols <thomas....@bdi.ox.ac.uk>
Reply-To: "bids-di...@googlegroups.com" <bids-di...@googlegroups.com>
Date: Monday, July 9, 2018 at 3:37 PM
To: BIDS Discussion <bids-di...@googlegroups.com>
Subject: Re: [bids-discussion] Re: [derivatives] Changing file naming scheme

 

If several scans of the same modality are  acquired they MUST be indexed with a key-value pairsuffix: _run-1, _run-2, _run-3 etc. (only integers are allowed as run labels). When there is only one scan of a given type the run keysuffix MAY be omitted.

 

As a minor thing, I used "run key-value pair", but I see the "value" is always called "label".  It's a minor point, but I don't know if there can be any improved clarity between what's a "value" in a key-value pair vs. a label.

 

With respect to the more general issue, my suggestion is that we always propagate *all* suffixes that remain applicable to the current file. I haven't thought about this at length, but all the cases I can think of seem pretty cut and dried. E.g., if the derivative file still has the modality "bold", "T1w", "physio", and so on, then you would keep that suffix. If the modality changes, you would drop it. It doesn't seem like there's much ambiguity. I can see there being some uncertainty down the line in the case of multiple chained derivatives; e.g., if you do some further processing of a "*_preproc_bold.nii.gz" image, you would definitely need to keep the "bold" suffix, but would you keep the "preproc" modality as well? I'm not sure. But I kind of feel like weird things are going to happen once we start talking about second-order derivatives anyway, so I'm not sure how much of a concern this is. We would have similar problems even if we didn't adopt the convention I'm proposing (e.g., how do we encode the various steps that constitute the full provenance of a file in the JSON sidecar?).

 

So, just to be clear, you're proposing to keep the 'modality' suffix at the end of the name, adding on derived "pre-suffixes" before.  This makes sense to me, as if an image is mainly "bold" then that's the first thing you'll want to see at the end.

 

-Tom

 

__________________________________________________________

Thomas Nichols, PhD

Professor of Neuroimaging Statistics

Nuffield Department of Population Health | University of Oxford
Big Data Institute | Li Ka Shing Centre for Health Information and Discovery
Old Road Campus | Headington | Oxford | OX3 7LF | United Kingdom
T: +44 1865 743590 | E:
thomas....@bdi.ox.ac.uk
W:
http://nisox.org | http://www.bdi.ox.ac.uk

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to
bids-di...@googlegroups.com.


To view this discussion on the web visit

 


The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.



--
THOMAS G. CLOSE, PHD
Senior Informatics Officer

Monash Biomedical Imaging
Monash University
Room 139, 770 Blackburn Rd
Clayton Campus, Clayton VIC 3800
Australia

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bids-discussion/CADTb%3DgaBo0t2yOcs37gRUvAPrS3ki0TeChVCMHveXMrpH4okCw%40mail.gmail.com.

Thomas Nichols

unread,
Jul 16, 2018, 2:02:35 AM7/16/18
to BIDS Discussion
Hi folks,

I can’t say any of these proposals excite me, but I guess 4 and 5 seem the most elegant.  #4 keeps the file names shorter and, usually, the context will make it clear what the source is but can be checked in the JSON.  #5 offers the next shortest option and I find it very readable (but, of course, breaks current convention).

-Tom

PS: I assume this ship has sailed, but have we committed to 7 whole letters to spell out “variant”?  Seems excessive.

--
We are all colleagues working together to shape brain imaging for tomorrow, please be respectful, gracious, and patient with your fellow group members.
---
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Chris Gorgolewski

unread,
Jul 16, 2018, 10:42:24 AM7/16/18
to bids-discussion
On Sun, Jul 15, 2018, 11:02 PM Thomas Nichols <thomas....@bdi.ox.ac.uk> wrote:
PS: I assume this ship has sailed, but have we committed to 7 whole letters to spell out “variant”?  Seems excessive.
Nothing is set in stone about BIDS Derivatives, but let's stay focused. If you would like to change something about the variant key please start a new thread with a proposal.


Chris Markiewicz

unread,
Jul 31, 2018, 8:46:54 AM7/31/18
to bids-discussion
Hi all,

My preference is #5, though it strikes me as equivalent to the status quo up to aesthetics. I don't see a parsing issue with accumulating a list of every keyless value, rather than insisting that all key-value pairs precede all keyless values. If this is the direction we go, I would suggest that we implement the parser so that datasets preprocessed under the existing draft spec can still be handled well.

I think #5 might be the closest we can get to correctly handling converging inputs in filenames: an unordered list of input types followed by the result type.

With regard to #4, we are getting close to forcing an ad hoc provenance spec. I vaguely recall that last time this discussion came up it was thought that starting to encode provenance was opening a can of worms, and that we should wait on other, non-BIDS efforts. Am I misremembering, or are we at the point where we really can't push it off any further?

Of the remaining options, #2 is preferable to #1 and #3.

Chris

From: <bids-di...@googlegroups.com> on behalf of Thomas Nichols <thomas....@bdi.ox.ac.uk>


Reply-To: "bids-di...@googlegroups.com" <bids-di...@googlegroups.com>
Date: Monday, July 9, 2018 at 3:37 PM
To: BIDS Discussion <bids-di...@googlegroups.com>
Subject: Re: [bids-discussion] Re: [derivatives] Changing file naming scheme

Satrajit Ghosh

unread,
Jul 31, 2018, 9:03:13 AM7/31/18
to bids-di...@googlegroups.com
hi chris,

for the current discussion i like 5 as well.

regarding the provenance spec, i think we will be having a conversation soon about the intersection/union of nidm and bids. some of it will happen at the upcoming neuroinformatics conference and we will post back to the list. and then we will have a chance to do something during the derivatives meeting

my worry about ad hoc approaches to provenance is that they have never really helped beyond very limited use cases. nidm (nidm.nidash.org) is a more comprehensive effort but will require user-friendly tools to go along with the spec. principally, we really should be thinking of provenance in a side-car rather than in the naming convention. 

well there was this - my first ever project in grad school was very much along these lines - we called it "final_figure4.tif":


cheers,

satra


--
We are all colleagues working together to shape brain imaging for tomorrow, please be respectful, gracious, and patient with your fellow group members.
---
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

Auer, Tibor

unread,
Jul 31, 2018, 9:34:25 AM7/31/18
to bids-di...@googlegroups.com

Hi,

 

I agree with Satra regarding waiting for NIDM-BIDS union; however, I am not sure how far we are from there.

 

I would prefer #2 due to provenance and filename length related considerations. As long as src values are clear and unambiguous, they can provide an alias for the source, so that any derivative would refer only to its closest source(s) by the src key. It can be also used to indicate more than one sources, which cannot be done with #5. E.g.:

1.

sub-01_src-T1w_brainmask.nii.gz

sub-01_task-rest_src-bold_variant-smoothed_ processedfunctional.nii.gz (I would use a less ambiguous and fully written suffix than preproc)

2.

sub-01_task-rest_src-T1w_src-processedfunctional_mean.tsv.gz

Con.: What to do with multiple variants/versions of the same source? But that is something the (NIDM) provenance model could cover.

 

#5 wold be my second choice.

Con.1.: It looks less intuitive. Is the variant-smoothed refer to the bold or the preprocessed (again, fully written suffix)? Contextually/semantically, this ambiguity can be resolved in perhaps most cases, because variant is not an usual/allowed(?) key for bold.

Con.2.: How to handle multiple sources? Some/most(?) cases (like the example above) may be straightforward, because one can see that the brainmask has been applied on the preprocessed, so

2.

sub-01_task-rest_variant-smoothed_bold_preprocessed_mean.tsv.gz

 

Vale,

Tibor

 

Auer, Tibor (Ph '99)

Henk-Jan Mutsaerts

unread,
Jul 31, 2018, 10:55:36 AM7/31/18
to bids-discussion
This seems to be a balance between filename length, readability, and provenance desires.

I +1 most of the others, either option 2 or 5. I would regard it as important to be able to see the source in the filename, but this wouldn't require a specific "src" key for me, I would go for shortest filenames as possible without hurting the BIDS essentials (as Thom points out preferring "var"  rather than "variance". Do we need the "src" prefix, since "T1w", or "BOLD" are names reserved for src (e.g. cannot belong to another key)?

It would be great if provenance could be stored inside the JSON, this probably goes too far for the filename. However, we could use some default letters that allow for short names and overview:
e.g.

r = resampled
s = smooth
etc. which is already used by some software packages (e.g. SPM).

We could also be pragmatic and simply keep the last "operation" in the filename (e.g. r or s) and leave the earlier operations in the sidecar. 


Best wishes/hartelijke groet,

 

 

Henk(-Jan) Mutsaerts, MD PhD

VUmc Amsterdam/AMC Amsterdam/UMCU Utrecht
UMCG Groningen/
Sunnybrook Toronto/RIT NY

Phone: +31 6 4390 8284; Skype: hj.mutsaerts


Chris Gorgolewski

unread,
Jul 31, 2018, 11:10:32 AM7/31/18
to bids-discussion
On Tue, Jul 31, 2018 at 7:55 AM Henk-Jan Mutsaerts <henkjanm...@gmail.com> wrote:
I would go for shortest filenames as possible without hurting the BIDS essentials (as Thom points out preferring "var"  rather than "variance".
If you feel strongly about making this change please start a new thread for clarity of discussion.

Do we need the "src" prefix, since "T1w", or "BOLD" are names reserved for src (e.g. cannot belong to another key)?
New keywords will be added in the future which might lead to conflict. However, my main point (which motivated this thread) was about that in the main BIDS spec each file has only one "keyless value". In derivatives (and proposal 5) there will be two.

It would be great if provenance could be stored inside the JSON, this probably goes too far for the filename. However, we could use some default letters that allow for short names and overview:
e.g.

r = resampled
s = smooth
etc. which is already used by some software packages (e.g. SPM).

We could also be pragmatic and simply keep the last "operation" in the filename (e.g. r or s) and leave the earlier operations in the sidecar. 
If you feel strongly about making this change please start a new thread for clarity of discussion. Sorry for being annoying, but we need to keep the conversations focused.

 

Auer, Tibor

unread,
Jul 31, 2018, 11:39:55 AM7/31/18
to bids-di...@googlegroups.com

+1 for var

 

I am not sure whether proposal 5 limits the number of ‘keyless values’. Or do we consider keeping only the last ‘keyless value’ of the source, i.e. omitting T1w for the second order derivative in the example?

 

I am not sure how scalable the ‘letter-based provenance’ is. I mean there is certainly more operations than letters and there are more tools to perform the same operation.

I would keep the var(iance) values to store freetext labels (like acq for raw data). I think they can help to distinguish different flavours/variances of the same data even within the same operation (e.g. stronglysmoothed/lightlysmoothed, normalisedTal/normalisedMNI).

 

Vale,

Tibor

 

Auer, Tibor (Ph '99)

 

Reply all
Reply to author
Forward
0 new messages