[BEP] layout file for derivatives

186 views
Skip to first unread message

Satrajit Ghosh

unread,
Apr 10, 2017, 10:51:03 AM4/10/17
to bids-di...@googlegroups.com
hi folks,

just like bids has a structure that could be represented with a schema or a layout file (see BIDSLayout here:  https://github.com/INCF/pybids/blob/master/bids/grabbids/config/bids.json ). we can scale this architecture for derivatives, while maintaining a dictionary for commonly used terms (e.g., sub, ses, etc.,.).

i would propose that the author of any app or process creating a bids derivatives optionally provide a layout.json file at the root of the derivative structure (//derivatives/{derivative_name}/layout.json)

the layout file should be relative to derivatives/{derivative_name}

cheers,

satra

Chris Gorgolewski

unread,
Apr 10, 2017, 11:43:37 AM4/10/17
to bids-discussion
Hi,

A couple of questions:
- How would this play with BIDS Derivatives? Would there be an intrinsic default layout people would extend?
- Do you want to control the key names in this layout?

Best,
Chris

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bids-discussion/CA%2BA4wOneVTU5YMmuVSCfoReMVaAcGqYjPT6KhE_-5G7MPP6qMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Michael Hanke

unread,
Apr 10, 2017, 2:25:18 PM4/10/17
to bids-di...@googlegroups.com, fran...@indiana.edu
Hi,

On Mon, Apr 10, 2017 at 08:43:15AM -0700, Chris Gorgolewski wrote:
> Hi,
>
> A couple of questions:
> - How would this play with BIDS Derivatives
> <https://docs.google.com/document/d/1Wwc4A6Mow4ZPPszDIWfCUCRNstn7d_zzaWPcfcHmgI4/edit>?

Quick question: Has there been any change to the concept of
incorporating derivatives inside a dataset (in a derivatives/ folder)?

I have previously expressed my concern that this brings a range of
practical problems, e.g.:

- a shared namespace is established: who gets to decide whose derivative
of a particular kind can take up a slot?

- dataset hosting and update bottle neck: any new derivative requires an
update of the original dataset. who get's to decide when it is worth
it? What is the take of data portal maintainers on this? Requests like
"I made A from B, please update dsXXX with it" are a likely
consequence.

- distribution efficiency: a growing blob of original data with
incorporated derivatives becomes increasingly more difficult to
handle, and the likelihood of anyone needing all pieces goes down at
the same time

- rights: who decides what licence derivatives are under when they go
into an original dataset? For example, "my" original data is CC0.
Nothing prevents people from producing "all rights reserved" works of
wonder from them. However, I will be up in arms if anyone achieves to
incorporate this kind of thing into a dataset, and at a location,
where I published the orginal CC0 dataset -- which would then maybe
get its licence amended and the author list extended by a new friend.
On the other side of the spectrum would be some kind of GPL-like viral
behavior, where derivatives have to share the license of the original
-- which is equally bad, although in a totally different way.

I recently had a discussion about this design with Franco Pestilli
(CC'ed), and it appears that I am not the only one with these kinds of
thoughts.

My suggestion was, and is, to have one dataset per derivative, and only
use a more or less convenient reference to the original dataset (at
least a README, but preferably a subdataset relationship like
datalad provides).

https://github.com/psychoinformatics-de/studyforrest-data-aligned/tree/master/src

The above link shows how one derivative dataset (spatially normalized
data for the studyforrest dataset) references its input/raw/original
datasets. In the studyforrest project we organize data in this way (one
study/acquisition == one dataset, one derivative/analysis == one
dataset). With the growing list of such datasets we are able to flexibly
combine intermediate datasets as source for new derivatives, without
feeling the pain of never ending updates to our foundation datasets --
something that we started with but had to abandon for manpower reasons
at a very early stage of the project.

Best,

Michael

Chris Gorgolewski

unread,
Apr 10, 2017, 2:47:10 PM4/10/17
to bids-discussion, fran...@indiana.edu
Hi Michael and Franco!

On Mon, Apr 10, 2017 at 11:25 AM, Michael Hanke <michae...@gmail.com> wrote:
Hi,

On Mon, Apr 10, 2017 at 08:43:15AM -0700, Chris Gorgolewski wrote:
> Hi,
>
> A couple of questions:
> - How would this play with BIDS Derivatives
> <https://docs.google.com/document/d/1Wwc4A6Mow4ZPPszDIWfCUCRNstn7d_zzaWPcfcHmgI4/edit>?

Quick question: Has there been any change to the concept of
incorporating derivatives inside a dataset (in a derivatives/ folder)?
Yes! A while ago we added a  <pipeline_name> folder level. Quoting from the draft of the spec

All of the following data types go under derivatives/ subfolder in the root of the dataset folder to make a clear distinction between raw data and results of data processing.
Each pipeline has a dedicated folder under which it stores all of its outputs. For example:
<dataset>/derivatives/fmri_preprocess/sub-0001
<dataset>/derivatives/spm/sub-0001
<dataset>/derivatives/vbm/sub-0001

This simple change should solve the problems outlined by you - let me go into details below.

I have previously expressed my concern that this brings a range of
practical problems, e.g.:

- a shared namespace is established: who gets to decide whose derivative
  of a particular kind can take up a slot?
There is not shared namespace anymore - each pipeline has its own subfolder.

- dataset hosting and update bottle neck: any new derivative requires an
  update of the original dataset. who get's to decide when it is worth
  it? What is the take of data portal maintainers on this? Requests like
  "I made A from B, please update dsXXX with it" are a likely
  consequence.
You can add derivative packages to existing datasets since each new derivative set will have its own subfoldr.
 
- distribution efficiency: a growing blob of original data with
  incorporated derivatives becomes increasingly more difficult to
  handle, and the likelihood of anyone needing all pieces goes down at
  the same time
You can create separate compressed archives that include only derivatives from one pipeline. Those can be downloaded independently from the raw data.

- rights: who decides what licence derivatives are under when they go
  into an original dataset? For example, "my" original data is CC0.
  Nothing prevents people from producing "all rights reserved" works of
  wonder from them. However, I will be up in arms if anyone achieves to
  incorporate this kind of thing into a dataset, and at a location,
  where I published the orginal CC0 dataset -- which would then maybe
  get its licence amended and the author list extended by a new friend.
  On the other side of the spectrum would be some kind of GPL-like viral
  behavior, where derivatives have to share the license of the original
  -- which is equally bad, although in a totally different way.
We have not included this before, but I just added license info to the <dataset>/derivatives/<pipeline_name>/pipeline_description.json
 
I recently had a discussion about this design with Franco Pestilli
(CC'ed), and it appears that I am not the only one with these kinds of
thoughts.

My suggestion was, and is, to have one dataset per derivative, and only
use a more or less convenient reference to the original dataset (at
least a README, but preferably a subdataset relationship like
datalad provides).

https://github.com/psychoinformatics-de/studyforrest-data-aligned/tree/master/src

The above link shows how one derivative dataset (spatially normalized
data for the studyforrest dataset) references its input/raw/original
datasets. In the studyforrest project we organize data in this way (one
study/acquisition == one dataset, one derivative/analysis == one
dataset). With the growing list of such datasets we are able to flexibly
combine intermediate datasets as source for new derivatives, without
feeling the pain of never ending updates to our foundation datasets --
something that we started with but had to abandon for manpower reasons
at a very early stage of the project.
Nothing prevent the <pipeline_name> subfolders to be treated as independent datasets - as long as there is a way for people to link them to the source. That link could be included in pipeline_description.json, but I expect only the most tech savvy neuroimagers to use it. In most cases I expect people to store the raw data on hard drives with the will to upload data online and ability to generate a permanent URL that could be included in the description of the derivatives folder.

I hopethis helps (and solves most of the problems)!

Best,
Chris

Best,

Michael


--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.

Satrajit Ghosh

unread,
Apr 10, 2017, 5:55:25 PM4/10/17
to bids-di...@googlegroups.com
hi chris,

- How would this play with BIDS Derivatives? Would there be an intrinsic default layout people would extend?

the list of possible derivatives and pipelines is, well close to infinite. therefore, if layout.json is not provided it could default to a basic layout:

/derivatives/pipeline_name/ses-X/subj-Y (if you are still doing things at the ses/subj level)

or simply

/derivatives/pipeline_name/... (for any non ses/subj level)

things inside these levels can be represented in whatever form the author of the pipeline desires. i'm not confident there are too many derivatives that can be grouped. for example registered to MNI152 in pipeline can mean something quite different from another pipeline if different registration algorithms are used. now you will be looking at full provenance.
 
- Do you want to control the key names in this layout?

the key names will be controlled in the following order:

- you cannot take an existing bids key to mean something else. i.e. intent of bids keys is preserved throughout the structure
- if you want to create a new key not already defined, send a request for keys to be included in the bids dictionary.

the issue with keys is going to be how do you prevent accidental use via a validator. would likely require a registry, etc.,. happy to elaborate on all the possible solutions here :)

cheers,

satra

On Mon, Apr 10, 2017 at 7:50 AM, Satrajit Ghosh <sa...@mit.edu> wrote:
hi folks,

just like bids has a structure that could be represented with a schema or a layout file (see BIDSLayout here:  https://github.com/INCF/pybids/blob/master/bids/grabbids/config/bids.json ). we can scale this architecture for derivatives, while maintaining a dictionary for commonly used terms (e.g., sub, ses, etc.,.).

i would propose that the author of any app or process creating a bids derivatives optionally provide a layout.json file at the root of the derivative structure (//derivatives/{derivative_name}/layout.json)

the layout file should be relative to derivatives/{derivative_name}

cheers,

satra

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bids-discussion/CA%2BA4wOneVTU5YMmuVSCfoReMVaAcGqYjPT6KhE_-5G7MPP6qMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.

Chris Gorgolewski

unread,
Apr 10, 2017, 9:46:25 PM4/10/17
to bids-discussion
Seems useful! Would you like to add a comprehensive paragraph with an example about it to BIDS Derivatives spec?

Best,
Chris

Michael Hanke

unread,
Apr 11, 2017, 1:07:36 AM4/11/17
to bids-di...@googlegroups.com, fran...@indiana.edu
Hi,

thanks for the explanations, please see my response below.

On Mon, Apr 10, 2017 at 11:46:48AM -0700, Chris Gorgolewski wrote:
> On Mon, Apr 10, 2017 at 11:25 AM, Michael Hanke <michae...@gmail.com>
> > - a shared namespace is established: who gets to decide whose derivative
> > of a particular kind can take up a slot?
> >
> There is not shared namespace anymore - each pipeline has its own subfolder.

Indeed the namespace conflict is limited to the pipeline names. Do you
envision it to become common practice to encode a person/lab's name in
the pipeline name?

> - dataset hosting and update bottle neck: any new derivative requires an
> > update of the original dataset. who get's to decide when it is worth
> > it? What is the take of data portal maintainers on this? Requests like
> > "I made A from B, please update dsXXX with it" are a likely
> > consequence.
> >
> You can add derivative packages to existing datasets since each new
> derivative set will have its own subfoldr.

True. However, I am still struggling to see how to apply this design in
the general case. Another example from our current practice are
derivatives that:

- need other derivatives as input in addition to raw data. I assume the
pipeline folder names are persistent with respect to updates, and pipeline
code will reference data inputs via a ../<derivpath> relative path.

- need more than one raw dataset as input.

This one I have no idea how to frame in the spirit of the current
spec. The new derivative has no exclusive parent-relation to either of
the two raw datasets. Where would it go?

You pointed out the derivatives are associated "by proximity" (i.e.
have to be extracted into the right dataset directory). So I guess I
could extract them into both input raw datasets. However, that still
wouldn't make the code that produced the derivative run in either
location without having to go beyond the root directory of a raw
dataset.

> You can create separate compressed archives that include only derivatives
> from one pipeline. Those can be downloaded independently from the raw data.
> <snip>
> We have not included this before, but I just added license info to the
> <dataset>/derivatives/<pipeline_name>/pipeline_description.json

Yes, that makes sense -- I see now that derivatives are considered stand alone.

Thanks,

Michael

Michael Hanke

unread,
Apr 11, 2017, 1:08:10 AM4/11/17
to bids-di...@googlegroups.com, fran...@indiana.edu
Hi,

thanks for the explanations, please see my response below.

On Mon, Apr 10, 2017 at 11:46:48AM -0700, Chris Gorgolewski wrote:
> On Mon, Apr 10, 2017 at 11:25 AM, Michael Hanke <michae...@gmail.com>
> > - a shared namespace is established: who gets to decide whose derivative
> > of a particular kind can take up a slot?
> >
> There is not shared namespace anymore - each pipeline has its own subfolder.

Indeed the namespace conflict is limited to the pipeline names. Do you
envision it to become common practice to encode a person/lab's name in
the pipeline name?

> - dataset hosting and update bottle neck: any new derivative requires an
> > update of the original dataset. who get's to decide when it is worth
> > it? What is the take of data portal maintainers on this? Requests like
> > "I made A from B, please update dsXXX with it" are a likely
> > consequence.
> >
> You can add derivative packages to existing datasets since each new
> derivative set will have its own subfoldr.

True. However, I am still struggling to see how to apply this design in
the general case. Another example from our current practice are
derivatives that:

- need other derivatives as input in addition to raw data. I assume the
pipeline folder names are persistent with respect to updates, and pipeline
code will reference data inputs via a ../<derivpath> relative path.

- need more than one raw dataset as input.

This one I have no idea how to frame in the spirit of the current
spec. The new derivative has no exclusive parent-relation to either of
the two raw datasets. Where would it go?

You pointed out the derivatives are associated "by proximity" (i.e.
have to be extracted into the right dataset directory). So I guess I
could extract them into both input raw datasets. However, that still
wouldn't make the code that produced the derivative run in either
location without having to go beyond the root directory of a raw
dataset.

> You can create separate compressed archives that include only derivatives
> from one pipeline. Those can be downloaded independently from the raw data.
> <snip>
> We have not included this before, but I just added license info to the
> <dataset>/derivatives/<pipeline_name>/pipeline_description.json

Chris Gorgolewski

unread,
Apr 11, 2017, 1:25:40 AM4/11/17
to bids-discussion, fran...@indiana.edu
On Mon, Apr 10, 2017 at 10:07 PM, Michael Hanke <michae...@ovgu.de> wrote:
Hi,

thanks for the explanations, please see my response below.

On Mon, Apr 10, 2017 at 11:46:48AM -0700, Chris Gorgolewski wrote:
> On Mon, Apr 10, 2017 at 11:25 AM, Michael Hanke <michae...@gmail.com>
> > - a shared namespace is established: who gets to decide whose derivative
> >   of a particular kind can take up a slot?
> >
> There is not shared namespace anymore - each pipeline has its own subfolder.

Indeed the namespace conflict is limited to the pipeline names. Do you
envision it to become common practice to encode a person/lab's name in
the pipeline name?
This seems reasonable. Another case would be different version of the same pipelines.
 
> - dataset hosting and update bottle neck: any new derivative requires an
> >   update of the original dataset. who get's to decide when it is worth
> >   it? What is the take of data portal maintainers on this? Requests like
> >   "I made A from B, please update dsXXX with it" are a likely
> >   consequence.
> >
> You can add derivative packages to existing datasets since each new
> derivative set will have its own subfoldr.

True. However, I am still struggling to see how to apply this design in
the general case. Another example from our current practice are
derivatives that:

- need other derivatives as input in addition to raw data. I assume the
  pipeline folder names are persistent with respect to updates, and pipeline
  code will reference data inputs via a ../<derivpath> relative path.
If a piece of software needs derivatives from multiple pipelines they can access their respective subfolders under /derivatives/pipeline1 and /derivatives/piepline2. This seems pretty straightforward unless I am missing something (I don't quite understand what do you mean by "derivatives needing derivatives").

- need more than one raw dataset as input.

  This one I have no idea how to frame in the spirit of the current
  spec. The new derivative has no exclusive parent-relation to either of
  the two raw datasets. Where would it go?

  You pointed out the derivatives are associated "by proximity" (i.e.
  have to be extracted into the right dataset directory). So I guess I
  could extract them into both input raw datasets. However, that still
  wouldn't make the code that produced the derivative run in either
  location without having to go beyond the root directory of a raw
  dataset.
Even though this is a valid case (again - don't quite understand what  "derivatives needing derivatives" means so I assume you meant software needing multiple datasets and producing one derivative) it does strike to me as rather on the 20 than 80 percent of usecases. I don't have a good solution to this (especially one that would not require datsets to have permanent URLs). Reading between the lines of your particular use case I do think that it would be easier for consumers of studyforrest if it was a single multi session dataset (which coincidentally eliminates this use case). Having T1ws in one dataset and corresponding bold files in another makes it impossible to analyze by most BIDS Apps. Just something to think about :)

Best,
Chris

> You can create separate compressed archives that include only derivatives
> from one pipeline. Those can be downloaded independently from the raw data.
> <snip>
> We have not included this before, but I just added license info to the
> <dataset>/derivatives/<pipeline_name>/pipeline_description.json

Yes, that makes sense -- I see now that derivatives are considered stand alone.

Thanks,

Michael

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.

Michael Hanke

unread,
Apr 11, 2017, 2:13:25 AM4/11/17
to bids-di...@googlegroups.com, fran...@indiana.edu
On Mon, Apr 10, 2017 at 10:25:18PM -0700, Chris Gorgolewski wrote:
> > True. However, I am still struggling to see how to apply this design in
> > the general case. Another example from our current practice are
> > derivatives that:
> >
> > - need other derivatives as input in addition to raw data. I assume the
> > pipeline folder names are persistent with respect to updates, and
> > pipeline
> > code will reference data inputs via a ../<derivpath> relative path.
> >
> If a piece of software needs derivatives from multiple pipelines they can
> access their respective subfolders under /derivatives/pipeline1 and
> /derivatives/piepline2. This seems pretty straightforward unless I am
> missing something (I don't quite understand what do you mean by
> "derivatives needing derivatives").

You got it right. "derivatives needing derivatives" means the
output of one "pipeline" is required input into another pipeline.

> - need more than one raw dataset as input.
> >
> > This one I have no idea how to frame in the spirit of the current
> > spec. The new derivative has no exclusive parent-relation to either of
> > the two raw datasets. Where would it go?
> >
> > You pointed out the derivatives are associated "by proximity" (i.e.
> > have to be extracted into the right dataset directory). So I guess I
> > could extract them into both input raw datasets. However, that still
> > wouldn't make the code that produced the derivative run in either
> > location without having to go beyond the root directory of a raw
> > dataset.
> >
> Even though this is a valid case (again - don't quite understand what
> "derivatives needing derivatives" means so I assume you meant software
> needing multiple datasets and producing one derivative) it does strike to
> me as rather on the 20 than 80 percent of usecases. I don't have a good
> solution to this (especially one that would not require datsets to have
> permanent URLs). Reading between the lines of your particular use case I do
> think that it would be easier for consumers of studyforrest if it was a
> single multi session dataset (which coincidentally eliminates this use
> case). Having T1ws in one dataset and corresponding bold files in another
> makes it impossible to analyze by most BIDS Apps. Just something to think
> about :)

This is one way to think about it. However, I don't think it scales.
What if I am investigating some network model and I am pooling over a
range of datasets from the functional connectomes initative? Wouldn't
your proposal mean that I have to restructure their work into a single
pretend multi-session raw dataset? That seems like adjusting the problem
to fit the solution ;-)

I still don't see the particular advantage of having derivative datasets
contained in a select raw dataset. I think the proposal would work in
exactly the same spirit when derivative datasets would "contain" their
inputs.

The current proposal uses an implicit root path of an input raw dataset
(only a single one is possible) of ../.. from the perspective of a
derivative, and ../<name of derivative> for other derivatives.

My proposal changes this to be always and for everything from the perspective
of the derivative dataset.

src/<name of the input dataset>

("src" could also be "inputs" or similar)

This is the only difference as far as I can see. Partial tarballs to be
extracted into some directory relative to a reference still work. Code
can rely on relative paths for an arbitrary number of input datasets.
Namespace conflicts are constrained to a single folder.


Michael

Guillaume Flandin

unread,
Apr 12, 2017, 6:39:14 AM4/12/17
to bids-di...@googlegroups.com
Hi Satra,

Before focusing on BIDS Derivatives, if we go down this route, could the BIDS layout file become an official part of the specs (structure of the JSON file, etc)? Unless I miss something, the file you point to defines a number of entities but is far from exhaustively describing the entire BIDS layout.

Thanks,
Guillaume.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.

Chris Gorgolewski

unread,
Apr 12, 2017, 11:21:40 AM4/12/17
to bids-discussion, fran...@indiana.edu
Just to clarify. You envision a structure like this:

/src/ #raw data
     sub-01
     sub-02
/fmriprep/ # derivative 1
          sub-01
          sub-01
/freesurfer/ # derivative 2
            sub-01
            sub-02

This is an interesting idea and it can work. It's slightly counter-intuitive since in the beginning (when you only have raw data - no derivatives) you need to add an extra directory on top to accommodate the derivatives that will come in the future.

I am not sure however how this would solve the issue of derivatives coming from multiple datasets. Would you add an extra level under /src/? Like /src/ds000113b /src/ds000113d.

Best,
Chris
          




Michael

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.

Chris Gorgolewski

unread,
Apr 12, 2017, 11:27:42 AM4/12/17
to bids-discussion
On Wed, Apr 12, 2017 at 3:39 AM, Guillaume Flandin <guillaum...@gmail.com> wrote:
Hi Satra,

Before focusing on BIDS Derivatives, if we go down this route, could the BIDS layout file become an official part of the specs (structure of the JSON file, etc)? Unless I miss something, the file you point to defines a number of entities but is far from exhaustively describing the entire BIDS layout.
Technically this file has been a formal representation of the BIDS Spec used internally in pybids. In current BIDS  (not BIDS Derivatives) it would be redundant to include this file with each dataset since it would always be the same. Am I missing something?

Best,
Chris

 

Thanks,
Guillaume.

On 10 April 2017 at 15:50, Satrajit Ghosh <sa...@mit.edu> wrote:
hi folks,

just like bids has a structure that could be represented with a schema or a layout file (see BIDSLayout here:  https://github.com/INCF/pybids/blob/master/bids/grabbids/config/bids.json ). we can scale this architecture for derivatives, while maintaining a dictionary for commonly used terms (e.g., sub, ses, etc.,.).

i would propose that the author of any app or process creating a bids derivatives optionally provide a layout.json file at the root of the derivative structure (//derivatives/{derivative_name}/layout.json)

the layout file should be relative to derivatives/{derivative_name}

cheers,

satra

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bids-discussion/CA%2BA4wOneVTU5YMmuVSCfoReMVaAcGqYjPT6KhE_-5G7MPP6qMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to bids-discussion@googlegroups.com.

Guillaume Flandin

unread,
Apr 12, 2017, 11:56:10 AM4/12/17
to bids-di...@googlegroups.com
Hi Chris,

On 12 April 2017 at 16:27, Chris Gorgolewski <krzysztof....@gmail.com> wrote:
On Wed, Apr 12, 2017 at 3:39 AM, Guillaume Flandin <guillaum...@gmail.com> wrote:
Hi Satra,

Before focusing on BIDS Derivatives, if we go down this route, could the BIDS layout file become an official part of the specs (structure of the JSON file, etc)? Unless I miss something, the file you point to defines a number of entities but is far from exhaustively describing the entire BIDS layout.
Technically this file has been a formal representation of the BIDS Spec used internally in pybids. In current BIDS  (not BIDS Derivatives) it would be redundant to include this file with each dataset since it would always be the same. Am I missing something?

Sorry if I was unclear: I meant to have it part of the BIDS specification document, not included in each dataset. And if this format ends up being recommended to be used in BIDS Derivatives, you will anyway have to describe it formally - unless you consider that pybids is part of the specs.

Best,
Guillaume.

 

Chris Gorgolewski

unread,
Apr 12, 2017, 12:00:52 PM4/12/17
to bids-discussion
On Wed, Apr 12, 2017 at 8:56 AM, Guillaume Flandin <guillaum...@gmail.com> wrote:
Hi Chris,


On 12 April 2017 at 16:27, Chris Gorgolewski <krzysztof.gorgolewski@gmail.com> wrote:
On Wed, Apr 12, 2017 at 3:39 AM, Guillaume Flandin <guillaum...@gmail.com> wrote:
Hi Satra,

Before focusing on BIDS Derivatives, if we go down this route, could the BIDS layout file become an official part of the specs (structure of the JSON file, etc)? Unless I miss something, the file you point to defines a number of entities but is far from exhaustively describing the entire BIDS layout.
Technically this file has been a formal representation of the BIDS Spec used internally in pybids. In current BIDS  (not BIDS Derivatives) it would be redundant to include this file with each dataset since it would always be the same. Am I missing something?

Sorry if I was unclear: I meant to have it part of the BIDS specification document, not included in each dataset. And if this format ends up being recommended to be used in BIDS Derivatives, you will anyway have to describe it formally
Got ya. I totally agree - that's why I asked Satra to add a description to the BIDS Derivatives document.
 
- unless you consider that pybids is part of the specs.
Nope - sorry if I gave that impression.

Best,
Chris

JB Poline

unread,
Apr 12, 2017, 3:05:03 PM4/12/17
to The Brain Imaging Data Structure (BIDS) discussion, Pestilli, Franco
Hi,

I must say I kind of like the idea,  it sounds like eventually we will need a tag versus a hierarchical description, such that you could have 'mypipeline/inputs/{list of ds????}' but also 'ds????/mypipeline'.

cheers
JB


Alejandro De La Vega

unread,
Apr 12, 2017, 9:08:07 PM4/12/17
to bids-discussion
I like JB's suggestion-- I think allowing some flexibility here is important as there are a wide range of use cases. If both derivatives and original datasets are self-contained is too confusing to have both be acceptable organizations?
It seems for the average user, a dataset-centric viewpoint is more intuitive, and I agree with Chris that 80% (or more) of use cases are likely to be single-dataset 
(Funny enough, when I was working with Michael's data, I just naturally plopped 'studyforrest-aligned-data' into 'derivatives/')

Best,
Alejandro.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bids-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bids-discussi...@googlegroups.com.
To post to this group, send email to bids-di...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages