Hi all,
I'm emailing the list because an issue came up on the GitHub tracker that might fly under the radar of those interested, and I think there is a larger meta-spec conversation here.
For context, BIDS has a lot of metadata (JSON sidecar) fields that are marked as RECOMMENDED in the specification. Some of these have raised validator warnings on absence from very early on, but many others did not. The schema-based validator took the position that, if RECOMMENDED is to mean anything, then absence should trigger a warning. The alternative was a large set of custom checks in the schema that replicated the pick-and-choose selections of the original validator.
An issue was recently opened by a researcher who converted their dataset with HeuDiConv/dcm2niix, which populated as much as reasonably could be populated automatically, leading to warnings for 28 different fields on 10k+ files:
https://github.com/bids-standard/bids-specification/issues/2040. I looked through all of the fields on that issue, and I believe that at least 23 of the fields should be demoted to OPTIONAL, though some can be made RECOMMENDED given certain conditions. For example, "PartialFourierDirection" only makes sense if "PartialFourier" is defined.
I would particularly appreciate a review of my response to that issue from qMRI and MRS researchers, as some of these fields may determine analytical choices, where for many of us they are basically just provenance.
It feels like a good opportunity to think about what justifies a field as being RECOMMENDED, so that validation warnings correspond to problems that reasonably can be fixed and don't drown the user with nice-to-haves. Just to kick off the discussion, I have a few suggestions:
1) Fields should be required/recommended if they require human intervention to populate and significantly contribute to the understandability of the dataset. For example, describing tasks either with plain language or references to protocols.
2) Fields can be made conditionally required/recommended if a narrowly-targeted condition can be established from the file name or the value of other, related metadata. For example, requiring EchoTime for files with `echo-` in the file name.
3) Fields should be required/optional and not recommended if they can be populated by a conversion tool. If the absence is not severe enough for an error, we presume an inability to detect the metadata field.
4) Fields that exist to record deviations from typical usage must be optional, with an explicit default value specified in the term description. For example SliceEncodingDirection is generally assumed to be "k".
5) Fields that are commonly scrubbed for anonymization purposes must be optional.
Another possibility would be to move beyond the OPTIONAL/RECOMMENDED/REQUIRED terminology and think about different kinds of recommendation. I don't have a specific proposal here, but I would imagine that we would have OPTIONAL/REQUIRED and then categories such as "provenance" or "analysis" which distinguish between metadata that will make your dataset more understandable to someone attempting to replicate your experiment or someone attempting to analyze it. A validator could then sort out different kinds of warnings.
This is all coming from an MRI perspective where DICOM inputs are the norm. Input from ephys/PET/NIRS researchers would be appreciated as well.
Best,