I’d like to share this idea/project for community’s benefit, if anyone is interested in taking it up or collaborating:
I am personally concerned about a ton of studies coming out in the literature analyzing cortical thickness estimates on large multi-site datasets like ABCD etc without sufficient QC (I mean none in some cases, and others being very nominal or flaky to be useless). One idea I have (that I have already mentioned this to a few of you here) is to crowd-source that large task (Visual QC on a 10K+ subjects) into many small tasks (500 to 1000 subjects per lab, that may take 2-4 weeks depending on their commitment and resources). It is rather straightforward (establish training guides, have concordance protocols, rate the quality, and pool them). The goal being: we only have to QC the dataset once, then ABCD folks share it in the NDA or whatever, and everyone works off it. It not only produces the highest quality QCed dataset but also saves a ton of everyone going forward.
I did pitch it to ABCD folks (at OHBM’19 in Rome), and unfortunately I haven’t been successful in convincing them to take it up. I am sure they have many good reasons for that, besides being extremely busy already in managing the dataset.
An alternative until we get there is to try QC only a small subset 50 subjects per site (N ~ 1500) and build an error detector machine learning model based on reliable and accurate quality ratings (which does not yet exist in the literature) and share it with the community to prevent folks from using other horrible methods that are being employed right now.
This is a generic framework of course that can be applied to most niQC tasks (in various modalities) which need accurate ground truth to build automatic ML models.
Thanks,
Pradeep
Hello
this sound like a nice objective, and I would like to
collaborate, but I am not sure, this would be easy neither
straightforward. First on need to have a well define protocol
(training guidelines ect ...) ... is there any thing already there
?
let say we have it, then we need to be sure every rater is well trained, and this can be do only with a consistent overlap in the subject being rated, so that one can check the consistency. So the objective can not be to QC the dataset only once (although this may be a good start) . The more overlap we will have the more confident will we be on the quality of the rating.
Even the existence of a unique groundtruth, is not obvious to me:
qualifying a data as bad or good, is very much relative to the
task in hand (clinical diagnostic (like tumor detection),
quantitative morphometry (segmentation ect ...). Even with a
single task in mind, it is then dependent on the exact software
you use ... (and new deep learning methods, may greatly change the
task if there are such robust as it is claimed ...)
I do not know about the ABCD protocol, can we get the data easily
?
We work in this direction with a private dataset and our local QC procedure, and we also consider the ABIDE dataset that was initially rated by the mriqc's people
On this subject It is interested to read this article : "Improving out-of-sample prediction of quality of MRIQC" O. Esteban, that show that just re qualifying the doubtful images, help to achieve much better performance of the the classifier.
So this show that the "exactness" of the ground truth is indeed
important,
Even with mriqc people which doing great job in sharing the code
and the data, it is not easy to get the proper information on the
exact QC rating. (there are different rating files in the mriqc
repository and I could not get an answer about the exact one to
use (see https://github.com/poldracklab/mriqc/issues/806 ).
I recently also contact the first author, to get the new
annotations related to the above article, ... unsucessfull ...
cheers
Romain
--
You received this message because you are subscribed to the Google Groups "niQC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to niqc+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/niqc/DM6PR04MB6746F4509D17A3AF61274C91B81E9%40DM6PR04MB6746.namprd04.prod.outlook.com.
Thanks, Romain, for sharing your thoughts in detail. I strongly appreciate and share your concerns. In fact, I am routinely the “difficult guy” (or Reviewer 2) in QC discussions pushing everyone to obtain an acceptable ground truth. Some recent papers coming out and some tools in use that are not based on acceptable ground truth is what prompted me to send this email.
That said, for the Freesurfer QC (atleast for cortical parcellation, which is often the salient one), there is acceptable protocols per se – see e.g. our preprint https://www.biorxiv.org/content/10.1101/2020.09.07.286807v3
Training folks to use a rather easy to use VisualQC is very easy (we have detailed manuals), and establishing concordance across raters/labs is also straightforward. Hence, I’d say the primary difficulty is getting 5-10 labs sign up to committing to this.
I do agree with you our definitions of what’s good/bad might not align with few others, but given flexibility in rating systems (allowing multiple labels/tags etc), we can achieve consensus or account for this easily.
Yes, ABCD dataset is straightforward to obtain, just follow their application process.
I am sorry to hear about the lack of response for the requests you made – you are not alone in that regard, as this is a rather well-known problem, and we hope to change the culture in our field to improve transparency.
Thanks,
Pradeep
To view this discussion on the web visit https://groups.google.com/d/msgid/niqc/b96a8a1e-1470-cbd3-7c94-ef1c4dd5c6b4%40gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/niqc/DM6PR04MB67465B35C17E9F5DEC1D2124B81A9%40DM6PR04MB6746.namprd04.prod.outlook.com.
https://github.com/OpenNeuroLab/braindr (https://braindr.us/#/) from a tireless https://github.com/akeshavan ?
just would need to deal with a mess of making that data "available" only for NDA-approved folks. Didn't look inside braindr, but if it is operating direction on NIfTIs, then it should be quite doable as long as there is an easy-ish access to individual NIfTIs from NDA. ATM access to S3 is still possible so could be quite easy to setup, and credentials would not need to leave client/participant's browser (would just mint a token to access NDA).
I am now even thinking -- it might be a cool project to make it easy to bolt it on any data hosting portal etc, such as http://datasets.datalad.org https://openneuro.org etc, which would pop up braindr somewhere in the corner and allow visitors to do QC while browsing the portal etc, and establish some centralized "sink" of QA results across them.
Thanks Yarik and Ariel – web-based tools (like braindr) are quite useful for niQC tasks. They are more accessible for certain types of users (by virtue of browser interface etc) and useful for certain QC tasks that are relatively easy to do (identifying easily detectable artefacts based on simple visualizations) and relatively easy to compute. For complicated QC tasks (which is often the case with niQC (e.g. Freesurfer, advanced fMRI artefacts) that require advanced visualizations and/or resource-intensive operations (such as ML on data-derived features to generate outlier alerts etc), they became much slower and/or less accurate, in addition to posing technical challenges with access to offline datasets, difficulties in initial setup and other challenges relying on a cloud setup etc.
Given the primary motivation for this project is to generate accurate and reliable ground-truth, I am much more inclined to use QC tools that are custom-designed for that task and that emphasizes rating accuracy (VisualQC for Freesurfer; Yes, I have a huge conflict of interest here 😊..). The reason for crowd-sourcing is to split that burden into producing QCed dataset that would not require any further maintenance once it is done.
That said, there is a ton of niQC tasks (some that are yet to be studied) that may benefit from braindr and other web-based tools. Lei and team at USC used it for a Stroke/Lesion QC task that some of us contributed to, and Ariel also just mentioned another recent study.
We can match the right tool for the right task going forward, based on the goals of the project and users. Perhaps this tool table needs to be expanded to help with task:
--
You received this message because you are subscribed to the Google Groups "niQC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
niqc+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/niqc/aee7f0b1-efa5-4def-892e-b4e3999ddd8bn%40googlegroups.com.
Yes
I would like to apologies,
this was not fair to accuse mriqc folks of not answering,
Your are the first to push toward open source, (the code, the
documentation, the data and the rating), and beside I do really
appreciate your work, and this database sharing initiative (with
the rating, even if it is not ideal), since I do work with it.
I guess I mention the issue in your github, just to give an
example, that sharing is not always easy. but I realize, that this
is by no mean productive, and particularly unfair, to you which is
much more involve in sharing than me ...
so if I could remove my comment on mriqc ...
So to really appologies, I need now to participate, and return something more useful for the community.
so I am in to review some data ...
Cheers
Romain
To view this discussion on the web visit https://groups.google.com/d/msgid/niqc/CAAsFcgzGKLAxvPOaGk2LOBAGr%3DUePpRBN5WuhzevvxD2v-e%3DwQ%40mail.gmail.com.
QC of freesurfer segmentation output, is indeed more easy to
define,
but I tought we were talking about rating the raw T1 images... ?
Romain
Thanks, Romain, for sharing your thoughts in detail. I strongly appreciate and share your concerns. In fact, I am routinely the “difficult guy” (or Reviewer 2) in QC discussions pushing everyone to obtain an acceptable ground truth. Some recent papers coming out and some tools in use that are not based on acceptable ground truth is what prompted me to send this email.
That said, for the Freesurfer QC (atleast for cortical parcellation, which is often the salient one), there is acceptable protocols per se – see e.g. our preprint https://www.biorxiv.org/content/10.1101/2020.09.07.286807v3
Training folks to use a rather easy to use VisualQC is very easy (we have detailed manuals), and establishing concordance across raters/labs is also straightforward. Hence, I’d say the primary difficulty is getting 5-10 labs sign up to committing to this.
I do agree with you our definitions of what’s good/bad might not align with few others, but given flexibility in rating systems (allowing multiple labels/tags etc), we can achieve consensus or account for this easily.
Yes, ABCD dataset is straightforward to obtain, just follow their application process.
I am sorry to hear about the lack of response for the requests you made – you are not alone in that regard, as this is a rather well-known problem, and we hope to change the culture in our field to improve transparency.
no, I do not agree, "the culture of transparency" does not mean
you will have to response any user mail request ... Again, it was
my fault to mention it at the first place, because by no mean
this was meant to blame mriqc folks: they have the culture of
sharing data code from a long time now ...) anyway, just let
forget it
Yes – I am guessing the javascript neuroimaging libraries are neither as comprehensive nor as mature as the python counterparts.
PS: Not trying to digress, (as some of you know my stand well) I would strongly discourage any divestment from python 😊 : https://crossinvalidation.com/2018/05/03/lets-focus-our-neuroinformatics-community-efforts-in-python-and-on-software-validation/
PPS: Yarik, I didn’t get that email or see it on the google groups web interface (some issues with google groups email delivery I guess)
From: ni...@googlegroups.com <ni...@googlegroups.com> on behalf of Yaroslav Halchenko <yarik...@gmail.com>
Date: Wednesday, July 7, 2021 at 2:12 PM
To: niQC <ni...@googlegroups.com>
--
You received this message because you are subscribed to the Google Groups "niQC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
niqc+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/niqc/75d4d58f-3e43-4aae-860f-1ee15f69e7ban%40googlegroups.com.
just would need to deal with a mess of making that data "available" only for NDA-approved folks.
Didn't look inside braindr, but if it is operating direction on NIfTIs, then it should be quite doable as long as there is an easy-ish access to individual NIfTIs from NDA
I am now even thinking -- it might be a cool project to make it easy to bolt it on any data hosting portal etc, such as http://datasets.datalad.org https://openneuro.org etc, which would pop up braindr somewhere in the corner and allow visitors to do QC while browsing the portal etc, and establish some centralized "sink" of QA results across them
They are more accessible for certain types of users (by virtue of browser interface etc) and useful for certain QC tasks that are relatively easy to do (identifying easily detectable artefacts based on simple visualizations) and relatively easy to compute. For complicated QC tasks (which is often the case with niQC (e.g. Freesurfer, advanced fMRI artefacts) that require advanced visualizations and/or resource-intensive operations (such as ML on data-derived features to generate outlier alerts etc), they became much slower and/or less accurate, in addition to posing technical challenges with access to offline datasets, difficulties in initial setup and other challenges relying on a cloud setup etc.
Given the primary motivation for this project is to generate accurate and reliable ground-truth, I am much more inclined to use QC tools that are custom-designed for that task and that emphasizes rating accuracy (VisualQC for Freesurfer; Yes, I have a huge conflict of interest here 😊..). The reason for crowd-sourcing is to split that burden into producing QCed dataset that would not require any further maintenance once it is done.
Yes – I am guessing the javascript neuroimaging libraries are neither as comprehensive nor as mature as the python counterparts.
PS: Not trying to digress, (as some of you know my stand well) I would strongly discourage any divestment from python 😊
An alternative until we get there is to try QC only a small subset 50 subjects per site (N ~ 1500) and build an error detector machine learning model based on reliable and accurate quality ratings (which does not yet exist in the literature) and share it with the community to prevent folks from using other horrible methods that are being employed right now.
To view this discussion on the web visit https://groups.google.com/d/msgid/niqc/DM6PR04MB674689CED1496F9BC50790A0B81A9%40DM6PR04MB6746.namprd04.prod.outlook.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/niqc/CAB4yuceGBFE85fLWo09miBm6fkizbi4i2atT3YcgQBu0X1xsVg%40mail.gmail.com.