Crowd-sourcing for accurate ground truth

34 views
Skip to first unread message

Raamana, Pradeep Reddy

unread,
Jul 3, 2021, 11:59:24 AM7/3/21
to niQC

I’d like to share this idea/project for community’s benefit, if anyone is interested in taking it up or collaborating:

 

I am personally concerned about a ton of studies coming out in the literature analyzing cortical thickness estimates on large multi-site datasets like ABCD etc without sufficient QC (I mean none in some cases, and others being very nominal or flaky to be useless). One idea I have (that I have already mentioned this to a few of you here) is to crowd-source that large task (Visual QC on a 10K+ subjects) into many small tasks (500 to 1000 subjects per lab, that may take 2-4 weeks depending on their commitment and resources). It is rather straightforward (establish training guides, have concordance protocols, rate the quality, and pool them). The goal being: we only have to QC the dataset once, then ABCD folks share it in the NDA or whatever, and everyone works off it. It not only produces the highest quality QCed dataset but also saves a ton of everyone going forward.

 

I did pitch it to ABCD folks (at OHBM’19 in Rome), and unfortunately I haven’t been successful in convincing them to take it up. I am sure they have many good reasons for that, besides being extremely busy already in managing the dataset.

 

An alternative until we get there is to try QC only a small subset 50 subjects per site (N ~ 1500) and build an error detector machine learning model based on reliable and accurate quality ratings (which does not yet exist in the literature) and share it with the community to prevent folks from using other horrible methods that are being employed right now.

 

This is a generic framework of course that can be applied to most niQC tasks (in various modalities) which need accurate ground truth to build automatic ML models.

 

Thanks,

Pradeep

romain

unread,
Jul 5, 2021, 4:22:47 PM7/5/21
to ni...@googlegroups.com

Hello

this sound like a nice objective, and I would like to collaborate, but I am not sure, this would be easy neither straightforward. First on need to have a well define protocol (training guidelines ect ...) ... is there any thing already there ?

let say we have it, then we need to be sure every rater is well trained, and this can be do only with a consistent overlap in the subject being rated, so that one can check the consistency. So the objective can not be to QC the dataset only once (although this may be a good start) . The more overlap we will have the more confident will we be on the quality of the rating.


Even the existence of a unique groundtruth, is not obvious to me: qualifying a data as bad or good, is very much relative to the task in hand (clinical diagnostic (like tumor detection), quantitative morphometry (segmentation ect ...). Even with a single task in mind, it is then dependent on the exact software you use ... (and new deep learning methods, may greatly change the task if there are such robust as it is claimed ...)


I do not know about the ABCD protocol, can we get the data easily ?

We work in this direction with a private dataset and our local QC procedure, and we also consider the ABIDE dataset that was initially rated by the mriqc's people

On this subject It is interested to read this article :  "Improving out-of-sample prediction of quality of MRIQC" O. Esteban, that show that just re qualifying the doubtful images, help to achieve much better performance of the the classifier.

So this show that the "exactness" of the ground truth is indeed important,

Even with mriqc people which doing great job in sharing the code and the data, it is not easy to get the proper information on the exact QC rating. (there are different rating files in the mriqc repository and I could not get an answer about the exact one to use (see https://github.com/poldracklab/mriqc/issues/806 ).
I recently also contact the first author, to get the new annotations related to the above article, ... unsucessfull ...


cheers

Romain

--
You received this message because you are subscribed to the Google Groups "niQC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to niqc+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/niqc/DM6PR04MB6746F4509D17A3AF61274C91B81E9%40DM6PR04MB6746.namprd04.prod.outlook.com.

Raamana, Pradeep Reddy

unread,
Jul 7, 2021, 10:31:50 AM7/7/21
to romain, ni...@googlegroups.com

Thanks, Romain, for sharing your thoughts in detail. I strongly appreciate and share your concerns. In fact, I am routinely the “difficult guy” (or Reviewer 2) in QC discussions pushing everyone to obtain an acceptable ground truth. Some recent papers coming out and some tools in use that are not based on acceptable ground truth is what prompted me to send this email.

 

That said, for the Freesurfer QC (atleast for cortical parcellation, which is often the salient one), there is acceptable protocols per se – see e.g. our preprint https://www.biorxiv.org/content/10.1101/2020.09.07.286807v3

 

Training folks to use a rather easy to use VisualQC is very easy (we have detailed manuals), and establishing concordance across raters/labs is also straightforward. Hence, I’d say the primary difficulty is getting 5-10 labs sign up to committing to this.

 

I do agree with you our definitions of what’s good/bad might not align with few others, but given flexibility in rating systems (allowing multiple labels/tags etc), we can achieve consensus or account for this easily.

 

Yes, ABCD dataset is straightforward to obtain, just follow their application process.

 

I am sorry to hear about the lack of response for the requests you made – you are not alone in that regard, as this is a rather well-known problem, and we hope to change the culture in our field to improve transparency.

 

Thanks,

Pradeep

Oscar Esteban

unread,
Jul 7, 2021, 11:20:57 AM7/7/21
to ni...@googlegroups.com
Hi there, the 2ct from the first author of the referenced paper:

- Questions about MRIQC (or any other tool we develop) should be addressed to the GitHub repos, the users lists (mriqc-users for the case at hand) or NeuroStars.org. I typically do not reply to private requests for support and this is one of the reasons - everything is public and transparent as Pradeep wished. Instead, I rather send a canned response redirecting to the aforementioned venues.
- That said, my first "unsuccessful" reply to your first email took literally 8 minutes... The second has been delayed for 7 days (and I have already mentioned why privately).

That being said, I'd just encourage anyone who can join Pradeep in this proposal. Unfortunately, I don't have the time or resources to help with it this time.

Cheers,
Oscar



(My postdoc appointment with Stanford has concluded and this account will eventually be closed, please update your contacts - I can be reached at oscar....@unil.ch or p...@oscaresteban.es, thanks!)

___________________________
Oscar Esteban, Ph.D.
Research and Teaching Ambizione FNS Fellow
Dept. of Radiology, CHUV, University of Lausanne

+1 (650) 733 33 82


Yaroslav Halchenko

unread,
Jul 7, 2021, 12:11:01 PM7/7/21
to niQC

just would need to deal with a mess of making that data "available" only for NDA-approved folks.  Didn't look inside braindr, but if it is operating direction on NIfTIs, then it should be quite doable as long as there is an easy-ish access to individual NIfTIs from NDA.  ATM access to S3 is still possible so could be quite easy to setup, and credentials would not need to leave client/participant's browser (would just mint a token to access NDA).

I am now even thinking -- it might be a cool project to make it easy to bolt it on any data hosting portal etc, such as http://datasets.datalad.org https://openneuro.org etc, which would pop up braindr somewhere in the corner and allow visitors to do QC while browsing the portal etc, and establish some centralized "sink" of QA results across them.

Ariel Rokem

unread,
Jul 7, 2021, 1:46:01 PM7/7/21
to niQC
On Wednesday, July 7, 2021 at 9:11:01 AM UTC-7 Yaroslav Halchenko wrote:


Thanks for mentioning braindr, Yarik! I have been meaning to send a message mentioning this, and glad that you beat me to it.

I'll just point out that Anisha also created tools to make braindr relatively easy to replicate with other datasets: https://docs.swipesforscience.org/#/. We have since expanded the approach to include diffusion-weighted MRI from the HBN dataset: https://fibr.dev/#/
 
just would need to deal with a mess of making that data "available" only for NDA-approved folks.  Didn't look inside braindr, but if it is operating direction on NIfTIs, then it should be quite doable as long as there is an easy-ish access to individual NIfTIs from NDA.  ATM access to S3 is still possible so could be quite easy to setup, and credentials would not need to leave client/participant's browser (would just mint a token to access NDA).


Access to NDA is indeed a bit of a limitation on our ability to do things like this right now. But even under restricted access conditions, I think that SwipesForScience would be very useful as a tool in what Pradeep originally proposed. I would just also advocate to open this up to citizen scientists as well. We've found that if you have a mixture of experts and citizens, you can combine information quite effectively (the details are in Anisha's paper: https://www.frontiersin.org/articles/10.3389/fninf.2019.00029/full). 
 
I am now even thinking -- it might be a cool project to make it easy to bolt it on any data hosting portal etc, such as http://datasets.datalad.org https://openneuro.org etc, which would pop up braindr somewhere in the corner and allow visitors to do QC while browsing the portal etc, and establish some centralized "sink" of QA results across them.

I love this idea! 

Cheers, 

Ariel

Raamana, Pradeep Reddy

unread,
Jul 7, 2021, 2:12:08 PM7/7/21
to Ariel Rokem, niQC

Thanks Yarik and Ariel – web-based tools (like braindr) are quite useful for niQC tasks. They are more accessible for certain types of users (by virtue of browser interface etc) and useful for certain QC tasks that are relatively easy to do (identifying easily detectable artefacts based on simple visualizations) and relatively easy to compute. For complicated QC tasks (which is often the case with niQC (e.g. Freesurfer, advanced fMRI artefacts) that require advanced visualizations and/or resource-intensive operations (such as ML on data-derived features to generate outlier alerts etc), they became much slower and/or less accurate, in addition to posing technical challenges with access to offline datasets, difficulties in initial setup and other challenges relying on a cloud setup etc.

 

Given the primary motivation for this project is to generate accurate and reliable ground-truth, I am much more inclined to use QC tools that are custom-designed for that task and that emphasizes rating accuracy (VisualQC for Freesurfer; Yes, I have a huge conflict of interest here 😊..). The reason for crowd-sourcing is to split that burden into producing QCed dataset that would not require any further maintenance once it is done.

 

That said, there is a ton of niQC tasks (some that are yet to be studied) that may benefit from braindr and other web-based tools. Lei and team at USC used it for a Stroke/Lesion QC task that some of us contributed to, and Ariel also just mentioned another recent study.

 

We can match the right tool for the right task going forward, based on the goals of the project and users. Perhaps this tool table needs to be expanded to help with task:

https://incf.github.io/niQC/tools

--

You received this message because you are subscribed to the Google Groups "niQC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to niqc+uns...@googlegroups.com.

Yaroslav Halchenko

unread,
Jul 7, 2021, 2:12:49 PM7/7/21
to niQC
Somehow do not see this part of the reply above, but see in an email I got:

>   I love this idea! The main wrinkle I see is that swipes currently requires                                                                                                        
>   a conversion from the original data format into a more "standard" image                                                                                                           
>   format, such as jpeg. 

may be https://github.com/rii-mango/NIFTI-Reader-JS or alike could be easily bolted on to operate directly on the nifti from a URL. With some CORRs tune up could even operate across portals etc.  FWIW filed https://github.com/OpenNeuroLab/braindr/issues/5 for starters ;)

romain

unread,
Jul 7, 2021, 2:15:46 PM7/7/21
to ni...@googlegroups.com

Yes

I would like to apologies,

this was not fair to accuse mriqc folks of not answering,

Your are the first to push toward open source, (the code, the documentation,  the data and the rating), and beside I do really appreciate your work, and this database sharing initiative (with the rating, even if it is not ideal), since I do work with it.

I guess I mention the issue in your github, just to give an example, that sharing is not always easy. but I realize, that this is by no mean productive, and particularly unfair, to you which is much more involve in sharing than me ...

so if I could remove my comment on mriqc ...

So to really appologies, I need now to participate, and return something more useful for the community.  

so I am in to review some data ...


Cheers

Romain

romain

unread,
Jul 7, 2021, 2:15:51 PM7/7/21
to ni...@googlegroups.com

QC of freesurfer segmentation output, is indeed more easy to define,

but I tought we were talking about rating the raw T1 images... ?


Romain


On 07/07/2021 16:31, Raamana, Pradeep Reddy wrote:

Thanks, Romain, for sharing your thoughts in detail. I strongly appreciate and share your concerns. In fact, I am routinely the “difficult guy” (or Reviewer 2) in QC discussions pushing everyone to obtain an acceptable ground truth. Some recent papers coming out and some tools in use that are not based on acceptable ground truth is what prompted me to send this email.

 

That said, for the Freesurfer QC (atleast for cortical parcellation, which is often the salient one), there is acceptable protocols per se – see e.g. our preprint https://www.biorxiv.org/content/10.1101/2020.09.07.286807v3

 

Training folks to use a rather easy to use VisualQC is very easy (we have detailed manuals), and establishing concordance across raters/labs is also straightforward. Hence, I’d say the primary difficulty is getting 5-10 labs sign up to committing to this.

 

I do agree with you our definitions of what’s good/bad might not align with few others, but given flexibility in rating systems (allowing multiple labels/tags etc), we can achieve consensus or account for this easily.

 

Yes, ABCD dataset is straightforward to obtain, just follow their application process.

 

I am sorry to hear about the lack of response for the requests you made – you are not alone in that regard, as this is a rather well-known problem, and we hope to change the culture in our field to improve transparency.

no, I do not agree, "the culture of transparency" does not mean you will have to response any user mail request ... Again, it was my fault to mention it at the first place,   because by no mean this was meant to blame mriqc folks: they have the culture of sharing data code from a long time now ...) anyway, just let forget it



Raamana, Pradeep Reddy

unread,
Jul 7, 2021, 3:23:24 PM7/7/21
to Yaroslav Halchenko, niQC

Yes – I am guessing the javascript neuroimaging libraries are neither as comprehensive nor as mature as the python counterparts.

 

PS: Not trying to digress, (as some of you know my stand well) I would strongly discourage any divestment from python 😊 : https://crossinvalidation.com/2018/05/03/lets-focus-our-neuroinformatics-community-efforts-in-python-and-on-software-validation/

 

PPS: Yarik, I didn’t get that email or see it on the google groups web interface (some issues with google groups email delivery I guess)

 

From: ni...@googlegroups.com <ni...@googlegroups.com> on behalf of Yaroslav Halchenko <yarik...@gmail.com>


Date: Wednesday, July 7, 2021 at 2:12 PM
To: niQC <ni...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "niQC" group.
To unsubscribe from this group and stop receiving emails from it, send an email to niqc+uns...@googlegroups.com.

Anisha Keshavan

unread,
Jul 7, 2021, 5:08:59 PM7/7/21
to Raamana, Pradeep Reddy, Yaroslav Halchenko, niQC
Hi All! Sorry I'm a bit late to this thread, I will try to address multiple things here!

just would need to deal with a mess of making that data "available" only for NDA-approved folks. 

So Damien Fair is working on getting a few versions of braindr up for ABCD data for NDA-approved folks to collaboratively QC. I'm not sure what the status is for this now, so I'd reach out to Damien on this (if he's not already on this mailing list) !

Didn't look inside braindr, but if it is operating direction on NIfTIs, then it should be quite doable as long as there is an easy-ish access to individual NIfTIs from NDA

Actually braindr (or more generally, swipes for science ) operates on any 2D image (png, jpeg, gif, etc). 

I am now even thinking -- it might be a cool project to make it easy to bolt it on any data hosting portal etc, such as http://datasets.datalad.org https://openneuro.org etc, which would pop up braindr somewhere in the corner and allow visitors to do QC while browsing the portal etc, and establish some centralized "sink" of QA results across them

great idea! braindr just needs a list of URLs to images, they can be hosted anywhere. (Also, wouldn't it be cool to do a Captcha-type thing, except instead of selecting all images with traffic lights, you swipe on 5 images ? LOL)

They are more accessible for certain types of users (by virtue of browser interface etc) and useful for certain QC tasks that are relatively easy to do (identifying easily detectable artefacts based on simple visualizations) and relatively easy to compute. For complicated QC tasks (which is often the case with niQC (e.g. Freesurfer, advanced fMRI artefacts) that require advanced visualizations and/or resource-intensive operations (such as ML on data-derived features to generate outlier alerts etc), they became much slower and/or less accurate, in addition to posing technical challenges with access to offline datasets, difficulties in initial setup and other challenges relying on a cloud setup etc.
 
Given the primary motivation for this project is to generate accurate and reliable ground-truth, I am much more inclined to use QC tools that are custom-designed for that task and that emphasizes rating accuracy (VisualQC for Freesurfer; Yes, I have a huge conflict of interest here 😊..). The reason for crowd-sourcing is to split that burden into producing QCed dataset that would not require any further maintenance once it is done.

Here is the issue: crowdsourcing of complicated QC tasks is really difficult. Not many people like doing it. If you can break down a complicated QC task into a set of quick binary decisions, you're more likely to get more raters rating more images. The purpose of braindr/swipesforscience is to enable the scientist to focus on generating QC images where a quick decision can be made, and to abstract away all the cloud deployment/web stuff. 

So I would think of braindr as a complement to visualQC, not an alternative: you can rate VisualQC images on braindr. Accuracy is a concern, but we've found that with a small "training set" of expertly-labelled images, you can filter out the bad raters and still end up with a large amount of labelled data.

Yes – I am guessing the javascript neuroimaging libraries are neither as comprehensive nor as mature as the python counterparts.
 
PS: Not trying to digress, (as some of you know my stand well) I would strongly discourage any divestment from python 😊 

I'm a fan of using the best tool/language for the job. Python is great for most things (e.g generating QC images), but you can't beat JS when it comes to crowdsourcing. Just check out the leaderboard of fibr.dev (https://fibr.dev/#/leaderboard) -- there are 393 total raters, and 148 of those rated over 3000 images! 

Another crowdsourcing alternative is zooniverse: https://www.zooniverse.org/ -- you can post more complex QC tasks if that's the route you want to go. Pierre Bellec's group has used it, so I would reach out to him (if he's not already on this mailing list) to discuss the pros/cons of this.

An alternative until we get there is to try QC only a small subset 50 subjects per site (N ~ 1500) and build an error detector machine learning model based on reliable and accurate quality ratings (which does not yet exist in the literature) and share it with the community to prevent folks from using other horrible methods that are being employed right now.

 

I'm sad I didn't think of this back when I wrote the braindr paper: how many image quality ratings would you need to train an accurate MRIQC classifier for a new site? Maybe we didn't need to rate so many images. In theory, someone could grab the braindr ratings and the mriqc IQMs on the HBN dataset to estimate this. Unfortunately I don't have the bandwidth right now.
 
- Anisha


Alexandre Franco

unread,
Jul 9, 2021, 9:12:11 PM7/9/21
to Anisha Keshavan, Raamana, Pradeep Reddy, Yaroslav Halchenko, niQC
Hi Folks, 

Another shoot-out to Braindr. We already have two papers that use Braindr to perform visual quality control on our data and are now working on a 3rd paper. We love this tool! 

One of these papers is under review and is about the data release of the Rockland Sample. But take a look at figure 9:
Here we compared the Braidr score with the Euler number extracted from Freesurfer. As you can see in panel C of that figure, there is a pretty good correspondence between the Braindr scores and the Euler number. 

Figure 8 might also be interesting for this discussion since it shows longitudinal trajectories of Freesurfer outputs. As can be seen in panel F, mean cortical thickness estimates are all over the place. 

As of note for this discussion as well, there is a group from USC that is training a model to predict the quality of Freesurfer segmentation. 
Unfortunately, this conference paper is behind a paywall and my trusty alternative method isn't working, so can't access it. 

Best, 
Alex








Reply all
Reply to author
Forward
0 new messages