One of the hardest problems we've faced during our MRIQC experience is the reproducibility of measures.
We have basically identified two sources of variability:
- The obvious one: logical changes in the workflow, along with changes in the definition of metrics. The latest changes on this front were done before releasing 0.9.6. Therefore, if not affected by the second item of this list, you should not notice changes between 0.9.6-10.
- The tricky one: one-to-one run reproducibility is not ensure when running C++ code compiled with generally used flags. That is the case of ANTs within MRIQC. So, if you want to get the exact same values between two MRIQC runs of the exact same version, then make sure you use the flag --ants-nthreads 1 to avoid parallelism. You might expect though a much longer runtime (
https://github.com/poldracklab/mriqc/pull/596)
The extent to which these numeric variabilities affect the results in our paper is largely unknown, and any efforts investigating this will be greatly welcome.
I'll leave this thread open for anyone to comment, knowing the importance of the topic.