Running MRIQC on big dataset

108 views
Skip to first unread message

zeit problem

unread,
May 17, 2023, 4:23:11 AM5/17/23
to mriqc-users
Hi there! 

I have a large-ish dataset of ~10k T1w MRI images and 2400 subjects. Therefore, I'd try to run mriqc in parallel. I'm using a singularity container build from MRIQC docker image (v 23.1.0rc0). Now my issue is that mriqc runs in parallel when I'm assigning e.g. 10 cores and 10 subjects. All good. But when I want to run the entire dataset on 60 cores, it takes a while to start and only runs as one process. 

Is there anything I'm missing here? 

Also, I tried to set --bids-database-dir to make it more efficient, but I'm not quite sure how to create such a directory using PyBids (layout.save(dir_of_choice)) did not seem to work.

Best,
Ruben. 

zeit problem

unread,
May 17, 2023, 4:32:48 AM5/17/23
to mriqc-users
Here is an example of a command I'm running:

  nice singularity run --cleanenv \
    --bind $BIDS_DIR:/data \
    --bind $OUT_DIR:/out \
    --bind $BIDS_DIR/layout:/db \
    $IMAGE_DIR/mriqc-23.1.0rc0.sif /data /out participant \
    --nprocs 60 --mem_gb 100 -f --no-sub

Oscar Esteban

unread,
May 18, 2023, 3:12:54 AM5/18/23
to mriqc-users, zeit problem
Dear Ruben,

> But when I want to run the entire dataset on 60 cores, it takes a while to start and only runs as one process.

Could you let me know how you checked it's running with only one process? I'm not hypothesizing you checked it wrongly, but we would need to know whether python is spawning 60 workers but only one is used or if it indeed does not spawn the workers.

Other thoughts:
- The time it should take to index the dataset is the same for 10 or for 1000 subjects (as long as you are not making any trick to speed up with less subjects, such as creating a "view" of the dataset with only 10 subjects using soft links, obviously, pybids is not going to get out of the bids root).
- --bids-dataset-dir should work -- the first time it will take a little longer, but after that one, provided you set the exact same folder, it should be very fast. After 23.0.0 you don't need to set --bids-database-dir because it will be written out to the top of the derivatives folder (i.e., if you don't wipe the output folder, it should remain there between runs).

Please let us know if you have any new findings.

Cheers,
oe


--
You received this message because you are subscribed to the Google Groups "mriqc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mriqc-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mriqc-users/72d11921-6ed2-4a45-be46-64880ecc05efn%40googlegroups.com.
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages