Multi-threading and reproducibility

35 views
Skip to first unread message

Reece Hill

unread,
Aug 27, 2025, 1:20:04 PMAug 27
to DSI Studio
Hi Frank,

Puzzled by this one - wonder if you could help. Each time I run --action=trk I get different results.

I have fixed the seed sequence (--random_seed) to no avail. I'm using DSI Studio version: Chen"陳" Apr 26 2024.

/home/reece/dsi-studio/dsi_studio
--action=trk
--source=/[...]/automated.fib.gz
--thread_count=10
--fiber_count=200000
--seed_count=1000000000000.0
--method=1
--fa_threshold=0
--step_size=0.5
--turning_angle=60
--smoothing=0
--min_length=10
--max_length=300
--check_ending=1
--ref=/[...]/automated_reg_aparc+aseg.nii.gz
--tip_iteration=0
--delete_repeat=0.125
--output=/[...]/L_1m0.trk
--seed=/[...]/mni_dti_lh_precentral_paracentral.mask.nii.gz
--end=/[...]/mni_dti_lh_precentral_paracentral.mask.nii.gz
--end2=/[...]/mni_dti_lh_precentral_paracentral.mask.nii.gz
--random_seed=1315760028

Could this be some race condition/other multithreaded problem? I wonder if the order in which streamlines are returned is not deterministic, and thus when "delete_repeat" is applied different streamlines are kept?

Any thoughts? Ideally I want to ensure the streamlines are identical. How would you approach this problem?  

Many thanks in advance,
Reece 

Frank Yeh

unread,
Aug 27, 2025, 6:29:15 PMAug 27
to Reece Hill, DSI Studio
What I would do is to try a simple command first before getting all
parameters together.
This would help rule out each factor.

Also, you may consider using the latest version. There have been many
revisions since the 2024 Chen version.

Best regards,
Frank

On Wed, Aug 27, 2025 at 6:12 PM Reece Hill <reece...@gmail.com> wrote:
>
> Thanks, Frank.
>
> I had submitted this message whilst thread_count=1 was running - I was hoping to report back before you'd messaged! Thanks for such a fast reply.
>
> Whilst setting thread_count=1 certainly reduces the problem, I'm still getting inconsistent results.
>
> Is there a way for me to check the seeds/parameters used in the code? I feel like random_seed is not having the effect I'd hoped...
>
> Is there randomness introduced by delete_repeat? I now set this parameter when running --action=ana (to merge two hemispheres).
>
> Thanks
>
> On Wed, 27 Aug 2025, 18:22 Frank Yeh, <fran...@gmail.com> wrote:
>>
>> I would set --thread_count=1 to rule out racing issues first.
>> > --
>> > You received this message because you are subscribed to the Google Groups "DSI Studio" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an email to dsi-studio+...@googlegroups.com.
>> > To view this discussion visit https://groups.google.com/d/msgid/dsi-studio/f1ad27ad-37e4-48c5-87b4-e66fc04d6655n%40googlegroups.com.

Frank Yeh

unread,
Aug 28, 2025, 8:54:08 AMAug 28
to Reece Hill, DSI Studio
Hi Reece,

Thank you for isolating the cause. I will see if I can fix this
problem asap.

Best regards,
Frank

On Thu, Aug 28, 2025 at 7:52 AM Reece Hill <reece...@gmail.com> wrote:
>
> Thanks, Frank.
>
> I've isolated this issue to the --delete_repeat parameter, whether appended to --action=trk or --action=ana.
>
> How to reproduce:
> 1) Get multiple .trk files
> Run the below X times (to collect 20k*X fibres)
> --action=trk --source=<sourceFile> --thread_count=1 --fiber_count=20000 --seed_count=99999999999 --method=1 --fa_threshold=0 --output=<L_1m0_threadX.trk> --random_seed=567 .....
>
> // For clarity: each <L_1m0_threadX.trk> is consistent between runs. This works fine.
>
> 2) Attempt to merge .trk files with --delete_repeat set.
> However, consistency is lost on attempt to merge the <L_1m0_threadX.trk> into <L_1m0.trk>
> --action=ana --source=<sourceFile> --tract=<L_1m0_thread0.trk>,...,<L_1m0_thread9.trk> --output=<L_1m0_run1.trk> --delete_repeat=0.125
> --action=ana --source=<sourceFile> --tract=<L_1m0_thread0.trk>,...,<L_1m0_thread9.trk> --output=<L_1m0_run2.trk> --delete_repeat=0.125
>
> Each time we merge the files with delete_repeat, we get different streamlines. <L_1m0_run1.trk> is not equal to <L_1m0_run2.trk>
> The number of streamlines is different: (run1: 122,691, run2: 122,658)
> I have trialled stability by increasing delete_repeat repeatedly, however inconsistency remains.
>
> If we merge files WITHOUT delete_repeat, the output is identical.
>
> Problem
> Now, looking at the source code, I wonder if a data race is to blame - specifically tipl::adaptive_par_for()
>
> Fix
> I don't think there'll be a quick fix I could make at the command line for this. And I'm locked to the dsistudio version installed on my institution's hardware. I will probably look at implementing the same logic in Python with fixes (I don't know C++ very well).
>
> I hope the above is helpful to you.
>
> Kind regards,

Reece Hill

unread,
Aug 28, 2025, 9:57:02 AMAug 28
to Frank Yeh, DSI Studio
Thanks, Frank.

I've isolated this issue to the --delete_repeat parameter, whether appended to --action=trk or --action=ana.

How to reproduce:
1) Get multiple .trk files
Run the below X times (to collect 20k*X fibres)
--action=trk --source=<sourceFile> --thread_count=1 --fiber_count=20000 --seed_count=99999999999 --method=1 --fa_threshold=0 --output=<L_1m0_threadX.trk> --random_seed=567 .....

//  For clarity: each <L_1m0_threadX.trk> is consistent between runs. This works fine.

2) Attempt to merge .trk files with --delete_repeat set.
However, consistency is lost on attempt to merge the  <L_1m0_threadX.trk> into  <L_1m0.trk>  
 --action=ana --source=<sourceFile> --tract=<L_1m0_thread0.trk>,...,<L_1m0_thread9.trk> --output=<L_1m0_run1.trk> --delete_repeat=0.125
 --action=ana --source=<sourceFile> --tract=<L_1m0_thread0.trk>,...,<L_1m0_thread9.trk> --output=<L_1m0_run2.trk> --delete_repeat=0.125

Each time we merge the files with delete_repeat, we get different streamlines. <L_1m0_run1.trk> is not equal to <L_1m0_run2.trk>
The number of streamlines is different: (run1: 122,691, run2: 122,658)
I have trialled stability by increasing delete_repeat repeatedly, however inconsistency remains.

If we merge files WITHOUT delete_repeat, the output is identical.

Problem
Now, looking at the source code, I wonder if a data race is to blame - specifically tipl::adaptive_par_for()

Fix
I don't think there'll be a quick fix I could make at the command line for this. And I'm locked to the dsistudio version installed on my institution's hardware. I will probably look at implementing the same logic in Python with fixes (I don't know C++ very well).

I hope the above is helpful to you.

Kind regards,

On Wed, 27 Aug 2025, 23:29 Frank Yeh, <fran...@gmail.com> wrote:

Frank Yeh

unread,
Aug 28, 2025, 10:33:04 AMAug 28
to Reece Hill, DSI Studio
Hi Reece,

I fixed the multi-thread reproducibility issue in the "delete
repeated" tract function.
The new release is under build and the download links will be
updated in one hour (mac versions may take two or three hours).

Should there still be any remaining issue, please feel free to let me know.

Thank you again for reporting this problem!

Best,
Frank

Reece Hill

unread,
Aug 28, 2025, 3:14:18 PMAug 28
to Frank Yeh, DSI Studio
Thanks, Frank.

I've experimented with the new build - looks like the results from --action=ana with --delete_repeat are consistent. Nice job!

Few points on new build:
* I had to ensure --output was .trk.gz (compressed) to get it to work.
* I can't get the DSI Studio GUI to work properly (on loading .fib files, I get a black screen [this issue]). I'll continue sticking with CLI for now anyway.
* I get bad_alloc issues if delete_repeat passes below ~0.25: 48Gb RAM, running NVIDIA GeForce GTX 980 Ti


Thanks for sorting this.

Frank Yeh

unread,
Aug 28, 2025, 3:14:37 PMAug 28
to Reece Hill, DSI Studio
> * I had to ensure --output was .trk.gz (compressed) to get it to work.

I will look into this isssue

> * I can't get the DSI Studio GUI to work properly (on loading .fib files, I get a black screen [this issue]). I'll continue sticking with CLI for now anyway.

The error is from the OpenGL driver, possible solutions:

Update Graphics Drivers: This is the most common solution. The
graphics card drivers may be outdated, corrupted, or not properly
installed. Ensure the users have the latest drivers for their specific
GPU (NVIDIA, AMD, or Intel).

For NVIDIA, use the GeForce Experience application.
For AMD, use the Radeon Software.
For Intel, use the Intel Driver & Support Assistant.

Enable GLX or EGL: The error message "neither GLX nor EGL are enabled"
points to a problem with the graphics server on a Linux system. You
might need to check the X11 configuration to ensure that GLX (OpenGL
Extension to the X Window System) is properly configured and enabled.

> * I get bad_alloc issues if delete_repeat passes below ~0.25: 48Gb RAM, running NVIDIA GeForce GTX 980 Ti

48GB ram may not be enough for 0.25. Usually delete_repeat should be
greater than 1.

Frank Yeh

unread,
Aug 30, 2025, 10:29:29 PMAug 30
to Reece Hill, DSI Studio
I updated DSI Studio (will be ready in an hour) and you may check to
see if it solves the bad allocation problem.
Best,
Frank

On Sat, Aug 30, 2025 at 8:17 PM Frank Yeh <fran...@gmail.com> wrote:
>
> Thanks for the tips to handle this issue.
>
> I will see what I can do on my side to avoid this in the future.
>
> For the memory issue, it is due to a lookup table to speed up the distance check. I can add a code to check if it is too large to avoid bad allocation.
>
> I will come up with a quick fix in one day or two.
>
> Best
> Frank
>
>> 1) Fixing DSI Studio Black Screen Issue
>> I use DSI Studio almost exclusively CLI, but for other users I document attempts to get GUI working below. I managed to get it working, but with some workarounds. Hopefully it's useful to some.
>> Following your advice for black screen, I updated my NVIDIA drivers. This made no change and black screen persisted.
>>
>> However, just for documentation, I then checked if acceleration was a problem by setting to use software rendering: `LIBGL_ALWAYS_SOFTWARE=1 ./dsi_studio
>> Result: quick fix. But perhaps not permanent solution. Best to upgrade Ubuntu 20 to 22+, otherwise...
>>
>> I suspect the issue resides in the use of WSL2 and Ubuntu 20.04. Wayland is outdated (≥1.20 required) and Qt6 is unavailable out-of-the-box for Ubuntu 20.04.
>> Hence:
>> `QT_QPA_PLATFORM=wayland ./dsi_studio` does not show DSI Studio at all.
>> `QT_QPA_PLATFORM=xcb ./dsi_studio` is the black screen problem.
>>
>> A more permanent fix/bypass:
>> We can bypass WSLg by using an external X server (like we did for WSL1): I use XLaunch (https://sourceforge.net/projects/vcxsrv/).
>> 1) I start up XLaunch app with native openGL selected and disable access control.
>> 2) Inside WSL2: `export DISPLAY=$(hostname).local:0 && export LIBGL_ALWAYS_INDIRECT=0`
>> 3) If you run `glxgears` you should see moving gears. If you see moving gears, WSLg is now bypassed and handled via X server.
>> 4) Now if you launch DSI Studio `./dsi_studio` and load a .fib, it should work fine.
>>
>> 2) bad_alloc issues for small delete_repeat values
>> Thanks, Frank. I managed to implement a Python version that (loosely) follows your C++ logic. It seems to overcome memory demands by batching, but despite use of multiprocessing it will never be faster than a similar solution in C++. As you've now got a working solution I suppose this is low priority for you, but if there's a way to avoid memory overload in the C++ function it would be very appreciated.
>>
>> Kind regards
Reply all
Reply to author
Forward
0 new messages