Inquiry about Batch Processing Performance and CLI-GUI Differences in Fiber Tracking

192 views
Skip to first unread message

Yonglun Ji

unread,
Apr 26, 2024, 12:14:36 AM4/26/24
to DSI Studio
Hi Frank,

I am currently encountering some issues with batch processing and CLI-based fiber tracking, and I would greatly appreciate your insights.

1.Batch Processing Speed: I've noticed that using batch processing to handle fiber tracking tasks is significantly slower compared to the GUI approach. Specifically, processing a single subject via batch takes over 15 hours, whereas the GUI method (Whole Brain Seeding-Fiber Tracking-Recognize and Cluster) completes in roughly 20 minutes for the same subject. Could you shed some light on why there might be such a drastic difference in processing times between these two methods?

2.CLI vs. GUI Results: Additionally, with the same parameters id, the results from CLI-based fiber tracking appear to be substantially different from those obtained through the GUI. And I am unable to find the "Recognize and Cluster" step in the CLI. Since there doesn't seem to be an option to process all tracks simultaneously in the CLI, I have implemented a workaround by storing all tract IDs and processing them sequentially in a loop. I believe this approach differs significantly from the fiber tracking method used in the GUI.

Below is a snippet of my code for handling the tracking in CLI, which might help illustrate my current method:

for /r "%SUBJECTS_DIR%" %%x in (*.fib.gz) do (
    rem Extract the specific part of the filename (e.g., sub-A001)
    set "filename=%%~nx"
    for /f "delims=_ tokens=1" %%i in ("!filename!") do set "subject_name=%%i"


    for %%t in (%TRACK_IDS%) do (
        rem Construct the output filename
        set output_file=!OUTPUT_DIR!\!subject_name!_%%t.tt.gz
set log_file=!OUTPUT_DIR!\!subject_name!_%%t.log.txt

       
        rem Execute DSI Studio tracking
        rem Example command, adjust according to your actual command line options
        call %DSI_PATH%\dsi_studio.exe --thread_count=20 --action=trk --source="%%x" --track_id="%%t" --output="!output_file!" --parameter_id=c9A99193Fb803FcCDCCCC3DbF041b484340420FdcaCDCC4C3Ec > "!log_file!"
call %DSI_PATH%\dsi_studio.exe --thread_count=20 --action=ana --source="%%x" --tract="!output_file!" --export=stat
        echo Processed: %%x with track ID %%t
    )
    echo Processed: %%x
)

I would greatly appreciate any advice or guidance you can offer to help resolve these issues. Thank you very much for your time and support.

Best,
Yonglun

Frank Yeh

unread,
Apr 26, 2024, 8:46:53 AM4/26/24
to yj3...@nyu.edu, DSI Studio
> 1.Batch Processing Speed: I've noticed that using batch processing to handle fiber tracking tasks is significantly slower compared to the GUI approach. Specifically, processing a single subject via batch takes over 15 hours, whereas the GUI method (Whole Brain Seeding-Fiber Tracking-Recognize and Cluster) completes in roughly 20 minutes for the same subject. Could you shed some light on why there might be such a drastic difference in processing times between these two methods?

You may check out the console output (GUI vs CLI) to see if there is
any differences.

>
> 2.CLI vs. GUI Results: Additionally, with the same parameters id, the results from CLI-based fiber tracking appear to be substantially different from those obtained through the GUI. And I am unable to find the "Recognize and Cluster" step in the CLI. Since there doesn't seem to be an option to process all tracks simultaneously in the CLI, I have implemented a workaround by storing all tract IDs and processing them sequentially in a loop. I believe this approach differs significantly from the fiber tracking method used in the GUI.
>

I will review the code to see if we can have "recognize and cluster" in the CLI.

Yonglun Ji

unread,
May 6, 2024, 5:44:17 AM5/6/24
to DSI Studio
Thank you for your response, Frank.

I still have some questions regarding the process.

1.I've noticed that the GUI pipeline for fiber tracking doesn't generate extensive logs, which makes it challenging to compare it directly with the batch processing output. I understand that the parameters for each method are not identical, but I am puzzled by the substantial difference in processing times. Below, I've included the available logs from both methods. Could you help me identify any significant differences or potential issues that might explain this discrepancy in processing times?

2.Since "Recognize and Cluster" is not available in CLI, I'm concerned about the potential impact on the results when using only the CLI for fiber tracking. I've observed considerable differences in the statistics (like the number of tracts and QA values) for the same regions between my GUI and CLI results. Could this difference be due to the absence of the "Recognize and Cluster" step in the CLI? How critical is this step for ensuring the accuracy and reliability of the results?

##My GUI pipeline  (Whole Brain Seeding-Fiber Tracking-Recognize and Cluster) output ##
loading tractography atlas
|-checking existing mapping file
| | loading mapping fields from E:/wjh/Pre_data/adult_1st/step1.qsiprep/Adult_1st/qsirecon/sub-001/dwi/sub-001_dir-AP_space-T1w_desc-preproc_gqi.fib.gz.icbm152_adult.map.gz
| |_626 ms
|-loading
| |_131 ms
| host space (mni):
| -1 -0 0 78
| -0 -1 0 76
| -0 -0 1 -50
| 0 0 0 1
| tractography space (mni):
| -1 0 0 78
| 0 -1 0 76
| 0 0 1 -50
| 0 0 0 1
|_1.085 s
loading tractography atlas
|_0 ms
##

## batch processing output(one region)##

-automatic fiber tracking

| | processing sub-002_dir-AP_space-T1w_desc-preproc_gqi.fib.gz

| |-tracking pathways

| | | tracking Association_ArcuateFasciculusL

| | |-open FIB file sub-002_dir-AP_space-T1w_desc-preproc_gqi.fib.gz

| | | | using index file for accelerated loading: E:/wjh/Pre_data/adult_1st/step1.qsiprep/Adult_1st/qsirecon/sub-002/dwi/sub-002_dir-AP_space-T1w_desc-preproc_gqi.fib.gz.idx

| | | | loading fiber and image data

| | | |-loading image volumes

| | | | |_80 ms

| | | | initiating data

| | | | default template set to young adult

| | | | FIB file loaded

| | | |_1.013 s

| | | template 0: ICBM152_adult

| | | template 1: C57BL6_mouse

| | | template 2: dHCP_neonate

| | | template 3: INDI_rhesus

| | | template 4: Pitt_marmoset

| | | template 5: WHS_SD_rat

| | |-loading tractography atlas

| | | |-checking existing mapping file

| | | | | loading mapping fields from E:/wjh/Pre_data/adult_1st/step1.qsiprep/Adult_1st/qsirecon/sub-002/dwi/sub-002_dir-AP_space-T1w_desc-preproc_gqi.fib.gz.icbm152_adult.map.gz

| | | | |_638 ms

| | | |-loading

| | | | |_122 ms

| | | | host space (mni):

| | | | -1 -0 0 78

| | | | -0 -1 0 76

| | | | -0 -0 1 -50

| | | | 0 0 0 1

| | | | tractography space (mni):

| | | | -1 0 0 78

| | | | 0 -1 0 76

| | | | 0 0 1 -50

| | | | 0 0 0 1

| | | |_1.019 s

| | | otsu_threshold=0.6

| | | fa_threshold=0

| | | turning_angle=0

| | | step_size=0

| | | smoothing=0

| | | min_length(mm): 21.3762

| | | max_length(mm): 146.695

| | | tip_iteration=32

| | | check_ending=1

| | |-tracking Association_ArcuateFasciculusL

| | | |-[thread]loading tractography atlas

| | | | [thread]convert tolerance distance of 22 from ICBM mm to 14.2508 subject voxels

| | | | [thread]creating limiting region to limit tracking results

| | | | [thread]apply left limiting mask for Association_ArcuateFasciculusL

| | | | [thread]checking additional ROI for refining tracking

| | | | yield rate (tract generated per seed): 0.0158454

| | | | tract yield rate (tracts per second): 5716.77

| | | | seed yield rate (seeds per second): 360785

| | | |-save trajectories to sub-002_dir-AP_space-T1w_desc-preproc_gqi.Association_ArcuateFasciculusL.tt.gz

| | | | |-compressing trajectories

| | | | | |_59 ms

| | | | |-saving file

| | | | | |_908 ms

| | | | |_1.012 s

| | | |_31.412 s

| | |-export tracts statistics

##


Frank Yeh

unread,
May 6, 2024, 8:16:47 AM5/6/24
to yj3...@nyu.edu, DSI Studio
> 1.I've noticed that the GUI pipeline for fiber tracking doesn't generate extensive logs, which makes it challenging to compare it directly with the batch processing output. I understand that the parameters for each method are not identical, but I am puzzled by the substantial difference in processing times. Below, I've included the available logs from both methods. Could you help me identify any significant differences or potential issues that might explain this discrepancy in processing times?

Click on the [Console] button to see the output.

>
> 2.Since "Recognize and Cluster" is not available in CLI, I'm concerned about the potential impact on the results when using only the CLI for fiber tracking.

It is available in CLI, ADD --recognize=cluster_info
https://dsi-studio.labsolver.org/doc/cli_t3.html

Yonglun Ji

unread,
May 7, 2024, 4:39:49 AM5/7/24
to DSI Studio
Many thanks!I have added the --recognize=cluster option in the CLI, and it outputs label.txt and name.txt. Regarding this, I have a few questions:

1.What does the label.txt represent? What specific information does it contain?

2.After running "Recognize and Cluster" in the GUI, we can obtain tract data for different regions and output it in txt form. How can I achieve similar results using the CLI? It seems that --export only outputs overall results.

Thank you again for your assistance!

Frank Yeh

unread,
May 9, 2024, 3:14:53 PM5/9/24
to yj3...@nyu.edu, DSI Studio
> 1.What does the label.txt represent? What specific information does it contain?

Cluster number

>
> 2.After running "Recognize and Cluster" in the GUI, we can obtain tract data for different regions and output it in txt form. How can I achieve similar results using the CLI? It seems that --export only outputs overall results.

You may use MATLAB or python to separate tracts (in .mat or .txt) into
multiple files.

Best,
Frank

>
> Thank you again for your assistance!
> On Monday, May 6, 2024 at 8:16:47 PM UTC+8 Frank Yeh wrote:
>>
>> > 1.I've noticed that the GUI pipeline for fiber tracking doesn't generate extensive logs, which makes it challenging to compare it directly with the batch processing output. I understand that the parameters for each method are not identical, but I am puzzled by the substantial difference in processing times. Below, I've included the available logs from both methods. Could you help me identify any significant differences or potential issues that might explain this discrepancy in processing times?
>>
>> Click on the [Console] button to see the output.
>>
>> >
>> > 2.Since "Recognize and Cluster" is not available in CLI, I'm concerned about the potential impact on the results when using only the CLI for fiber tracking.
>>
>> It is available in CLI, ADD --recognize=cluster_info
>> https://dsi-studio.labsolver.org/doc/cli_t3.html
>
> --
> You received this message because you are subscribed to the Google Groups "DSI Studio" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dsi-studio+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dsi-studio/da0a0b6f-c3c9-46f6-9d23-5b84e72f16c9n%40googlegroups.com.

Yonglun Ji

unread,
May 21, 2024, 7:30:14 AM5/21/24
to DSI Studio
Thank you for your previous responses. I have a few follow-up questions based on your last email:

1.I now understand what the label file represents. By counting the occurrence of each number in the label file, I can determine the quantity of each fiber tract.

2.I apologize for any confusion in my previous communication. The results I exported using --export appear to show overall results for the entire brain, not separate results for each fiber tract as shown in attached Screenshot 1. If I want to obtain data results for each individual tract, how should I proceed?

3.Regarding auto fiber tracking versus manual fiber tracking: Based on your advice, I compared the console outputs for both methods. I noticed differences in parameter settings (eg. min/max length), which might explain the variation in results between the two methods? However, I still don't understand why the GUI batch processing's step B4: auto fiber tracking is very slow (taking over 10+ hours), whereas Whole Brain Seeding-Fiber Tracking-Recognize and Cluster only takes about 20 minutes for the same dataset. As seen in Screenshot 2, it seems that each fiber tract is processed individually in ATK. Could you explain this discrepancy? Additionally, as I aim to obtain individual data results for each tract, are the differences between these two methods acceptable?

Thank you once again for your assistance.

Best,
Yonglun

图片1.png图片2.png

Frank Yeh

unread,
May 21, 2024, 9:19:55 AM5/21/24
to yj3...@nyu.edu, DSI Studio

2.I apologize for any confusion in my previous communication. The results I exported using --export appear to show overall results for the entire brain, not separate results for each fiber tract as shown in attached Screenshot 1. If I want to obtain data results for each individual tract, how should I proceed?

1. save tracts in tt.gz file
2. use --action=ana to export metrics
 

3.Regarding auto fiber tracking versus manual fiber tracking: Based on your advice, I compared the console outputs for both methods. I noticed differences in parameter settings (eg. min/max length), which might explain the variation in results between the two methods? However, I still don't understand why the GUI batch processing's step B4: auto fiber tracking is very slow (taking over 10+ hours), whereas Whole Brain Seeding-Fiber Tracking-Recognize and Cluster only takes about 20 minutes for the same dataset. As seen in Screenshot 2, it seems that each fiber tract is processed individually in ATK. Could you explain this discrepancy? 

The fiber tracking setting is different, including the parameters and ROI/ROA combinations.
 
Additionally, as I aim to obtain individual data results for each tract, are the differences between these two methods acceptable?

The differences will be substantial.
Whether it is acceptable depends on the experiment design and scientific reasoning.

Best,
Frank
 

Yonglun Ji

unread,
May 21, 2024, 11:33:46 AM5/21/24
to DSI Studio
Sorry,  "If I want to obtain data results for each individual tract", here I mean results for each cluster (or regions? e.g. Commissure_CorpusCallosum_Body). And I only had one tt.gz.file for each subject after running fiber tracking with recognize and cluster. In GUI, after recognize and cluster, we can get number of tracts of each region and export the data (e.g qa,fa) of each region, I wonder if i could do it in CLI. Thanks!

Frank Yeh

unread,
May 21, 2024, 11:40:44 AM5/21/24
to yj3...@nyu.edu, DSI Studio
You may need to write a code/script to separate each tract into
different files. It is not available in DSI Studio CLI.
One approach is to save as .txt. or .mat and use MATLAB or Python to
separate them.

On Tue, May 21, 2024 at 11:34 AM Yonglun Ji <yj3...@nyu.edu> wrote:
>
> Sorry, "If I want to obtain data results for each individual tract", here I mean results for each cluster (or regions? e.g. Commissure_CorpusCallosum_Body). And I only had one tt.gz.file for each subject after running fiber tracking with recognize and cluster. In GUI, after recognize and cluster, we can get number of tracts of each region and export the data (e.g qa,fa) of each region, I wonder if i could do it in CLI. Thanks!
> On Tuesday, May 21, 2024 at 9:19:55 PM UTC+8 Frank Yeh wrote:
>>>
>>>
>>> 2.I apologize for any confusion in my previous communication. The results I exported using --export appear to show overall results for the entire brain, not separate results for each fiber tract as shown in attached Screenshot 1. If I want to obtain data results for each individual tract, how should I proceed?
>>
>>
>> 1. save tracts in tt.gz file
>> 2. use --action=ana to export metrics
>>
>>>
>>>
>>> 3.Regarding auto fiber tracking versus manual fiber tracking: Based on your advice, I compared the console outputs for both methods. I noticed differences in parameter settings (eg. min/max length), which might explain the variation in results between the two methods? However, I still don't understand why the GUI batch processing's step B4: auto fiber tracking is very slow (taking over 10+ hours), whereas Whole Brain Seeding-Fiber Tracking-Recognize and Cluster only takes about 20 minutes for the same dataset. As seen in Screenshot 2, it seems that each fiber tract is processed individually in ATK. Could you explain this discrepancy?
>>
>>
>> The fiber tracking setting is different, including the parameters and ROI/ROA combinations.
>>
>>>
>>> Additionally, as I aim to obtain individual data results for each tract, are the differences between these two methods acceptable?
>>
>>
>> The differences will be substantial.
>> Whether it is acceptable depends on the experiment design and scientific reasoning.
>>
>> Best,
>> Frank
>>
>>>
>>>
>>>
>>> Thank you once again for your assistance.
>>>
>>> Best,
>>> Yonglun
>>>
> To view this discussion on the web visit https://groups.google.com/d/msgid/dsi-studio/d9a5257a-d6b3-4443-9df7-d2f9ee809881n%40googlegroups.com.

Wenjun Huang

unread,
Jul 12, 2024, 1:55:48 AM7/12/24
to DSI Studio
Hi, Frank

I used the command call "%DSI_PATH%\dsi_studio.exe" --action=ana --source="%%x" --tract="%RESOURSE_DIR%\!EM_name!.t25.length30.dec.tt.gz" --recognize=cluster --export=qa, but the exported QA differs from the QA exported directly using the GUI. As shown in the figure.
From GUI
1.png
From CLI
2.png
When exporting QA using the command, the data obtained is as shown in the figure.
3.png
Because the number of rows in the qa.txt exactly matches the number of labels in label.txt, my approach is to first calculate the average of each row in qa.txt, then match  the row-averaged qa with the labels to calculate the average qa for each label (hope I've expressed my meaning clearly). My questions are: 1. Is my approach correct? 2. If it is correct, why is the computed result different from the GUI output? Can this difference be ignored (because the difference is not significant)?"  

Best,
Wenjun
Message has been deleted

DSI Studio

unread,
Jul 18, 2024, 8:10:14 AM7/18/24
to DSI Studio
--export=qa will export qa throughout the entire tracks.
--export=stat is the averaged value

The difference is likely due to round off error when saving track coordinates.

Best regards,
Frank

Reply all
Reply to author
Forward
0 new messages