Replay issue in demo_example_replay.m

9 views
Skip to first unread message

Xiuxiu Zhang

unread,
Feb 2, 2026, 10:04:18 PM (12 days ago) Feb 2
to mmc-users
Dear Dr. Fang, 

I ran demo_example_replay.m on my laptop with an Apple M4 Pro chip. The initial Monte Carlo simulation runs without issue and detects ~3000 photons. However, during the replay step (newcfg), only ~4 photons are detected and the terminal reports that the replay is not successful. I did not modify the sample code. The script still generates the Jacobian figure (attached).

A coworker was able to run the same demo successfully on an NVIDIA GPU, so I was wondering whether this behavior could be related to the Apple Silicon OpenCL backend. Please let me know if there are recommended settings or known limitations for running replay on macOS.

Thank you,
Xiuxiu ZhangScreenshot 2026-02-02 at 9.51.10 PM.pngScreenshot 2026-02-02 at 9.41.23 PM.png

Qianqian Fang

unread,
Feb 3, 2026, 10:45:32 PM (11 days ago) Feb 3
to mmc-...@googlegroups.com, Xiuxiu Zhang

hi Xiuxiu,

yes, this behavior is a combination of an recently enabled OpenCL optimization in MMC kernel, and differences in OpenCL thread scheduling (i.e. the orders of the execution of many threads) on different devices.

In the most recent release, v2025.10, I increased the kernel optimization level (cfg.optlevel=3, the optlevels are explained in our MCXCL paper [1] - each level enables extra optimization strategies to improve simulation speed on OpenCL, previously, the default optlevel is 1).

after this optimization, I noticed that for devices that the execution order is more stable/repeatable, such as NVIDIA GPU, the replay is still quite reliable, but for some devices, the replay seems to be sensitive to these additional optimization strategies (especially the block-based workload balance).


there is a simple workaround to this issue, you just need to set cfg.optlevel=1 to reverse to the previous setting.


below is the session I ran on an M2 Pro processor. Initially, using the new default optlevel=3, the replay failed, just like you reported, but after setting newcfg.optlevel=1, the replay is successful.


hope this explanation helps.

Qianqian


fangq@yugong:~/Downloads/mmclab/example$ sw_vers 
ProductName:        macOS
ProductVersion:        14.5
BuildVersion:        23F79
fangq@yugong:~/Downloads/mmclab/example$ sysctl -a machdep.cpu
machdep.cpu.cores_per_package: 10
machdep.cpu.core_count: 10
machdep.cpu.logical_per_package: 10
machdep.cpu.thread_count: 10
machdep.cpu.brand_string: Apple M2 Pro
fangq@yugong:~/Downloads/mmclab/example$ 
fangq@yugong:~/Downloads/mmclab/example$ matlab -nojvm


                                                     < M A T L A B (R) >
                                           Copyright 1984-2024 The MathWorks, Inc.
                                       R2024a Update 3 (24.1.0.2603908) 64-bit (maca64)
                                                         May 2, 2024

 
For online documentation, see https://www.mathworks.com/support
For product information, visit www.mathworks.com.
 
>> 
>> cd ..
>> addpath(pwd)
>> cd example/
>> demo_example_replay             
Launching MMCLAB - Mesh-based Monte Carlo for MATLAB & GNU Octave ...
Running simulations for configuration #1 ...
mmc.nphoton=1e+06;
mmc.seed=1648335518;
mmc.nn=29791;
mmc.elem=[162000,4];
mmc.ne=162000;
mmc.srcpos=[30.1 30.2 0];
mmc.srcdir=[0 0 1 0];
mmc.tstart=0;
mmc.tend=5e-09;
mmc.tstep=5e-09;
mmc.prop=1;
mmc.debuglevel='TP';
mmc.method='elem';
mmc.isreflect=0;
mmc.detnum=4;
mmc.issaveexit=1;
mmc.issaveseed=1;
mmc.facenb=[162000,4];
mmc.evol=162000;
mmc.e0=27466;
    done    7
simulating ... 
###############################################################################
#                     Mesh-based Monte Carlo (MMC) - OpenCL                   #
#          Copyright (c) 2010-2025 Qianqian Fang <q.fang at neu.edu>          #
#              https://mcx.space/#mmc  &  https://neurojson.io                #
#                                                                             #
#Computational Optics & Translational Imaging (COTI) Lab  [http://fanglab.org]#
#   Department of Bioengineering, Northeastern University, Boston, MA, USA    #
###############################################################################
#    The MCX Project is funded by the NIH/NIGMS under grant R01-GM114365      #
###############################################################################
#  Open-source codes and reusable scientific data are essential for research, #
# MCX proudly developed human-readable JSON-based data formats for easy reuse.#
#                                                                             #
#Please visit our free scientific data sharing portal at https://neurojson.io #
# and consider sharing your public datasets in standardized JSON/JData format #
###############################################################################
$Rev::0684c2$v2025.10$Date::2025-10-20 11:05:07 -04$ by $Author::Qianqian Fang$
###############################################################################
- variant name: [MMC-OpenCL] compiled with OpenCL version [1]
- compiled with: [RNG] xorshift128+ RNG [Seed Length] 4
initializing streams ...    init complete : 0 ms
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DUSE_MACRO_CONST -DMCX_SRC_PENCIL   -DUSE_ATOMIC -DMCX_SAVE_DETECTORS -DMCX_SAVE_SEED -DUSE_BLBADOUEL -Dgcfgdebuglevel=2560  -Dgcfgdstep=1.0000000000e+00f  -Dgcfge0=27466  -Dgcfgelemlen=4  -Dgcfgfocus=0.0000000000e+00f  -Dgcfgframelen=162000  -Dgcfgisextdet=0  -Dgcfgismomentum=0  -Dgcfgisreflect=0  -Dgcfgissavedet=1  -Dgcfgissaveexit=1  -Dgcfgissaveref=0  -Dgcfgisspecular=0  -Dgcfgmaxdetphoton=1000000  -Dgcfgmaxjumpdebug=10000000  -Dgcfgmaxmedia=1  -Dgcfgmaxgate=1  -Dgcfgmaxpropdet=6  -Dgcfgmethod=3  -Dgcfgminenergy=9.9999999748e-07f  -Dgcfgnormbuf=998  -Dgcfgnout=1.0000000000e+00f  -Dgcfgoutputtype=0  -Dgcfgreclen=9  -Dgcfgroulettesize=1.0000000000e+01f  -DgcfgRtstep=2.0000000000e+08f  -Dgcfgsrcelemlen=0  -Dgcfgsrctype=0  -Dgcfgvoidtime=1  -Dgcfgseed=1648335518  -Dgcfgsrcnum=1  -Dgcfgdetnum=4  -Dgcfgissaveseed=1  -Dgcfgnf=10800 
build program complete : 3 ms
- [device 0(1): Apple M2 Pro] threadph=81 oddphotons=4672 np=1000000.0 nthread=12288 nblock=64 repetition=1
requesting 2816 bytes of shared memory
set kernel arguments complete : 3 ms 3
lauching mcx_main_loop for time window [0.0ns 5.0ns] ...
simulation run# 1 ... 
Progress: [==============================================================] 100%
kernel complete:      1436 ms
retrieving flux ...     detected 2999 photons, total: 2999    transfer complete:        1453 ms
source 1    total simulated energy: 1000000.000000    absorbed: 17.66706%    normalizor=209.381
simulated 1000000 photons (1000000) with 1 devices (ray-tet 228338960)
MCX simulation speed: 697.84 photon/ms
total simulated energy: 1000000.00    absorbed: 17.66706%
(loss due to initial specular reflection is excluded in the total)
    done    1494
Launching MMCLAB - Mesh-based Monte Carlo for MATLAB & GNU Octave ...
Running simulations for configuration #1 ...
mmc.nphoton=1e+06;
mmc.nphoton=2999;
mmc.nn=29791;
mmc.elem=[162000,4];
mmc.ne=162000;
mmc.srcpos=[30.1 30.2 0];
mmc.srcdir=[0 0 1 0];
mmc.tstart=0;
mmc.tend=5e-09;
mmc.tstep=5e-09;
mmc.prop=1;
mmc.debuglevel='TP';
mmc.method='elem';
mmc.isreflect=0;
mmc.detnum=4;
mmc.issaveexit=1;
mmc.issaveseed=1;
mmc.facenb=[162000,4];
mmc.evol=162000;
mmc.e0=27466;
mmc.replaydet=1;
mmc.detphotons=[10 2999];
mmc.isnormalized=0;
mmc.outputtype='wl';
    done    13
simulating ... 
###############################################################################
#                     Mesh-based Monte Carlo (MMC) - OpenCL                   #
#          Copyright (c) 2010-2025 Qianqian Fang <q.fang at neu.edu>          #
#              https://mcx.space/#mmc  &  https://neurojson.io                #
#                                                                             #
#Computational Optics & Translational Imaging (COTI) Lab  [http://fanglab.org]#
#   Department of Bioengineering, Northeastern University, Boston, MA, USA    #
###############################################################################
#    The MCX Project is funded by the NIH/NIGMS under grant R01-GM114365      #
###############################################################################
#  Open-source codes and reusable scientific data are essential for research, #
# MCX proudly developed human-readable JSON-based data formats for easy reuse.#
#                                                                             #
#Please visit our free scientific data sharing portal at https://neurojson.io #
# and consider sharing your public datasets in standardized JSON/JData format #
###############################################################################
$Rev::0684c2$v2025.10$Date::2025-10-20 11:05:07 -04$ by $Author::Qianqian Fang$
###############################################################################
- variant name: [MMC-OpenCL] compiled with OpenCL version [1]
- compiled with: [RNG] xorshift128+ RNG [Seed Length] 4
initializing streams ...    init complete : 0 ms
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DUSE_MACRO_CONST -DMCX_SRC_PENCIL   -DUSE_ATOMIC -DMCX_SAVE_DETECTORS -DUSE_BLBADOUEL -Dgcfgdebuglevel=2560  -Dgcfgdstep=1.0000000000e+00f  -Dgcfge0=27466  -Dgcfgelemlen=4  -Dgcfgfocus=0.0000000000e+00f  -Dgcfgframelen=162000  -Dgcfgisextdet=0  -Dgcfgismomentum=0  -Dgcfgisreflect=0  -Dgcfgissavedet=1  -Dgcfgissaveexit=1  -Dgcfgissaveref=0  -Dgcfgisspecular=0  -Dgcfgmaxdetphoton=1000000  -Dgcfgmaxjumpdebug=10000000  -Dgcfgmaxmedia=1  -Dgcfgmaxgate=1  -Dgcfgmaxpropdet=6  -Dgcfgmethod=3  -Dgcfgminenergy=9.9999999748e-07f  -Dgcfgnormbuf=998  -Dgcfgnout=1.0000000000e+00f  -Dgcfgoutputtype=4  -Dgcfgreclen=9  -Dgcfgroulettesize=1.0000000000e+01f  -DgcfgRtstep=2.0000000000e+08f  -Dgcfgsrcelemlen=0  -Dgcfgsrctype=0  -Dgcfgvoidtime=1  -Dgcfgseed=4294966297  -Dgcfgsrcnum=1  -Dgcfgdetnum=4  -Dgcfgissaveseed=0  -Dgcfgnf=10800 
build program complete : 3 ms
- [device 0(1): Apple M2 Pro] threadph=0 oddphotons=702 np=702.0 nthread=12288 nblock=64 repetition=1
requesting 2816 bytes of shared memory
set kernel arguments complete : 3 ms 3
lauching mcx_main_loop for time window [0.0ns 5.0ns] ...
simulation run# 1 ... 
Progress: [==============================================================] 100%
kernel complete:      234 ms
retrieving flux ...     detected 2 photons, total: 2    transfer complete:        261 ms
simulated 702 photons (702) with 1 devices (ray-tet 175748)
MCX simulation speed: 3.04 photon/ms
total simulated energy: 702.00    absorbed: 19.15085%
(loss due to initial specular reflection is excluded in the total)
    done    294
replay failed :-(
Error using figure
This functionality is not supported under the -nojvm startup option.

Error in demo_example_replay (line 66)
figure;
 
>> 
>> newcfg.optlevel=1;              
>> 
>> [cube3, detp3] = mmclab(newcfg);
Launching MMCLAB - Mesh-based Monte Carlo for MATLAB & GNU Octave ...
Running simulations for configuration #1 ...
mmc.nphoton=1e+06;
mmc.nphoton=2999;
mmc.nn=29791;
mmc.elem=[162000,4];
mmc.ne=162000;
mmc.srcpos=[30.1 30.2 0];
mmc.srcdir=[0 0 1 0];
mmc.tstart=0;
mmc.tend=5e-09;
mmc.tstep=5e-09;
mmc.prop=1;
mmc.debuglevel='TP';
mmc.method='elem';
mmc.isreflect=0;
mmc.detnum=4;
mmc.issaveexit=1;
mmc.issaveseed=1;
mmc.facenb=[162000,4];
mmc.evol=162000;
mmc.e0=27466;
mmc.replaydet=1;
mmc.detphotons=[10 2999];
mmc.isnormalized=0;
mmc.outputtype='wl';
mmc.optlevel=1;
    done    17
simulating ... 
###############################################################################
#                     Mesh-based Monte Carlo (MMC) - OpenCL                   #
#          Copyright (c) 2010-2025 Qianqian Fang <q.fang at neu.edu>          #
#              https://mcx.space/#mmc  &  https://neurojson.io                #
#                                                                             #
#Computational Optics & Translational Imaging (COTI) Lab  [http://fanglab.org]#
#   Department of Bioengineering, Northeastern University, Boston, MA, USA    #
###############################################################################
#    The MCX Project is funded by the NIH/NIGMS under grant R01-GM114365      #
###############################################################################
#  Open-source codes and reusable scientific data are essential for research, #
# MCX proudly developed human-readable JSON-based data formats for easy reuse.#
#                                                                             #
#Please visit our free scientific data sharing portal at https://neurojson.io #
# and consider sharing your public datasets in standardized JSON/JData format #
###############################################################################
$Rev::0684c2$v2025.10$Date::2025-10-20 11:05:07 -04$ by $Author::Qianqian Fang$
###############################################################################
- variant name: [MMC-OpenCL] compiled with OpenCL version [1]
- compiled with: [RNG] xorshift128+ RNG [Seed Length] 4
initializing streams ...    init complete : 0 ms
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SRC_PENCIL   -DUSE_ATOMIC -DMCX_SAVE_DETECTORS -DUSE_BLBADOUEL
build program complete : 4 ms
- [device 0(1): Apple M2 Pro] threadph=0 oddphotons=702 np=702.0 nthread=12288 nblock=64 repetition=1
requesting 2816 bytes of shared memory
set kernel arguments complete : 4 ms 3
lauching mcx_main_loop for time window [0.0ns 5.0ns] ...
simulation run# 1 ... 
Progress: [==============================================================] 100%
kernel complete:      236 ms
retrieving flux ...     detected 702 photons, total: 702    transfer complete:        263 ms
simulated 702 photons (702) with 1 devices (ray-tet 243296)
MCX simulation speed: 3.03 photon/ms
total simulated energy: 702.00    absorbed: 35.66643%
(loss due to initial specular reflection is excluded in the total)
    done    298

On 2/2/26 22:04, Xiuxiu Zhang wrote:
Dear Dr. Fang, 

I ran demo_example_replay.m on my laptop with an Apple M4 Pro chip. The initial Monte Carlo simulation runs without issue and detects ~3000 photons. However, during the replay step (newcfg), only ~4 photons are detected and the terminal reports that the replay is not successful. I did not modify the sample code. The script still generates the Jacobian figure (attached).

A coworker was able to run the same demo successfully on an NVIDIA GPU, so I was wondering whether this behavior could be related to the Apple Silicon OpenCL backend. Please let me know if there are recommended settings or known limitations for running replay on macOS.

Thank you,
Xiuxiu Zhang

--
You received this message because you are subscribed to the Google Groups "mmc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mmc-users+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/mmc-users/dc1625c5-874f-420d-b459-83e17a960b45n%40googlegroups.com.

Qianqian Fang

unread,
Feb 3, 2026, 10:52:12 PM (11 days ago) Feb 3
to mmc-...@googlegroups.com, Xiuxiu Zhang

sorry, forgot to provide the MCXCL paper link

[1] Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang*, "Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms," J. Biomed. Opt. 23(1), 010504 (2018).

https://doi.org/10.1117/1.JBO.23.1.010504


a quick way to tell whether you are using optlevel=3 or 1 is to see the printed log, particularly the line starting with "Building kernel with option", if you see a long line with many macros, you are using optlevel=3


Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DUSE_MACRO_CONST -DMCX_SRC_PENCIL   -DUSE_ATOMIC -DMCX_SAVE_DETECTORS -DMCX_SAVE_SEED -DUSE_BLBADOUEL -Dgcfgdebuglevel=2560  -Dgcfgdstep=1.0000000000e+00f  -Dgcfge0=27466  -Dgcfgelemlen=4  -Dgcfgfocus=0.0000000000e+00f  -Dgcfgframelen=162000  -Dgcfgisextdet=0  -Dgcfgismomentum=0  -Dgcfgisreflect=0  -Dgcfgissavedet=1  -Dgcfgissaveexit=1  -Dgcfgissaveref=0  -Dgcfgisspecular=0  -Dgcfgmaxdetphoton=1000000  -Dgcfgmaxjumpdebug=10000000  -Dgcfgmaxmedia=1  -Dgcfgmaxgate=1  -Dgcfgmaxpropdet=6  -Dgcfgmethod=3  -Dgcfgminenergy=9.9999999748e-07f  -Dgcfgnormbuf=998  -Dgcfgnout=1.0000000000e+00f  -Dgcfgoutputtype=0  -Dgcfgreclen=9  -Dgcfgroulettesize=1.0000000000e+01f  -DgcfgRtstep=2.0000000000e+08f  -Dgcfgsrcelemlen=0  -Dgcfgsrctype=0  -Dgcfgvoidtime=1  -Dgcfgseed=1648335518  -Dgcfgsrcnum=1  -Dgcfgdetnum=4  -Dgcfgissaveseed=1  -Dgcfgnf=10800 


if you use optlevel=1, you will see only a much shorter list of options

Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SRC_PENCIL   -DUSE_ATOMIC -DMCX_SAVE_DETECTORS -DUSE_BLBADOUEL


Xiuxiu Zhang

unread,
Feb 4, 2026, 1:57:22 PM (11 days ago) Feb 4
to mmc-users
Thanks so much for the guidance, Dr. Fang. The replay works on my end now. 

One other thing I noticed in the demo replay code: https://github.com/fangq/mmc/blob/master/mmclab/example/demo_example_replay.m--------
Before the reply success check in line 59, you might want to also add:  detp.data = detp.data(:, detp.data(1, :) == newcfg.replaydet) to make sure its only check detp.data for detector #1. You had this line for wp replay but not for wl replay. Let me know if I am understanding the code incorrectly.

Best,
Xiuxiu
Reply all
Reply to author
Forward
0 new messages