Replay issue in demo_example

Xiuxiu Zhang

unread,

Feb 2, 2026, 10:04:18 PMFeb 2

to mmc-users

Dear Dr. Fang,

I ran demo_example_replay.m on my laptop with an Apple M4 Pro chip. The initial Monte Carlo simulation runs without issue and detects ~3000 photons. However, during the replay step (newcfg), only ~4 photons are detected and the terminal reports that the replay is not successful. I did not modify the sample code. The script still generates the Jacobian figure (attached).

A coworker was able to run the same demo successfully on an NVIDIA GPU, so I was wondering whether this behavior could be related to the Apple Silicon OpenCL backend. Please let me know if there are recommended settings or known limitations for running replay on macOS.

Thank you,

Xiuxiu Zhang Screenshot 2026-02-02 at 9.51.10 PM.png

Qianqian Fang

unread,

Feb 3, 2026, 10:45:32 PMFeb 3

to mmc-...@googlegroups.com, Xiuxiu Zhang

hi Xiuxiu,

yes, this behavior is a combination of an recently enabled OpenCL optimization in MMC kernel, and differences in OpenCL thread scheduling (i.e. the orders of the execution of many threads) on different devices.

In the most recent release, v2025.10, I increased the kernel optimization level (cfg.optlevel=3, the optlevels are explained in our MCXCL paper [1] - each level enables extra optimization strategies to improve simulation speed on OpenCL, previously, the default optlevel is 1).

after this optimization, I noticed that for devices that the execution order is more stable/repeatable, such as NVIDIA GPU, the replay is still quite reliable, but for some devices, the replay seems to be sensitive to these additional optimization strategies (especially the block-based workload balance).

there is a simple workaround to this issue, you just need to set cfg.optlevel=1 to reverse to the previous setting.

below is the session I ran on an M2 Pro processor. Initially, using the new default optlevel=3, the replay failed, just like you reported, but after setting newcfg.optlevel=1, the replay is successful.

hope this explanation helps.

Qianqian

fangq@yugong:~/Downloads/mmclab/example$ sw_vers ProductName: macOS ProductVersion: 14.5 BuildVersion: 23F79 fangq@yugong:~/Downloads/mmclab/example$ sysctl -a machdep.cpu machdep.cpu.cores_per_package: 10 machdep.cpu.core_count: 10 machdep.cpu.logical_per_package: 10 machdep.cpu.thread_count: 10 machdep.cpu.brand_string: Apple M2 Pro fangq@yugong:~/Downloads/mmclab/example$ fangq@yugong:~/Downloads/mmclab/example$ matlab -nojvm < M A T L A B (R) > Copyright 1984-2024 The MathWorks, Inc. R2024a Update 3 (24.1.0.2603908) 64-bit (maca64) May 2, 2024 For online documentation, see https://www.mathworks.com/support For product information, visit www.mathworks.com. >> >> cd .. >> addpath(pwd) >> cd example/ >> demo_example_replay Launching MMCLAB - Mesh-based Monte Carlo for MATLAB & GNU Octave ... Running simulations for configuration #1 ... mmc.nphoton=1e+06; mmc.seed=1648335518; mmc.nn=29791; mmc.elem=[162000,4]; mmc.ne=162000; mmc.srcpos=[30.1 30.2 0]; mmc.srcdir=[0 0 1 0]; mmc.tstart=0; mmc.tend=5e-09; mmc.tstep=5e-09; mmc.prop=1; mmc.debuglevel='TP'; mmc.method='elem'; mmc.isreflect=0; mmc.detnum=4; mmc.issaveexit=1; mmc.issaveseed=1; mmc.facenb=[162000,4]; mmc.evol=162000; mmc.e0=27466; done 7 simulating ... ############################################################################### # Mesh-based Monte Carlo (MMC) - OpenCL # # Copyright (c) 2010-2025 Qianqian Fang <q.fang at neu.edu> # # https://mcx.space/#mmc & https://neurojson.io # # # #Computational Optics & Translational Imaging (COTI) Lab [http://fanglab.org]# # Department of Bioengineering, Northeastern University, Boston, MA, USA # ############################################################################### # The MCX Project is funded by the NIH/NIGMS under grant R01-GM114365 # ############################################################################### # Open-source codes and reusable scientific data are essential for research, # # MCX proudly developed human-readable JSON-based data formats for easy reuse.# # # #Please visit our free scientific data sharing portal at https://neurojson.io # # and consider sharing your public datasets in standardized JSON/JData format # ############################################################################### $Rev::0684c2$v2025.10$Date::2025-10-20 11:05:07 -04$ by $Author::Qianqian Fang$ ############################################################################### - variant name: [MMC-OpenCL] compiled with OpenCL version [1] - compiled with: [RNG] xorshift128+ RNG [Seed Length] 4 initializing streams ... init complete : 0 ms Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DUSE_MACRO_CONST -DMCX_SRC_PENCIL -DUSE_ATOMIC -DMCX_SAVE_DETECTORS -DMCX_SAVE_SEED -DUSE_BLBADOUEL -Dgcfgdebuglevel=2560 -Dgcfgdstep=1.0000000000e+00f -Dgcfge0=27466 -Dgcfgelemlen=4 -Dgcfgfocus=0.0000000000e+00f -Dgcfgframelen=162000 -Dgcfgisextdet=0 -Dgcfgismomentum=0 -Dgcfgisreflect=0 -Dgcfgissavedet=1 -Dgcfgissaveexit=1 -Dgcfgissaveref=0 -Dgcfgisspecular=0 -Dgcfgmaxdetphoton=1000000 -Dgcfgmaxjumpdebug=10000000 -Dgcfgmaxmedia=1 -Dgcfgmaxgate=1 -Dgcfgmaxpropdet=6 -Dgcfgmethod=3 -Dgcfgminenergy=9.9999999748e-07f -Dgcfgnormbuf=998 -Dgcfgnout=1.0000000000e+00f -Dgcfgoutputtype=0 -Dgcfgreclen=9 -Dgcfgroulettesize=1.0000000000e+01f -DgcfgRtstep=2.0000000000e+08f -Dgcfgsrcelemlen=0 -Dgcfgsrctype=0 -Dgcfgvoidtime=1 -Dgcfgseed=1648335518 -Dgcfgsrcnum=1 -Dgcfgdetnum=4 -Dgcfgissaveseed=1 -Dgcfgnf=10800 build program complete : 3 ms - [device 0(1): Apple M2 Pro] threadph=81 oddphotons=4672 np=1000000.0 nthread=12288 nblock=64 repetition=1 requesting 2816 bytes of shared memory set kernel arguments complete : 3 ms 3 lauching mcx_main_loop for time window [0.0ns 5.0ns] ... simulation run# 1 ... Progress: [==============================================================] 100% kernel complete: 1436 ms retrieving flux ... detected 2999 photons, total: 2999 transfer complete: 1453 ms source 1 total simulated energy: 1000000.000000 absorbed: 17.66706% normalizor=209.381 simulated 1000000 photons (1000000) with 1 devices (ray-tet 228338960) MCX simulation speed: 697.84 photon/ms total simulated energy: 1000000.00 absorbed: 17.66706% (loss due to initial specular reflection is excluded in the total) done 1494 Launching MMCLAB - Mesh-based Monte Carlo for MATLAB & GNU Octave ... Running simulations for configuration #1 ... mmc.nphoton=1e+06; mmc.nphoton=2999; mmc.nn=29791; mmc.elem=[162000,4]; mmc.ne=162000; mmc.srcpos=[30.1 30.2 0]; mmc.srcdir=[0 0 1 0]; mmc.tstart=0; mmc.tend=5e-09; mmc.tstep=5e-09; mmc.prop=1; mmc.debuglevel='TP'; mmc.method='elem'; mmc.isreflect=0; mmc.detnum=4; mmc.issaveexit=1; mmc.issaveseed=1; mmc.facenb=[162000,4]; mmc.evol=162000; mmc.e0=27466; mmc.replaydet=1; mmc.detphotons=[10 2999]; mmc.isnormalized=0; mmc.outputtype='wl'; done 13 simulating ... ############################################################################### # Mesh-based Monte Carlo (MMC) - OpenCL # # Copyright (c) 2010-2025 Qianqian Fang <q.fang at neu.edu> # # https://mcx.space/#mmc & https://neurojson.io # # # #Computational Optics & Translational Imaging (COTI) Lab [http://fanglab.org]# # Department of Bioengineering, Northeastern University, Boston, MA, USA # ############################################################################### # The MCX Project is funded by the NIH/NIGMS under grant R01-GM114365 # ############################################################################### # Open-source codes and reusable scientific data are essential for research, # # MCX proudly developed human-readable JSON-based data formats for easy reuse.# # # #Please visit our free scientific data sharing portal at https://neurojson.io # # and consider sharing your public datasets in standardized JSON/JData format # ############################################################################### $Rev::0684c2$v2025.10$Date::2025-10-20 11:05:07 -04$ by $Author::Qianqian Fang$ ############################################################################### - variant name: [MMC-OpenCL] compiled with OpenCL version [1] - compiled with: [RNG] xorshift128+ RNG [Seed Length] 4 initializing streams ... init complete : 0 ms Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DUSE_MACRO_CONST -DMCX_SRC_PENCIL -DUSE_ATOMIC -DMCX_SAVE_DETECTORS -DUSE_BLBADOUEL -Dgcfgdebuglevel=2560 -Dgcfgdstep=1.0000000000e+00f -Dgcfge0=27466 -Dgcfgelemlen=4 -Dgcfgfocus=0.0000000000e+00f -Dgcfgframelen=162000 -Dgcfgisextdet=0 -Dgcfgismomentum=0 -Dgcfgisreflect=0 -Dgcfgissavedet=1 -Dgcfgissaveexit=1 -Dgcfgissaveref=0 -Dgcfgisspecular=0 -Dgcfgmaxdetphoton=1000000 -Dgcfgmaxjumpdebug=10000000 -Dgcfgmaxmedia=1 -Dgcfgmaxgate=1 -Dgcfgmaxpropdet=6 -Dgcfgmethod=3 -Dgcfgminenergy=9.9999999748e-07f -Dgcfgnormbuf=998 -Dgcfgnout=1.0000000000e+00f -Dgcfgoutputtype=4 -Dgcfgreclen=9 -Dgcfgroulettesize=1.0000000000e+01f -DgcfgRtstep=2.0000000000e+08f -Dgcfgsrcelemlen=0 -Dgcfgsrctype=0 -Dgcfgvoidtime=1 -Dgcfgseed=4294966297 -Dgcfgsrcnum=1 -Dgcfgdetnum=4 -Dgcfgissaveseed=0 -Dgcfgnf=10800 build program complete : 3 ms - [device 0(1): Apple M2 Pro] threadph=0 oddphotons=702 np=702.0 nthread=12288 nblock=64 repetition=1 requesting 2816 bytes of shared memory set kernel arguments complete : 3 ms 3 lauching mcx_main_loop for time window [0.0ns 5.0ns] ... simulation run# 1 ... Progress: [==============================================================] 100% kernel complete: 234 ms retrieving flux ... detected 2 photons, total: 2 transfer complete: 261 ms simulated 702 photons (702) with 1 devices (ray-tet 175748) MCX simulation speed: 3.04 photon/ms total simulated energy: 702.00 absorbed: 19.15085% (loss due to initial specular reflection is excluded in the total) done 294 replay failed :-( Error using figure This functionality is not supported under the -nojvm startup option. Error in demo_example_replay (line 66) figure; >> >> newcfg.optlevel=1; >> >> [cube3, detp3] = mmclab(newcfg); Launching MMCLAB - Mesh-based Monte Carlo for MATLAB & GNU Octave ... Running simulations for configuration #1 ... mmc.nphoton=1e+06; mmc.nphoton=2999; mmc.nn=29791; mmc.elem=[162000,4]; mmc.ne=162000; mmc.srcpos=[30.1 30.2 0]; mmc.srcdir=[0 0 1 0]; mmc.tstart=0; mmc.tend=5e-09; mmc.tstep=5e-09; mmc.prop=1; mmc.debuglevel='TP'; mmc.method='elem'; mmc.isreflect=0; mmc.detnum=4; mmc.issaveexit=1; mmc.issaveseed=1; mmc.facenb=[162000,4]; mmc.evol=162000; mmc.e0=27466; mmc.replaydet=1; mmc.detphotons=[10 2999]; mmc.isnormalized=0; mmc.outputtype='wl'; mmc.optlevel=1; done 17 simulating ... ############################################################################### # Mesh-based Monte Carlo (MMC) - OpenCL # # Copyright (c) 2010-2025 Qianqian Fang <q.fang at neu.edu> # # https://mcx.space/#mmc & https://neurojson.io # # # #Computational Optics & Translational Imaging (COTI) Lab [http://fanglab.org]# # Department of Bioengineering, Northeastern University, Boston, MA, USA # ############################################################################### # The MCX Project is funded by the NIH/NIGMS under grant R01-GM114365 # ############################################################################### # Open-source codes and reusable scientific data are essential for research, # # MCX proudly developed human-readable JSON-based data formats for easy reuse.# # # #Please visit our free scientific data sharing portal at https://neurojson.io # # and consider sharing your public datasets in standardized JSON/JData format # ############################################################################### $Rev::0684c2$v2025.10$Date::2025-10-20 11:05:07 -04$ by $Author::Qianqian Fang$ ############################################################################### - variant name: [MMC-OpenCL] compiled with OpenCL version [1] - compiled with: [RNG] xorshift128+ RNG [Seed Length] 4 initializing streams ... init complete : 0 ms Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SRC_PENCIL -DUSE_ATOMIC -DMCX_SAVE_DETECTORS -DUSE_BLBADOUEL build program complete : 4 ms - [device 0(1): Apple M2 Pro] threadph=0 oddphotons=702 np=702.0 nthread=12288 nblock=64 repetition=1 requesting 2816 bytes of shared memory set kernel arguments complete : 4 ms 3 lauching mcx_main_loop for time window [0.0ns 5.0ns] ... simulation run# 1 ... Progress: [==============================================================] 100% kernel complete: 236 ms retrieving flux ... detected 702 photons, total: 702 transfer complete: 263 ms simulated 702 photons (702) with 1 devices (ray-tet 243296) MCX simulation speed: 3.03 photon/ms total simulated energy: 702.00 absorbed: 35.66643% (loss due to initial specular reflection is excluded in the total) done 298

On 2/2/26 22:04, Xiuxiu Zhang wrote:

Dear Dr. Fang,

I ran demo_example_replay.m on my laptop with an Apple M4 Pro chip. The initial Monte Carlo simulation runs without issue and detects ~3000 photons. However, during the replay step (newcfg), only ~4 photons are detected and the terminal reports that the replay is not successful. I did not modify the sample code. The script still generates the Jacobian figure (attached).

A coworker was able to run the same demo successfully on an NVIDIA GPU, so I was wondering whether this behavior could be related to the Apple Silicon OpenCL backend. Please let me know if there are recommended settings or known limitations for running replay on macOS.

Thank you,

Xiuxiu Zhang

--
You received this message because you are subscribed to the Google Groups "mmc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mmc-users+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/mmc-users/dc1625c5-874f-420d-b459-83e17a960b45n%40googlegroups.com.

Qianqian Fang

unread,

Feb 3, 2026, 10:52:12 PMFeb 3

to mmc-...@googlegroups.com, Xiuxiu Zhang

sorry, forgot to provide the MCXCL paper link

[1] Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang*, "Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms," J. Biomed. Opt. 23(1), 010504 (2018).

https://doi.org/10.1117/1.JBO.23.1.010504

a quick way to tell whether you are using optlevel=3 or 1 is to see the printed log, particularly the line starting with "Building kernel with option", if you see a long line with many macros, you are using optlevel=3

Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DUSE_MACRO_CONST -DMCX_SRC_PENCIL -DUSE_ATOMIC -DMCX_SAVE_DETECTORS -DMCX_SAVE_SEED -DUSE_BLBADOUEL -Dgcfgdebuglevel=2560 -Dgcfgdstep=1.0000000000e+00f -Dgcfge0=27466 -Dgcfgelemlen=4 -Dgcfgfocus=0.0000000000e+00f -Dgcfgframelen=162000 -Dgcfgisextdet=0 -Dgcfgismomentum=0 -Dgcfgisreflect=0 -Dgcfgissavedet=1 -Dgcfgissaveexit=1 -Dgcfgissaveref=0 -Dgcfgisspecular=0 -Dgcfgmaxdetphoton=1000000 -Dgcfgmaxjumpdebug=10000000 -Dgcfgmaxmedia=1 -Dgcfgmaxgate=1 -Dgcfgmaxpropdet=6 -Dgcfgmethod=3 -Dgcfgminenergy=9.9999999748e-07f -Dgcfgnormbuf=998 -Dgcfgnout=1.0000000000e+00f -Dgcfgoutputtype=0 -Dgcfgreclen=9 -Dgcfgroulettesize=1.0000000000e+01f -DgcfgRtstep=2.0000000000e+08f -Dgcfgsrcelemlen=0 -Dgcfgsrctype=0 -Dgcfgvoidtime=1 -Dgcfgseed=1648335518 -Dgcfgsrcnum=1 -Dgcfgdetnum=4 -Dgcfgissaveseed=1 -Dgcfgnf=10800

if you use optlevel=1, you will see only a much shorter list of options

 Building kernel with option:
        -cl-mad-enable -DMCX_USE_NATIVE -DMCX_SRC_PENCIL   -DUSE_ATOMIC
        -DMCX_SAVE_DETECTORS -DUSE_BLBADOUEL

Xiuxiu Zhang

unread,

Feb 4, 2026, 1:57:22 PMFeb 4

to mmc-users

Thanks so much for the guidance, Dr. Fang. The replay works on my end now.

One other thing I noticed in the demo replay code: https://github.com/fangq/mmc/blob/master/mmclab/example/demo_example_replay.m--------

Before the reply success check in line 59, you might want to also add: detp.data = detp.data(:, detp.data(1, :) == newcfg.replaydet) to make sure its only check detp.data for detector #1. You had this line for wp replay but not for wl replay. Let me know if I am understanding the code incorrectly.

Replay issue in demo_example_replay.m

Xiuxiu Zhang

Qianqian Fang

Qianqian Fang

Xiuxiu Zhang