

hi Xiuxiu,
yes, this behavior is a combination of an recently enabled OpenCL optimization in MMC kernel, and differences in OpenCL thread scheduling (i.e. the orders of the execution of many threads) on different devices.
In the most recent release, v2025.10, I increased the kernel optimization level (cfg.optlevel=3, the optlevels are explained in our MCXCL paper [1] - each level enables extra optimization strategies to improve simulation speed on OpenCL, previously, the default optlevel is 1).
after this optimization, I noticed that for devices that the execution order is more stable/repeatable, such as NVIDIA GPU, the replay is still quite reliable, but for some devices, the replay seems to be sensitive to these additional optimization strategies (especially the block-based workload balance).
there is a simple workaround to this issue, you just need to set cfg.optlevel=1 to reverse to the previous setting.
below is the session I ran on an M2 Pro processor. Initially, using the new default optlevel=3, the replay failed, just like you reported, but after setting newcfg.optlevel=1, the replay is successful.
hope this explanation helps.
Qianqian
fangq@yugong:~/Downloads/mmclab/example$ sw_vers
ProductName: macOS
ProductVersion: 14.5
BuildVersion: 23F79
fangq@yugong:~/Downloads/mmclab/example$ sysctl -a
machdep.cpu
machdep.cpu.cores_per_package: 10
machdep.cpu.core_count: 10
machdep.cpu.logical_per_package: 10
machdep.cpu.thread_count: 10
machdep.cpu.brand_string: Apple M2 Pro
fangq@yugong:~/Downloads/mmclab/example$
fangq@yugong:~/Downloads/mmclab/example$ matlab -nojvm
< M A T
L A B (R) >
Copyright 1984-2024
The MathWorks, Inc.
R2024a Update 3
(24.1.0.2603908) 64-bit (maca64)
May 2,
2024
For online documentation, see https://www.mathworks.com/support
For product information, visit www.mathworks.com.
>>
>> cd ..
>> addpath(pwd)
>> cd example/
>> demo_example_replay
Launching MMCLAB - Mesh-based Monte Carlo for MATLAB & GNU
Octave ...
Running simulations for configuration #1 ...
mmc.nphoton=1e+06;
mmc.seed=1648335518;
mmc.nn=29791;
mmc.elem=[162000,4];
mmc.ne=162000;
mmc.srcpos=[30.1 30.2 0];
mmc.srcdir=[0 0 1 0];
mmc.tstart=0;
mmc.tend=5e-09;
mmc.tstep=5e-09;
mmc.prop=1;
mmc.debuglevel='TP';
mmc.method='elem';
mmc.isreflect=0;
mmc.detnum=4;
mmc.issaveexit=1;
mmc.issaveseed=1;
mmc.facenb=[162000,4];
mmc.evol=162000;
mmc.e0=27466;
done 7
simulating ...
###############################################################################
# Mesh-based Monte Carlo (MMC) - OpenCL
#
# Copyright (c) 2010-2025 Qianqian Fang <q.fang at
neu.edu> #
# https://mcx.space/#mmc &
https://neurojson.io #
#
#
#Computational Optics & Translational Imaging (COTI) Lab
[http://fanglab.org]#
# Department of Bioengineering, Northeastern University,
Boston, MA, USA #
###############################################################################
# The MCX Project is funded by the NIH/NIGMS under grant
R01-GM114365 #
###############################################################################
# Open-source codes and reusable scientific data are essential
for research, #
# MCX proudly developed human-readable JSON-based data formats
for easy reuse.#
#
#
#Please visit our free scientific data sharing portal at
https://neurojson.io #
# and consider sharing your public datasets in standardized
JSON/JData format #
###############################################################################
$Rev::0684c2$v2025.10$Date::2025-10-20 11:05:07 -04$ by
$Author::Qianqian Fang$
###############################################################################
- variant name: [MMC-OpenCL] compiled with OpenCL version [1]
- compiled with: [RNG] xorshift128+ RNG [Seed Length] 4
initializing streams ... init complete : 0 ms
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE
-DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DUSE_MACRO_CONST
-DMCX_SRC_PENCIL -DUSE_ATOMIC -DMCX_SAVE_DETECTORS
-DMCX_SAVE_SEED -DUSE_BLBADOUEL -Dgcfgdebuglevel=2560
-Dgcfgdstep=1.0000000000e+00f -Dgcfge0=27466 -Dgcfgelemlen=4
-Dgcfgfocus=0.0000000000e+00f -Dgcfgframelen=162000
-Dgcfgisextdet=0 -Dgcfgismomentum=0 -Dgcfgisreflect=0
-Dgcfgissavedet=1 -Dgcfgissaveexit=1 -Dgcfgissaveref=0
-Dgcfgisspecular=0 -Dgcfgmaxdetphoton=1000000
-Dgcfgmaxjumpdebug=10000000 -Dgcfgmaxmedia=1 -Dgcfgmaxgate=1
-Dgcfgmaxpropdet=6 -Dgcfgmethod=3
-Dgcfgminenergy=9.9999999748e-07f -Dgcfgnormbuf=998
-Dgcfgnout=1.0000000000e+00f -Dgcfgoutputtype=0
-Dgcfgreclen=9 -Dgcfgroulettesize=1.0000000000e+01f
-DgcfgRtstep=2.0000000000e+08f -Dgcfgsrcelemlen=0
-Dgcfgsrctype=0 -Dgcfgvoidtime=1 -Dgcfgseed=1648335518
-Dgcfgsrcnum=1 -Dgcfgdetnum=4 -Dgcfgissaveseed=1
-Dgcfgnf=10800
build program complete : 3 ms
- [device 0(1): Apple M2 Pro] threadph=81 oddphotons=4672
np=1000000.0 nthread=12288 nblock=64 repetition=1
requesting 2816 bytes of shared memory
set kernel arguments complete : 3 ms 3
lauching mcx_main_loop for time window [0.0ns 5.0ns] ...
simulation run# 1 ...
Progress:
[==============================================================]
100%
kernel complete: 1436 ms
retrieving flux ... detected 2999 photons, total: 2999
transfer complete: 1453 ms
source 1 total simulated energy: 1000000.000000 absorbed:
17.66706% normalizor=209.381
simulated 1000000 photons (1000000) with 1 devices (ray-tet
228338960)
MCX simulation speed: 697.84 photon/ms
total simulated energy: 1000000.00 absorbed: 17.66706%
(loss due to initial specular reflection is excluded in the
total)
done 1494
Launching MMCLAB - Mesh-based Monte Carlo for MATLAB & GNU
Octave ...
Running simulations for configuration #1 ...
mmc.nphoton=1e+06;
mmc.nphoton=2999;
mmc.nn=29791;
mmc.elem=[162000,4];
mmc.ne=162000;
mmc.srcpos=[30.1 30.2 0];
mmc.srcdir=[0 0 1 0];
mmc.tstart=0;
mmc.tend=5e-09;
mmc.tstep=5e-09;
mmc.prop=1;
mmc.debuglevel='TP';
mmc.method='elem';
mmc.isreflect=0;
mmc.detnum=4;
mmc.issaveexit=1;
mmc.issaveseed=1;
mmc.facenb=[162000,4];
mmc.evol=162000;
mmc.e0=27466;
mmc.replaydet=1;
mmc.detphotons=[10 2999];
mmc.isnormalized=0;
mmc.outputtype='wl';
done 13
simulating ...
###############################################################################
# Mesh-based Monte Carlo (MMC) - OpenCL
#
# Copyright (c) 2010-2025 Qianqian Fang <q.fang at
neu.edu> #
# https://mcx.space/#mmc &
https://neurojson.io #
#
#
#Computational Optics & Translational Imaging (COTI) Lab
[http://fanglab.org]#
# Department of Bioengineering, Northeastern University,
Boston, MA, USA #
###############################################################################
# The MCX Project is funded by the NIH/NIGMS under grant
R01-GM114365 #
###############################################################################
# Open-source codes and reusable scientific data are essential
for research, #
# MCX proudly developed human-readable JSON-based data formats
for easy reuse.#
#
#
#Please visit our free scientific data sharing portal at
https://neurojson.io #
# and consider sharing your public datasets in standardized
JSON/JData format #
###############################################################################
$Rev::0684c2$v2025.10$Date::2025-10-20 11:05:07 -04$ by
$Author::Qianqian Fang$
###############################################################################
- variant name: [MMC-OpenCL] compiled with OpenCL version [1]
- compiled with: [RNG] xorshift128+ RNG [Seed Length] 4
initializing streams ... init complete : 0 ms
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE
-DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX -DUSE_MACRO_CONST
-DMCX_SRC_PENCIL -DUSE_ATOMIC -DMCX_SAVE_DETECTORS
-DUSE_BLBADOUEL -Dgcfgdebuglevel=2560
-Dgcfgdstep=1.0000000000e+00f -Dgcfge0=27466 -Dgcfgelemlen=4
-Dgcfgfocus=0.0000000000e+00f -Dgcfgframelen=162000
-Dgcfgisextdet=0 -Dgcfgismomentum=0 -Dgcfgisreflect=0
-Dgcfgissavedet=1 -Dgcfgissaveexit=1 -Dgcfgissaveref=0
-Dgcfgisspecular=0 -Dgcfgmaxdetphoton=1000000
-Dgcfgmaxjumpdebug=10000000 -Dgcfgmaxmedia=1 -Dgcfgmaxgate=1
-Dgcfgmaxpropdet=6 -Dgcfgmethod=3
-Dgcfgminenergy=9.9999999748e-07f -Dgcfgnormbuf=998
-Dgcfgnout=1.0000000000e+00f -Dgcfgoutputtype=4
-Dgcfgreclen=9 -Dgcfgroulettesize=1.0000000000e+01f
-DgcfgRtstep=2.0000000000e+08f -Dgcfgsrcelemlen=0
-Dgcfgsrctype=0 -Dgcfgvoidtime=1 -Dgcfgseed=4294966297
-Dgcfgsrcnum=1 -Dgcfgdetnum=4 -Dgcfgissaveseed=0
-Dgcfgnf=10800
build program complete : 3 ms
- [device 0(1): Apple M2 Pro] threadph=0 oddphotons=702 np=702.0
nthread=12288 nblock=64 repetition=1
requesting 2816 bytes of shared memory
set kernel arguments complete : 3 ms 3
lauching mcx_main_loop for time window [0.0ns 5.0ns] ...
simulation run# 1 ...
Progress:
[==============================================================]
100%
kernel complete: 234 ms
retrieving flux ... detected 2 photons, total: 2 transfer
complete: 261 ms
simulated 702 photons (702) with 1 devices (ray-tet 175748)
MCX simulation speed: 3.04 photon/ms
total simulated energy: 702.00 absorbed: 19.15085%
(loss due to initial specular reflection is excluded in the
total)
done 294
replay failed :-(
Error using figure
This functionality is not supported under the -nojvm startup
option.
Error in demo_example_replay (line 66)
figure;
>>
>> newcfg.optlevel=1;
>>
>> [cube3, detp3] = mmclab(newcfg);
Launching MMCLAB - Mesh-based Monte Carlo for MATLAB & GNU
Octave ...
Running simulations for configuration #1 ...
mmc.nphoton=1e+06;
mmc.nphoton=2999;
mmc.nn=29791;
mmc.elem=[162000,4];
mmc.ne=162000;
mmc.srcpos=[30.1 30.2 0];
mmc.srcdir=[0 0 1 0];
mmc.tstart=0;
mmc.tend=5e-09;
mmc.tstep=5e-09;
mmc.prop=1;
mmc.debuglevel='TP';
mmc.method='elem';
mmc.isreflect=0;
mmc.detnum=4;
mmc.issaveexit=1;
mmc.issaveseed=1;
mmc.facenb=[162000,4];
mmc.evol=162000;
mmc.e0=27466;
mmc.replaydet=1;
mmc.detphotons=[10 2999];
mmc.isnormalized=0;
mmc.outputtype='wl';
mmc.optlevel=1;
done 17
simulating ...
###############################################################################
# Mesh-based Monte Carlo (MMC) - OpenCL
#
# Copyright (c) 2010-2025 Qianqian Fang <q.fang at
neu.edu> #
# https://mcx.space/#mmc &
https://neurojson.io #
#
#
#Computational Optics & Translational Imaging (COTI) Lab
[http://fanglab.org]#
# Department of Bioengineering, Northeastern University,
Boston, MA, USA #
###############################################################################
# The MCX Project is funded by the NIH/NIGMS under grant
R01-GM114365 #
###############################################################################
# Open-source codes and reusable scientific data are essential
for research, #
# MCX proudly developed human-readable JSON-based data formats
for easy reuse.#
#
#
#Please visit our free scientific data sharing portal at
https://neurojson.io #
# and consider sharing your public datasets in standardized
JSON/JData format #
###############################################################################
$Rev::0684c2$v2025.10$Date::2025-10-20 11:05:07 -04$ by
$Author::Qianqian Fang$
###############################################################################
- variant name: [MMC-OpenCL] compiled with OpenCL version [1]
- compiled with: [RNG] xorshift128+ RNG [Seed Length] 4
initializing streams ... init complete : 0 ms
Building kernel with option: -cl-mad-enable -DMCX_USE_NATIVE
-DMCX_SRC_PENCIL -DUSE_ATOMIC -DMCX_SAVE_DETECTORS
-DUSE_BLBADOUEL
build program complete : 4 ms
- [device 0(1): Apple M2 Pro] threadph=0 oddphotons=702 np=702.0
nthread=12288 nblock=64 repetition=1
requesting 2816 bytes of shared memory
set kernel arguments complete : 4 ms 3
lauching mcx_main_loop for time window [0.0ns 5.0ns] ...
simulation run# 1 ...
Progress:
[==============================================================]
100%
kernel complete: 236 ms
retrieving flux ... detected 702 photons, total: 702
transfer complete: 263 ms
simulated 702 photons (702) with 1 devices (ray-tet 243296)
MCX simulation speed: 3.03 photon/ms
total simulated energy: 702.00 absorbed: 35.66643%
(loss due to initial specular reflection is excluded in the
total)
done 298
Dear Dr. Fang,
I ran demo_example_replay.m on my laptop with an Apple M4 Pro chip. The initial Monte Carlo simulation runs without issue and detects ~3000 photons. However, during the replay step (newcfg), only ~4 photons are detected and the terminal reports that the replay is not successful. I did not modify the sample code. The script still generates the Jacobian figure (attached).
A coworker was able to run the same demo successfully on an NVIDIA GPU, so I was wondering whether this behavior could be related to the Apple Silicon OpenCL backend. Please let me know if there are recommended settings or known limitations for running replay on macOS.
Thank you,Xiuxiu Zhang
--
You received this message because you are subscribed to the Google Groups "mmc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mmc-users+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/mmc-users/dc1625c5-874f-420d-b459-83e17a960b45n%40googlegroups.com.
sorry, forgot to provide the MCXCL paper link
[1] Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang*, "Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms," J. Biomed. Opt. 23(1), 010504 (2018).
https://doi.org/10.1117/1.JBO.23.1.010504
a quick way to tell whether you are using optlevel=3 or 1 is to see the printed log, particularly the line starting with "Building kernel with option", if you see a long line with many macros, you are using optlevel=3
Building kernel with option: -cl-mad-enable
-DMCX_USE_NATIVE -DMCX_SIMPLIFY_BRANCH -DMCX_VECTOR_INDEX
-DUSE_MACRO_CONST -DMCX_SRC_PENCIL -DUSE_ATOMIC
-DMCX_SAVE_DETECTORS -DMCX_SAVE_SEED -DUSE_BLBADOUEL
-Dgcfgdebuglevel=2560 -Dgcfgdstep=1.0000000000e+00f
-Dgcfge0=27466 -Dgcfgelemlen=4 -Dgcfgfocus=0.0000000000e+00f
-Dgcfgframelen=162000 -Dgcfgisextdet=0 -Dgcfgismomentum=0
-Dgcfgisreflect=0 -Dgcfgissavedet=1 -Dgcfgissaveexit=1
-Dgcfgissaveref=0 -Dgcfgisspecular=0
-Dgcfgmaxdetphoton=1000000 -Dgcfgmaxjumpdebug=10000000
-Dgcfgmaxmedia=1 -Dgcfgmaxgate=1 -Dgcfgmaxpropdet=6
-Dgcfgmethod=3 -Dgcfgminenergy=9.9999999748e-07f
-Dgcfgnormbuf=998 -Dgcfgnout=1.0000000000e+00f
-Dgcfgoutputtype=0 -Dgcfgreclen=9
-Dgcfgroulettesize=1.0000000000e+01f
-DgcfgRtstep=2.0000000000e+08f -Dgcfgsrcelemlen=0
-Dgcfgsrctype=0 -Dgcfgvoidtime=1 -Dgcfgseed=1648335518
-Dgcfgsrcnum=1 -Dgcfgdetnum=4 -Dgcfgissaveseed=1
-Dgcfgnf=10800
Building kernel with option:
-cl-mad-enable -DMCX_USE_NATIVE -DMCX_SRC_PENCIL -DUSE_ATOMIC
-DMCX_SAVE_DETECTORS -DUSE_BLBADOUEL