MMC: GPU memory saturation in Matlab when repeating simulations

Andrea Farina

unread,

Sep 5, 2024, 9:05:38 PM9/5/24

to mmc-users

Dear Qiquian,

I'm experience an issue, that probably was also shown in mcx some years ago. I run MMC on a NVIDIA Geforce RTX 3080 Ti having 12Gb of memory. The code is quite simple:

for i = 1:10
mmclab(newcfg);
disp(i);
end

What I see, by monitoring with nvtop the GPU, is that the memory usage continues growing with the number of simulations. I show the screenshot after 10 loops.

As you can see after 10 repetitions, the memory remains roughly at the 25% unless I restart MATLAB.

The problem is that when I need to replay many photons for the Jacobian the memory saturates and MMC gives error. This happens independently of the number of simulated photons, the number of nodes/elements, and the pause between simulations (I tried also 10 seconds).

I have also the same board with 24Gb under Windows10 but it stops after 39 repetitions although the memory is not saturated...the error is "no GPU device found".

Do you have any solution or idea for that?

I tried also the command reset(gpudevice) but nothing happens...

Can you suggest a workaround for that?

Thank you very much

Best regards

Andrea

Andrea Farina

unread,

Sep 8, 2024, 6:15:09 AM9/8/24

to mmc-users

Dear Qinquian,

I'm trying to compile the cuda version (trinity) using the option -DBUILD_CUDA = on for trying to overcome the memory leakage problem. The compiler arrives to the generation of the mmciii.mexa64 and then stops for some error in linking the binary mmciii, but still the mex file is present.

When I try in matlab mmciii here is the error:

Invalid MEX-file
'/home/andreafarina/Documents/MCXStudio/MATLAB/mmclab/mmciii.mexa64':
/home/andreafarina/Documents/MCXStudio/MATLAB/mmclab/mmciii.mexa64: undefined
symbol: benchjson

Is the trinity version still maintained? I've seen also in the nightly build that the mmciii.mexa64 is missing...

Thank you very much

Best regards

Andrea

Qianqian Fang

unread,

Sep 9, 2024, 5:06:25 PM9/9/24

to mmc-...@googlegroups.com, Andrea Farina

hi Andrea,

I am sorry for my late response. was a bit busy at the start of the semester, combined with traveling ...

for the mmc-trinity build, if you use cmake, can you add the following two lines to the CMakeList.txt, and regenerate the makefile and binary?

fangq@taote:~/space/git/Project/github/mmc/src/build$ git diff diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 3036195..0bf9550 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -88,6 +88,7 @@ add_library(mmc STATIC mmc_rand_xorshift128p.c mmc_rand_xorshift128p.h mmc_bench.h + mmc_bench.c mmc_tictoc.c mmc_tictoc.h mmc_cl_utils.c @@ -179,6 +180,7 @@ if(BUILD_MEX AND Matlab_FOUND) mmc_rand_xorshift128p.c mmc_rand_xorshift128p.h mmc_bench.h + mmc_bench.c mmc_tictoc.c mmc_tictoc.h mmc_cl_utils.c

otherwise, you could directly use "make cudamex" inside the src folder. the Makefile approach does link with mmc_bench.o, but cmake file seems to have this missing.

let me know if that fixes the error you saw.

Qianqian

--
You received this message because you are subscribed to the Google Groups "mmc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mmc-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mmc-users/3b6e29ea-fce5-4eac-ae20-ba5f39331659n%40googlegroups.com.

Qianqian Fang

unread,

Sep 9, 2024, 5:09:00 PM9/9/24

to mmc-...@googlegroups.com, Andrea Farina

hi Andrea,

this is a known memory leakage bug in cuda's opencl runtime. I have reported this since 2020, but there has not been a fix

https://forums.developer.nvidia.com/t/how-to-force-nvidia-opencl-to-release-gpu-context-to-avoid-memory-leak/119484

yes, using the cuda backend with the trinity version of mmc/mmclab would avoid this.

Qianqian

--
You received this message because you are subscribed to the Google Groups "mmc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mmc-users+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/mmc-users/229b36d4-60ed-4186-8ad4-1bb913af7d29n%40googlegroups.com.

Andrea Farina

unread,

Sep 9, 2024, 6:52:46 PM9/9/24

to mmc-users

Dear Qinquian,

thank you very much for the support. I've tried to add the two lines in CmakeLists.txt but now, when I simply write mmciii in MATLAB I get another error:

Invalid MEX-file '/home/andreafarina/Documents/MCXStudio/MATLAB/mmclab/mmciii.mexa64':
/home/andreafarina/Documents/MCXStudio/MATLAB/mmclab/mmciii.mexa64: undefined symbol:

mesh_saveweight

I've also explored the mmclab.m file and it seems no if-case managing the mmciii mex

By Trying the way of make cuda and make cudamex I obtain a classical compilation error probably due to a mismatch of version of GLIBC...

andreafarina@kepler-Precision-Tower-7910:~/Documents/MCXStudio/MCXSuite/mmc/src$ make cudamex
Building ../bin/mmc
mex CC='cc' CXX='g++' LINKLIBS="-L"\$MATLABROOT/extern/lib/\$ARCH" -L"\$MATLABROOT/bin/\$ARCH" -lmx -lmex -fopenmp" COMPFLAGS='' DEFINES='' CXXLIBS='$CXXLIBS -fopenmp -lOpenCL -L/usr/local/cuda/lib64 -lcudart' CXXFLAGS='$CXXFLAGS -c -Wall -g -DMCX_EMBED_CL -fno-strict-aliasing -m64 -DMMC_USE_SSE -DHAVE_SSE2 -msse -msse2 -msse3 -mssse3 -msse4.1 -O3 -fopenmp -fPIC -DMCX_CONTAINER -DUSE_OS_TIMER -DUSE_OPENCL -DMMC_XORSHIFT -DUSE_CUDA' -cxx -outdir ../mmclab mmclab.cpp -I../src -I../src/zmat/easylzma -I../src/ubj -output ../bin/mmc built/mmc_rand_xorshift128p.o built/mmc_mesh.o built/mmc_raytrace.o built/mmc_utils.o built/mmc_tictoc.o built/mmc_host.o built/mmc_highorder.o built/mmc_bench.o built/mmc_cl_utils.o built/mmc_cl_host.o built/mmc_cu_host.o
Building with 'g++'.
/usr/bin/ld: built/mmc_mesh.o: relocation R_X86_64_TPOFF32 against `pos.8056' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: built/mmc_highorder.o: relocation R_X86_64_PC32 against symbol `_ZTTNSt7__cxx1119basic_ostringstreamIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

make: *** [../commons/Makefile_common.mk:291: ../bin/mmc] Error 255

Anyway, I'll update MATLAB to the last version on the linux machine, now I'm using the 2021a, which is quite old...I hope that updating to the last version the mmciii file will work! Otherwise, I'll kindly ask you a pre-compiled mex file mmciii.mexa64!

I'll get back to you for updating

Thanks a lot

Andrea

Qianqian Fang

unread,

Sep 9, 2024, 7:04:03 PM9/9/24

to mmc-...@googlegroups.com, Andrea Farina

try download a linux binary I built on Ubuntu 18.04 with cuda 11.3 from

https://mcx.space/nightly/linux64/mmc_trinity_mex.tar.gz

let me know if it works.

for your own builds, did you do a make clean or removing cmake cached files before building again?

To view this discussion on the web visit https://groups.google.com/d/msgid/mmc-users/8d11bfa9-5eb8-4c86-97b8-f45273427480n%40googlegroups.com.

Andrea Farina

unread,

Sep 9, 2024, 8:43:00 PM9/9/24

to mmc-users

Dear Qinquian,

thank you again for the precious support. I've tried to use your binary but, probably because I have CUDA 11.1 (not yet updated...) and Matlab 2021a the binary works but the memory leakage is still present. Moreover in this binary is present the issue about mua > 0 that I've posted here

https://github.com/fangq/mmc/issues/91#issuecomment-2329406979.

Probably a hint that may help you in debugging this.

To circumvent the memory issue, my idea is to try running the executable file /mmc/bin/mmc, compiled with openCL (that for me runs well through cmake) by automatizing the generation of the json file in matlab through your script mmc2json.m, although it will be a little slower because of read/write from disk...

Initially I was thinking that the issue of the memory leakage was present both in MATAB and from command line but this is not...

If I run this code under matlab:

newcfg = mmclab(cfg, 'prep'); % preprocessing of the mesh to get the missing fields
mmc2json(newcfg,'sim');
for i = 1:40
! ../../../MCXStudio/MCXSuite/mmc/bin/mmc -f ./sim.json -s sim -G 1 -d 1

end

It works and no memory leakage is present! So I think that the problem is mainly due to MATLAB MEX and not to incompatibility between NVIDIA and OpenCL.

Hope this can help investigate the problem...

Anyway in the next days I'll try to update the MATLAB version to 2024 and try to re-compile...

Concerning my issue in compiling every time I run cmake I cancel the content of the /src/build folder... or I run make clean from /src.

Thanks a lot!

Best regards!

Andrea

Andrea Farina

unread,

Sep 12, 2024, 8:54:54 AM9/12/24

to mmc-users

Dear Qinquian,

I'm here back to report some further findings on the memory leak.

I have updated my Linux machine to CUDA 12.6 with the last driver and MATLAB 2024b. Here is what I've found:

1. After the execution of the mex function mmclab, both if compiled with cuda or openCL, the memory occupancy increases and this is intrinsic in Matlab. If you run the command:

clear mex

the mex is flushed out from memory. I've tested it by adding some printf in mmclab.cpp and the function MexAtExit that helps understand when the mex is released.

With the cuda-compiled mex (make cudamex) this solves the problem: the GPU memory comes to zero, but this doesn't happen with the openCL-compiled version.

2. I realized later that, to use cuda, you have to set cfg.compute = 'cuda'. I've found this by digging a little in the sources file but it is worth mentioning it the help of mmclab.m.

3. Concerning compiling the binaries starting by deleting both src/built src/build and running the make clean I have the following situations:

a. I can successfully compile either make (or make cuda) or make mex (or make cudamex) but not together if I reverse the order of the two commands I obtain an error:

/usr/bin/ld: built/mmc_mesh.o: relocation R_X86_64_TPOFF32 against `pos.8056' can not be used when making a shared object; recompile with -fPIC

/usr/bin/ld: built/mmc_highorder.o: relocation R_X86_64_PC32 against symbol `_ZTTNSt7__cxx1119basic_ostringstreamIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

probably I need to interleave a make clean between the two commands. This generates a single mex named mmc.mexa64 and, with the option cfg.compute = 'cuda' in MATLAB I can switch from CUDA to openGL.

b. By using CMAKE without options all goes smoothly and both the mmc.mexa64 and the binary mmc are created using OPENCL. By compiling using Cmake ..DBUILD_CUDA = on it creates two files mmc.mexa64 and mmciii.mexa64 but the linker gives an error when generating the binary mmciii. However, if I run mmclab with cfg.compute = 'cuda' this options is overridden and the engine is still OpenCL. If I execute mmciii from MATLAB command line I obtain the error:

Invalid MEX-file
'/home/andreafarina/Documents/MCXStudio/MATLAB/mmclab/mmciii.mexa64':
/home/andreafarina/Documents/MCXStudio/MATLAB/mmclab/mmciii.mexa64: undefined

symbol: mesh_saveweight

I guess the two compiling paths are slightly different...

Anyway, when I use make cudamex command I can work with cuda but there are two issues:

1. When I run the function I get the alert:

Unrecognized function or variable 'stdout'.

Error in mmclab (line 430)
[varargout{1:mmcout}] = mmc(cfg);

The simulation keeps going and for the forward data the output is ok. When I use the reply unfortunately the output detp is empty (so no check is possible if the replay succeeded) and the results are different between openCL and CUDA. Using CUDA for WL the total pathlength (sum over the nodes) is roughly half of the total pathlength calculated at the detector, using OPENCL the two numbers are the same.

I've used also the mex you sent me compiled with CUDA 11.3, this works without the alert above but same effect on the output in reply mode: no output detp and numbers non-consistent with the detector.

I think I'll try to use the command line version run inside matlab, there will be some overhead of write/read cycles from the HD but it will ensure no memory leaks!

Hope this can help you in finding a way to free the mex file from MATLB or to align the output under CUDA with the one under OPENGL.

Kind regards

Reply all

Reply to author

Forward