MCXLABCL memory saturation after crash

81 views
Skip to first unread message

Alex Antrobus

unread,
May 14, 2021, 11:06:12 AM5/14/21
to mcx-users
Dear Dr Fang & community

Perhaps this is not an MCX-specific question, so apologies if not...

I am running MCXLABCL on a FEDORA-33 with a NVIDIA GeForce RTX 2080 SUPER.
When my mcxlabcl calls crash, they seem to 'keep' GPU memory resource. This makes debugging hard! Unless I restart Matlab each time / call form the command line (no gui!)

I've tried reseting the GPU from matlab (g-gpuDevice(1); reset(g);) but does nothing.

Any insights appreciated, 
Thanks and regards,

Alex Antroqbus

Qianqian Fang

unread,
May 15, 2021, 5:23:30 PM5/15/21
to mcx-...@googlegroups.com, Alex Antrobus
On 5/14/21 11:06 AM, Alex Antrobus wrote:
Dear Dr Fang & community

Perhaps this is not an MCX-specific question, so apologies if not...


no worries, mcx/mcxcl share the same mailing list


I am running MCXLABCL on a FEDORA-33 with a NVIDIA GeForce RTX 2080 SUPER.
When my mcxlabcl calls crash, they seem to 'keep' GPU memory resource. This makes debugging hard! Unless I restart Matlab each time / call form the command line (no gui!)


I have similar observations recently when debugging mcxlab in matlab R2020+Ubuntu 20.04 with a RTX2060/driver 460.73.01. However, on my other machines running Ubuntu 16.04/18.04 with lower driver versions, I rarely saw this - not sure if this is a new issue caused by later GPU drivers.

as a test, I added a cudaDeviceReset() call in the error-handling function, see

https://github.com/fangq/mcx/blob/master/src/mcx_core.cu#L1900

but it does not seem to solve the issue. I still had to restart matlab.

For OpenCL/mcxcl, I am still not able to locate a function to force resetting a GPU.

I've tried reseting the GPU from matlab (g-gpuDevice(1); reset(g);) but does nothing.


I suppose it does the same as cudaDeviceReset, not entirely sure if it has impact to opencl/mcxcl handling.


I am more curious what had caused the crash - I prefer to fix the crash itself, if you can share a script so I can reproduce - instead of fixing the device reset after a crash.

please let me know.


Qianqian



Any insights appreciated, 
Thanks and regards,

Alex Antroqbus
--
You received this message because you are subscribed to the Google Groups "mcx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mcx-users/822366f5-89a2-4b07-bc35-c8e7259f4d30n%40googlegroups.com.

Alex Antrobus

unread,
May 17, 2021, 5:00:39 AM5/17/21
to Qianqian Fang, mcx-...@googlegroups.com
Thanks Dr Fang. I will try put together a MWE that causes the crash and send over.

On a different point: I watched the online seminar you gave last Monday and found it very informative about the general MCX landscape - what packages are out there and how they relate. I was wondering if you would be willing to share your slides from your talk? Even if a reduced / abridged set, removing anything you would rather not share. 
It would be a great little reference.

Many thanks,
Alex

- Alex

Qianqian Fang

unread,
May 21, 2021, 3:36:41 PM5/21/21
to Alex Antrobus, mcx-...@googlegroups.com
On 5/17/21 5:00 AM, Alex Antrobus wrote:
Thanks Dr Fang. I will try put together a MWE that causes the crash and send over.

On a different point: I watched the online seminar you gave last Monday and found it very informative about the general MCX landscape - what packages are out there and how they relate.


hi Alex

one of my slides contains a diagram (attached here), showing the overall tools we are providing/developing - basically we maintain 3 code bases - mcx (cuda), mcx (opencl), and mmc(sse/opencl/cuda), each tool contains two interfaces - a standalone executable, and a matlab/octave mex file. So, that makes 6 combinations (mcx, mcxcl, mmc, mcxlab, mcxlabcl and mmclab). On top of that, we have a unified GUI - mcxstudio to help create JSON input files for each tool. For more info, please check out our wiki

http://mcx.space/wiki/?Learn

Each of the codes contains numerous options/flags to support different types of simulations and settings - a list of the options can be found here

http://mcx.space/#optionhelp   (then click on "7-Options")

If you install our all-in-one MCXStudio package/installer (see https://twitter.com/FangQ/status/1279935567095087106), it will install all 6 components, plus other small utilities (mcx/utils, mmc/matlab ...)

Our packages are also available on dockerhub for cloud based processing/automation

https://hub.docker.com/u/fangqq


I was wondering if you would be willing to share your slides from your talk? Even if a reduced / abridged set, removing anything you would rather not share.


sent it to you offline. please check your email.

Qianqian

mcx_tools.png

Fang, Qianqian

unread,
May 26, 2021, 4:36:55 PM5/26/21
to Alex Antrobus, mcx-...@googlegroups.com
hi Alex,

I received your mwe and was able to reproduce the issue, thanks for putting it together. the mentioned nvtop tool is also very neat utility that I was not aware of.

regarding the reported memory leakage issue, I was able to fix a few minor leaks detected by valgrind, see this commit


however, this is not enough to fix the memory accumulation issue you found. After digging this further, I finally recalled a similar finding I made last year (sorry for my short memory), see these two unresolved StackOverflow questions I posted regarding mmclab (use opencl as well)



my findings were:

1. nvidia driver fails to release almost all allocated GPU memory associated with an OpenCL context, thus, valgrind reports leakage from function clCreateContextFromType, same thing happens on mcxlabcl
2. this memory leakage only happens on nvidia GPUs with nvidia drivers, but not on Intel or AMD GPUs based on my previous tests


in addition, with your mwe script, I found that mcxlab does not have memory leakage, to see that, you simply change USE_OPENCL=1 to USE_OPENCL=0. GPU memory stays the same across repetitions (on Titan V+418.56+Ubuntu 16.04).

If you have nvidia GPUs, I strongly recommend you to use mcxlab instead of mcxlabcl - this is because nvidia offers poor/outdated opencl support compared to cuda, thus, mcxcl/mcxlabcl have a much poor performance compared to the cuda-based mcx/mcxlab.

Please check out the inset in Fig 2 of our mcxcl paper:


as you can see, on the same nvidia GPUs, mcx is about 5x to 3x faster than mcxcl, despite many of our optimizations. This memory leakage issue adds another limitations to use opencl on nvidia hardware.

hope this is helpful.

Qianqian


On 5/17/21 5:00 AM, Alex Antrobus wrote:
Thanks Dr Fang. I will try put together a MWE that causes the crash and send over.
...

Alex Antrobus

unread,
May 27, 2021, 5:39:17 AM5/27/21
to Fang, Qianqian, mcx-...@googlegroups.com
Morning Dr Fang

Thank you very much for the comprehensive and informative response. Very helpful.

Since I am running Fedora (in fact, I'm using the NeuroFedora image), I thought I read somewhere that using OpenCL mcx (mcxlabcl) was recommended for this OS, even with NVIDIA cards (maybe because CUDA on linux used to be a pain?)
Anyway, I cannot find any reference to this now (can't recall where I read it) - but I CAN say that the plot thickens....

Consider two runs of MCXLAB/MCXLABCL (looped over many times, as in the script I posted before). I find the following:

In OCTAVE (using the mcxlabcl version distributed with NeuroFedora)
flux=mcxlabcl(cfg)                <-               this does NOT result in memory saturation
[flux,detp]=mcxlabcl(cfg)      <-              this DOES result in memory saturation

in MATLAB (using the latest stable release, Furios Fermion 2020 , precompiled...)
flux=mcxlabcl(cfg)                <-               this DOES result in memory saturation
[flux,detp]=mcxlabcl(cfg)      <-              this also DOES result in memory saturation
.. but ....
flux=mcxlab(cfg)                <-               this DOES NOT result in memory saturation
[flux,detp]=mcxlab(cfg)      <-              this also DOES NOT result in memory saturation

... and, as you mentioned, the CUDA-based implementations in MATLAB are significantly faster.

I hope this is a little useful. At least now I can get on with my work!
Many thanks,
Alex

Qianqian Fang

unread,
May 27, 2021, 8:56:45 AM5/27/21
to Alex Antrobus, mcx-...@googlegroups.com
On 5/27/21 5:39 AM, Alex Antrobus wrote:
Morning Dr Fang

Thank you very much for the comprehensive and informative response. Very helpful.

Since I am running Fedora (in fact, I'm using the NeuroFedora image), I thought I read somewhere that using OpenCL mcx (mcxlabcl) was recommended for this OS, even with NVIDIA cards (maybe because CUDA on linux used to be a pain?)
Anyway, I cannot find any reference to this now (can't recall where I read it) - but I CAN say that the plot thickens....

Consider two runs of MCXLAB/MCXLABCL (looped over many times, as in the script I posted before). I find the following:

In OCTAVE (using the mcxlabcl version distributed with NeuroFedora)
flux=mcxlabcl(cfg)                <-               this does NOT result in memory saturation
[flux,detp]=mcxlabcl(cfg)      <-              this DOES result in memory saturation


hi Alex

the mcxlabcl that I uploaded in NeuroFedora was an old release - v0.9.5 (v2019.10).

between then and now, the biggest change in terms of OpenCL use is the duplication of read-only GPU buffers for each GPU, otherwise, multi-GPU can not be used simultaneously on NVIDIA cards (but does work on AMD/Intel devices)

https://github.com/fangq/mcxcl/commit/c1e3ebbe995724436a3f627d4826582d2a9a4f5c

aside from that change, I see the memory leakage caused by the detected photon buffer Pdet already exists in v0.9.5

https://github.com/fangq/mcxcl/commit/4c1830589e2e4615782e036c994a7e4688357b15#diff-7f653e70f446b9f7d91c5e1cef592e217b124628543bdecf31ff294b10c706baR1058


because of that, I expect there should be some level of memory accumulation - not necessarily saturation - on this version.



in MATLAB (using the latest stable release, Furios Fermion 2020 , precompiled...)
flux=mcxlabcl(cfg)                <-               this DOES result in memory saturation
[flux,detp]=mcxlabcl(cfg)      <-              this also DOES result in memory saturation
.. but ....
flux=mcxlab(cfg)                <-               this DOES NOT result in memory saturation
[flux,detp]=mcxlab(cfg)      <-              this also DOES NOT result in memory saturation


this aligns with the latest observations.

for now, please use mcxlab to take advantage of the higher speed and robust memory deallocation. I will find out if the opencl related memory leak is related to the buffer duplication change that I mentioned above.

Qianqian

Reply all
Reply to author
Forward
0 new messages