Out of memory

97 views
Skip to first unread message

ww w

unread,
May 3, 2024, 9:14:18 AM5/3/24
to mcx-users
Hi Dr. Fang, 

I'm trying to run a simulation using mcxlab on my universities HPC cluster setup, but I keep getting the out of memory error. I'm trying to run it on all 4 GPU that are available to me, I'm not sure how much memory would be required in this instance, but I would have thought that 4 GPU would have been enough. Any advice would be appreciated.

I've copied the log below: 

Device 1 of 4: NVIDIA A100-PCIE-40GB
Compute Capability: 8.0
Global Memory: 42298834944 B
Constant Memory: 65536 B
Shared Memory: 49152 B
Registers: 65536
Clock Speed: 1.41 GHz
Number of SMs: 108
Number of Cores: 6912
Auto-thread: 442368
Auto-block: 64
=============================   GPU Information  ================================
Device 2 of 4: NVIDIA A100-PCIE-40GB
Compute Capability: 8.0
Global Memory: 42298834944 B
Constant Memory: 65536 B
Shared Memory: 49152 B
Registers: 65536
Clock Speed: 1.41 GHz
Number of SMs: 108
Number of Cores: 6912
Auto-thread: 442368
Auto-block: 64
=============================   GPU Information  ================================
Device 3 of 4: NVIDIA A100-PCIE-40GB
Compute Capability: 8.0
Global Memory: 42298834944 B
Constant Memory: 65536 B
Shared Memory: 49152 B
Registers: 65536
Clock Speed: 1.41 GHz
Number of SMs: 108
Number of Cores: 6912
Auto-thread: 442368
Auto-block: 64
=============================   GPU Information  ================================
Device 4 of 4: NVIDIA A100-PCIE-40GB
Compute Capability: 8.0
Global Memory: 42298834944 B
Constant Memory: 65536 B
Shared Memory: 49152 B
Registers: 65536
Clock Speed: 1.41 GHz
Number of SMs: 108
Number of Cores: 6912
Auto-thread: 442368
Auto-block: 64

ans =

  4x1 struct array with fields:

    name
    id
    devcount
    major
    minor
    globalmem
    constmem
    sharedmem
    regcount
    clock
    sm
    core
    autoblock
    autothread
    maxgate

>> >> >> >> >> >> >> >> >> >> >> >> >> >> Launching MCXLAB - Monte Carlo eXtreme for MATLAB & GNU Octave ...
Running simulations for configuration #1 ...
mcx.respin=1;
mcx.nphoton=1e+09;
mcx.maxdetphoton=1e+08;
mcx.gpuid='1111';
mcx.srctype='pencil';
mcx.unitinmm=1;
mcx.savedetflag=5;
mcx.issaveref=1;
mcx.isspecular=1;
mcx.autopilot=1;
mcx.srcdir=[0 0 1 0];
mcx.srcpos=[99 59 0 1];
mcx.issrcfrom0=1;
mcx.detnum=4;
mcx.tstart=0;
mcx.tend=1e-08;
mcx.tstep=2e-11;
mcx.dim=[200 200 200];
mcx.mediabyte=1;
mcx.medianum=5;
MCXLAB ERROR -2 in unit mcx_core.cu:3006: out of memory
Error from thread (1): out of memory

Qianqian Fang

unread,
May 3, 2024, 9:42:40 AM5/3/24
to mcx-...@googlegroups.com, ww w

the error was raised on line 3006 of mcx_core.cu, which reads

https://github.com/fangq/mcx/blob/v2024.2/src/mcx_core.cu#L3006

this is the line where the main volumetric output array, gfield, on the GPU is allocated.

the needed size of the output array is explained in

https://github.com/fangq/mcx?tab=readme-ov-file#requirement-and-installation

"For simulations with large volumes, sufficient graphics memory is also required to perform the simulation. The minimum amount of graphics memory required for a MC simulation is Nx*Ny*Nz bytes for the input tissue data plus Nx*Ny*Nz*Ng*4*2 bytes for the output flux/fluence data - where Nx,Ny,Nz are the dimensions of the tissue volume, Ng is the number of concurrent time gates, 4 is the size of a single-precision floating-point number, 2 is for the extra memory needed to ensure output accuracy (#41). MCX does not require double-precision support in your hardware."


let's do the math in your case

your domain volume is Nx=Ny=Nz=200, your time-gate Nt=tend/tstep = 1e-8/2e-11=500. the total bytes just for the volumetric output alone is 200*200*200*500*4*2=30GB.

each of your GPU has a max 40GB global memory, but there are other programs (run nvidia-smi to list all programs using the GPU) and other smaller variables needed by mcx. In the end, it is not surprising it raise this error.


a related question is what spatial resolution and temporary resolution is sufficient for your simulation? can yo work with a 100x100x100 volume? do you really need 2e-11 as your tstep?


Qianqian

--
You received this message because you are subscribed to the Google Groups "mcx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mcx-users/305cd539-b0ee-4b91-886c-b86241a92b95n%40googlegroups.com.

ww w

unread,
May 7, 2024, 4:23:33 AM5/7/24
to mcx-users
I see, that makes sense. The time step is priority compared to the volume, so halving the volume seems to have done the trick! Thanks for your help
Reply all
Reply to author
Forward
0 new messages