Linux stuck at 50% simulation

37 views
Skip to first unread message

Jack Radford

unread,
Apr 11, 2025, 5:52:20 PMApr 11
to mcx-users
Hi Prof. Fang,

I am trying to run the simulation below on a Linux system with an A6000 ADA running CUDA 12.2 and it gets stuck half way through the simulation. I have found that it runs no problem when decreasing the number of timebins, but when using >60 timebins or so it starts to struggle. I have tried the Linux mcxlab versions for the last 5 years or so, including the recent February update to Jumbo Jolt but the issue is consistent across all versions.

Any help with this would be greatly appreciated.

Thanks,
Jack

%%
clear all;
clc;
%%
load("colin27_v3.mat");
clear cfg;
cfg.vol = colin27;
cfg.nphoton=1e9;
cfg.unitinmm = 1;
% define disk source
cfg.srctype = 'disk';
x = 0; y = 88; z = 90;
cfg.srcpos = [x y z];
cfg.srcparam1 = [1 0 0 0];
xangle=deg2rad(0);
yangle=deg2rad(90);
zangle=deg2rad(90);
vector = [cos(xangle) cos(yangle) cos(zangle)];
vector = vector./norm(vector);
cfg.srcdir= vector;
g = 0.89; n = 1.37;
cfg.prop=[0 0 1 1; % mua (mm-1), mus (mm-1), g, n
0.045 19.818 g n;
0.011 17.4545 g n;
0.0026 0.0909 g n;
0.028 7.3 g n;
0.092 38 0.87 n;
0 0 1 1];
% fluence time parameters
cfg.tstart=0;
cfg.tend=7e-9;
cfg.tstep = 0.1e-9;
time_len = 70;
cfg.issaveref=1; % label=0 acts as an intensity detector; collect dref
cfg.debuglevel='P'; % enable the progress bar
%% Redundant or unnecessary params for one GPU
% GPU thread configuration (unnecessary for 1 GPU since default)
cfg.autopilot=1; % automatically set threads and blocks
cfg.gpuid=1; % which GPU to use
% running simulation with boundary reflection enabled
cfg.isreflect=1; % consider refractive index mismatch; default is 1
cfg.isrefint=1; % index mismatch at inner boundaries; param not in MCXLAB github
cfg.issavedet=1; % enable recording partial pathlength of detected photons; if we ask for detpt this is is 1 by default
%% main simulation and data, image saving cell
tic
% run the simulation
[fluence, detphot] = mcxlab(cfg);
toc

jack.ra...@gmail.com

unread,
Apr 12, 2025, 10:38:49 AMApr 12
to mcx-users
I also noticed that using more than one GPU for the simulations causes Matlab to crash and I get the error below.

Best,
Jack

------------------------------------------------
MATLAB Log File
------------------------------------------------


--------------------------------------------------------------------------------
                Assertion detected at 2025-04-11 22:57:57 +0100
--------------------------------------------------------------------------------

Configuration:
  Crash Decoding           : Disabled - No sandbox or build area path
  Crash Mode               : continue (default)
  Default Encoding         : UTF-8
  Deployed                 : false
  Desktop Environment      : ubuntu:GNOME
  GNU C Library            : 2.35 stable
  Graphics Driver          : Uninitialized software
  Graphics card 1          : 0x10de ( 0x10de ) 0x26b1 Version 535.183.1.0 (0-0-0)
  Graphics card 2          : 0x10de ( 0x10de ) 0x26b1 Version 535.183.1.0 (0-0-0)
  Graphics card 3          : 0x10de ( 0x10de ) 0x26b1 Version 535.183.1.0 (0-0-0)
  Graphics card 4          : 0x1a03 ( 0x1a03 ) 0x2000 Version 0.0.0.0 (0-0-0)
  Java Version             : Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
  MATLAB Architecture      : glnxa64
  MATLAB Entitlement ID    : 1941656
  MATLAB Root              : /home/jack/MATLAB
  MATLAB Version           : 24.1.0.2837808 (R2024a) Update 7
  OpenGL                   : software
  Operating System         : Ubuntu 22.04.5 LTS
  Process ID               : 20684
  Processor ID             : x86 Family 25 Model 8 Stepping 2, AuthenticAMD
  Session Key              : 24ecf41a-eba3-49bb-ba25-50ff25c81e67
  Window System            : The X.Org Foundation (12101004), display :1

Fault Count: 1


Assertion in find at management.cpp line 792:
find: no active context for type '(anonymous namespace)::AlreadyReportedFailure'

Current Thread: 'MCR 0 interpret' id 126472824026688

Register State (captured):
  RAX = 0000000000000000  RBX = 00007306cf63c8e1
  RCX = 0000000000000000  RDX = 00007306d0de9e60
  RSP = 00007306bfbfc850  RBP = 00007306bfbfcc30
  RSI = 00007306d0dcfa55  RDI = 00007306bfbfc860

   R8 = 0000000000000000   R9 = 00007306bfbfcb40
  R10 = 0000000000000000  R11 = 0000000000000000
  R12 = 00007306d0df0328  R13 = 00007306d0e003a8
  R14 = 00007306cf63c644  R15 = 00007306bfbfd530

  RIP = 00007306d0d7cc0f  EFL = 0000000000000000

   CS = 0000   FS = 0000   GS = 0000

Stack Trace (captured):
[  0] 0x00007306d0d75f53           /home/jack/MATLAB/bin/glnxa64/libmwfl.so+00323411
[  1] 0x00007306d0d7614c           /home/jack/MATLAB/bin/glnxa64/libmwfl.so+00323916 _ZN10foundation4core4diag15stacktrace_base7captureEm+00000028
[  2] 0x00007306d0d7b011           /home/jack/MATLAB/bin/glnxa64/libmwfl.so+00344081
[  3] 0x00007306d0d7b0d0           /home/jack/MATLAB/bin/glnxa64/libmwfl.so+00344272
[  4] 0x00007306cf627bc3 /home/jack/MATLAB/bin/glnxa64/libmwfoundation_usm.so+00080835
[  5] 0x00007306cf63a081 /home/jack/MATLAB/bin/glnxa64/libmwfoundation_usm.so+00155777 _ZN10foundation3usm6DetailINS0_5scope3MvmEE4findEmRKSt9type_info+00000145
[  6] 0x00007306c6a7525b       /home/jack/MATLAB/bin/glnxa64/libmwbridge.so+00471643
[  7] 0x00007306cece728c             /home/jack/MATLAB/bin/glnxa64/libut.so+00377484 utVprintf+00000188
[  8] 0x00007306c41625eb            /home/jack/MATLAB/bin/glnxa64/libmex.so+00845291 mexPrintf+00000139
[  9] 0x000073054342a66c                       /home/jack/mcxlab/mcx.mexa64+00173676
[ 10] 0x0000730543417ab5                       /home/jack/mcxlab/mcx.mexa64+00096949
[ 11] 0x000073054340a868                       /home/jack/mcxlab/mcx.mexa64+00043112
[ 12] 0x000073054340fced                       /home/jack/mcxlab/mcx.mexa64+00064749
[ 13] 0x0000730543427ca3                       /home/jack/mcxlab/mcx.mexa64+00162979
[ 14] 0x00007306a78ae92c /home/jack/MATLAB/bin/glnxa64/../../sys/os/glnxa64/libiomp5.so+00715052
[ 15] 0x00007306a7955893 /home/jack/MATLAB/bin/glnxa64/../../sys/os/glnxa64/libiomp5.so+01398931 __kmp_invoke_microtask+00000147
[ 16] 0x00007306a78c8cb3 /home/jack/MATLAB/bin/glnxa64/../../sys/os/glnxa64/libiomp5.so+00822451
[ 17] 0x00007306a78c7bf2 /home/jack/MATLAB/bin/glnxa64/../../sys/os/glnxa64/libiomp5.so+00818162
[ 18] 0x00007306a7956603 /home/jack/MATLAB/bin/glnxa64/../../sys/os/glnxa64/libiomp5.so+01402371
[ 19] 0x00007306cfa94ac3                    /lib/x86_64-linux-gnu/libc.so.6+00608963
[ 20] 0x00007306cfb26850                    /lib/x86_64-linux-gnu/libc.so.6+01206352

Qianqian Fang

unread,
Apr 16, 2025, 12:16:02 PMApr 16
to mcx-...@googlegroups.com, jack.ra...@gmail.com

hi Jack,

thanks for reporting this issue.

I took a look into this issue, and was able to identify that this was caused by a GPU memory out-of-bound error. I created this ticket a few days ago

https://github.com/fangq/mcx/issues/242

and was able to fix it with the following patch

https://github.com/HirviP/mcx/commit/0bf148c1674506d079a59127f624274c1cb85088

please try the latest nightly build or rebuild it on your machine. the provided simulation was able to pass 50% mark and complete normally.

Qianqian

--
You received this message because you are subscribed to the Google Groups "mcx-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mcx-users+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/mcx-users/43804373-38a9-40d0-9c49-f8de386e9d3fn%40googlegroups.com.

jack.ra...@gmail.com

unread,
Apr 17, 2025, 10:31:21 AMApr 17
to mcx-users
Hi Qianqian,

Excellent, I have got it working with this patch. Thank you for fixing this bug so quickly, I appreciate it!

Best,
Jack
Reply all
Reply to author
Forward
0 new messages