Hi Dr. Fang,
Great fan of MCX, thank you for your work. For the most part I've been able to get mcxlab to work well, but I've spent the last 3-4 days trying to get my new simulations to work with the correct number of timesteps without segmentation violation errors (code works fine for fewer timesteps).
My problem: the GPU has 40GB of memory, my simulation uses at most 20GB when tstep=5e-12 (as I expect from calculations provided), RAM is also plentiful, but I get a segmentation violation error. If tstep=1e-11, peak GPU usage is 10GB/40GB and there is no error. I expect MCX needs a few spare GB for various functions, but still GPU memory doesn't seem to be the issue.
What do you think might be causing the segmentation violation?
I am currently running with the latest binaries (18 november 2024) from this page:
https://mcx.space/nightly/linux64/.
I provide the full code, output log, error log below:
My script:
disp('Starting MATLAB script execution...');add_paths();disp('Hello friends');mcxlab('gpuinfo')clear cfg cfgs;%cfg.gpuid='11111111'; % use 8 GPUs togethercfg.gpuid=1;cfg.autopilot = 1;cfg.seed = hex2dec('623F9A9E');cfg.srctype='gaussian';cfg.unitinmm = 1;cfg.bc = 'rrrrrr'; % Reflective on all faces except the top (+z) facecfg.nphoton=1e8; cfg.tstart=0;cfg.tend=1e-9; % Total simulation timecfg.tstep=5e-12; % Time step (5e-12)cfg.issrcfrom0 = 1; % Positions are from (0,0,0)cfg.isspecular = 0; %reflection on air-surface boundary (source outside so 0)cfg.isreflect = 0;% cfg.workload = [12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5, 12.5];% cfg.workload = [10, 10];disp('base config set, loading data');67.198456, 136.474472, 222.196457;
cfg.isgpuinfo = 1;
% Read and set volume data
raw = jsondecode(fileread('/u/trialan/Geometric-eigenmodes/templates_jnifti/P03_5tissues.jnii'));
volumeData = raw.struct.NIFTIData.('x_ArrayData_');
volumeData = reshape(volumeData, [192, 256, 256]);
volumeData = volumeData./50; % ensuring indexing is proper (0,1,...)
cfg.vol=volumeData;
cfg.vol=uint8(cfg.vol); % unsigned 8-bit integers for mcxlab
disp('Data has been loaded');
% Set medium properties (NB: ensure indexing is proper
cfg.prop = [
0 0 1 1; % medium 0: environment
0.0195 14.75 0.92 1.37; % medium 1: grey matter
0.0195 14.75 0.92 1.37; % medium 2: white matter
0.0021 0.125 0.92 1.37; % medium 3: CSF
0.0125 11.625 0.92 1.37; % medium 4: skull
0.0177 9.125 0.92 1.37; % medium 5: skin
];
% Set source positions, source directions, and detectors (NB: flipped x<->z)
cfg.srcpos = [
96.307083, 68.003639, 55.998356;
44.174034, 77.827965, 81.520073;
66.802277, 52.199722, 96.766548;
26.951130, 114.490875, 108.559990;
46.643810, 66.358185, 117.232712;
94.124985, 43.001999, 120.994629;
28.002001, 95.378166, 145.395218;
64.492905, 57.509098, 152.482513;
26.002001, 131.971420, 165.808945;
58.356976, 69.645020, 180.162170;
92.075584, 59.001999, 184.685333;
45.633389, 110.609413, 202.631393;
69.145714, 74.697334, 200.841049;
114.983376, 56.565464, 172.580093;
91.411972, 96.323570, 219.998001;
67.198456, 136.474472, 222.196457;
];
cfg.srcdir = [
-0.000000, 0.707107, 0.707107;
0.999998, 0.002000, -0.000000;
0.707107, 0.707107, -0.000000;
0.577350, 0.577350, 0.577350;
0.707107, 0.707107, -0.000000;
-0.000000, 0.999998, -0.002000;
1.000000, -0.000000, -0.000000;
0.577735, 0.577735, -0.576580;
0.999998, -0.000000, -0.002000;
0.577735, 0.577735, -0.576580;
-0.000000, 0.999998, -0.002000;
0.707107, -0.000000, -0.707107;
0.576580, 0.577735, -0.577735;
-0.577350, 0.577350, -0.577350;
-0.000000, -0.000000, -1.000000;
0.707107, -0.000000, -0.707107;
];
det_radius = 3; % set detector radius
cfg.detpos = [
93.488235, 86.575806, 44.426189, det_radius;
68.718773, 89.161018, 48.001999, det_radius;
58.665642, 77.985718, 59.350636, det_radius;
53.143295, 94.157356, 56.858704, det_radius;
127.412392, 82.119255, 54.295128, det_radius;
92.999161, 46.771332, 88.230667, det_radius;
52.008480, 65.300247, 86.693275, det_radius;
65.663345, 52.001999, 123.162247, det_radius;
32.296879, 87.481064, 112.224052, det_radius;
47.998852, 67.003151, 150.051773, det_radius;
92.949844, 49.001999, 152.478653, det_radius;
124.711327, 51.001999, 121.583382, det_radius;
23.002001, 124.355942, 137.226669, det_radius;
31.779688, 100.710503, 174.777695, det_radius;
65.172417, 65.001999, 180.371765, det_radius;
124.373878, 62.001999, 180.130127, det_radius;
41.946758, 132.513260, 196.944763, det_radius;
62.178677, 77.022934, 198.199615, det_radius;
98.457642, 72.045975, 202.043976, det_radius;
144.441986, 90.919327, 201.475327, det_radius;
60.091969, 133.353287, 217.089966, det_radius;
62.669739, 104.033295, 214.998001, det_radius;
128.814774, 102.993713, 216.998001, det_radius;
97.552841, 132.920670, 229.918671, det_radius;
];
%%% SIMULATING AND MCXPLOTVOL
disp("About to run mcxlab");
[flux, detp, vol, seeds] = mcxlab(cfg);
%mcxplotvol(log10(flux.data));
disp("Simulation has run bby");
% Set up the replay configuration
cfg_replay = cfg;
cfg_replay.seed = seeds.data;
cfg_replay.detphotons = detp.data;
cfg_replay.outputtype = 'jacobian'; % Output absorption Jacobian
cfg_replay.replaydet = 0; % Replay all detected photon
disp("about to run replay");
% Run the replay simulation to get the Jacobian
jacobian = mcxlab(cfg_replay);
disp("replay has run");size(jacobian.data)and add_paths is:
function add_paths() % Function to add necessary toolbox paths %toolbox_root = 'C:\Users\Lenovo\AppData\Local\MathWorks\MATLAB\R2024b\'; toolbox_root = '/u/trialan'; addpath(fullfile(toolbox_root, 'mcxlab')); addpath(fullfile(toolbox_root, 'mcx')); addpath(fullfile(toolbox_root, 'jsonlab')); addpath(fullfile(toolbox_root, 'zmat')); addpath(fullfile(toolbox_root, 'mcx', 'utils'));endIf I set
cfg.tstep=1e-11 instead of
cfg.tstep=5e-12. then there is no error and everything works (in this case, peak GPU memory usage is about 10GB, which seems odd).
My expectation is that the memory usage should be ~ 192*256^2*4*2*200 bytes, or around 20GB (following your calculation here:
https://groups.google.com/g/mcx-users/c/w_YL7M6G-e8/m/NybKPx3jAAAJ). And indeed tracking memory with nvidia-smi in the background indicates peak usage at around 20GB.
Unfortunately I cannot share the data I am running this on as it is medical data I'm not allowed to share, sorry about that.
The full error is this:
Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: NoneCurrently Loaded Modules: 1) gcc/11.4.0 3) cuda/11.8.0 5) slurm-env/0.1 7) matlab/2024a 2) openmpi/4.1.6 4) cue-login-env/1.0 6) default-s11 -------------------------------------------------------------------------------- Segmentation violation detected at 2024-11-18 05:53:07 -0600--------------------------------------------------------------------------------Configuration: Crash Decoding : Disabled - No sandbox or build area path Crash Mode : continue (default) Default Encoding : UTF-8 Deployed : false GNU C Library : 2.28 stable Graphics Driver : Uninitialized software Graphics card 1 : 0x10de ( 0x10de ) 0x2235 Version 550.90.7.0 (0-0-0) Graphics card 2 : 0x10de ( 0x10de ) 0x2235 Version 550.90.7.0 (0-0-0) Graphics card 3 : 0x10de ( 0x10de ) 0x2235 Version 550.90.7.0 (0-0-0) Graphics card 4 : 0x102b ( 0x102b ) 0x538 Version 0.0.0.0 (0-0-0) Graphics card 5 : 0x10de ( 0x10de ) 0x2235 Version 550.90.7.0 (0-0-0) Java Version : Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode MATLAB Architecture : glnxa64 MATLAB Entitlement ID : 7087517 MATLAB Root : /sw/external/matlab/2024a MATLAB Version : 24.1.0.2653294 (R2024a) Update 5 OpenGL : software Operating System : "Red Hat Enterprise Linux release 8.8 (Ootpa)" Process ID : 3009653
Processor ID : x86 Family 25 Model 1 Stepping 1, AuthenticAMD
Session Key : 51d243fc-6507-4a61-afb3-6c43729cab7a
Window System : No active display
Fault Count: 1
Abnormal termination:
Segmentation violation
Current Thread: 'MCR 0 interpret' id 140339235780352
Register State (from fault):
RAX = 00007f9c853fc020 RBX = fffffffe58000000
RCX = 00007f9add3fc000 RDX = fffffffbf9ffe060
RSP = 00007fa3464b7748 RBP = 00007fa3464b8ea0
RSI = 00007fa13b3fefb0 RDI = 00007f9ee33fdfc0
R8 = ffffffffffffffe0 R9 = 00007fa3400008d2
R10 = 00007f9add3fc020 R11 = 00007f9c853fc020
R12 = 00007f9edd3fd010 R13 = 00007fa3464b9190
R14 = 00007fa3464b9710 R15 = 0000000000000000
RIP = 00007fa3abd1111e EFL = 0000000000010282
CS = 0033 FS = 0000 GS = 0000
Stack Trace (from fault):
[ 0] 0x00007fa3abd1111e /lib64/libc.so.6+00848158
[ 1] 0x00007fa25a83c5de /u/trialan/mcxlab/mcx.mexa64+00194014 mexFunction+00004578
[ 2] 0x00007fa38f8880df /sw/external/matlab/2024a/bin/glnxa64/libmex.so+00966879
[ 3] 0x00007fa38f888157 /sw/external/matlab/2024a/bin/glnxa64/libmex.so+00966999
[ 4] 0x00007fa38f8881c7 /sw/external/matlab/2024a/bin/glnxa64/libmex.so+00967111
[ 5] 0x00007fa38f88977a /sw/external/matlab/2024a/bin/glnxa64/libmex.so+00972666
[ 6] 0x00007fa38f875234 /sw/external/matlab/2024a/bin/glnxa64/libmex.so+00889396
[ 7] 0x00007fa38ff71d66 /sw/external/matlab/2024a/bin/glnxa64/libmwm_dispatcher.so+01535334 _ZN8Mfh_file20dispatch_file_commonEMS_FviPP11mxArray_tagiS2_EiS2_iS2_+00000166
[ 8] 0x00007fa38ff7334c /sw/external/matlab/2024a/bin/glnxa64/libmwm_dispatcher.so+01540940
[ 9] 0x00007fa38ff736ee /sw/external/matlab/2024a/bin/glnxa64/libmwm_dispatcher.so+01541870 _ZN8Mfh_file8dispatchEiPSt10unique_ptrI11mxArray_tagN6matrix6detail17mxDestroy_deleterEEiPPS1_+00000030
[ 10] 0x00007fa38f389d02 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+02555138
[ 11] 0x00007fa38f38b264 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+02560612
[ 12] 0x00007fa384c4b72f /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+11548463
[ 13] 0x00007fa384c5685f /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+11593823
[ 14] 0x00007fa384bcbf02 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+11026178
[ 15] 0x00007fa3848dc000 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07946240
[ 16] 0x00007fa3848de31c /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07955228
[ 17] 0x00007fa3848db8bb /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07944379
[ 18] 0x00007fa3848ecbbf /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+08014783
[ 19] 0x00007fa3848ed619 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+08017433
[ 20] 0x00007fa3848db6c4 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07943876
[ 21] 0x00007fa3848db7c6 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07944134
[ 22] 0x00007fa384a3677b /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+09365371
[ 23] 0x00007fa384a3a856 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+09381974
[ 24] 0x00007fa38f50fe24 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+04152868
[ 25] 0x00007fa38f377b91 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+02481041
[ 26] 0x00007fa38f37add5 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+02493909
[ 27] 0x00007fa38ff71d66 /sw/external/matlab/2024a/bin/glnxa64/libmwm_dispatcher.so+01535334 _ZN8Mfh_file20dispatch_file_commonEMS_FviPP11mxArray_tagiS2_EiS2_iS2_+00000166
[ 28] 0x00007fa38ff7334c /sw/external/matlab/2024a/bin/glnxa64/libmwm_dispatcher.so+01540940
[ 29] 0x00007fa38ff736ee /sw/external/matlab/2024a/bin/glnxa64/libmwm_dispatcher.so+01541870 _ZN8Mfh_file8dispatchEiPSt10unique_ptrI11mxArray_tagN6matrix6detail17mxDestroy_deleterEEiPPS1_+00000030
[ 30] 0x00007fa38f389d02 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+02555138
[ 31] 0x00007fa38f38b264 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+02560612
[ 32] 0x00007fa384c4b72f /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+11548463
[ 33] 0x00007fa384c3df6d /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+11493229
[ 34] 0x00007fa384bcc0e2 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+11026658
[ 35] 0x00007fa3848dc000 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07946240
[ 36] 0x00007fa3848de31c /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07955228
[ 37] 0x00007fa3848db8bb /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07944379
[ 38] 0x00007fa3848ecbbf /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+08014783
[ 39] 0x00007fa3848ed619 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+08017433
[ 40] 0x00007fa3848db6c4 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07943876
[ 41] 0x00007fa3848db7c6 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+07944134
[ 42] 0x00007fa384a3677b /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+09365371
[ 43] 0x00007fa384a3a856 /sw/external/matlab/2024a/bin/glnxa64/libmwm_lxe.so+09381974
[ 44] 0x00007fa38f50fe24 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+04152868
[ 45] 0x00007fa38f3eaebf /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+02952895
[ 46] 0x00007fa38f3f2577 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+02983287
[ 47] 0x00007fa38f4b31c5 /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+03772869
[ 48] 0x00007fa38f4b340e /sw/external/matlab/2024a/bin/glnxa64/libmwlxemainservices.so+03773454
[ 49] 0x00007fa38fd3329f /sw/external/matlab/2024a/bin/glnxa64/libmwm_interpreter.so+01434271 _Z51inEvalCmdWithLocalReturnInDesiredWSAndPublishEventsRKNSt7__cxx1112basic_stringIDsSt11char_traitsIDsESaIDsEEEPibbP15inWorkSpace_tagN9MathWorks3lxe10EvalSourceE+00000063
[ 50] 0x00007fa3943b313e /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00971070 _ZNK3iqm18InternalEvalPlugin24inEvalCmdWithLocalReturnERKNSt7__cxx1112basic_stringIDsSt11char_traitsIDsESaIDsEEEP15inWorkSpace_tag+00000110
[ 51] 0x00007fa3943b4124 /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00975140 _ZN3iqm18InternalEvalPlugin7executeEP15inWorkSpace_tag+00000420
[ 52] 0x00007fa39439c54f /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00877903
[ 53] 0x00007fa394367a56 /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00662102
[ 54] 0x00007fa394367e82 /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00663170
[ 55] 0x00007fa3a996f0ff /sw/external/matlab/2024a/bin/glnxa64/libmwmlutil.so+09703679 _ZNK14cmddistributor16IIPRunNowMessage7deliverERKN10foundation7msg_svc8exchange7RoutingE+00000031
[ 56] 0x00007fa3aa4f9e7a /sw/external/matlab/2024a/bin/glnxa64/libmwms.so+03362426 _ZN10foundation7msg_svc8exchange12MessageQueue7deliverERKN7mwboost10shared_ptrIKNS1_8EnvelopeEEE+00000250
[ 57] 0x00007fa3aa4fb190 /sw/external/matlab/2024a/bin/glnxa64/libmwms.so+03367312
[ 58] 0x00007fa3aa4e32dd /sw/external/matlab/2024a/bin/glnxa64/libmwms.so+03269341
[ 59] 0x00007fa3aa4e6f8c /sw/external/matlab/2024a/bin/glnxa64/libmwms.so+03284876
[ 60] 0x00007fa3aa4e2d47 /sw/external/matlab/2024a/bin/glnxa64/libmwms.so+03267911
[ 61] 0x00007fa3a98a77f1 /sw/external/matlab/2024a/bin/glnxa64/libmwmlutil.so+08886257
[ 62] 0x00007fa3a98afb55 /sw/external/matlab/2024a/bin/glnxa64/libmwmlutil.so+08919893
[ 63] 0x00007fa3acfaa56e /sw/external/matlab/2024a/bin/glnxa64/libmwrcf_framework.so+00304494 _ZN7mwboost6detail17shared_state_base13wait_internalERNS_11unique_lockINS_5mutexEEEb+00000222
[ 64] 0x00007fa39421fa62 /sw/external/matlab/2024a/bin/glnxa64/libmwmcr.so+00719458 _ZN7mwboost6futureIvE3getEv+00000098
[ 65] 0x00007fa39420c360 /sw/external/matlab/2024a/bin/glnxa64/libmwmcr.so+00639840
[ 66] 0x00007fa3accf3634 /sw/external/matlab/2024a/bin/glnxa64/libmwmvm.so+03384884 _ZN14cmddistributor15PackagedTaskIIP10invokeFuncIN7mwboost8functionIFvvEEEEENS2_10shared_ptrINS2_6futureIDTclfp_EEEEEERKT_+00000068
[ 67] 0x00007fa3accf38e9 /sw/external/matlab/2024a/bin/glnxa64/libmwmvm.so+03385577 _ZNSt17_Function_handlerIFN7mwboost3anyEvEZN14cmddistributor15PackagedTaskIIP10createFuncINS0_8functionIFvvEEEEESt8functionIS2_ET_EUlvE_E9_M_invokeERKSt9_Any_data+00000025
[ 68] 0x00007fa3943bd8cd /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+01013965 _ZN3iqm18PackagedTaskPlugin7executeEP15inWorkSpace_tag+00000093
[ 69] 0x00007fa39439c54f /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00877903
[ 70] 0x00007fa3943669b8 /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00657848
[ 71] 0x00007fa38f95dab9 /sw/external/matlab/2024a/bin/glnxa64/libmwbridge.so+00498361
[ 72] 0x00007fa38f95df43 /sw/external/matlab/2024a/bin/glnxa64/libmwbridge.so+00499523
[ 73] 0x00007fa38f979592 /sw/external/matlab/2024a/bin/glnxa64/libmwbridge.so+00611730 _Z22mnGetCommandLineBufferbRbN7mwboost8optionalIKP15inWorkSpace_tagEEbRKNS0_9function2IN6mlutil14cmddistributor17inExecutionStatusERKNSt7__cxx1112basic_stringIDsSt11char_traitsIDsESaIDsEEES4_EE+00000210
[ 74] 0x00007fa38f9798f9 /sw/external/matlab/2024a/bin/glnxa64/libmwbridge.so+00612601 _Z8mnParserv+00000521
[ 75] 0x00007fa394242c9f /sw/external/matlab/2024a/bin/glnxa64/libmwmcr.so+00863391
[ 76] 0x00007fa3accf3634 /sw/external/matlab/2024a/bin/glnxa64/libmwmvm.so+03384884 _ZN14cmddistributor15PackagedTaskIIP10invokeFuncIN7mwboost8functionIFvvEEEEENS2_10shared_ptrINS2_6futureIDTclfp_EEEEEERKT_+00000068
[ 77] 0x00007fa3accf38e9 /sw/external/matlab/2024a/bin/glnxa64/libmwmvm.so+03385577 _ZNSt17_Function_handlerIFN7mwboost3anyEvEZN14cmddistributor15PackagedTaskIIP10createFuncINS0_8functionIFvvEEEEESt8functionIS2_ET_EUlvE_E9_M_invokeERKSt9_Any_data+00000025
[ 78] 0x00007fa3943bd8cd /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+01013965 _ZN3iqm18PackagedTaskPlugin7executeEP15inWorkSpace_tag+00000093
[ 79] 0x00007fa39439c54f /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00877903
[ 80] 0x00007fa394365252 /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00651858
[ 81] 0x00007fa394365ba3 /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00654243
[ 82] 0x00007fa394365ea4 /sw/external/matlab/2024a/bin/glnxa64/libmwiqm.so+00655012
[ 83] 0x00007fa39422f17e /sw/external/matlab/2024a/bin/glnxa64/libmwmcr.so+00782718
[ 84] 0x00007fa39422ed75 /sw/external/matlab/2024a/bin/glnxa64/libmwmcr.so+00781685
[ 85] 0x00007fa39422efcd /sw/external/matlab/2024a/bin/glnxa64/libmwmcr.so+00782285
[ 86] 0x00007fa3ab6fc277 /sw/external/matlab/2024a/bin/glnxa64/libmwboost_thread.so.1.78.0+00045687
[ 87] 0x00007fa3ac2011ca /lib64/libpthread.so.0+00033226
[ 88] 0x00007fa3abc7be73 /lib64/libc.so.6+00237171 clone+00000067
This error was detected while a MEX-file was running. If the MEX-file
is not an official MathWorks function, please examine its source code
for errors. Please consult the External Interfaces Guide for information
on debugging MEX-files.
** This crash report has been saved to disk as /u/trialan/matlab_crash_dump.3009653-1 **
MATLAB is exiting because of fatal error
/var/spool/slurmd/job5572661/slurm_script: line 34: 3009653 Killed matlab -batch "addpath('/u/trialan/Geometric-eigenmodes/forward_fNIRS'); plotvol;"The output log is:
Job started on node: gpub038 at Mon Nov 18 05:52:21 CST 2024Starting MATLAB script execution...Hello friends============================= GPU Information ================================Device 1 of 1: NVIDIA A40Compute Capability: 8.6Global Memory: 47608692736 BConstant Memory: 65536 BShared Memory: 49152 BRegisters: 65536Clock Speed: 1.74 GHzNumber of SMs: 84Number of Cores: 5376Auto-thread: 344064Auto-block: 64ans = struct with fields: name: 'NVIDIA A40' id: 1 devcount: 1 major: 8 minor: 6 globalmem: 4.7609e+10
constmem: 65536
sharedmem: 49152
regcount: 65536
clock: 1740000
sm: 84
core: 5376
autoblock: 64
autothread: 344064
maxgate: 0
base config set, loading data
Data has been loaded
About to run mcxlab
Launching MCXLAB - Monte Carlo eXtreme for MATLAB & GNU Octave ...
Running simulations for configuration #1 ...
mcx.gpuid=1;
mcx.autopilot=1;
mcx.seed=1648335518;
mcx.srctype='gaussian';
mcx.unitinmm=1;
mcx.bc='rrrrrr';
mcx.nphoton=1e+08;
mcx.tstart=0;
mcx.tend=1e-09;
mcx.tstep=5e-12;
mcx.issrcfrom0=1;
mcx.isspecular=0;
mcx.isreflect=0;
mcx.isgpuinfo=1;
mcx.dim=[192 256 256];
mcx.mediabyte=1;
mcx.medianum=6;
mcx.srcpos=[96.3071 68.0036 55.9984 1];
mcx.extrasrclen=15;
mcx.srcdir=[-0 0.707107 0.707107 0];
mcx.extrasrclen=15;
mcx.detnum=24;
============================= GPU Information ================================
Device 1 of 1: NVIDIA A40
Compute Capability: 8.6
Global Memory: 47608692736 B
Constant Memory: 65536 B
Shared Memory: 49152 B
Registers: 65536
Clock Speed: 1.74 GHz
Number of SMs: 84
Number of Cores: 5376
Auto-thread: 344064
Auto-block: 64
###############################################################################
# Monte Carlo eXtreme (MCX) -- CUDA #
# Copyright (c) 2009-2024 Qianqian Fang <q.fang at neu.edu> #
# https://mcx.space/ & https://neurojson.io/ #
# #
# Computational Optics & Translational Imaging (COTI) Lab- http://fanglab.org #
# Department of Bioengineering, Northeastern University, Boston, MA, USA #
###############################################################################
# The MCX Project is funded by the NIH/NIGMS under grant R01-GM114365 #
###############################################################################
# Open-source codes and reusable scientific data are essential for research, #
# MCX proudly developed human-readable JSON-based data formats for easy reuse.#
# #
#Please visit our free scientific data sharing portal at https://neurojson.io/#
# and consider sharing your public datasets in standardized JSON/JData format #
###############################################################################
$Rev::188338$v2024.6 $Date::2024-11-13 00:00:36 -05$ by $Author::Qianqian Fang$
###############################################################################
- code name: [Jumbo Jolt] compiled by nvcc [7.5] for CUDA-arch [350] on [Nov 18 2024]
- compiled with: RNG [xorshift128+] with Seed Length [4]
GPU=1 (NVIDIA A40) threadph=290 extra=221440 np=100000000 nthread=344064 maxgate=200 repetition=1
initializing streams ... init complete : 20 ms
requesting 3584 bytes of shared memory
launching MCX simulation for time window [0.00e+00ns 1.00e+00ns] ...
simulation run# 1 ...
kernel complete: 1855 ms
retrieving fields ... detected 229928 photons, total: 229928 transfer complete: 11019 ms
normalizing raw data ... source 1, normalization factor alpha=2000.000000
data normalization complete : 11310 ms
simulated 100000000 photons (100000000) with 344064 threads (repeat x1)
MCX simulation speed: 57971.01 photon/ms
total simulated energy: 100000000.00 absorbed: 23.80213%(loss due to initial specular reflection is excluded in the total)If it helps, I have access to large numbers of GPUs through the cluster I use, but I don't get the impression this is the root cause.
Again, great package, and thanks for a super helpful google group!
Regards,
Thomas Rialan