Question about pycuda

fegy .q

unread,

Feb 19, 2024, 7:49:21 AM2/19/24

to gprMax-users

Dear gprMax developers,

Hello, I have built a 3D model and let it run on a HPC as follows
Host: c0ce5c8fc0a7 | RDO KVM | 2 x Intel Xeon Processor (Skylake, IBRS) (40 cores, 80 cores with Hyper-Threading) | 630GiB RAM | Linux-3.10.0-1160.99.1.el7.x86_64-x86_64-with-glibc2.27
GPU(s) detected: 0 - NVIDIA A100-PCIE-40GB, 39.4GiB.

The 3D model only used about 30GB as
Memory (RAM) required: ~27.3GB host + ~27.3GB GPU.

However, once the model started running it reported an error

Output file: /root/Simu/corner1.out

Running simulation, model 1/74: 1%|█▊ | 64/5398 [00:00<00:31, 171.28it/s]
Traceback (most recent call last):
File "/root/Simu/BScan.py", line 16, in <module>
api(filename, n=n_times, geometry_only=False,gpu={0}) #geometry_only：仅几何图形
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/gprMax/lib/python3.11/site-packages/gprMax-3.1.7-py3.11-linux-x86_64.egg/gprMax/gprMax.py", line 108, in api
run_main(args)
File "/root/miniconda3/envs/gprMax/lib/python3.11/site-packages/gprMax-3.1.7-py3.11-linux-x86_64.egg/gprMax/gprMax.py", line 191, in run_main
run_std_sim(args, inputfile, usernamespace)
File "/root/miniconda3/envs/gprMax/lib/python3.11/site-packages/gprMax-3.1.7-py3.11-linux-x86_64.egg/gprMax/gprMax.py", line 232, in run_std_sim
run_model(args, currentmodelrun, modelend - 1, numbermodelruns, inputfile, modelusernamespace)
File "/root/miniconda3/envs/gprMax/lib/python3.11/site-packages/gprMax-3.1.7-py3.11-linux-x86_64.egg/gprMax/model_build_run.py", line 373, in run_model
tsolve, memsolve = solve_gpu(currentmodelrun, modelend, G)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/gprMax/lib/python3.11/site-packages/gprMax-3.1.7-py3.11-linux-x86_64.egg/gprMax/model_build_run.py", line 640, in solve_gpu
pml.gpu_update_magnetic(G)
File "/root/miniconda3/envs/gprMax/lib/python3.11/site-packages/gprMax-3.1.7-py3.11-linux-x86_64.egg/gprMax/pml.py", line 364, in gpu_update_magnetic
self.update_magnetic_gpu(np.int32(self.xs), np.int32(self.xf), np.int32(self.ys), np.int32(self.yf), np.int32(self.zs), np.int32(self.zf), np.int32(self.HPhi1_gpu.shape[1]), np.int32(self.HPhi1_gpu.shape[2]), np.int32(self.HPhi1_gpu.shape[3]), np.int32(self.HPhi2_gpu.shape[1]), np.int32(self.HPhi2_gpu.shape[2]), np.int32(self.HPhi2_gpu.shape[3]), np.int32(self.thickness), G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, self.HPhi1_gpu.gpudata, self.HPhi2_gpu.gpudata, self.HRA_gpu.gpudata, self.HRB_gpu.gpudata, self.HRE_gpu.gpudata, self.HRF_gpu.gpudata, floattype(self.d), block=G.tpb, grid=self.bpg)
File "/root/miniconda3/envs/gprMax/lib/python3.11/site-packages/pycuda/driver.py", line 481, in function_call
func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuEventDestroy failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuEventDestroy failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
Aborted (core dumped)

As a comparison, in the same environment, where I only enlarged the grid to reduce the memory, the model was able to run correctly

Memory (RAM) required: ~6.9GB host + ~6.9GB GPU

Running simulation, model 1/74: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3117/3117 [00:25<00:00, 123.92it/s]
Memory (RAM) used: ~4.51GB host + ~5.38GB GPU
Solving time [HH:MM:SS]: 0:00:25.675926

Is it possible that the relevant settings in pycuda are limiting the maximum memory that the model can be run with?

Could you please help me to run larger models using gprMax？

I appreciate all the help I can get on this. Cheers!

Craig Warren

unread,

Feb 21, 2024, 5:35:14 AM2/21/24

to gprMax-users

The memory estimate is pretty crude, so it maybe that it is not accurate for your model. Can you share your full input file please?

Craig

fegy .q

unread,

Feb 26, 2024, 9:02:19 AM2/26/24

to gprMax-users

I am sorry that i have sent too many replies, as this is the first time I use the Google Groups.

The input file is as follows. Thank you very much for your reply!

#title: pulse B-scan
#domain: 4.6 3 7
#dx_dy_dz: 0.005 0.01 0.005
#time_window: 60e-9

#waveform: impulse 1 1e9 my_wave

#python:
from gprMax.input_cmd_funcs import *

hertzian_dipole('y',0.44+(current_model_run-1)*0.05, 1.1, 0.2,'my_wave')
rx(0.44+(current_model_run-1)*0.05, 1.3, 0.2)

#end_python:

#material: 6 0.001 1 0 my_wall
#material: 81 0.001 1 0 human
#material: 4 0.001 1 0 wood

#python:
from gprMax.input_cmd_funcs import *

box(0.2, 0, 0.5, 0.44, 3, 6.56, 'my_wall')
box(0.44, 0, 1.5, 4.1, 3, 1.7, 'my_wall')
box(4.1, 0, 0.5, 4.34, 3, 6.56, 'my_wall')
box(1.41, 0, 3.26, 4.1, 3, 3.38, 'my_wall')
box(1.52, 0, 6.36, 4.34, 3, 6.56, 'my_wall')
box(0.44, 0, 6.36, 0.54, 3, 6.56, 'my_wall')

box(0, 0, 0, 4.6, 0.1, 7, 'my_wall')
box(0, 2.9, 0, 4.6, 3, 7, 'my_wall')

#end_python:

##geometry_view: 0 0 0 4.6 3 7 0.01 0.01 0.01 corner n
## n_times = 74

Jakub Vaverka

unread,

Jan 31, 2025, 8:41:29 AM1/31/25

to gprMax-users

Dear gprMax developers and community,

first, let me thank you for amazing software, yours big afford and growing community.

I would like to ask about the similar question and if you figured out where was the problem. I obtained identical error message on HPC GPU (linux) when I increase the duration (time window) of the simulation. Our simulation is very robust (we have more than 15 millions iterations but "only" 2.52 millions cells). On the other hand, it should be no problem of the insufficient memory.

Slightly shorter (about 30 %) simulation runs well on NVIDIA Tesla V100 (it takes only 42 % of GPU memory). This slightly longer simulation crashed even on NVIDIA A100. It seems to me that the problem is with the size of the simulation but not in GPU memory.

The simulation works well on CPU but it is significantly slower.

I have seen several similar questions in this forum but I am not able to figure out any conclusion from the discussions.

I will be very happy for any suggestion or recommendation.

Thank you very much for your time.

Best regards,

Jakub Vaverka

Dne pondělí 26. února 2024 v 15:02:19 UTC+1 uživatel fegy...@gmail.com napsal:

Antonis Giannopoulos

unread,

Jan 31, 2025, 8:54:03 AM1/31/25

to gprMax-users

Hi Jakub,

I must say that I am not the computing expert on the GPU CUDA kernels but I know that for the GPU solver we store the output points in vectors in memory and we do not export them to disk every time step. This obviously increases speed but if the outputs needed are substantial then that might lead to a problem with available memory that we might be missing. If you have 15 million iterations and a lot of output points this might explain running out of memory.

If however, the model does not run even for few iterations this points to something else that needs investigation. Sadly, we do not have these GPUs (A100, etc.) to replicate problems with very high needs. I believe it will be something really simple as some variable getting the wrong type that limits the size of a memory allocation but also if your size of output is really big this can cause a problem.

If Craig gets some time I am sure he will look into this.

Best

Antonis

Jakub Vaverka

unread,

Jan 31, 2025, 11:20:20 AM1/31/25

to gprMax-users

Hi Antonis,

Thank you for a very fast response.

The simulation crashes immediately after it starts on GPU. I saw exactly the same report in a different discussion here.

I include the part of the report where the problem starts. The output file (.out) should be less than 1 MB. We mainly save data in snapshots (cca 100) but their size is independent of the simulation duration (we keep the same number of them with bigger time distance).

I have been wondering if there can be a problem with communication between GPU and prepared input data on CPU. We use 180 Hertzian dipoles and each of them have more than 15 millions time points (I think that they are calculated/stored independently even if the waveform is identical).

The preparation of the simulation on a single CPU takes thus a long time (few hours). There is above 20 GB of data in CPU memory at the moment when the simulation starts on GPU. I will try some tests to figure out more.

Best regards and happy weekend,

Jakub Vaverka

Dne pátek 31. ledna 2025 v 14:54:03 UTC+1 uživatel Antonis Giannopoulos napsal:

Antonis Giannopoulos

unread,

Jan 31, 2025, 12:47:38 PM1/31/25

to gprMax-users

Hi Jakub,

Does the simulation with 180 dipoles and 15 million time points run on GPU for smaller models? gprMax will build the model in the CPU and then transfer arrays that are needed in the GPU. We try not to move stuff whilst the execution of the solving takes place. So, the source waveforms are precomputed and stored in an array. Normally this is not a problem but in your case this can be up to 10.8 gigabytes if you are using 15E6 time points in your waveform. I am not sure why this is the case. If you have so many time steps. If you are using the same waveform for all sources then we should only store one copy but need to double check.

We will get to the bottom of this again it will be something simple that we never tested for as we do not have the capacity to do very big models beyond our 24GB capacity of our own GPUs.

Best

Antonis

Jakub Vaverka

unread,

Feb 3, 2025, 4:24:00 AM2/3/25

to gprMax-users

Hi Antonis,

Yes, it looks that the problem is in the data transfer from CPU to GPU.

I use the same geometry for all simulations and change the time window (number of iterations). It worked with 9.3 millions time points on NVIDIA Tesla V100. Even when slightly more (23 GB) memory on CPU were used (there were more 3D snapshots).

It worked also with only one dipole (9 GB) for 15 million time points. Used CPU memory decreased from 21 GB to 9 GB when only one dipole have been used. I assume that the output vti files are also prepared at the beginning of the simulation on CPU but not transferred to GPU.

So the memory usage on CPU doesn't correspond to data transferred to GPU. I will run some test without snapshots to get better estimation for the critical data volume. I will let you know when the results come.

20 GB is far bellow HPC CPU ans GPU (A100) memory, so it should be no issue of hardware. On the other hand, the similar error on GPU have been mentioned several times in this forum and I suppose that others models have been smaller.

This is a little bit confusing.

Best regards,

Jakub

Dne pátek 31. ledna 2025 v 18:47:38 UTC+1 uživatel Antonis Giannopoulos napsal:

Craig Warren

unread,

Feb 4, 2025, 12:03:17 PM2/4/25

to gprMax-users

Dear Jakub,

Can you pull the latest code from our 'devel' branch and try your examples again. I have made some changes to the way the source waveforms are pre-calculated which may help.

Kind regards,

Craig

Jakub Vaverka

unread,

Feb 10, 2025, 4:19:08 AM2/10/25

to gprMax-users

Dear Craig,

I am very sorry for my slow response. I have been sick last week.

I run the GPU simulations on HPC. I will ask them if it is possible to update the code and I will try it.

I have tried some simulations without snapshots (15 million time points). The simulation with one dipole takes cca 0.5 GB of CPU memory, 90 dipoles 10.7 GB, and 180 dipoles 21.3 GB. The last one crashed.

The preparation time on CPU before simulation started on GPU takes cca 1 minute for 1 dipole, 1.5 hours for 90 dipoles, and more than 3 hours for 180 dipoles. All dipoles have identical waveform (Gaussian).

The simulation with one dipole runs well also for 30 million time points (the problem is not in the number of time steps).

Best regards,

Jakub

Dne úterý 4. února 2025 v 18:03:17 UTC+1 uživatel Craig Warren napsal:

Jakub Vaverka

unread,

Feb 17, 2025, 9:35:12 AM2/17/25

to gprMax-users

Dear Craig and Antonis,

I have tried the gprMax from 'devel' branch (548a0a5) on HPC (H100 and A100). The preparation took some time.

The initial CPU preparation phase is now much shorter (less than 1 minute instead of cca 80 minutes). The problems with the "bigger" simulations is still the same.

The "shorter" simulations runs well as before (only the data in the output vti files are now suspicious).

I am not sure if the output report can help you. Here are two of them for 3.1.7 and the "devel" versions.

Let me know if there is something I can try for the problem identification.

Best regards,

Jakub Vaverka

*************************************************gprMax3.1.7.************************************

Running simulation, model 1/1: 0%| | 0/15577675 [00:00<?, ?it/s]
Running simulation, model 1/1: 0%| | 69/15577675 [00:00<5:23:19, 803.00it/s]Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/__main__.py", line 6, in <module>
gprMax.gprMax.main()
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 69, in main
run_main(args)
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 191, in run_main
run_std_sim(args, inputfile, usernamespace)
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 232, in run_std_sim

run_model(args, currentmodelrun, modelend - 1, numbermodelruns, inputfile, modelusernamespace)

File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/model_build_run.py", line 373, in run_model

tsolve, memsolve = solve_gpu(currentmodelrun, modelend, G)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/model_build_run.py", line 640, in solve_gpu
pml.gpu_update_magnetic(G)
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/pml.py", line 364, in gpu_update_magnetic

self.update_magnetic_gpu(np.int32(self.xs), np.int32(self.xf), np.int32(self.ys), np.int32(self.yf), np.int32(self.zs), np.int32(self.zf), np.int32(self.HPhi1_gpu.shape[1]), np.int32(self.HPhi1_gpu.shape[2]), np.int32(self.HPhi1_gpu.shape[3]), np.int32(self.HPhi2_gpu.shape[1]), np.int32(self.HPhi2_gpu.shape[2]), np.int32(self.HPhi2_gpu.shape[3]), np.int32(self.thickness), G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, self.HPhi1_gpu.gpudata, self.HPhi2_gpu.gpudata, self.HRA_gpu.gpudata, self.HRB_gpu.gpudata, self.HRE_gpu.gpudata, self.HRF_gpu.gpudata, floattype(self.d), block=G.tpb, grid=self.bpg)

File "/hpc2n/eb/software/PyCUDA/2024.1.2-gfbf-2023b-CUDA-12.4.0/lib/python3.11/site-packages/pycuda/driver.py", line 481, in function_call

func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: an illegal memory access was encountered

PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered

...

PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

*************************************************gprMax_devel-548a0a5************************************

Model 1/1 solving on b-cn1610.hpc2n.umu.se with CUDA backend using Device 0: NVIDIA A100 80GB PCIe

|--->: 0%| | 0/15577675 [00:00<?, ?it/s]
|--->: 0%| | 69/15577675 [00:00<40:46:10, 106.14it/s]Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/__main__.py", line 6, in <module>
gprMax.gprMax.cli()
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 218, in cli
results = run_main(args)
^^^^^^^^^^^^^^
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 245, in run_main
results = context.run()
^^^^^^^^^^^^^
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/contexts.py", line 89, in run
model.solve(solver)
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/model_build_run.py", line 392, in solve
solver.solve(iterator)
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/solvers.py", line 113, in solve
self.updates.update_magnetic_pml()
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/updates.py", line 673, in update_magnetic_pml
pml.update_magnetic()
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/pml.py", line 566, in update_magnetic
self.update_magnetic_dev(
File "/hpc2n/eb/software/PyCUDA/2024.1.2-gfbf-2023b-CUDA-12.4.0/lib/python3.11/site-packages/pycuda/driver.py", line 481, in function_call

func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: an illegal memory access was encountered

PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered

...

PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

Dne pondělí 10. února 2025 v 10:19:08 UTC+1 uživatel Jakub Vaverka napsal:

Craig Warren

unread,

Feb 17, 2025, 11:27:09 AM2/17/25

to gprMax-users

Dear Jakub,

See this recently resolved issue on our GitHub issue tracker - https://github.com/gprMax/gprMax/issues/469

For very big models it looks like we are overflowing the index variable used to index our arrays in CUDA. They are currently cast as 'int's so may need to be re-case as 'long'.

Kind regards,

Craig

Jakub Vaverka

unread,

Feb 19, 2025, 4:09:07 AM2/19/25

to gprMax-users

Dear Craig,

yes, this could be it. Thank you.

It's great that you've identified the problem.

Best regards,

Jakub

Dne pondělí 17. února 2025 v 17:27:09 UTC+1 uživatel Craig Warren napsal:

Jakub Vaverka

unread,

Feb 24, 2025, 9:36:24 AM2/24/25

to gprMax-users

Dear Craig,

I can confirm that modifying of knl_common_base.tmpl worked!

I have one more question related to snapshots. Is it possible that there is a similar problem? There is no error message and vti files are successfully created but they seems to be empty. Data in the output file (rx) seams to be ok.

Best regards,

Jakub Vaverka

Dne středa 19. února 2025 v 10:09:07 UTC+1 uživatel Jakub Vaverka napsal:

Reply all

Reply to author

Forward