Dear Craig and Antonis,
I have tried the gprMax from 'devel' branch (548a0a5) on HPC (H100 and A100). The preparation took some time.
The initial CPU preparation phase is now much shorter (less than 1 minute instead of cca 80 minutes). The problems with the "bigger" simulations is still the same.
The "shorter" simulations runs well as before (only the data in the output vti files are now suspicious).
I am not sure if the output report can help you. Here are two of them for 3.1.7 and the "devel" versions.
Let me know if there is something I can try for the problem identification.
Best regards,
Jakub Vaverka
*************************************************gprMax3.1.7.************************************
Running simulation, model 1/1: 0%| | 0/15577675 [00:00<?, ?it/s]
Running simulation, model 1/1: 0%| | 69/15577675 [00:00<5:23:19, 803.00it/s]Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/__main__.py", line 6, in <module>
gprMax.gprMax.main()
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 69, in main
run_main(args)
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 191, in run_main
run_std_sim(args, inputfile, usernamespace)
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 232, in run_std_sim
run_model(args, currentmodelrun, modelend - 1, numbermodelruns, inputfile, modelusernamespace)
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/model_build_run.py", line 373, in run_model
tsolve, memsolve = solve_gpu(currentmodelrun, modelend, G)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/model_build_run.py", line 640, in solve_gpu
pml.gpu_update_magnetic(G)
File "/hpc2n/eb/software/gprMax/3.1.7-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/pml.py", line 364, in gpu_update_magnetic
self.update_magnetic_gpu(np.int32(self.xs), np.int32(self.xf), np.int32(self.ys), np.int32(self.yf), np.int32(self.zs), np.int32(self.zf), np.int32(self.HPhi1_gpu.shape[1]), np.int32(self.HPhi1_gpu.shape[2]), np.int32(self.HPhi1_gpu.shape[3]), np.int32(self.HPhi2_gpu.shape[1]), np.int32(self.HPhi2_gpu.shape[2]), np.int32(self.HPhi2_gpu.shape[3]), np.int32(self.thickness), G.ID_gpu.gpudata, G.Ex_gpu.gpudata, G.Ey_gpu.gpudata, G.Ez_gpu.gpudata, G.Hx_gpu.gpudata, G.Hy_gpu.gpudata, G.Hz_gpu.gpudata, self.HPhi1_gpu.gpudata, self.HPhi2_gpu.gpudata, self.HRA_gpu.gpudata, self.HRB_gpu.gpudata, self.HRE_gpu.gpudata, self.HRF_gpu.gpudata, floattype(self.d), block=G.tpb, grid=self.bpg)
File "/hpc2n/eb/software/PyCUDA/2024.1.2-gfbf-2023b-CUDA-12.4.0/lib/python3.11/site-packages/pycuda/driver.py", line 481, in function_call
func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
...
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
*************************************************gprMax_devel-548a0a5************************************
Model 1/1 solving on
b-cn1610.hpc2n.umu.se with CUDA backend using Device 0: NVIDIA A100 80GB PCIe
|--->: 0%| | 0/15577675 [00:00<?, ?it/s]
|--->: 0%| | 69/15577675 [00:00<40:46:10, 106.14it/s]Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/__main__.py", line 6, in <module>
gprMax.gprMax.cli()
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 218, in cli
results = run_main(args)
^^^^^^^^^^^^^^
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/gprMax.py", line 245, in run_main
results = context.run()
^^^^^^^^^^^^^
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/contexts.py", line 89, in run
model.solve(solver)
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/model_build_run.py", line 392, in solve
solver.solve(iterator)
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/solvers.py", line 113, in solve
self.updates.update_magnetic_pml()
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/updates.py", line 673, in update_magnetic_pml
pml.update_magnetic()
File "/proj/nobackup/hpc2n2024-132/easybuild/software/gprMax/devel-548a0a5-foss-2023b-CUDA-12.4.0/lib/python3.11/site-packages/gprMax/pml.py", line 566, in update_magnetic
self.update_magnetic_dev(
File "/hpc2n/eb/software/PyCUDA/2024.1.2-gfbf-2023b-CUDA-12.4.0/lib/python3.11/site-packages/pycuda/driver.py", line 481, in function_call
func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
Dne pondělí 10. února 2025 v 10:19:08 UTC+1 uživatel Jakub Vaverka napsal: