Spending two weeks, eventually, I succeed in running gprMax in GPU in a Windows 10 system.
Firstly, I would like to share the experience of installing software for gprMax GPU computing. It seems LINUX is much easier for GPU software installation than Windows for its gcc complier, but I prefer to use Windows GPU computing because I am not familiar with LINUX operations. To install Visual Studio is my first step. I have tried Visual Studio 2017 and 2015, but I recommend to use Visual Studio 2015 to ensure CUDA Toolkit recognizing VS. I used vs2015.ent_enu.iso in my computer, and I choose to install all components. It is time- and space-consuming, but it is safe for a programming beginner. Then, I installed CUDA toolkit as the gprMax doc guide, I chose to install cuda_9.0.176_win10.exe, with all default installation step. The environment variable settings are very important, but it will be OK if you chose the default installation. Afterwards, I install Miniconda and gprMax and have a test of gprMax, as guiding by the gprMax manual. I have met with some problems in this step, because my web link is very unsteady, which might not be a problem for a person in other countries. The next step is to install pycuda according to the guide. Now, everything is finished, I began to run gprMax in GPU. As discussed in the GitHub, I also met with the problem “The context stack was not empty upon module cleanup”. The main reason is the cl.exe, which should have been included into the environment variable path. I manually add cl.exe directory into user path, and it showed everything is OK. I strongly suggest to pay attention to the environment variable settings if you met with some problems. So many software installed that they can not correlate with each other.
Then, I had a series of tests in my workstation. My work station is configured as:
Dell T7810, one CPU with Intel Xeon E5-2643 v4 @3.4GHz 6 cores 12 threads, memory 32 G, one NVIDIA Quadro K1200 with 4G memory and 512 cuda cores. My test results are as follows:
(1) cylinder_Ascan_2D.in, cell number: 120*120*1
CPU: 41.9 M memory, solver time 0.58 s, simulation time 1.45 s
GPU: 109 M memory, solver time 0.36 s, simulation time 3.57 s
(2) cylinder_Ascan_2D.in, increased cell number: 5000*5000*1=2500*e4
CPU: 2.59 G memory, solver time 1 m 22 s, simulation time 1 m 27 s
GPU: 2.84 G memory, solver time 28 s, simulation time 48 s
(3) heterogeneous_soil.in, cell number: 150*150*100=175*e4
CPU: 307 M memory, solver time 2 m 47 s, simulation time 3 m 3 s
GPU: 375 M memory, solver time 35.1 s, simulation time 50.55 s
(4) heterogeneous_soil.in, increased cell number: 300*300*500=4500*e4
CPU: can compute
GPU: warning of GPU memory beyond 4 G
I think the GPU mainly decreases the solver time, but it can not compute a large-scale model that consumes GPU memory. Generally, a large-scale model is in more need of GPU for solving linear system of equations, while the resulting memory consuming limits the utilization of GPU.