Hi,
I am observing an issue with qmcpack v 4.1.0 employing GPUs and run on the Leonardo cluster, on the Booster partition, at CINECA (
https://docs.hpc.cineca.it/hpc/leonardo.html#system-architecture),
I am trying to understand if the problem comes from my compilation, the way I run the code, or something else.
Let me explain the problem. I was running the GPU version, and I was noticing it goes slower than I expected. So, I tried a small system (water-methane complex) and tried both the GPU and the CPU-only version of the code, and got that the CPU-only version is way faster despite having the same resources and not employing the GPUs.
I am attaching the outputs of the two calculations.
They are performed using two nodes of the Booster partition, having 4 GPUs per node and 32 CPUs per node, so I used 4 mpi rasks per node (so, 8 in total) and 8 OMP threads per mpi task.
The timing are:
GPU:
Timer Inclusive_time Exclusive_time Calls Time_per_call
Total 517.1148 2.5066 1 517.114793363
DMCBatched 190.5414 190.5414 1 190.541434383
Startup 0.1267 0.1267 1 0.126677281
VMCBatched 323.9401 323.9401 1 323.940080429
CPU-only:
Timer Inclusive_time Exclusive_time Calls Time_per_call
Total 147.9119 0.0558 1 147.911921762
DMCBatched 82.2949 82.2949 1 82.294926232
Startup 0.1188 0.1188 1 0.118750077
VMCBatched 65.4425 65.4425 1 65.442487005
As you can see, CPU-only is way faster.
This is how I compiled the GPU version:
module load cmake #/4.1.2
module load ninja
module load gcc/12.2.0
module load cuda/12.2
module load openmpi/4.1.6--gcc--12.2.0-cuda-12.2
module load fftw/3.3.10--openmpi--4.1.6--gcc--12.2.0-spack0.22
module load hdf5/1.14.3--openmpi--4.1.6--gcc--12.2.0-spack0.22
module load boost/1.85.0--openmpi--4.1.6--gcc--12.2.0
module load openblas/0.3.26--gcc--12.2.0
cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx \
-DQMC_COMPLEX=OFF -DQMC_MIXED_PRECISION=OFF \
-DQMC_GPU="cuda" -DQMC_GPU_ARCHS=sm_80 \
../qmcpack-4.1.0
make -j 32
While I got the CPU-only version using
cmake -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx \
-DQMC_COMPLEX=OFF -DQMC_MIXED_PRECISION=OFF \
../qmcpack-4.1.0
make -j 32
Can somebody help me with this?
Best,
Andrea Zen