Running cp2k on GPU

Hemanth Haridas

unread,

Jun 6, 2024, 3:10:38 PM6/6/24

to cp2k

I am trying to run CP2K on a linux cluster with GPU support. I have successfully complied the code with CUDA support. But the utilization of GPU is zero, even though the program is running , meaning that the code is running on cpu cores.

This is the script that I am using to run cp2k

#!/bin/bash

#SBATCH --job-name=LiCl ### Job Name

#SBATCH --output=cp2k.out ### File in which to store job output

#SBATCH --error=cp2k.err ### File in which to store job error messages

#SBATCH --time=3-00:00:00 ### Wall clock time limit in Days-HH:MM:SS

#SBATCH --ntasks=64

#SBATCH --gres=gpu:1

#SBATCH --cpus-per-task=1

module load gcc cuda/11.8.0 openmpi/4.1.6-gpu intel-oneapi-mkl/2022.0.2

source /cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/setup

export OMP_NUM_THREADS=$SLURM_NTASKS

/cp2k_plumed_gpu/cp2k-2024.1/exe/local_cuda/cp2k.psmp -i colvars.inp -o colvars.out

Are there any additional flags that I need to use to run the code on GPUs?

Johann Pototschnig

unread,

Jun 7, 2024, 3:51:34 AM6/7/24

to cp2k

Can you provide the local_cuda.psmp file which you find in the arch folder?

Hemanth Haridas

unread,

Jun 7, 2024, 11:18:47 AM6/7/24

to cp2k

CC = /gcc-8.5.0/openmpi-4.1.6-chhfokbbf3fbb2t4uo7ns4ukaripskzj/bin/mpicc

CXX = /gcc-8.5.0/openmpi-4.1.6-chhfokbbf3fbb2t4uo7ns4ukaripskzj/bin/mpic++

AR = ar -r

FC = /gcc-8.5.0/openmpi-4.1.6-chhfokbbf3fbb2t4uo7ns4ukaripskzj/bin/mpifort

LD = /gcc-8.5.0/openmpi-4.1.6-chhfokbbf3fbb2t4uo7ns4ukaripskzj/bin/mpifort

#

DFLAGS = -D__OFFLOAD_CUDA -D__DBCSR_ACC -D__LIBXSMM -D__parallel -D__MKL -D__FFTW3 -D__SCALAPACK -D__LIBINT -D__LIBXC -D__LIBGRPP -D__GSL -D__PLUMED2 -D__SPGLIB -D__OFFLOAD_GEMM -D__SPLA

#

WFLAGS = -Werror=aliasing -Werror=ampersand -Werror=c-binding-type -Werror=intrinsic-shadow -Werror=intrinsics-std -Werror=line-truncation -Werror=tabs -Werror=target-lifetime -Werror=underflow -Werror=unused-but-set-variable -Werror=unused-variable -Werror=unused-dummy-argument -Werror=unused-parameter -Werror=unused-label -Werror=conversion -Werror=zerotrip -Wno-maybe-uninitialized -Wuninitialized -Wuse-without-only

#

FCDEBFLAGS = -fbacktrace -ffree-form -fimplicit-none -std=f2008

CFLAGS = -fno-omit-frame-pointer -fopenmp -g -mtune=native -O3 -funroll-loops $(PROFOPT) -I/gcc-8.5.0/openmpi-4.1.6-chhfokbbf3fbb2t4uo7ns4ukaripskzj/include -pthread -m64 -I/gcc-8.5.0/intel-oneapi-mkl-2023.2.0-etwucm5d3s2qu7eiuaaxastbiukj2ori/mkl/2023.2.0/include -I/gcc-8.5.0/intel-oneapi-mkl-2023.2.0-etwucm5d3s2qu7eiuaaxastbiukj2ori/mkl/2023.2.0/include/fftw -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5/include' -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libxc-6.2.2/include' -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libgrpp-main-20231215/include' -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libxsmm-1.17/include' -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/gsl-2.7/include' -I/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/spglib-1.16.2/include -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/SpLA-1.5.5/include/spla' -std=c11 -Wall -Wextra -Werror -Wno-vla-parameter -Wno-deprecated-declarations $(DFLAGS) -I/gcc-8.5.0/cuda-11.8.0-3wlxktsbgw2ui4wvdnsy7w7xyxlkkwju/include

FCFLAGS = -fno-omit-frame-pointer -fopenmp -g -mtune=native -O3 -funroll-loops $(PROFOPT) -I/gcc-8.5.0/openmpi-4.1.6-chhfokbbf3fbb2t4uo7ns4ukaripskzj/include -pthread -m64 -I/gcc-8.5.0/intel-oneapi-mkl-2023.2.0-etwucm5d3s2qu7eiuaaxastbiukj2ori/mkl/2023.2.0/include -I/gcc-8.5.0/intel-oneapi-mkl-2023.2.0-etwucm5d3s2qu7eiuaaxastbiukj2ori/mkl/2023.2.0/include/fftw -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5/include' -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libxc-6.2.2/include' -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libgrpp-main-20231215/include' -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libxsmm-1.17/include' -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/gsl-2.7/include' -I/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/spglib-1.16.2/include -I'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/SpLA-1.5.5/include/spla' $(FCDEBFLAGS) $(WFLAGS) $(DFLAGS)

CXXFLAGS = -O2 -fPIC -fno-omit-frame-pointer -fopenmp -g -march=native -mtune=native --std=c++14 $(DFLAGS) -Wno-deprecated-declarations -I/gcc-8.5.0/cuda-11.8.0-3wlxktsbgw2ui4wvdnsy7w7xyxlkkwju/include

#

LDFLAGS = $(FCFLAGS) -Wl,--enable-new-dtags -pthread -L/gcc-8.5.0/openmpi-4.1.6-chhfokbbf3fbb2t4uo7ns4ukaripskzj/lib -L/gcc-8.5.0/ucx-1.14.0-cupo7hrn2exqqwfyatdigeuiiqijaulw/lib -L/uufs/chpc.utah.edu/sys/spack/v019/linux-rocky8-x86_64/gcc-8.5.0/zlib-1.2.13-dcpzngybj4fisn6ojapnels3yfwcxqgk/lib -Wl,-rpath -Wl,/gcc-8.5.0/openmpi-4.1.6-chhfokbbf3fbb2t4uo7ns4ukaripskzj/lib -Wl,-rpath -Wl,/gcc-8.5.0/ucx-1.14.0-cupo7hrn2exqqwfyatdigeuiiqijaulw/lib -Wl,-rpath -Wl,/gcc-8.5.0/zlib-1.2.13-dcpzngybj4fisn6ojapnels3yfwcxqgk/lib -L'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5/lib' -L'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libxc-6.2.2/lib' -Wl,-rpath,'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libxc-6.2.2/lib' -L'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libgrpp-main-20231215/lib' -Wl,-rpath,'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libgrpp-main-20231215/lib' -L'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libxsmm-1.17/lib' -Wl,-rpath,'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/libxsmm-1.17/lib' -L'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/gsl-2.7/lib' -Wl,-rpath,'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/gsl-2.7/lib' -L'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/plumed-2.9.0/lib' -Wl,-rpath,'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/plumed-2.9.0/lib' -L'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/spglib-1.16.2/lib' -Wl,-rpath,'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/spglib-1.16.2/lib' -L'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/SpLA-1.5.5/lib/cuda' -Wl,-rpath,'/cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/SpLA-1.5.5/lib/cuda' -L'/gcc-8.5.0/cuda-11.8.0-3wlxktsbgw2ui4wvdnsy7w7xyxlkkwju/targets/x86_64-linux/lib' -Wl,-rpath,'/gcc-8.5.0/cuda-11.8.0-3wlxktsbgw2ui4wvdnsy7w7xyxlkkwju/targets/x86_64-linux/lib' -L'/usr/lib64' -Wl,-rpath,'/usr/lib64'

LDFLAGS_C =

LIBS = -lspla -lsymspg -l:libplumed.a -ldl -lstdc++ -lz -ldl -lgsl -lxsmmf -lxsmm -ldl -lpthread -llibgrpp -lxcf03 -lxc -lint2 -lmpi_cxx -lmpi -L/gcc-8.5.0/intel-oneapi-mkl-2023.2.0-etwucm5d3s2qu7eiuaaxastbiukj2ori/mkl/2023.2.0/lib/intel64 -Wl,-rpath,/gcc-8.5.0/intel-oneapi-mkl-2023.2.0-etwucm5d3s2qu7eiuaaxastbiukj2ori/mkl/2023.2.0/lib/intel64 -lmkl_scalapack_lp64 -Wl,--start-group -lmkl_gf_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_openmpi_lp64 -Wl,--end-group -lpthread -lm -ldl -lstdc++ -lcudart -lnvrtc -lcuda -lcufft -lcublas -lrt

#

GPUVER = A100

OFFLOAD_CC = nvcc

OFFLOAD_FLAGS = -g -arch sm_80 -O3 -allow-unsupported-compiler -Xcompiler='-fopenmp -Wall -Wextra -Werror' --std=c++11 $(DFLAGS)

OFFLOAD_TARGET = cuda

#

FYPPFLAGS = -n --line-marker-format=gfortran5

Johann Pototschnig

unread,

Jun 7, 2024, 12:52:33 PM6/7/24

to cp2k

It links to cuda, there should be no problem, but you are missing the mpirun / srun:

mpirun -n 1 -x OMP_NUM_THREADS=$... /cp2k_plumed_gpu/cp2k-2024.1/exe/local_cuda/cp2k.psmp -i colvars.inp -o colvars.out

depending on your system there might be additional options for mpirun/srun necessary.

The following program can help to figure out bindings:

https://code.ornl.gov/olcf/hello_jobstep

Johann Pototschnig

unread,

Jun 7, 2024, 1:02:42 PM6/7/24

to cp2k

For cuda it is rather:

https://github.com/tom-papatheodore/hello_jobstep

Hemanth Haridas

unread,

Jun 10, 2024, 12:31:45 PM6/10/24

to cp2k

I tried running cp2k as described in the previous email, but the code still does not run on the gpu, and the GPU usage is still zero.

Sincererly,

Hemanth

Johann Pototschnig

unread,

Jun 11, 2024, 7:07:07 AM6/11/24

to cp2k

Which GPU bindings did you end up using?

Did you compare to submission scripts for your cluster for which the GPU is used?

Hemanth Haridas

unread,

Jun 11, 2024, 10:37:29 AM6/11/24

to cp2k

The script that I used for submitting is reproduced below

#!/bin/bash

#SBATCH --job-name=LiCl ### Job Name

#SBATCH --output=cp2k.out ### File in which to store job output

#SBATCH --error=cp2k.err ### File in which to store job error messages

#SBATCH --time=3-00:00:00 ### Wall clock time limit in Days-HH:MM:SS

#SBATCH --ntasks=64

#SBATCH --gres=gpu:1

#SBATCH --cpus-per-task=1

module load gcc cuda/11.8.0 openmpi/4.1.6-gpu intel-oneapi-mkl/2022.0.2

source /cp2k_plumed_gpu/cp2k-2024.1/tools/toolchain/install/setup

export OMP_NUM_THREADS=$SLURM_NTASKS

mpirun -np 1 /cp2k_plumed_gpu/cp2k-2024.1/exe/local_cuda/cp2k.psmp -i colvars.inp -o colvars.out

We do not have a version of cp2k installed cluster-wide meaning that I do not have a script that I can compare with.

Johann Pototschnig

unread,

Jun 11, 2024, 11:35:24 AM6/11/24

to cp2k

It doesn't have to be cp2k. Any program that uses GPUs in order to figure out the script you need.

srun / mpirun might need additional options:

https://slurm.schedmd.com/srun.html#OPT_gpu-bind

Otherwise there are some options to get additional information:

- Since you have OpenMPI you can get more information if you put "ompi_info" in your script.

-You can also put "nvidia-smi" in the script to get GPU information:

https://developer.nvidia.com/system-management-interface

- "echo $CUDA_VISIBLE_DEVICES" should show the GPUs that are visible

Also you can check: https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/cuda.html

Hemanth Haridas

unread,

Jun 11, 2024, 11:45:47 AM6/11/24

to cp2k

Gromacs requires many other shell variables (which are Gromacs specific) for specifically running the code on the GPUs. On the other hand, NAMD (which is another MD code) just requires that you load the required bindings and use the binary to run the executable. The first case that I had reported is similar to the case of NAMD (which I had demonstrated that it does not work). I checked the GPU usage using "nvidia-smi" and observed that the GPU utilization was 0% and the process was not registered on the GPU cores, even though the program was running.

Best,

Hemanth

Nikhil Maroli

unread,

Jun 11, 2024, 12:54:35 PM6/11/24

to cp...@googlegroups.com

What is the status of running on a terminal without a script? login to the gpu node and execute it and check if the gpu is used or not.

Reply all

Reply to author

Forward