GPU vs CPU performance on consumer workstation

18 views
Skip to first unread message

rafa...@gmail.com

unread,
Feb 7, 2026, 1:31:35 PM (5 days ago) Feb 7
to cp2k
Hello, I'm testing CP2K performance on an older workstation PC and I'm finding that a the CPU version of CP2k 2025.2 is faster than the GPU version. My understanding is that many consumer GPUs do not have great double precision performance, but I can't tell if the slower GPU timing is normal for my system or if there is anything I can improve? For example, a CPU-only H2O-32.inp benchmark is twice as fast as a GPU run. The timings show that "grid_collocate_task_list" and "grid_integrate_task_list" are the most time consuming steps.

I came across a similar thread from 2018 issue73, but I wonder how those comments hold up for the 2025.2 CP2K version? Should I expect any performance gains from a GPU on small systems (<250 atoms)? I attached the ARCH files I used to build the CPU and GPU versions of CP2K along with the output files from the H2O-32.inp benchmarks.

My system has: hyperthreaded 4-core AMD Ryzen 5 2400G CPU, NVIDIA RTX 3050 6gb GPU, and 16gb RAM.

For CPU runs I use 4 MPI ranks with 2 OMP threads to get full CPU utilization. For GPU runs I use 1 MPI rank with 2 OMP threads, increasing OMP_NUM_THREADS to 4, 6, 8 does not show increased CPU utilization during a GPU run.

(I am unable to run H20-64.inp on GPU because of a CUDA OOM error: ERROR: "cudaErrorLaunchOutOfResources" at /home/raf/cp2k-home/cp2k-colordiffusion/cp2k-2025.2/src/grid/gpu/grid_gpu_collocate.cu:387 )

Thanks,
Rafal
local_cuda.psmp
local.psmp
H2O-32-cpu-n4-o2.out
H2O-32-gpu-n1-o2.out

Frederick Stein

unread,
Feb 7, 2026, 3:20:19 PM (5 days ago) Feb 7
to cp2k
Dear Rafael,
with your GPU consumer cards will not provide an acceleration in case of CP2K no matter the workload because CP2K relies on Double-precision floating point numbers for accuracy which are not well supported by consumer cards such as NVIDIA RTX.
The GPU performance has improved since then (grid library, PDGEMM in RPA, DGEMM in MP2, ...) so some comments in the linked are not anymore correct.
I can't tell how much memory (CPU or GPU) you need for this test.
If you are interested to use the latest version of CP2K, be aware that you need to switch to the CMake-based (or Spack or Easybuild) build system.
Best,
Frederick
Reply all
Reply to author
Forward
0 new messages