Regarding improvement of speed calculation HSE06

22 views
Skip to first unread message

Lorenzo Lagasco

unread,
Apr 28, 2026, 9:43:45 AM (2 days ago) Apr 28
to cp...@googlegroups.com

Good afternoon everyone,

I am writing to ask for advice on how to improve the computational efficiency of an HSE06 calculation used to evaluate inter-state couplings between diabatic states in a slab–organic dye system (using the Kondov diabatization scheme, system containing 395 atoms). I have attached the input file for reference. In a preliminary test on a CPU-only machine (single node with 52 processors), a single SCF iteration takes approximately 30 minutes.

Any suggestions would be greatly appreciated.

Best regards

Lorenzo Lagasco



SLAB+DYE.inp

Frederick Stein

unread,
Apr 28, 2026, 9:59:56 AM (2 days ago) Apr 28
to cp2k
Dear Lorenzo,
Are you sure that you need these tight cutoffs (EPS_DEFAULT and especially EPS_PGF_ORB)? Can you check how many integrals are recalculated (check the output file)? If they are, is it possible for you to increase MAX_MEMORY? How are you running CP2K (number of MPI ranks, number of OpenMP threads)?
Best,
Frederick

Lorenzo Lagasco

unread,
Apr 28, 2026, 10:30:01 AM (2 days ago) Apr 28
to cp...@googlegroups.com
Ok, maybe I can test a bit more carefully the EPS_DEFAULT and EPS_ORB (I have already launched different tests for checking convergence on NGRIDS,CUTOFF and REL_CUTOFF). Meanwhile, this is my slurm submission script:
#!/bin/bash
#SBATCH --job-name=HSEO6
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=52
#SBATCH --partition=taras2-6230r

module purge
module load cp2k/may2025-gnu14.2.0-openmpi4.1.6-psm211.2.230
export OMP_NUM_THREADS=1

mpirun cp2k.popt -i DYE+SLAB.inp -o DYE+SLAB.out


I tried to increase the number of OMP_NUM_THREADS from 1 to 2 or 3 and, as a result, the time required for each scf step is longer and longer. Moreover, I regulated the MAX_MEMORY in relation to the memory free of the node. 

--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cp2k/439c2d51-5c6f-4fc9-be1b-1ff2884ab77an%40googlegroups.com.

Frederick Stein

unread,
Apr 28, 2026, 11:18:45 AM (2 days ago) Apr 28
to cp2k
Just in case, be aware that MAX_MEMORY is per rank, i.e. you need to increase the memory if you use more OpenMP threads in favor of MPI ranks. Do you also have a timing report? This may also help to investigate. Usually, CUTOFF and REL_CUTOFF are less important with the Hartree-Fock-kernel, it is commonly more about EPS_DEFAULT, EPS_PGF_ORB or EPS_SCHWARZ.

Johann Pototschnig

unread,
Apr 28, 2026, 11:20:22 AM (2 days ago) Apr 28
to cp2k
You need to check you bindings / number of processes. 

You request 52 processes. Does mpi use them? To make sure run with:
 mpirun -n 52 ...

If you don't have the CPU's and increase the number of OMP threads then they are run on the same processor, which will slow down the calculation as you have seen. 

You can also split between MPI and OPENMP:
mpirun -n 26 -c 2 --bind-to none .. 
(The options might have different names depending on the MPI distribution). 


Which splitting works the best depends on your system and on the type of computation. 
In general more MPI processes require more memory as they have their copies, while openmp does not work over different nodes. 
There is more MPI parallelization in CP2K, so this should work for most types of calculation. 
Reply all
Reply to author
Forward
0 new messages