Dear CP2K developers and users,
I am testing a stress tensor calculation example with my newly compiled CP2K-2025.1 on Platinum 9282 CPUs and I found something wrong:
With the binary compiled with "--with-gcc=install --with-intel=no --with-mpich=install --with-mkl=system" tags , the MPI threw an error and the calculation cannot be performed if multiple k-points (say, 2*1*1) were to be sampled, but normally terminated if only gamma point was considered. Since the version compiled with "--with-gcc=
install --with-intel=no --with-mpich=install --with-openblas=install" can perform the calculation with multiple k-points without error (though much solwer than the apptainer image), I suspect that there is something wrong in the version compiled with MKL and wonder if there is any workaround owning to the high performance of MKL.
The input file looks like this:
'''
&GLOBAL
PROJECT StressTensor
PRINT_LEVEL LOW
RUN_TYPE GEO_OPT
&END GLOBAL
&FORCE_EVAL
METHOD Quickstep
STRESS_TENSOR ANALYTICAL
&SUBSYS
&CELL
A
B
C
PERIODIC XYZ #Direction(s) of applied PBC (geometry aspect)
&END CELL
&COORD
&END COORD
&KIND Cu
ELEMENT Cu
BASIS_SET DZVP-MOLOPT-SR-GTH-q11
POTENTIAL GTH-PBE
&END KIND
&KIND H
ELEMENT H
BASIS_SET TZVP-MOLOPT-GTH-q1
POTENTIAL GTH-PBE
&END KIND
&KIND O
ELEMENT O
BASIS_SET TZVP-MOLOPT-GTH-q6
POTENTIAL GTH-PBE
&END KIND
&END SUBSYS
&DFT
BASIS_SET_FILE_NAME BASIS_MOLOPT
POTENTIAL_FILE_NAME POTENTIAL
CHARGE 0 #Net charge
MULTIPLICITY 1 #Spin multiplicity
&KPOINTS
SCHEME MONKHORST-PACK 2 1 1
&END KPOINTS
&QS
EPS_DEFAULT 1.0E-12 #Set all EPS_xxx to values such that the energy will be correct up to this value
&END QS
&POISSON
PERIODIC XYZ #Direction(s) of PBC for calculating electrostatics
PSOLVER PERIODIC #The way to solve Poisson equation
&END POISSON
&XC
&XC_FUNCTIONAL PBE
&END XC_FUNCTIONAL
&VDW_POTENTIAL
POTENTIAL_TYPE PAIR_POTENTIAL
&PAIR_POTENTIAL
PARAMETER_FILE_NAME dftd3.dat
TYPE DFTD3(BJ)
REFERENCE_FUNCTIONAL PBE
CALCULATE_C9_TERM T #Calculate C9-related three-body term, more accurate for large system
&END PAIR_POTENTIAL
&END VDW_POTENTIAL
&END XC
&MGRID
CUTOFF 480
REL_CUTOFF 60
NGRIDS 5 #The number of multigrids to use. 5 is optimal for MOLOPT-GTH basis sets
&END MGRID
&SCF
MAX_SCF 256
EPS_SCF 1.0E-06 #Convergence threshold of density matrix of inner SCF
&DIAGONALIZATION
ALGORITHM STANDARD #Algorithm for diagonalization
&END DIAGONALIZATION
&MIXING #How to mix old and new density matrices
METHOD BROYDEN_MIXING #PULAY_MIXING is also a good alternative
ALPHA 0.4 #Default. Mixing 40% of new density matrix with the old one
NBROYDEN 8 #Default is 4. Number of previous steps stored for the actual mixing scheme
&END MIXING
&SMEAR
METHOD FERMI_DIRAC
ELECTRONIC_TEMPERATURE 30 #Electronic temperature of Fermi-Dirac smearing in K
&END SMEAR
ADDED_MOS 504 #Number of virtual MOs to solve
&PRINT
&RESTART #Note: Use "&RESTART OFF" can prevent generating .wfn file
BACKUP_COPIES 0 #Maximum number of backup copies of wfn file. 0 means never
&END RESTART
&END PRINT
&END SCF
&END DFT
&PRINT
&STRESS_TENSOR
&END STRESS_TENSOR
&END PRINT
&END FORCE_EVAL
&MOTION
&GEO_OPT
TYPE MINIMIZATION #Search for minimum
KEEP_SPACE_GROUP F #If T, then space group will be detected and preserved
OPTIMIZER CG #Can also be BFGS or LBFGS
&CG
&LINE_SEARCH
TYPE 2PNT #Two-point extrapolation, cheap while acceptable. Can also be FIT, GOLD
&END LINE_SEARCH
&END CG
MAX_ITER 0 #Maximum number of geometry optimization
MAX_DR 3E-3 #Maximum geometry change
RMS_DR 1.5E-3 #RMS geometry change
MAX_FORCE 4.5E-4 #Maximum force
RMS_FORCE 3E-4 #RMS force
&END GEO_OPT
&CONSTRAINT
&FIXED_ATOMS #Set atoms to be fixed
COMPONENTS_TO_FIX XYZ #Which fractional components will be fixed, can be X, Y, Z, XY, XZ, YZ, XYZ
LIST 5 6 11 12 17 18 23 24 29 30 35 36 41 42 47 48 53 54 59 60 65 66 71 72 77 78 83 84 89 90 95 96 101 102 107 108 113 114 119 120 125 126 131 132 137 138 143 144 149 150 155 156 161 162 167 168 173 174 179 180 185 186 191 192 197 198 203 204 209 210 215 216 221 222 227 228 233 234 239 240 245 246 251 252 257 258 263 264 269 270 275 276 281 282 287 288 293 294 299 300 305 306 311 312 317 318 323 324 329 330 335 336 341 342 347 348 353 354 359 360 365 366 371 372 377 378 383 384 389 390 395 396 401 402 407 408 413 414 419 420 425 426 431 432
&END FIXED_ATOMS
&FIXED_ATOMS #Set atoms to be fixed
COMPONENTS_TO_FIX Z #Which fractional components will be fixed, can be X, Y, Z, XY, XZ, YZ, XYZ
LIST 721..1008
&END FIXED_ATOMS
&END CONSTRAINT
&PRINT
&TRAJECTORY
FORMAT xyz
&END TRAJECTORY
&RESTART
BACKUP_COPIES 0 #Maximum number of backing up restart file, 0 means never
&END RESTART
&END PRINT
&END MOTION
'''
BTW, it's found that the binary compiled with openmpi-5.0.6 shows severe performance problems compared to openmpi-4.1.5 or mpich-4.0.3 in our cluster, and so did the apptainer image, but this is not so urgent :)
All your suggestions are greatly appreciated.