Why simulation with PBE functional is too slow?

huan...@mail.huji.ac.il

unread,

Oct 7, 2013, 8:56:34 AM10/7/13

to cp...@googlegroups.com

Dear All,

I am now using CP2K to run MD simulations on the benzen-Zundel cation with BLYP and PBE functional, respectively. I use the same number of CPU, but for PBE functional, the speed is 3 times slower than BLYP. In addition, the calculation with PBE functional takes nearly whole the memory, shown below.

For BLYP functional-- Mem: 64380M total, 16753M used, 47626M free

For PBE functional-- Mem: 64380M total, 63976M used, 403M free

Is that common for PBE functional?

Actually, I just copied the BLYP input file, and then changed the functional into PBE and added long range correction, C9 term. The rest of options, parameters, coordinates and basis sets are the same with BLYP input file.

I pasted my input file below. Can anyone give me some suggestion? I appreciate it very much.

Best wishes,
Huan

======== INPUT FILE ===========

&GLOBAL
PREFERRED_FFT_LIBRARY FFTW #FFTSG # FFTESSL
# FFT_POOL_SCRATCH_LIMIT 10
# PROGRAM Quickstep
PROJECT nve_50K # #
RUN_TYPE MD # # MD ## GEO_OPT ## ENERGY ----------
PRINT_LEVEL LOW
# SEED -6
&END GLOBAL
&FORCE_EVAL
&DFT
   CHARGE 1
   # UKS # ROKS with self-interaction correction (SIC)
   UKS
   &SIC
     SIC_METHOD MAURI_SPZ
     SIC_SCALING_A 0.2
     SIC_SCALING_B 0.0
   &END
   # non-periodic Poisson (cluster boundary condition)
   &POISSON
     POISSON_SOLVER MT
     &MT
       ALPHA 7.0
       REL_CUTOFF 2.0
     &END MT
     PERIODIC NONE
   &END POISSON
   BASIS_SET_FILE_NAME GTH_BASIS_SETS
   POTENTIAL_FILE_NAME GTH_POTENTIALS
   &MGRID
     CUTOFF 300
   &END MGRID
   &PRINT
     &MOMENTS ON
         COMMON_ITERATION_LEVELS 20000
         FILENAME ./MOMENTS
         ADD_LAST NUMERIC
         PERIODIC FALSE
         REFERENCE COAC
          &EACH 1
          &END
      &END MOMENTS
      &LOCALIZATION ON
           &TOTAL_DIPOLE
                COMMON_ITERATION_LEVELS 20000
                FILENAME ./TOTAL_DIPOLE.dat
                ADD_LAST NUMERIC
                PERIODIC   FALSE
                REFERENCE COAC
                &EACH 1
                &END
           &END TOTAL_DIPOLE
#           &MOLECULAR_DIPOLES
#                COMMON_ITERATION_LEVELS 20000
#                FILENAME ./MOLECULAR_DIPOLES.dat
#                ADD_LAST NUMERIC
#                PERIODIC   FALSE
#                REFERENCE COAC
#                &EACH 1
#                &END
#           &END MOLECULAR_DIPOLES
       &END LOCALIZATION
#      &MO_CUBES
#       NLUMO=1
#       NHOMO=1
#      &END MO_CUBES
#      &MO
#       EIGENVALUES T
#       OCCUPATION_NUMBERS T
#       COMMON_ITERATION_LEVELS 0
#       EACH 1
#      &END MO
#     &E_DENSITY_CUBE
#     &END E_DENSITY_CUBE
     &MULLIKEN
     &END MULLIKEN
     &LOWDIN
     &END LOWDIN
   &END PRINT
   &QS
     EPS_DEFAULT 1.0E-12
     EXTRAPOLATION PS
     EXTRAPOLATION_ORDER 3
   &END QS
   &LOCALIZE T
     EPS_LOCALIZATION 1.0E-4
     EPS_OCCUPATION 1.E-6
     OPERATOR BERRY
     METHOD    JACOBI
     MAX_ITER 2000
     MAX_CRAZY_ANGLE 0.05
   &END LOCALIZE
   &SCF
     MAX_SCF 150
     SCF_GUESS ATOMIC # ATOMIC # RESTART ---cont use RESTART
     &MIXING
     &END MIXING
   &OT
   MINIMIZER DIIS
   #ROTATION TRUE #needed for ROKS+SIC!!
   &END OT
   &OUTER_SCF
     MAX_SCF 20
     EPS_SCF 1.0E-5
   &END OUTER_SCF
   &END SCF
   &XC
     &XC_FUNCTIONAL PBE #specified the functional
     &END XC_FUNCTIONAL
     &vdw_POTENTIAL
      DISPERSION_FUNCTIONAL PAIR_POTENTIAL
      &PAIR_POTENTIAL
       TYPE DFTD3
       REFERENCE_FUNCTIONAL PBE
       CALCULATE_C9_TERM .TRUE.
       REFERENCE_C9_TERM .TRUE.
       LONG_RANGE_CORRECTION .TRUE.
       PARAMETER_FILE_NAME dftd3.dat
       R_CUTOFF 15.0
      &END PAIR_POTENTIAL
     &END vdW_POTENTIAL
     &XC_GRID
       XC_DERIV SPLINE2_SMOOTH
       XC_SMOOTH_RHO NN10
     &END XC_GRID
   &END XC
&END DFT
&SUBSYS
   &CELL
     PERIODIC NONE
     ABC 20.000 20.000 20.000
   &END CELL
   &COORD
C
C
C
C
C
C
H
H
H
H
H
H
H
H
O
H
O
H
H
Ar
&END COORD
   &KIND H
     BASIS_SET aug-TZVP-GTH
     POTENTIAL GTH-PBE-q1
   &END KIND
   &KIND C
     BASIS_SET aug-TZVP-GTH
     POTENTIAL GTH-PBE-q4
   &END KIND
   &KIND O
     BASIS_SET aug-TZVP-GTH
     POTENTIAL GTH-PBE-q6
   &END KIND
   &KIND Ar
     BASIS_SET DZVP-GTH
     POTENTIAL GTH-PBE-q8
   &END KIND
&END SUBSYS
&END FORCE_EVAL
&MOTION
&MD
   ENSEMBLE NVE
   STEPS 50000 #10ps total
   TIMESTEP 0.2 # 0.2fs
   TEMPERATURE 50.0
   #TEMP_TOL 40.0
#   &THERMOSTAT
#       REGION MOLECULE
#       &NOSE
#                LENGTH 3
#                YOSHIDA 3
#                TIMECON 50.0
#                MTS 2
#        &END NOSE
#   &END THERMOSTAT
&END MD
&PRINT
        &RESTART
          LOG_PRINT_KEY T
          &EACH
           MD 100
          &END EACH
          ADD_LAST NUMERIC
        &END RESTART
        &TRAJECTORY
          LOG_PRINT_KEY T
          FORMAT XYZ
          UNIT angstrom
          &EACH
             MD 1
          &END EACH
          ADD_LAST NUMERIC
        &END TRAJECTORY
        &VELOCITIES
          LOG_PRINT_KEY T
          FORMAT XYZ
          UNIT angstrom
          &EACH
            MD 1
          &END EACH
          ADD_LAST NUMERIC
        &END VELOCITIES
    &END PRINT
&END MOTION
================================

Matt W

unread,

Oct 7, 2013, 1:43:31 PM10/7/13

to cp...@googlegroups.com

Hi Huan,

it is not normal for one GGA functional to be massively more expensive than another - typically PBE is one of the most robust in fact.

You have a few unusual keywords in your input, so it is hard to comment without seeing the output:

Is each SCF cycle slower with PBE, or do you need many more SCF cycles?

How are you estimating memory usage? If you are monitoring it with, say, "top" linux command, is the memory usage consistent or does it peak in some section that you can identify (localization maybe?)?

Also, and maybe most importantly, if you strip your input down to a very simple calculation do you see these differences? You need to debug your results like you would some computer code your wrote. And without a sample of the output it is hard to help.

Matt

Huan Wang

unread,

Oct 7, 2013, 2:14:26 PM10/7/13

to cp2k

Hi Matt,

I received a message that I can not attach files larger than 8 Mb. As a result, my output file may not be send to you. Sorry about that.

Huan

--
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cp2k/ivi2a2ScZwY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cp2k+uns...@googlegroups.com.
To post to this group, send email to cp...@googlegroups.com.
Visit this group at http://groups.google.com/group/cp2k.
For more options, visit https://groups.google.com/groups/opt_out.

Matt W

unread,

Oct 7, 2013, 2:27:40 PM10/7/13

to cp...@googlegroups.com

Sorry, please don't try to post everything!!!!

But what happens during a couple of SCF cycles, during localization, of a single MD step.

Matt

Huan Wang

unread,

Oct 7, 2013, 4:52:07 PM10/7/13

to cp2k

Dear Matt,

I am sorry for late response.

Is the information below matched what you said?

Huan

--------------------

SCF PARAMETERS Density guess: ATOMIC

--------------------------------------------------------

max_scf: 150

max_scf_history: 0

max_diis: 4

--------------------------------------------------------

eps_scf: 1.00E-05

eps_scf_history: 0.00E+00

eps_diis: 1.00E-01

eps_eigval: 1.00E-05

--------------------------------------------------------

level_shift [a.u.]: 0.00

--------------------------------------------------------

Outer loop SCF in use

No variables optimised in outer loop

eps_scf 1.00E-05

max_scf 20

No outer loop optimization

step_size 5.00E-01

SCF WAVEFUNCTION OPTIMIZATION

----------------------------------- OT ---------------------------------------

Allowing for rotations: F

Optimizing orbital energies: F

Minimizer : DIIS : direct inversion

in the iterative subspace

using : - 7 DIIS vectors

- safer DIIS on

Preconditioner : FULL_KINETIC : cholesky inversion of T + eS

Precond_solver : DEFAULT

stepsize : 0.15000000

energy_gap : 0.20000000

eps_taylor : 0.10000E-15

max_taylor : 4

mixed_precision : F

*** SCF run converged in 54 steps ***

Electronic density on regular grids: -54.0000000100 -0.0000000100

Core density on regular grids: 54.9999999999 -0.0000000001

Total charge density on r-space grids: 0.9999999898

Total charge density g-space grids: 0.9999999898

Overlap energy of the core charge distribution: 0.00000266573281

Self energy of the core charge distribution: -224.19565117090505

Core Hamiltonian energy: 64.65904182780211

Hartree energy: 90.68411689043555

Exchange-correlation energy: -24.59932151750673

Dispersion energy: -0.01041102029456

Total energy: -93.46222232473586

outer SCF iter = 1 RMS gradient = 0.96E-05 energy = -93.4622223247

outer SCF loop converged in 1 iterations or 54 steps

Integrated absolute spin density : 0.0000000001

Ideal and single determinant S**2 : 0.000000 -0.000000

Huan Wang

unread,

Dec 10, 2013, 8:25:13 AM12/10/13

to cp2k

Dear Matt,

Sorry for taking too long time for response. I tried to submit my job to a new node with Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz. And for PBE functional, the CPUTime is still 4 times longer than that of using BLYP. In addition, the calculation with PBE functional always became slower after several steps, no matter which step-points I restarted it, while for BLYP, the speed is stable.

Attached please see my input and output files. In output file with PBE functional, in the first 27-step, convergence only takes less than 10 steps, however, after 27 step, the OT section shows method of SD not DIIS, and returns *** SCF run NOT converged *** for several times. After that, calculation takes about 50 steps for convergence.

I don't know what happened when calculation comes to the step 27. Last time you mentioned that there are some usual keywords in my input file. What are they? Would you please help me to fix this problem? Thank you very much.

Best wishes,
Huan

On Mon, Oct 7, 2013 at 8:27 PM, Matt W <MattWa...@gmail.com> wrote:

nve.inp

nve_7.out

Matt W

unread,

Dec 11, 2013, 2:31:59 PM12/11/13

to cp...@googlegroups.com

Hi,

I can't comment on exactly what is happening, but you are having problems with SCF convergence. This is not the same as one functional running much slower than another, as such - i.e. it is a more or less physical problem not that the code doesn't / can't run the two funcitonals with a similar speed.

It appears that you get huge charge oscillations between the two centres in your system - try

> grep ' 5 C 1'

PBE must place HOMO/SOMO, or similar, of both systems at very similar energies. The slow down occurs when your system spin polarizes and you get a fluctuating biradical solution.

At this point you need to wonder why this is happening. Maybe hybrid functionals are needed?, maybe it is true that charge transfer between the two things can occur, but this is a difficult problem for DFT.

I don't know what happened when calculation comes to the step 27. Last time you mentioned that there are some usual keywords in my input file. What are they? Would you please help me to fix this problem? Thank you very much.

I seem to remember (this is going back some time so likely I am wrong) that you were using self interaction corrections - I have no experience with them. Perhaps what works for BLYP doesn't do good things to PBE.

Again, it is the SCF convergence and possible problems with your system/hamiltonian that gives the speed differential, not PBE being slower.

Matt

Huan Wang

unread,

Dec 12, 2013, 3:37:44 AM12/12/13

to cp2k

Hi Matt,

Thanks very much for your reply. Maybe I should try hybrid functional for my system in CP2K.

Please forgive my ignorance, but what's the meaning of > grep ' 5 C 1' ?

The reason I choose PBE functional is that I first optimized the geometry in Gaussian09 with different functionals, and the PBE functional gave a reasonable structure and spectrum compared to experimental spectrum, while BLYP, B3LYP, PW91 etc. could not get the same structure. However, In CP2K, my system run too slow with PBE functional, so I just try to use BLYP functional to check if it also low speed with different functionals. That comes to this question.

It's really a big challenge that a proton shared with two water molecules and have two pi-hydrogen bonds in one system. Anyway, thanks very much for discussion.