AIMD simulation: increase of the number of the SCF cycles per MD step after some MD steps

Max

unread,

Sep 10, 2014, 9:44:48 AM9/10/14

to cp...@googlegroups.com

Dear CP2K users and developers,

I would like to perform a MD simulation for a transition metal complex in water;
the complex being in an open-shell state. However, I'm facing an issue which I
cannot resolve: the time taken by a MD step dramatically increases after some
steps because of the increase of the number of SCF cycles (see Figure_1.eps).

I first thought that this was due to the order chosen for the wavefunction extrapolation
(ASPC 3), but the situation does not change when varying the order (1-4).

I usually tend to use rather conservative values for EPS_DEFAULT and CUTOFF
(1e-14 and 1000 Ry), and I was wondering if this could lead to a numerical instability
in the present case. However increasing EPS_DEFAULT to 1e-12 does not improve
the situation; likewise for using a smaller CUTOFF value of 500 Ry.

Please note that, when the simulation is restarted, the same phenomenon in observed.
Thus, in Figure_2.eps, the simulation was restarted at ~1070 fs.

I made several attempts to find which mistake(s) I could have made (CG vs DIIS
minimizer; EPS_SCF value of 1e-6 or 5e-7; CSVR versus NHC thermostating; TIMESTEP
of 0.25 or 0.50 fs; PRECONDITIONER and ENERGY_GAP ...): but nothing came out of
this.

There is perhaps a trivial error which I may be missing. Please let me know: I have
attached the input file to the message.

Thanks in advance for your help.

Best regards,
Max

P.S. For Figure_1.eps, the simulation was perfomed using the attached input file. For
Figure_2.eps, there are the following changes: "EPS_DEFAULT 1.0E-12",
"EXTRAPOLATION_ORDER 1", "PRECONDITIONER FULL_SINGLE_INVERSE".

Figure_1.eps

Figure_2.eps

aimd.inp

Max

unread,

Sep 26, 2014, 10:13:42 AM9/26/14

to cp...@googlegroups.com

Hi,

I've done additional tests to try to figure out what may be going wrong. It seems that,
for the open-shell systems I'm interested in, the performance of the chosen preconditioner
is critical. Here's what I understand: the system is in a region of his phase space wherein
after some MD steps, the guess wavefunction obtained by the ASPC method starts to
depart from the BO surface and the use of the computationally cheap preconditioners
(full_kinetic, full_single_inverse) do not help speed up the SCF convergence. Regularly
resetting the extrapolation method by restarting the simulation is a workaround. But,
besides being a rather tedious approach, I don't think there is an ideal fixed number of
steps after which to restart the simulation: the issue always tends to show up again, later.

With the full_all preconditioner, the simulation run rather smoothly ! As expected, the
conserved quantity tend to be pretty well conserved over 1.5ps+, when using a
timestep of 0.5fs or even of 1fs. The huge problem now is that a MD step is taking
80s in average, with ~80% of the time dedicated to preconditioning. Is there a way
to decrease the time spent in preconditioning ? I'm using currently cp2k-2.5.1, the
mixed_precision algorithm cannot be combined with the full_all preconditioner.
Is there an other way to do this? Are there changes made to preconditioning in the
development version?

Best regards,
Max

Florian Schiffmann

unread,

Sep 29, 2014, 3:16:06 AM9/29/14

to cp...@googlegroups.com

Hi Max,

I am currently looking a bit in the performance of the preconditioners. Unfortunately, the better preconditioners are expensive as they require diagonalization and I don't see a way around it at the moment. There are some ideas around how to improve Full_single inverse, but that will take a little while and nothing promised.
Full Kinetic is the best choice in terms of balancing cost and efficiency for large systems. The performance of this preconditioner strongly depends on the ENERGY_GAP value in your input. Have you tried playing around with it? Quite often a somewhat higher than default value seems to work best.

Cheers
Flo

Max

unread,

Sep 29, 2014, 10:47:22 AM9/29/14

to cp...@googlegroups.com

Le lundi 29 septembre 2014 09:16:06 UTC+2, Florian Schiffmann a écrit :

Hi Max,

Hi Flo,

Thanks a lot for your kind reply !

I am currently looking a bit in the performance of the preconditioners.

These are great news :)

Unfortunately, the better preconditioners are expensive as they require diagonalization and I don't see a way around it at the moment.

Ok.

There are some ideas around how to improve Full_single inverse, but that will take a little while and nothing promised.

Ok. This is fine with me. :-) Please do no hesitate to tell me if you want I run some tests.

Full Kinetic is the best choice in terms of balancing cost and efficiency for large systems. The performance of this preconditioner strongly depends on the ENERGY_GAP value in your input. Have you tried playing around with it? Quite often a somewhat higher than default value seems to work best.

I haven't played with the value of energy_gap when using full_kinetic yet. I'll give it a try and
will keep you informed.

Thanks again for your reply and your help.

Cheers,
Max

Cheers
Flo

Max

unread,

Oct 30, 2014, 6:24:47 AM10/30/14

to cp...@googlegroups.com

Hi Flo,

I've made tens of additional tests with the FULL_KINETIC preconditioner,
increasing progressively the value of the ENERGY_GAP to very large
values (like 5 Ha). In the best case, this tends only to delay the occurence
of the convergence issue.

So, for the studied open-shell system, the FULL_ALL preconditioner remains
the best choice despite its computational cost. I was using an MPI+OMP
version of CP2K-2.5.1; switching to the pure MPI version divides the time
taken by the first SCF cycle by 3~4, and this does more than compensate
for the increase of the time taken by the next cycles.

Linking the code to the ELPA2-2013.11.008 library reduces the time taken by
the first SCF cycle by some additonal and significant 10~14%! Unfortunately,
the MD simulation gets killed after a few hundreds cycle by the OOM killer.
Granting more memory to each process by decreasing the number of processes
per node allows to delay the occurence of the problem. This suggests a memory
leak; but, not being familiar with the code, I cannot track it down. I read the
cp_fm_elpa routine in the file src/cp_fm_diag.F, and it seems to me that the
allocated memories are returned at the end of the call to the function, but I may
be wrong.

If you have time, could you please have a look at this? so as to point me to some
directions ot explore?

Please note that I've compiled the ELPA2-2013.11.008 library with only the generic
fortran kernels.

Cheers,
Max

Reply all

Reply to author

Forward