CP2K 7.1-Cuda Bandgap and HF energies different from previous versions

Andres Ortega

unread,

Apr 29, 2020, 10:47:18 AM4/29/20

to cp...@googlegroups.com

Dear CP2K developers,

I was wondering if you could help me with something,

I have used CP2K 5.1 and 6.1 for PBE0-TC-LRC calculations in the past.

I have found that my Bandgap values are consistent in versions 5.1 and 6.1 in our local cluster.

I did calculations with the versions 6.1 and 7.1 with Cuda in Piz Daint, and I have found that my Bandgaps are around -0.25 eV different and the HF energies (and total energies) are now different

There is no difference using a restart wfn file or optimizing the SCF cycle from the beginning.

5.1 and 6.1

(1)

Overlap energy of the core charge distribution: 0.00007052589464

Self energy of the core charge distribution: -4822.08795661199383

Core Hamiltonian energy: 1456.10019696034419

Hartree energy: 1951.78336227053728

Exchange-correlation energy: -433.84963055662922

Hartree-Fock Exchange energy: -127.87395151550244

Dispersion energy: -0.46932014114032

Total energy ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -1976.397228950271256

HOMO - LUMO gap [eV] : 2.910486

6.1 and 7.1 with cuda

(2)

Overlap energy of the core charge distribution: 0.00007052589464

Self energy of the core charge distribution: -4822.08795661199383

Core Hamiltonian energy: 1456.47181019598679

Hartree energy: 1951.50352600623592

Exchange-correlation energy: -438.90435624956706

Hartree-Fock Exchange energy: -121.46009108451653

Dispersion energy: -0.46932014114033

Total energy ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.): -1974.946317359100703

HOMO - LUMO gap [eV] : 2.747721

best,

Andres Ortega

LSMO EPFL

pbe0.inp

test.subsys-pbe0

pbe0_5.1_6.1_values.out

pbe0_6.1_7.1_cuda.out

GEO_OPT.xyz

Andres Ortega

unread,

May 3, 2020, 6:36:31 AM5/3/20

to cp2k

Dear CP2K developers,

As a follow-up, I did calculations in Piz Daint without GPU and use the cp2k version without Cuda

I was wondering if you could have a look at it,

My results now are consistent with my previous calculations.

CP2K 6.1 NO CUDA results

Overlap energy of the core charge distribution: 0.00007052589464

Self energy of the core charge distribution: -4822.08795661199383

Core Hamiltonian energy: 1456.09966160679664

Hartree energy: 1951.78379257353936

Exchange-correlation energy: -433.84955420146537

Hartree-Fock Exchange energy: -127.87392272412202

Dispersion energy: -0.46932014114032

Total energy: -1976.39722897249089

I was wondering how can I solve the problem of the discrepancies I found when using CP2K with GPU-Cuda,

best,

Andres

Fabian Ducry

unread,

May 4, 2020, 11:35:08 AM5/4/20

to cp...@googlegroups.com

Dear Andres,

I can confirm and reproduce the issue. Apparently it appears when combining CUDA + OMP in hybrid calculations. In that case the energy becomes a function of #OMP threads per rank. For your input I got (cp2k 8.0, revision 3e7b916, run on Piz Daint)

                                                        no cuda            OMP_NUM_THREADS = 1    OMP_NUM_THREADS = 3          OMP_NUM_THREADS = 6
Exchange-correlation energy:          -433.84964308969535           -433.84964308969302        -435.33426106395467                  -435.96513615032325
Hartree-Fock Exchange energy:      -127.87395928499694            -127.87395928499325 -125.97109874333140                  -125.24809389970088
Total energy:                                -1976.39722899739672 -1976.39722899739013          -1975.95046919253809                 -1975.87080541858177

Without OMP parallelization the energies agrees with the calculation without CUDA accelleration. Increasing OMP_NUM_THREADS beyond 1 increases the Hartree-Fock Exchange energy.

Apparently you have to disable OMP to obtain correct results. This is obviously not very satisfying and I hope this gets fixed. I see that you used 1 MPI/12 OMP ranks per node. Try increasing the number of MPI ranks per node. To do so you have to set

export CRAY_CUDA_MPS=1 in the submission script.

I hope this helps.

Best,

Fabian

Leopold Talirz

unread,

May 4, 2020, 12:12:06 PM5/4/20

to cp2k

Dear Fabian,

thanks a lot for checking and for pinning down the issue.

Since this is a rather serious issue, my first instinct was to check on the performance page of cp2k to see whether CUDA + OMP was ever used in benchmark studies.

https://www.cp2k.org/performance

Unfortunately, it is not clear to me from the page - something I now remember to have run in before:

E.g. for some systems it says explicitly "no GPU" but for others that can have a GPU (like Cray XC40) it does not say it and it is not clear whether this means the GPU was used or not.

May I suggest to the maintainer of this page to make this information explicit?

And if it turns out that there are currently no tests including the CUDA version on the list, perhaps it would make sense to include some?

Best wishes from Bern,

Leopold

Fabian Ducry

unread,

May 4, 2020, 12:23:50 PM5/4/20

to cp2k

Dear Leopold,

This page gives detailes about the configuration of the HPC clusters used in the tests: https://www.cp2k.org/performance:systems

But these test only compare the performance and do not check the accuracy as far as I know. The dashboard (https://dashboard.cp2k.org/) states that the current revision correctly runs on Piz Daint (psmp), at least according to the regtests. Its possible that this aspect is not covered by any regtest though.

Best,

Fabian

Ole Schütt

unread,

May 4, 2020, 12:26:40 PM5/4/20

to cp...@googlegroups.com

Hi Leopold,

I agree that this is a serious issue that should have been caught by our
testing.
AFAIK, the performance tests on the dashboard do not check the computed
results.

However, we do have a daily CUDA regtest:
https://dashboard.cp2k.org/archive/cuda-pascal/index.html
It uses two OpenMP threads, which might not be enough to exceed the
tolerance thresholds?

-Ole

> --
> You received this message because you are subscribed to the Google
> Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cp2k+uns...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/048030dd-6532-4fc5-b127-b7a5b017dad7%40googlegroups.com
> [1].
>
>
> Links:
> ------
> [1]
> https://groups.google.com/d/msgid/cp2k/048030dd-6532-4fc5-b127-b7a5b017dad7%40googlegroups.com?utm_medium=email&utm_source=footer

Thomas Kühne

unread,

May 4, 2020, 12:27:47 PM5/4/20

to cp...@googlegroups.com

Dear Leopold,

to the best of my knowledge all other machines except for Piz Daint, where the "no

GPU" comment is present, are not equipped with GPUs, so everything is consistent.

Cheers,

Thomas

--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/048030dd-6532-4fc5-b127-b7a5b017dad7%40googlegroups.com.

Leopold Talirz

unread,

May 4, 2020, 3:09:50 PM5/4/20

to cp2k

Thanks, Fabian and Thomas for the clarifications!

to the best of my knowledge all other machines except for Piz Daint, where the "no
GPU" comment is present, are not equipped with GPUs, so everything is consistent.

Ok, hearing you say this it may seem obvious, also since only CPU core counts are reported.

I still feel it would be useful to mention somewhere in the beginning that the tests are run just on CPU cores, since it may not be clear to everyone.

And as I mentioned, given that there is the CUDA version with significant speedups for hybrids & post-HF, perhaps it would make sense to include in the list a few benchmark results with the GPU as well?

Best,

Leo

Cheers,
Thomas

To unsubscribe from this group and stop receiving emails from it, send an email to cp...@googlegroups.com.

Fabian Ducry

unread,

May 4, 2020, 3:24:03 PM5/4/20

to cp2k

So far I could not reproduce the behaviour with any other system than the one provided by Andres. I tried

- QS/regtest-admm-1/CH3-BP-NONE.inp

- QS/regtest-hybrid-4/CH4-PBE0_TC_LRC.inp

- the Si-8 example from https://www.cp2k.org/howto:static_calculation

- the Si-8 example with 2 Si replaced by Al

I took the subsys from these 4 examples and replaced the one in the input from Andres. All of them behave as they should with CUDA+OMP.

So it seems to be difficult to catch and maybe does not trigger very often?

That is probably why it does not show up in the regtests.

Best,

Fabian

Vladimir Rybkin

unread,

May 10, 2020, 4:46:09 PM5/10/20

to cp2k

Dear all,

I have recently encountered a similar problem on Piz Daint. Hybrid ADMM calculation converged to a different value with OMP and without, with GPU acceleration. With OMP it often does not converge at all. And: this is only for CG SCF optimiser. DIIS is fine.

Yours,

Vladimir

среда, 29 апреля 2020 г., 16:47:18 UTC+2 пользователь Andres Ortega написал:

Fabian Ducry

unread,

May 11, 2020, 2:16:31 AM5/11/20

to cp2k

Dear Vladimir,

Depending on the ADMM settings it could still work:

METHOD BASIS_PROJECTION + ADMM_PURIFICATION_METHOD CAUCHY
METHOD BASIS_PROJECTION + ADMM_PURIFICATION_METHOD NONE_DM

As a quick fix you could try to use these.

Best,

Fabian

Vladimir Rybkin

unread,

May 11, 2020, 3:38:07 PM5/11/20

to cp2k

Dear Fabian,

thanks. I'll try it out.

Yours,

Vladimir

понедельник, 11 мая 2020 г., 8:16:31 UTC+2 пользователь Fabian Ducry написал:

Reply all

Reply to author

Forward