CP2K 7.1-Cuda Bandgap and HF energies different from previous versions

166 views
Skip to first unread message

Andres Ortega

unread,
Apr 29, 2020, 10:47:18 AM4/29/20
to cp...@googlegroups.com
Dear CP2K developers,

I was wondering if you could help me with something, 

I have used CP2K 5.1 and 6.1 for PBE0-TC-LRC calculations in the past.
I have found that my Bandgap values are consistent in versions 5.1 and 6.1 in our local cluster.

I did calculations with the versions 6.1 and 7.1 with Cuda in Piz Daint, and I have found that my Bandgaps are around -0.25 eV different and the HF energies (and total energies) are now different
There is no difference using a restart wfn file or optimizing the SCF cycle from the beginning. 

5.1 and 6.1
(1) 
   Overlap energy of the core charge distribution:               0.00007052589464

  Self energy of the core charge distribution:              -4822.08795661199383

  Core Hamiltonian energy:                                   1456.10019696034419

  Hartree energy:                                            1951.78336227053728

  Exchange-correlation energy:                               -433.84963055662922

  Hartree-Fock Exchange energy:                              -127.87395151550244

  Dispersion energy:                                           -0.46932014114032


     Total energy  ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.):            -1976.397228950271256
   HOMO - LUMO gap [eV] :    2.910486

6.1 and 7.1 with cuda 
 (2)
    Overlap energy of the core charge distribution:               0.00007052589464

  Self energy of the core charge distribution:              -4822.08795661199383

  Core Hamiltonian energy:                                   1456.47181019598679

  Hartree energy:                                            1951.50352600623592

  Exchange-correlation energy:                               -438.90435624956706

  Hartree-Fock Exchange energy:                              -121.46009108451653

  Dispersion energy:                                           -0.46932014114033


    Total energy ENERGY| Total FORCE_EVAL ( QS ) energy (a.u.):            -1974.946317359100703
  HOMO - LUMO gap [eV] :    2.747721


best, 

Andres Ortega
LSMO EPFL
pbe0.inp
test.subsys-pbe0
pbe0_5.1_6.1_values.out
pbe0_6.1_7.1_cuda.out
GEO_OPT.xyz

Andres Ortega

unread,
May 3, 2020, 6:36:31 AM5/3/20
to cp2k
Dear CP2K developers, 

As a follow-up, I did calculations in Piz Daint without GPU and use the cp2k version without Cuda
I was wondering if you could have a look at it, 

My results now are consistent with my previous calculations.

CP2K 6.1 NO CUDA results 

  Overlap energy of the core charge distribution:               0.00007052589464

  Self energy of the core charge distribution:              -4822.08795661199383

  Core Hamiltonian energy:                                   1456.09966160679664

  Hartree energy:                                            1951.78379257353936

  Exchange-correlation energy:                               -433.84955420146537

  Hartree-Fock Exchange energy:                              -127.87392272412202

  Dispersion energy:                                           -0.46932014114032


  Total energy:                                             -1976.39722897249089




I was wondering how can I solve the problem of the discrepancies I found when using CP2K with GPU-Cuda, 



best, 

Andres 

Fabian Ducry

unread,
May 4, 2020, 11:35:08 AM5/4/20
to cp...@googlegroups.com
Dear Andres,

I can confirm and reproduce the issue. Apparently it appears when combining CUDA + OMP in hybrid calculations. In that case the energy becomes a function of #OMP threads per rank. For your input I got (cp2k 8.0, revision 3e7b916, run on Piz Daint)
                                                                no cuda                      OMP_NUM_THREADS = 1          OMP_NUM_THREADS = 3          OMP_NUM_THREADS = 6
  Exchange-correlation energy:          -433.84964308969535               -433.84964308969302                -435.33426106395467                  -435.96513615032325
  Hartree-Fock Exchange energy:      -127.87395928499694                -127.87395928499325                -125.97109874333140                  -125.24809389970088
  Total energy:                                -1976.39722899739672               -1976.39722899739013              -1975.95046919253809                 -1975.87080541858177

Without OMP parallelization the energies agrees with the calculation without CUDA accelleration. Increasing OMP_NUM_THREADS beyond 1 increases the Hartree-Fock Exchange energy.
Apparently you have to disable OMP to obtain correct results. This is obviously not very satisfying and I hope this gets fixed. I see that you used 1 MPI/12 OMP ranks per node. Try increasing the number of MPI ranks per node. To do so you have to set
export CRAY_CUDA_MPS=1 in the submission script.

I hope this helps.

Best,
Fabian

Leopold Talirz

unread,
May 4, 2020, 12:12:06 PM5/4/20
to cp2k
Dear Fabian,

thanks a lot for checking and for pinning down the issue.

Since this is a rather serious issue, my first instinct was to check on the performance page of cp2k to see whether CUDA + OMP was ever used in benchmark studies.

Unfortunately, it is not clear to me from the page - something I now remember to have run in before:
E.g. for some systems it says explicitly "no GPU" but for others that can have a GPU (like Cray XC40) it does not say it and it is not clear whether this means the GPU was used or not.
May I suggest to the maintainer of this page to make this information explicit?

And if it turns out that there are currently no tests including the CUDA version on the list, perhaps it would make sense to include some?

Best wishes from Bern,
Leopold

Fabian Ducry

unread,
May 4, 2020, 12:23:50 PM5/4/20
to cp2k
Dear Leopold,

This page gives detailes about the configuration of the HPC clusters used in the tests: https://www.cp2k.org/performance:systems
But these test only compare the performance and do not check the accuracy as far as I know. The dashboard (https://dashboard.cp2k.org/) states that the current revision correctly runs on Piz Daint (psmp), at least according to the regtests. Its possible that this aspect is not covered by any regtest though.

Best,
Fabian

Ole Schütt

unread,
May 4, 2020, 12:26:40 PM5/4/20
to cp...@googlegroups.com
Hi Leopold,

I agree that this is a serious issue that should have been caught by our
testing.
AFAIK, the performance tests on the dashboard do not check the computed
results.

However, we do have a daily CUDA regtest:
https://dashboard.cp2k.org/archive/cuda-pascal/index.html
It uses two OpenMP threads, which might not be enough to exceed the
tolerance thresholds?

-Ole
> --
> You received this message because you are subscribed to the Google
> Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cp2k+uns...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/048030dd-6532-4fc5-b127-b7a5b017dad7%40googlegroups.com
> [1].
>
>
> Links:
> ------
> [1]
> https://groups.google.com/d/msgid/cp2k/048030dd-6532-4fc5-b127-b7a5b017dad7%40googlegroups.com?utm_medium=email&utm_source=footer

Thomas Kühne

unread,
May 4, 2020, 12:27:47 PM5/4/20
to cp...@googlegroups.com
Dear Leopold, 

to the best of my knowledge all other machines except for Piz Daint, where the "no 
GPU" comment is present, are not equipped with GPUs, so everything is consistent. 

Cheers, 
Thomas

--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.

Leopold Talirz

unread,
May 4, 2020, 3:09:50 PM5/4/20
to cp2k
Thanks, Fabian and Thomas for the clarifications!

to the best of my knowledge all other machines except for Piz Daint, where the "no 
GPU" comment is present, are not equipped with GPUs, so everything is consistent. 
Ok, hearing you say this it may seem obvious,  also since only CPU core counts are reported.
I still feel it would be useful to mention somewhere in the beginning that the tests are run just on CPU cores, since it may not be clear to everyone.

And as I mentioned, given that there is the CUDA version with significant speedups for hybrids & post-HF, perhaps it would make sense to include in the list a few benchmark results with the GPU as well?

Best,
Leo
 

Cheers, 
Thomas

To unsubscribe from this group and stop receiving emails from it, send an email to cp...@googlegroups.com.

Fabian Ducry

unread,
May 4, 2020, 3:24:03 PM5/4/20
to cp2k
So far I could not reproduce the behaviour with any other system than the one provided by Andres. I tried
 - QS/regtest-admm-1/CH3-BP-NONE.inp
 - QS/regtest-hybrid-4/CH4-PBE0_TC_LRC.inp
 - the Si-8 example with 2 Si replaced by Al
I took the subsys from these 4 examples and replaced the one in the input from Andres. All of them behave as they should with CUDA+OMP.

So it seems to be difficult to catch and maybe does not trigger very often?
That is probably why it does not show up in the regtests.

Best,
Fabian

Vladimir Rybkin

unread,
May 10, 2020, 4:46:09 PM5/10/20
to cp2k
Dear all,

I have recently encountered a similar problem on Piz Daint. Hybrid ADMM calculation converged to a different value with OMP and without, with GPU acceleration. With OMP it often does not converge at all. And: this is only for CG SCF optimiser. DIIS is fine.

Yours,

Vladimir

среда, 29 апреля 2020 г., 16:47:18 UTC+2 пользователь Andres Ortega написал:

Fabian Ducry

unread,
May 11, 2020, 2:16:31 AM5/11/20
to cp2k
Dear Vladimir,

Depending on the ADMM settings it could still work:
  • METHOD BASIS_PROJECTION + ADMM_PURIFICATION_METHOD CAUCHY
  • METHOD BASIS_PROJECTION + ADMM_PURIFICATION_METHOD NONE_DM
As a quick fix you could try to use these.

Best,
Fabian

Vladimir Rybkin

unread,
May 11, 2020, 3:38:07 PM5/11/20
to cp2k
Dear Fabian,

thanks. I'll try it out.

Yours,

Vladimir

понедельник, 11 мая 2020 г., 8:16:31 UTC+2 пользователь Fabian Ducry написал:
Reply all
Reply to author
Forward
0 new messages