Scaling with xTB

55 views
Skip to first unread message

Mauro Sgroi

unread,
Feb 22, 2021, 4:59:47 AM2/22/21
to cp2k
Dear all,
I'm testing the xTB code on a liquid electrolyte containing a Li ion.
I'm running MD on a cell containing 728 atoms.
I obtain a disappointing scaling with the number of cores. The following are times for 100 MD steps:

cores  Time (s)
28         7818 
56         7364
84         6529
112       7493

My input file can be downloaded here: 


Is this the right behaviour to be expected? Or is there is something wrong in my compilation of the code or in the input file? 
The HPC facility has a Infinband network protocol and a fast shared filesystem.

Thanks a lot in advance and best regards,
Mauro Sgroi.



Mauro Sgroi

unread,
Feb 22, 2021, 5:04:05 AM2/22/21
to cp2k
Dear all,
I forgot to mention that I'm using the toolchain:
cmake-3.18.5 cosma-2.2.0 elpa-2020.05.001 fftw-3.3.8 gsl-2.6 hdf5-1.12.0 libint-v2.6.0-cp2k-lmax-5 libvdwxc-0.4.0 libvori-201229 libxc-4.3.4 libxsmm-1.16.1 openblas-0.3.10 scalapack-2.1.0 sirius-7.0.0 SpFFT-0.9.13 spglib-1.16.0 

and gcc-8.3.0 + openmpi 4.1.0.

Best regards,
Mauro.

Manjusha Chugh

unread,
Feb 22, 2021, 5:10:11 AM2/22/21
to cp...@googlegroups.com
Hi Mauro,

it will be easy to look into your scaling issue if you report time for each MD step. I am also doing MD with xTB.

Regards
Manjusha


--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/162a9eac-2a85-4d76-a4c2-08305f29cf5bn%40googlegroups.com.

fabia...@gmail.com

unread,
Feb 22, 2021, 5:44:24 AM2/22/21
to cp2k
Dear Mauro,

This is consistent with my own observations of the scaling of xTB. Because of the increasing cost of communication more CPUs don't necesarily speed up the simulation. I don't use more than 25 CPUs with xTB unless I have well above 1000 atoms.

Please note that the number of MPI ranks should be a square number. 25 CPUs are probably faster than 28 unless you are using k-points.

Cheers,
Fabian

Mauro Sgroi

unread,
Feb 22, 2021, 5:51:07 AM2/22/21
to cp2k
Dear  Manjusha,
If I understand well, the time per MD step should correspond to the times that I'm reporting divided by 100. 

Dear Fabian,
Thanks a lot for the information. I will check the use of 25 cores instead of 28. I'm using 28 since our nodes have 28 cores each.

Is this scaling the related to the xTB method or is it linked to its current implementation on CP2K? Can the linear scaling approach improve the situation?

Thanks a lot and best regards,
Mauro.

fabia...@gmail.com

unread,
Feb 22, 2021, 5:59:25 AM2/22/21
to cp2k
Strong scaling always hits a ceiling. Since xTB is a relatively cheap method (compared to e.g. DFT) the number of CPUs one can effectively utilize is low. I suspect that this is inherent to xTB and has nothing to do with CP2K.

You can try different SCF methods to reduce the cost, but I doubt that the scaling substantially changes.

Cheers,
Fabian

hut...@chem.uzh.ch

unread,
Feb 22, 2021, 6:00:40 AM2/22/21
to cp...@googlegroups.com
Hi

without further information it is impossible to give good advice.
Most likely, the calculation of the Hamiltonian (xTB) is so fast
that the lack of scaling is due to other parts, e.g. the MD
extrapolation or the preconditioner setup.

Without inputs and outputs this is all guess work.

regards

Juerg Hutter
--------------------------------------------------------------
Juerg Hutter Phone : ++41 44 635 4491
Institut für Chemie C FAX : ++41 44 635 6838
Universität Zürich E-mail: hut...@chem.uzh.ch
Winterthurerstrasse 190
CH-8057 Zürich, Switzerland
---------------------------------------------------------------

-----cp...@googlegroups.com wrote: -----
To: "cp2k" <cp...@googlegroups.com>
From: "Mauro Sgroi"
Sent by: cp...@googlegroups.com
Date: 02/22/2021 11:00AM
Subject: [CP2K:14808] Scaling with xTB

Manjusha Chugh

unread,
Feb 22, 2021, 6:04:06 AM2/22/21
to cp...@googlegroups.com
Dear Mauro,

yes, but I just meant a general way of reporting time in computations.

Anyway, as suggested by Fabian, using square number of cores helps.
And for my more than 1000 atoms system, scaling in xTB is as follows:
cores    time_per_MD_step
100           4
144           3

Regards
Manjusha


Reply all
Reply to author
Forward
0 new messages