cp2k hangs with large system

318 views
Skip to first unread message

Fernan Saiz

unread,
Mar 19, 2018, 2:35:35 PM3/19/18
to cp2k
Dear All,
I have been experiencing a problem with cp2k for large systems of around 1,000 atoms, in which this code prints no data at the beginning of SCF cycle as shown in the attached file log.out. I am using a cp2k version compiled with Intel compilers 2017 at my institution's HPC systems. It is strange to me that sometimes cp2k runs fine with a different set of nuclei positions while keeping untouched the rest of the parameters. However, I need to restart my run (see ape-water.inp) to continue with the NVT simulation for several ps. It also strikes me that I have never faced this problem when using a version built with CRAY compilers on the UK's ARCHER supercomputer. As an alternative, I have compiled cp2k with gcc and openmpi modules (I received your helped in this forum a few days ago for this compilation), but my code is much slower than that built with Intel Compilers, which makes it not suitable for my purposes. I have also played with the OpenMP vs MPI load in the PBS file, but I got no luck. Therefore, I would really appreciate some advice on how to correct this problem, if possible.

Best regards,
 - Fernan Saiz, PhD
Department of Chemistry
Imperial College London
ape-water.inp
log.out

Yingchun Zhang

unread,
Mar 20, 2018, 8:21:34 AM3/20/18
to cp2k
Hi,Fernan

    1, how long have you been waiting for its output
    2, the MAX_SCF in SCF and outer SCF are very large, you can change them to smaller values. This will lead to non converged steps in the beginning. It's ok, it will converge later. But if you have such large MAXSCF, the program will keep iterating till converge or equal to the MAX_SCF
    3, you used 48 cores for such a large system, it should be slow.

Fernan Saiz

unread,
Mar 20, 2018, 9:59:15 AM3/20/18
to cp...@googlegroups.com
Hi,
The problem is that no matter how many processors I request and the MAX_SCF, the output is not generated even if I wait for several hours.

- Fernan

--
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cp2k/XzwfDfJtW08/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cp2k+unsubscribe@googlegroups.com.
To post to this group, send email to cp...@googlegroups.com.
Visit this group at https://groups.google.com/group/cp2k.
For more options, visit https://groups.google.com/d/optout.

Nico Holmberg

unread,
Mar 20, 2018, 4:02:29 PM3/20/18
to cp2k
Hi,

I don't have access to a working Intel binary at the moment but, as you said, there is nothing obvious wrong with the input file because it starts running without issues on my gnu binary. Has your Intel binary worked with other input files and does it pass the regression test suite that is distributed with the CP2K installation? Could you post the arch-file that was used to compile the binary including full versions of all the libraries/compilers?

To try and find where the program hangs, you could try activating the TRACE keyword which prints out the list of subroutines the calculation accesses in sequential order. The output file will be quite large.


- Nico



tiistai 20. maaliskuuta 2018 15.59.15 UTC+2 Fernan Saiz kirjoitti:
Hi,
The problem is that no matter how many processors I request and the MAX_SCF, the output is not generated even if I wait for several hours.

- Fernan

On Tue, Mar 20, 2018 at 12:21 PM, Yingchun Zhang <zhangyin...@126.com> wrote:
Hi,Fernan

    1, how long have you been waiting for its output
    2, the MAX_SCF in SCF and outer SCF are very large, you can change them to smaller values. This will lead to non converged steps in the beginning. It's ok, it will converge later. But if you have such large MAXSCF, the program will keep iterating till converge or equal to the MAX_SCF
    3, you used 48 cores for such a large system, it should be slow.

--
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cp2k/XzwfDfJtW08/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cp2k+uns...@googlegroups.com.

Wei Lai

unread,
Mar 20, 2018, 10:31:29 PM3/20/18
to cp2k
Were you using Intel compilers with OpenMPI or MVAPICH?  I had the same issue.  Switching to Intel MPI solved the problem.

Wei

Alfio Lazzaro

unread,
Mar 21, 2018, 6:57:17 AM3/21/18
to cp2k
Dear Fernan,
we are working on making a list of supported compilers for CP2K (see https://www.cp2k.org/dev:compiler_support ).
Now, I'm confused about your findings:

1) "CRAY compilers on the UK's ARCHER supercomputer" ==> Do they really compile CP2K with the Cray CCE compiler? This is somehow surprising to me since we found that CCE is broken...

2) "I have compiled cp2k with gcc and openmpi modules, but my code is much slower than that built with Intel Compilers" ==> This is strange too. I see that you are not using libxsmm, i.e.

DBCSR| Multiplication driver                                               BLAS

Libxsmm will give a much better performance. Where is BLAS coming from? MKL for GCC and Intel compilations? Could you elaborate a bit more on this comparison?

3) Before any production run, it would be a good idea to test the CP2K installation. Have you executed the regtests? (https://www.cp2k.org/dev:regtesting )

Best regards,

Alfio

iain.b...@stfc.ac.uk

unread,
Mar 21, 2018, 7:43:59 AM3/21/18
to cp...@googlegroups.com
1) "CRAY compilers on the UK's ARCHER supercomputer" ==> Do they really compile CP2K with the Cray CCE compiler? This is somehow surprising to me since we found that CCE is broken…

The ARCHER builds of CP2K use gfortran - compilation options are here: http://www.archer.ac.uk/documentation/software/cp2k/compiling_phase2.php

Cheers

- Iain

--

Iain Bethune
Technical Programme Manager, STFC Hartree Centre

Email: iain.b...@stfc.ac.uk<mailto:iain.b...@stfc.ac.uk>
Twitter: @IainBethune @PrimeGrid @CP2Kproject

Tel: +44 (0)1925 603735
Mob: +44 (0)7598317015
Addr: Hartree Centre, Sci-Tech Daresbury, Keckwick​,​ Warrington, WA4 4AD
--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com<mailto:cp2k+uns...@googlegroups.com>.
To post to this group, send email to cp...@googlegroups.com<mailto:cp...@googlegroups.com>.

Fernan Saiz

unread,
Mar 21, 2018, 9:22:54 AM3/21/18
to cp...@googlegroups.com
Dear Alfio,
2) The log.out file was written using the Intel Compilers (MPI and MKL). Please find attached file a new log.out_openmpi, which was built with libxsmm where I show the times that are ridiculous long. When the cp2k version compiled with Intel runs fine, these times are between 6 and 10 seconds, whereas with openmpi they are between 28 and 76 seconds, even if I increase the number of cores.

I also attached my arch file used for openmpi.

Best regards,

- Fernan

--
You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cp2k/XzwfDfJtW08/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cp2k+unsubscribe@googlegroups.com.
To post to this group, send email to cp...@googlegroups.com.
log.out_openmpi
Linux-x86-64-openmpi.popt

Alfio Lazzaro

unread,
Mar 21, 2018, 6:10:48 PM3/21/18
to cp2k
Dear Fernan,
Thanks for the new files.
OK, let me summarize:
1) The GNU+OpenMPI version of the code is terribly slow
2) The Intel (MPI + MKL) version hangs

Now, let's start with the first problem. I took your input file and executed on my system with 24 MPI ranks and 1 OpenMP thread. I'm using GNU 5.3 + OpenMPI without any special optimizations/libraries.  The first thing I noticed is that the initial values are different (w.r.t log.out_openmpi):

Yours:
TOTAL NUMBERS AND MAXIMUM NUMBERS

  Total number of            - Atomic kinds:                                   4
                             - Atoms:                                        950
                             - Shell sets:                                  1900
                             - Shells:                                      4094
                             - Primitive Cartesian functions:               4750
                             - Cartesian basis functions:                  10348
                             - Spherical basis functions:                   9726

  Maximum angular momentum of- Orbital basis functions:                        2
                             - Local part of the GTH pseudopotential:          2
                             - Non-local part of the GTH pseudopotential:      2

Mine:
 TOTAL NUMBERS AND MAXIMUM NUMBERS

  Total number of            - Atomic kinds:                                   3
                             - Atoms:                                        938
                             - Shell sets:                                  1876
                             - Shells:                                      3434
                             - Primitive Cartesian functions:               4690
                             - Cartesian basis functions:                   7480
                             - Spherical basis functions:                   7170

  Maximum angular momentum of- Orbital basis functions:                        2
                             - Local part of the GTH pseudopotential:          2
                             - Non-local part of the GTH pseudopotential:      0

Checking with your log.out, I see that you are using CP2K 4.1 and the output is different, but I see that some values are equal to mine, for instance:


Number of electrons:                                                       2168
 Number of occupied orbitals:                                               1084
 Number of molecular orbitals:                                              1084

 Number of orbital functions:                                               7170
 Number of independent orbital functions:                                   7170

These values are very important for the performance. For the GNU+OpenMPI they are bigger, that means that you can expect slower performance there.

At this point, I strongly suggest to run the regtests to check your installation. Make sure 
Then, I can suggest you run a smaller test (you can take some a test under tests/QS/benchmark/H2O-32.inp) and run with a single rank, so that you can do a fast comparison without MPI. If this is reasonable, then you can move to more ranks. 
Another suggestion is to check how many cores are really involved during the execution (you can use htop).

Alfio




- Fernan

To unsubscribe from this group and all its topics, send an email to cp2k+uns...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages