Dear Fernan,
Thanks for the new files.
OK, let me summarize:
1) The GNU+OpenMPI version of the code is terribly slow
2) The Intel (MPI + MKL) version hangs
Now, let's start with the first problem. I took your input file and executed on my system with 24 MPI ranks and 1 OpenMP thread. I'm using GNU 5.3 + OpenMPI without any special optimizations/libraries. The first thing I noticed is that the initial values are different (w.r.t log.out_openmpi):
Yours:
TOTAL NUMBERS AND MAXIMUM NUMBERS
Total number of - Atomic kinds: 4
- Atoms: 950
- Shell sets: 1900
- Shells: 4094
- Primitive Cartesian functions: 4750
- Cartesian basis functions: 10348
- Spherical basis functions: 9726
Maximum angular momentum of- Orbital basis functions: 2
- Local part of the GTH pseudopotential: 2
- Non-local part of the GTH pseudopotential: 2
Mine:
TOTAL NUMBERS AND MAXIMUM NUMBERS
Total number of - Atomic kinds: 3
- Atoms: 938
- Shell sets: 1876
- Shells: 3434
- Primitive Cartesian functions: 4690
- Cartesian basis functions: 7480
- Spherical basis functions: 7170
Maximum angular momentum of- Orbital basis functions: 2
- Local part of the GTH pseudopotential: 2
- Non-local part of the GTH pseudopotential: 0
Checking with your log.out, I see that you are using CP2K 4.1 and the output is different, but I see that some values are equal to mine, for instance:
Number of electrons: 2168
Number of occupied orbitals: 1084
Number of molecular orbitals: 1084
Number of orbital functions: 7170
Number of independent orbital functions: 7170
These values are very important for the performance. For the GNU+OpenMPI they are bigger, that means that you can expect slower performance there.
At this point, I strongly suggest to run the regtests to check your installation. Make sure
Then, I can suggest you run a smaller test (you can take some a test under tests/QS/benchmark/H2O-32.inp) and run with a single rank, so that you can do a fast comparison without MPI. If this is reasonable, then you can move to more ranks.
Another suggestion is to check how many cores are really involved during the execution (you can use htop).
Alfio