comparison of psmp and popt (with and without openmp)

Ronald Cohen

unread,

Mar 23, 2016, 4:28:19 PM3/23/16

to cp2k

So I finally got decent performance with gfortran, openmpi, and openblas across inifiniband. Now I find that the use of openmp and

half the number of mpi processes seems to give better performance for the 64 molecule H2O test case. Is that reasonable? I recompiled everything including BLAS, scalapack, etc without -fopenmp etc. to make the popt version.

I find in seconds:

1 node 16 MPI procs psmp OMP_NUM_THREADS=1 834

1 node 16 MPI procs popt OMP_NUM_THREADS=1 836

2 nodes 16 MPI procs psmp OMP_NUM_THREADS=2 266

2 nodes 32 MPI procs popt OMP_NUM_THREADS=1 430

4 nodes 64 MPI procs popt OMP_NUM_THREADS=1 331

4 nodes 32 MPI procs psmp OMP_NUM_THREADS=2 189

4 nodes 64 MPI procs psmp OMP_NUM_THREADS=4 166

So you see there is no overhead using psmp built with openmp and setting threads to 1.

Using OMP THREADS greatly improves performance over just increasing mpi processes

This may be because this machine has only 1 GB memory per core, but even 4 threads is better than 2, so it seems openmp

is more efficient than mpi.

Still room for improvement though. Any ideas of how to tweak out better performance?

Ron

Cohen, Ronald

unread,

Mar 25, 2016, 1:01:40 PM3/25/16

to cp2k

I am finding very strange dependence of the benchmark on how I run under openmpi. Does anyone have any insight?

cp2k 3.0

If I simply use:

mpirun -n 16 cp2k.psmp H2O-64.inp >> H2O-64_REC.log

with

#PBS -l nodes=n013.cluster.com:ppn=4+n014.cluster.com:ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4
for example.

The timing is 165 seconds, and for

#PBS -l nodes=4:ppn=16,pmem=1gb

mpirun --map-by ppr:4:node -n 16 cp2k.psmp H2O-64.inp >> H2O-64_REC.log

it is 368 seconds!

Ron

---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco...@carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727

Cohen, Ronald

unread,

Mar 25, 2016, 1:14:44 PM3/25/16

to cp2k

It seems our cluster is slower today than yesterday, as when I ran the -n 16 benchmark again I got the slower speed of 371 second rather than 165. Very strange. I can reproduce todaty's number, but not yesterday's. I have the log files attached.

Ron

---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco...@carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727

H2O-64_REC_slow.log

H2O-64_REC_fast.log

Samuel Andermatt

unread,

Apr 18, 2016, 3:55:23 AM4/18/16

to cp2k

The omp/mpi scaling of CP2K is regularily tested under https://dashboard.cp2k.org/archive/scaling/index.html .
I guess the reason for the large speedup through openmp in your case could likely be the limited memory per core that you mentioned.

Reply all

Reply to author

Forward