comparison of psmp and popt (with and without openmp)

1,406 views
Skip to first unread message

Ronald Cohen

unread,
Mar 23, 2016, 4:28:19 PM3/23/16
to cp2k
So I finally got decent performance with gfortran, openmpi, and openblas across inifiniband. Now I find that the use of openmp and 
half the number of mpi processes seems to give better performance for the 64 molecule H2O test case. Is that reasonable? I recompiled everything including BLAS, scalapack, etc without -fopenmp etc. to make the popt version.

I find in seconds:

1 node 16 MPI procs psmp OMP_NUM_THREADS=1              834
1 node 16 MPI procs popt OMP_NUM_THREADS=1                836
2 nodes 16 MPI procs psmp OMP_NUM_THREADS=2             266
2 nodes 32 MPI procs popt OMP_NUM_THREADS=1               430
4 nodes   64 MPI procs popt OMP_NUM_THREADS=1             331
4 nodes   32 MPI procs psmp OMP_NUM_THREADS=2           189
4 nodes   64 MPI procs psmp OMP_NUM_THREADS=4           166

So you see there is no overhead using psmp built with openmp and setting threads to 1.
Using OMP THREADS greatly improves performance over just increasing mpi processes
This may be because this machine has only 1 GB memory per core, but even 4 threads is better than 2, so it seems openmp 
is more efficient than mpi.

Still room for improvement though. Any ideas of how to tweak out better performance?


Ron

 

Cohen, Ronald

unread,
Mar 25, 2016, 1:01:40 PM3/25/16
to cp2k
I am finding very strange dependence of the benchmark on how I run under openmpi. Does anyone have any insight?

cp2k 3.0

If I simply use:

mpirun  -n 16 cp2k.psmp H2O-64.inp >> H2O-64_REC.log

with

#PBS -l nodes=n013.cluster.com:ppn=4+n014.cluster.com:ppn=4+n015.cluster.com:ppn=4+n016.cluster.com:ppn=4
for example.

The timing is 165 seconds, and for

#PBS -l nodes=4:ppn=16,pmem=1gb
mpirun  --map-by ppr:4:node -n 16  cp2k.psmp H2O-64.inp >> H2O-64_REC.log 
it is 368 seconds!

Ron


---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco...@carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727

Cohen, Ronald

unread,
Mar 25, 2016, 1:14:44 PM3/25/16
to cp2k
It seems our cluster is slower today than yesterday, as when I ran the -n 16 benchmark again I got the slower speed of 371 second rather than 165. Very strange. I can reproduce todaty's number, but not yesterday's. I have the log files attached.

Ron


---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
rco...@carnegiescience.edu
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727

H2O-64_REC_slow.log
H2O-64_REC_fast.log

Samuel Andermatt

unread,
Apr 18, 2016, 3:55:23 AM4/18/16
to cp2k
The omp/mpi scaling of CP2K is regularily tested under https://dashboard.cp2k.org/archive/scaling/index.html .
I guess the reason for the large speedup through openmp in your case could likely be the limited memory per core that you mentioned.
Reply all
Reply to author
Forward
0 new messages