Hi all,
I'm running remd with dftb+ as the driver. My dftb+ client is compiled with MPI support thus it can run in parallel. The problem is when I launch N instances of dftb+ (mpirun -np 2 dftb+) where N equals the number of replicas, the %CPU drops to 50% for each process.
Thus it seems that the overall parallel efficiency is pretty bad.
My OMP_NUM_THREADS=1 as suggested by the dftb+ community for MPI runs.
Anyone can point me what am I doing wrong?
This is the output of 'top' running 4 replicas system with dftb+ (i.e. in total 8 cores are at work).
I run dftb+ with: nohup mpirun.openmpi -np 2 dftb+ &
========================================================
35426 user 20 0 423384 40404 18248 R 50.5 0.0 0:37.69 dftb+
35434 user 20 0 423080 39720 18248 R 50.2 0.0 0:05.70 dftb+
35396 user 20 0 422044 38660 17644 R 49.8 0.0 1:21.35 dftb+
35402 user 20 0 423384 40460 18304 R 49.8 0.0 1:09.54 dftb+
35403 user 20 0 422044 38872 17848 R 49.8 0.0 1:16.15 dftb+
35427 user 20 0 422044 39048 18032 R 49.8 0.0 0:32.93 dftb+
35435 user 20 0 421732 38164 17716 R 49.8 0.0 0:09.07 dftb+
35395 user 20 0 423384 40360 18204 R 49.5 0.0 1:13.18 dftb+
=========================================================
Running the same simulation but with only 1 client for all replicas (nohup mpirun.openmpi -np 8 dftb+ &) yields:
=========================================================
35837 user 20 0 441868 35196 19112 R 100.3 0.0 0:04.90 dftb+
35828 user 20 0 442580 35676 19120 R 100.0 0.0 0:04.83 dftb+
35829 user 20 0 442544 36024 19432 R 100.0 0.0 0:04.91 dftb+
35830 user 20 0 441912 35356 19360 R 100.0 0.0 0:04.91 dftb+
35831 user 20 0 441688 35048 19304 R 100.0 0.0 0:04.90 dftb+
35833 user 20 0 441904 35656 19500 R 100.0 0.0 0:04.91 dftb+
35841 user 20 0 441516 34824 19076 R 100.0 0.0 0:04.89 dftb+
35843 user 20 0 441224 34456 18868 R 99.7 0.0 0:04.89 dftb+
=========================================================