Parallelization question

35 views
Skip to first unread message

Peter Gierth

unread,
Jul 9, 2025, 6:34:44 PMJul 9
to NWChem Forum
Hi,

I'm running some small organic molecule calculations on a desktop system.
CPU is intel core i9 14900k cpu, 8 performance cores (with hyperthreading) and 16 efficiency cores (no hyperthreading)
OS ubuntu 24.4lts,
NWchem version: nwchem-openmpi/noble,now 7.2.2-1build3 amd64 , installed from the standard Ubuntu repository with apt

I get strange results with different numbers of processors, and am curious whether this is expected. Typical molecules are a few tens of second row elements. 

With strychnine  (C21H22N2O2),  specifically, using input as follows:

memory stack 1500 mb heap 1500 mb global 3000 mb
echo

start molecule

title "strichnine_nmr.nw"
echo

start

charge 0

<coordinates>


basis
  * library 6-311g*
end

cosmo
  do_cosmo_smd true
  solvent chcl3
end

dft
  xc mpw91 0.75 HFexch 0.25 perdew91
  mult 1
end

task dft energy

property
  shielding
end
task dft property

If I run with nwchem.openmpi <inputfile> ,  the timings are:
Total times  cpu:    16333.7s     wall:     6454.8s

(you can see during the calculation, that there is one process active but the cpu usage goes above 100% ie it's really using more than one core)

if I use /usr/bin/mpirun.openmpi -np 2 nwchem.openmpi <inputfile>
then there are two processes, again sometimes using >100%. Timings are:
 Total times  cpu:     4034.3s     wall:     3866.3s

With -np 4:
 Total times  cpu:     6220.2s     wall:     3602.7s
With -np 8
 Total times  cpu:     3407.0s     wall:     2888.1s

So the time is substantially reduced with two processes but not so much after that.

However, with smaller molecules, more processes can take longer. With coumarin (C9H6O2) and the same level of theory as above, I get:

-np 1: Total times  cpu:     3407.0s     wall:     2888.1s
-np 2:Total times  cpu:      231.0s     wall:      205.2s
-np 4: Total times  cpu:     1347.3s     wall:      593.8s
-np 8: Total times  cpu:      346.1s     wall:      244.2s

So 2 processes is optimal and more may be substantially worse.
I get similar results with energy calculations.

Is this expected behaviour for systems of this sort of size? Is there anything I'm doing wrong with running the calculations? Any thoughts gratefully received!

Output files are attached for reference

Pete
 
coumarin_nmr_np2.nwo
strychnine_nmr_np1.nwo
coumarin_nmr_np8.nwo
strychnine_nmr_np2.nwo
strychnine_nmr_np4.nwo
strychnine_nmr_np8.nwo
coumarin_nmr_np1.nwo
coumarin_nmr_np4.nwo

Edoardo Aprà

unread,
Jul 11, 2025, 6:36:33 PMJul 11
to NWChem Forum
I think that the input file you have posted are fine and that you should get better performances than the one reported.

Instead of running the debian/ubuntu provided NWChem package, could you try the NWChem Docker image described in the link below?
The only requirement is that you need docker to be installed. I am curious to see if you see any difference in performances. This Docker image is built with what I believe are optimal settings for a NWChem installation.

Peter Gierth

unread,
Aug 15, 2025, 1:27:38 PMAug 15
to NWChem Forum
Dear Edoardo,

Thanks for the response and apologies for the delay in my reply. So, with the docker version,. for the coumarin example where the previous times were
-np 1 367s
-np2 205s
-np4 593s
-np8 244s

With the docker version I get:
MYNPROC=2 310s
MYNPROC=4   124s
MYNPROC=8   60s

so a factor of about 3 speedup compared to the previous best case.

With strychnine I get 

MYNPROC=2 5850s
MYNPROC=4   2358s
MYNPROC=8   1229s

So 2 processors is surprisingly slow, but with 8 processors I get at least a factor 2 speedup compared to the best case with the operating system version of NWchem, and going from 4 to 8 
gives essentially exactly a factor 2 as expected.


However, I'm slightly confused about running the docker version. It seems that you have to put a compose file in whatever directory you want to work in, and then 
start the container, then run the job (I guess at this point if you have multiple jobs in the same directory you can run them sequentially here) and then shut down the container 
before you move to another directory and repeat the process. Is this right? I might want to incorporate this into a wider workflow

Also as note for others, the default compose file had a small number for the shared memory (shmem) size (512mb), and with nontrivial molecules it was crashing on running out of shared memory.
Increasing this setting solved the problem, eg changing to :

shm_size: 8gb 

Thanks for your assistance with this, it's much appreciated!



Edoardo Aprà

unread,
Aug 15, 2025, 9:20:09 PMAug 15
to NWChem Forum
I have modified the content of compose.yaml so that you should be able to run in any directory you want, without the previous constraint of remaining in the same directory where compose.yml was initially located.
Please let me know if this address your problem.
Reply all
Reply to author
Forward
0 new messages