Parallelization question

Peter Gierth

unread,

Jul 9, 2025, 6:34:44 PMJul 9

to NWChem Forum

Hi,

I'm running some small organic molecule calculations on a desktop system.

CPU is intel core i9 14900k cpu, 8 performance cores (with hyperthreading) and 16 efficiency cores (no hyperthreading)

OS ubuntu 24.4lts,

NWchem version: nwchem-openmpi/noble,now 7.2.2-1build3 amd64 , installed from the standard Ubuntu repository with apt

I get strange results with different numbers of processors, and am curious whether this is expected. Typical molecules are a few tens of second row elements.

With strychnine (C21H22N2O2), specifically, using input as follows:

memory stack 1500 mb heap 1500 mb global 3000 mb
echo

start molecule

title "strichnine_nmr.nw"
echo

start

charge 0

<coordinates>

basis
* library 6-311g*
end

cosmo
do_cosmo_smd true
solvent chcl3
end

dft
xc mpw91 0.75 HFexch 0.25 perdew91
mult 1
end

task dft energy

property
shielding
end
task dft property

If I run with nwchem.openmpi <inputfile> , the timings are:

Total times cpu: 16333.7s wall: 6454.8s

(you can see during the calculation, that there is one process active but the cpu usage goes above 100% ie it's really using more than one core)

if I use /usr/bin/mpirun.openmpi -np 2 nwchem.openmpi <inputfile>

then there are two processes, again sometimes using >100%. Timings are:

Total times cpu: 4034.3s wall: 3866.3s

With -np 4:

Total times cpu: 6220.2s wall: 3602.7s

With -np 8

Total times cpu: 3407.0s wall: 2888.1s

So the time is substantially reduced with two processes but not so much after that.

However, with smaller molecules, more processes can take longer. With coumarin (C9H6O2) and the same level of theory as above, I get:

-np 1: Total times cpu: 3407.0s wall: 2888.1s

-np 2:Total times cpu: 231.0s wall: 205.2s

-np 4: Total times cpu: 1347.3s wall: 593.8s

-np 8: Total times cpu: 346.1s wall: 244.2s

So 2 processes is optimal and more may be substantially worse.

I get similar results with energy calculations.

Is this expected behaviour for systems of this sort of size? Is there anything I'm doing wrong with running the calculations? Any thoughts gratefully received!

Output files are attached for reference

Pete

coumarin_nmr_np2.nwo

strychnine_nmr_np1.nwo

coumarin_nmr_np8.nwo

strychnine_nmr_np2.nwo

strychnine_nmr_np4.nwo

strychnine_nmr_np8.nwo

coumarin_nmr_np1.nwo

coumarin_nmr_np4.nwo

Edoardo Aprà

unread,

Jul 11, 2025, 6:36:33 PMJul 11

to NWChem Forum

I think that the input file you have posted are fine and that you should get better performances than the one reported.

Instead of running the debian/ubuntu provided NWChem package, could you try the NWChem Docker image described in the link below?

The only requirement is that you need docker to be installed. I am curious to see if you see any difference in performances. This Docker image is built with what I believe are optimal settings for a NWChem installation.

https://github.com/nwchemgit/nwchem-dockerfiles/blob/master/nwchem-dev.mpipr/README.md

Peter Gierth

unread,

Aug 15, 2025, 1:27:38 PMAug 15

to NWChem Forum

Dear Edoardo,

Thanks for the response and apologies for the delay in my reply. So, with the docker version,. for the coumarin example where the previous times were

-np 1 367s

-np2 205s

-np4 593s

-np8 244s

With the docker version I get:

MYNPROC=2 310s

MYNPROC=4 124s

MYNPROC=8 60s

so a factor of about 3 speedup compared to the previous best case.

With strychnine I get

MYNPROC=2 5850s

MYNPROC=4 2358s

MYNPROC=8 1229s

So 2 processors is surprisingly slow, but with 8 processors I get at least a factor 2 speedup compared to the best case with the operating system version of NWchem, and going from 4 to 8

gives essentially exactly a factor 2 as expected.

However, I'm slightly confused about running the docker version. It seems that you have to put a compose file in whatever directory you want to work in, and then

start the container, then run the job (I guess at this point if you have multiple jobs in the same directory you can run them sequentially here) and then shut down the container

before you move to another directory and repeat the process. Is this right? I might want to incorporate this into a wider workflow

Also as note for others, the default compose file had a small number for the shared memory (shmem) size (512mb), and with nontrivial molecules it was crashing on running out of shared memory.

Increasing this setting solved the problem, eg changing to :

shm_size: 8gb

Thanks for your assistance with this, it's much appreciated!

Edoardo Aprà

unread,

Aug 15, 2025, 9:20:09 PMAug 15

to NWChem Forum

I have modified the content of compose.yaml so that you should be able to run in any directory you want, without the previous constraint of remaining in the same directory where compose.yml was initially located.
Please let me know if this address your problem.

Reply all

Reply to author

Forward