How to ensure good scaling of NWChem for correlated methods?

119 views
Skip to first unread message

Stephen Weitzner

unread,
Feb 12, 2022, 12:36:46 AM2/12/22
to NWChem Forum
Dear all,

I am somewhat new to using NWChem and I have yet to find a good answer to this question looking over the documentation. Are there any flags or directives that can provided to NWChem to improve parallelization / optimize scaling? I come from a materials science background and often use Quantum ESPRESSO, which has a very straightforward way to adjust the parallelization of the code. I am interested in testing double hybrids and correlated wave function methods (e.g., MP2 and CCSD(T)), but struggle to get good parallel performance when increasing the number of cores in my simulations. I understand that these are expensive methods, but I feel like I may be overlooking methods to improve the scaling of my calculations.

Thanks,
Stephen

Edoardo Aprà

unread,
Feb 14, 2022, 1:55:51 PM2/14/22
to NWChem Forum
An important requisite for NWChem to run with good parallel scaling is the installation process. Do you happen to have any detail of the installation?
For example, correct choice of the variable ARMCI_NETWORK (during the installation process) is crucial to obtain parallel scaling.

Stephen Weitzner

unread,
Feb 16, 2022, 1:18:29 AM2/16/22
to NWChem Forum
Thanks Edoardo. Would you happen to have any suggestions for how to set ARMCI_NETWORK (and related variables) for building with Omni-Path interconnects?

Thanks,
Steve

Edoardo Aprà

unread,
Feb 16, 2022, 1:01:37 PM2/16/22
to NWChem Forum
I would use MPI-PR.


On Intel True Scale and Omni Path systems, MPI-PR is more reliable than OPENIB or MPI-SPAWN.

Stephen Weitzner

unread,
Feb 16, 2022, 9:16:03 PM2/16/22
to NWChem Forum
Thanks. Below is our current build script. It seems like this is giving some improved performance. Based on these settings, do you have any other advice / suggested changes?


#!/bin/bash

module load mkl/2020.0
module load intel/19.0.4
module load impi/2019.8

export NWCHEM_TOP=$(pwd)
export NWCHEM_TARGET=LINUX64
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y

export ARMCI_NETWORK=MPI-PR
export PSM2_MEMORY=large

export HAS_BLAS=y
export BLAS_SIZE=8
export BLASOPT="-L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl"

export HAS_LAPACK=y
export USE_LAPACK=y
export LAPACK_SIZE=8
export LAPACK_LIB="$BLASOPT"
export LAPACK_LIBS="$BLASOPT"
export LAPACKOPT="$BLASOPT"

export HAS_SCALAPACK=y
export USE_SCALAPACK=y
export SCALAPACK_SIZE=8
export SCALAPACK="-L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -ldl"
export SCALAPACK_LIB="$SCALAPACK"
export SCALAPACK_LIBS="$SCALAPACK"

export NWCHEM_MODULES=all

export NWCHEM_LONG_PATHS=y
export USE_NOFSCHECK=y
export LARGE_FILES=y

export USE_OPENMP=y

#export PYTHONHOME=/usr
#export PYTHONVERSION=2.7
#export PYTHONLIBTYPE=so
#export USE_PYTHON64=y

#export MPI_LOC="/usr/tce/packages/impi/impi-2019.8-intel-19.0.4/"
#export MPI_INCLUDE="$MPI_LOC/include/gfortran/9.1.0 -I$MPI_LOC/include"
#export MPI_LIB="$MPI_LOC/lib/release_mt -L$MPI_LOC/lib"
#export LIBMPI="-lmpifort -lmpi -lmpigi -ldl -lrt -lpthread"

#export CC=icc
export FC=ifort
export USE_64TO32=y

cd $NWCHEM_TOP/src
#make -j realclean
make -j nwchem_config
make -j 64_to_32
make -j FC=ifort

cd $NWCHEM_TOP/src/tools
make -j FC=ifort version
make -j FC=ifort

cd $NWCHEM_TOP/src
make -j FC=ifort link

Edoardo Aprà

unread,
Feb 16, 2022, 9:21:55 PM2/16/22
to NWChem Forum
This settings look OK to me.
PSM2_MEMORY=large was suggested by another NWChem use in the following github issue https://github.com/nwchemgit/nwchem/issues/284#issuecomment-743984830
I would suggest OpenMPI over Intel MPI for most networks, but probably you want to use Intel software on Intel network hardware.
The 64to32 business is redundant for BLAS_SIZE and SCALAPACK_SIZE=8.

jeff.science

unread,
Feb 17, 2022, 3:36:41 AM2/17/22
to NWChem Forum
PSM2_MEMORY=large is the only known workaround for MPI flow control issues that were never root-caused when I was at Intel.  It isn't always necessary but I'm not aware of any reason not to use it, as it doesn't appear to be a significant memory overhear, relative to what NWChem needs already.
Reply all
Reply to author
Forward
0 new messages