Segmentation Fault when running on multiple nodes

245 views
Skip to first unread message

celemmon

unread,
Aug 10, 2020, 6:40:45 PM8/10/20
to NWChem Forum
Hi all,

I would be pleased if anyone had some insight as to this error I am coming across. Almost any time I try to run on multiple nodes, the software immediately crashes with a segmentation fault. Sometimes, I can run on 2 nodes, but often running on 2 nodes still does not work. Attached are my input, output, input batch file (run.pbs) and the system error log (run.err).

Thanks in advance for any insight and help.
Segmentation_Errors_in_NWchem_on_multiple_nodes.zip

Edoardo Aprà

unread,
Aug 10, 2020, 8:03:44 PM8/10/20
to NWChem Forum
It's hard to tell what is going on without knowing what version of NWChem is used and how it was compiled (for example, what env. variables were set)

celemmon

unread,
Aug 10, 2020, 8:48:26 PM8/10/20
to NWChem Forum
Hi Edoardo,

It is version 6.8. It was not compiled by me but is a module installed by the administrators of the cluster I am using. I asked them and they said it is best to ask here.

One more note - like I said I can sometimes run on 2 nodes successfully, and I have found this only works when I have "setenv ARMCI_DEFAULT_SHMMAX=2048" in my .pbs submission script.

Edoardo Aprà

unread,
Aug 10, 2020, 9:31:35 PM8/10/20
to NWChem Forum
Carl
How much memory does your computer have on each node?
The input file shows that you are asking 128 gb on each process and you are going to use 16 processes on each node. This has the result of asking for  2048 gb of memory on each node (128*16)
Do you have a two terabytes of memory on each node?

celemmon

unread,
Aug 12, 2020, 4:44:40 PM8/12/20
to NWChem Forum
Hi Edoardo,

Thanks for the reply. I did not realize I was requesting that much per process, I for some reason assumed it was total. I have adjusted my memory requests to be 7GB total per process and 6GB total in the nwchem input. There is 128GB per node. That said, I still receive the same segmentation error, see attached files once again.

lindqvist_64.tar.gz

Edoardo Aprà

unread,
Aug 12, 2020, 5:38:36 PM8/12/20
to NWChem Forum
Carl
Have you tried input files with some more modest memory requirements? Were the successful?
For example, I would try the following memory line

memory 1000 mb

If this still fails, I would try 500 mb

celemmon

unread,
Aug 12, 2020, 7:22:58 PM8/12/20
to NWChem Forum
Hi Edoardo,

Just tried both, and with the same result.

Edoardo Aprà

unread,
Aug 12, 2020, 7:42:45 PM8/12/20
to NWChem Forum
Have you tried to reduce the number of processes per node?
If that still fails, I would suggest to install the latest NWChem release

celemmon

unread,
Aug 13, 2020, 8:39:21 PM8/13/20
to NWChem Forum
Still getting the same error, so I am guessing a fresh install is necessary.

Sorry for the following newbie questions, but I've little experience installing/compiling things in linux.

I currently am running nwchem 6.8 which is a preinstalled module on the cluster I am using. I have no root or sudo permissions. The cluster has centOS 7.8 installed. I am aware nwchem 7.0.0 could be installed using EPEL but I don't think I can do that without root permission.

I am not sure exactly how to compile from source but tried the following script:

#!/bin/bash
#PBS -q condo
#PBS -N nwchem_compile
#PBS -l nodes=1:ppn=1
#PBS -l walltime=1:00:00
#PBS -o nwchem_compile.out
#PBS -e nwchem_compile.err
#PBS -V
#PBS -m abe

module load scalapack
export NWCHEM_TOP=local/src/nwchem-7.0.0
export NWCHEM_TARGET=LINUX64
export ARMCI_NETWORK=MPI-PR
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=$MPIHOME
export MPI_LIB=$MPIHOME/lib
export MPI_INCLUDE=$MPIHOME/include
export LIBMPI="-lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi"
export BLASOPT="-lopenblas -lpthread -lrt"
export LAPACK_LIB="-lopenblas -lpthread -lrt"
export USE_SCALAPACK=y
export SCALAPACK=$SCALAPACKHOME
export SCALAPACK_SIZE=4
export USE_64TO32=y
export BLAS_SIZE=4


cd $NWCHEM_TOP/src
make nwchem_config NWCHEM_MODULES="all"
make 64_to_32
make

And am met with the following errors:

config/makefile.h:227: local/src/nwchem-7.0.0/src/config/nwchem_config.h: No such file or directory
config/makefile.h:227: local/src/nwchem-7.0.0/src/config/nwchem_config.h: No such file or directory
config/makefile.h:227: local/src/nwchem-7.0.0/src/config/nwchem_config.h: No such file or directory
fatal: Not a git repository (or any parent up to mount point /home/celemmon)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
../config/makefile.h:227: local/src/nwchem-7.0.0/src/config/nwchem_config.h: No such file or directory
make[1]: *** No rule to make target `local/src/nwchem-7.0.0/src/config/nwchem_config.h'.  Stop.
make: *** [libraries] Error 1

Edoardo Aprà

unread,
Aug 13, 2020, 9:16:36 PM8/13/20
to NWChem Forum
  1. Please do not set the MPI variables, just have the mpif90 in your path https://nwchemgit.github.io/Compiling-NWChem.html#automatic-detection-of-mpi-variables-with-mpif90
  2. you must have the wrong value for NWCHEM_TOP. Please unset it and let NWChem guess it for you (this is an undocumented feature of version 7.0.0)
  3. Please unset USE_SCALAPACK

celemmon

unread,
Aug 13, 2020, 9:50:23 PM8/13/20
to NWChem Forum
Similar result this time around:

config/makefile.h:227: /home/celemmon/local/src/config/nwchem_config.h: No such file or directory
make: /home/celemmon/local/src/tools/guess-mpidefs: Command not found
config/makefile.h:227: /home/celemmon/local/src/config/nwchem_config.h: No such file or directory
make: /home/celemmon/local/src/tools/guess-mpidefs: Command not found
config/makefile.h:227: /home/celemmon/local/src/config/nwchem_config.h: No such file or directory
make: /home/celemmon/local/src/tools/guess-mpidefs: Command not found
fatal: Not a git repository (or any parent up to mount point /home/celemmon)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
../config/makefile.h:227: /home/celemmon/local/src/config/nwchem_config.h: No such file or directory
make[1]: /home/celemmon/local/src/tools/guess-mpidefs: Command not found
make[1]: /home/celemmon/local/src/tools/guess-mpidefs: Command not found
make[1]: /home/celemmon/local/src/tools/guess-mpidefs: Command not found
make[1]: /home/celemmon/local/src/tools/guess-mpidefs: Command not found
make[1]: *** No rule to make target `/home/celemmon/local/src/config/nwchem_config.h'.  Stop.
make: *** [libraries] Error 1

This is very confusing to me because nwchem_config.h and guess-mpidefs are definitely there.

Edoardo Aprà

unread,
Aug 13, 2020, 10:10:13 PM8/13/20
to NWChem Forum
Please post the output of the command

find /home/celemmon -name makefile.h

celemmon

unread,
Aug 13, 2020, 10:38:41 PM8/13/20
to NWChem Forum

/home/celemmon/local/src/nwchem-7.0.0/src/nwpw/paw/makefile.h
/home/celemmon/local/src/nwchem-7.0.0/src/config/makefile.h
/home/celemmon/local/src/nwchem-7.0.0/src/tools/ga-5.7.1/pario/makefile.h

Yavuz

unread,
Aug 13, 2020, 11:09:31 PM8/13/20
to NWChem Forum
Hi
Try following 

module swap gnu7 share_modules/INTEL/2015_openmpi3
module load share_modules/INTEL/2015_openmpi3
unset SCALAPACK
unset SCALAPACK_LIB
unset SCALAPACK_LIBS
unset USE_SCALAPACK=n
export LARGE_FILES=TRUE
export NWCHEM_TOP=path
export NWCHEM_TARGET="LINUX64"
export USE_MPI="y"
export NWCHEM_MODULES="all"
export FC="mpifort"
export CC="mpicc"
export MKLLIB="${MKLROOT}/lib/intel64"
export MKLINC="${MKLROOT}/include"
export HAS_BLAS=y
export BLAS_SIZE=8
export BLASOPT="-L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl"
export LAPACK_SIZE=8
export LAPACK_LIB="-L${MKLROOT}/lib/intel64 -lmkl_lapack95_lp64 -lmkl_lapack95_ilp64"
export BLASOPT="-L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl"
make realclean
make clean
make nwchem_config
make -j 8 > make.log &

Yavuz Ceylan

unread,
Aug 13, 2020, 11:09:31 PM8/13/20
to nwchem...@googlegroups.com
Hi

Please try following 
--
You received this message because you are subscribed to the Google Groups "NWChem Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nwchem-forum...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nwchem-forum/b7dbb7c1-9079-4fae-b9e7-d0d1d2ff3951n%40googlegroups.com.

Edoardo Aprà

unread,
Aug 14, 2020, 12:00:52 AM8/14/20
to NWChem Forum
Thanks.
The value for NWCHEM_TOP is

/home/celemmon/local/src/nwchem-7.0.0

Please use the command

export NWCHEM_TOP=/home/celemmon/local/src/nwchem-7.0.0


celemmon

unread,
Aug 15, 2020, 5:12:04 PM8/15/20
to NWChem Forum
Hi all,

Thanks for all the help and patience, I really appreciate it. Unfortunately I am still getting :

config/makefile.h:227: /home/celemmon/local/src/nwchem-7.0.0/src/config/nwchem_config.h: No such file or directory

Edoardo Aprà

unread,
Aug 16, 2020, 10:00:56 PM8/16/20
to NWChem Forum
That line you reported is not a problem, the execution should continue ... is this  your case?
If this  does not happen, please report the full  output generate by the command (single command, no slurm needed)

cd /home/celemmon/local/src/nwchem-7.0.0/src; \
make NWCHEM_TOP=/home/celemmon/local/src/nwchem-7.0.0 nwchem_config NWCHEM_MODULES=all

celemmon

unread,
Aug 17, 2020, 5:17:16 PM8/17/20
to NWChem Forum
Hi Yavuz and Edoardo,

Thank you for all of the help, I have gotten it to compile actually this time (test case with H2O ran fine). I will run some more tests and see if my multi-node issue is now fixed.

Thanks again for all the help and patience.

celemmon

unread,
Aug 17, 2020, 5:40:00 PM8/17/20
to NWChem Forum
I am now successfully running across multiple nodes (up to 8 works so far). Not sure if this was a product of the upgrade to 7.0.0 or maybe a mistake during the sysadmins initial compile but I'll take it. The help is appreciated!
Reply all
Reply to author
Forward
0 new messages