Segmentation Violation Error for NWChem 7.2.0

44 views
Skip to first unread message

Angelo Raymond Rossi

unread,
Jun 20, 2023, 4:28:53 PM6/20/23
to NWChem Forum
Hello,

I am hoping that you can help me with the following error, although I understand it may be system dependent.

Here is a snippet of a log file (for any simple input file):

                             NWChem Input Module
                                -------------------
 Scaling coordinates for geometry "geometry" by  1.889725989
 (inverse scale =  0.529177249)
0:Segmentation Violation error, status=: 11
(rank:0 hostname:cn503 pid:1911346):ARMCI DASSERT fail. ../../ga-5.8.2/armci/src/common/signaltrap.c:SigSegvHandler():315 cond:0
The above error message refers to line 315 in the file:
/shared/chem5326/nwchem-7.2.0/src/tools/ga-5.8.2/armci/src/common/signaltrap.c
I don't understand this.  For an earlier version nwchem-7.0.2, compilation and execution works perfectly. 

The error file contains the following:
Last System Error Message from Task 0:: No such file or directory
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode 11.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

For the nwchem-7.0.2 bersion, compilation and execution works perfectly.

Although the UConn HPC Linux Cluster contains older Intel nodes, the error above occurs on the latest AMD Epyc CPU nodes.

I hope that you can point me in a direction for how to fix this.

Kind regards,

A. R. Rossi

 




Edoardo Aprà

unread,
Jun 20, 2023, 7:33:23 PM6/20/23
to NWChem Forum
Could you share the setup of your compilation (for example: environment variables that were set, compilers used, MPI libraries used, etc ...)?

Angelo Rossi

unread,
Jun 21, 2023, 8:14:32 AM6/21/23
to nwchem...@googlegroups.com
Good Morning.
 
Thank you for looking into this for me.

Info for Login Node on Linux Cluster
============================
Linux version 4.18.0-372.9.1.el8.x86_64 (mock...@x86-vm-09.build.eng.bos.redhat.com)


 nwchem-7.2.0-configure.x
=====================
xport NWCHEM_TOP=/shared/chem5326/nwchem-7.2.0
export LARGE_FILES=TRUE
export NWCHEM_TARGET=LINUX64
#export NWCHEM_MODULES=all
export NWCHEM_MODULES="all python"
export USE_PYTHONCONFIG=y
#Note the third number in the python version should not be kept: 3.10.5 should be set as 3.10
export PYTHONVERSION=3.10
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export USE_NOIO=TRUE
export USE_NOFSCHECK=TRUE
export MRCC_METHODS=TRUE
export CCSDTQ=TRUE
export ARMCI_NETWORK=OPENIB
export USE_INTERNALBLAS=y

module load cmake/3.23.2
module load gcc/11.3.0
module load zlib/1.2.12
module load openmpi/4.1.4
module load tcl/8.6.12
module load sqlite3/3.39.0
module load python/3.10.5

nwchem-7.0.2-compile.x*
===================
cd $NWCHEM_TOP/src
make nwchem_config
make >& make.log

Final Load Module
==============
[anr11010@login6 LINUX64]$pwd
/shared/chem5326/nwchem-7.2.0/bin/LINUX64

[anr11010@login6 LINUX64]$ldd ./nwchem
linux-vdso.so.1 (0x0000155555551000)
libmpi_usempif08.so.40 => /gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4/lib/libmpi_usempif08.so.40 (0x00001555550e5000)
libmpi_usempi_ignore_tkr.so.40 => /gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4/lib/libmpi_usempi_ignore_tkr.so.40 (0x0000155554ed6000)
libmpi_mpifh.so.40 => /gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4/lib/libmpi_mpifh.so.40 (0x0000155554c68000)
libmpi.so.40 => /gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4/lib/libmpi.so.40 (0x0000155554750000)
libibverbs.so.1 => /lib64/libibverbs.so.1 (0x0000155554530000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000155554310000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00001555540e7000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000155553ee3000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000155553cdf000)
libgfortran.so.5 => /gpfs/sharedfs1/admin/hpc2.0/apps/gcc/11.3.0/lib64/libgfortran.so.5 (0x0000155553832000)
libm.so.6 => /lib64/libm.so.6 (0x00001555534b0000)
libmvec.so.1 => /lib64/libmvec.so.1 (0x0000155553285000)
libgcc_s.so.1 => /gpfs/sharedfs1/admin/hpc2.0/apps/gcc/11.3.0/lib64/libgcc_s.so.1 (0x000015555306c000)
libquadmath.so.0 => /gpfs/sharedfs1/admin/hpc2.0/apps/gcc/11.3.0/lib64/libquadmath.so.0 (0x0000155552e25000)
libc.so.6 => /lib64/libc.so.6 (0x0000155552a5f000)
libopen-rte.so.40 => /gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4/lib/libopen-rte.so.40 (0x000015555273b000)
libopen-pal.so.40 => /gpfs/sharedfs1/admin/hpc2.0/apps/openmpi/4.1.4/lib/libopen-pal.so.40 (0x000015555228b000)
libucp.so.0 => /gpfs/sharedfs1/admin/hpc2.0/apps/ucx/1.13.1/lib/libucp.so.0 (0x0000155551fc5000)
libuct.so.0 => /gpfs/sharedfs1/admin/hpc2.0/apps/ucx/1.13.1/lib/libuct.so.0 (0x0000155551d8d000)
libucs.so.0 => /gpfs/sharedfs1/admin/hpc2.0/apps/ucx/1.13.1/lib/libucs.so.0 (0x0000155551b30000)
libucm.so.0 => /gpfs/sharedfs1/admin/hpc2.0/apps/ucx/1.13.1/lib/libucm.so.0 (0x0000155551916000)
librdmacm.so.1 => /lib64/librdmacm.so.1 (0x00001555516fb000)
libpmi2.so.0 => /cm/shared/apps/slurm/current/lib64/libpmi2.so.0 (0x00001555514e3000)
libpmi.so.0 => /cm/shared/apps/slurm/current/lib64/libpmi.so.0 (0x00001555512dd000)
libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00001555510ba000)
libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x0000155550e28000)
librt.so.1 => /lib64/librt.so.1 (0x0000155550c20000)
libz.so.1 => /gpfs/sharedfs1/admin/hpc2.0/apps/zlib/1.2.12/lib/libz.so.1 (0x0000155550a04000)
libhwloc.so.5 => /cm/shared/apps/hwloc/1.11.11/lib/libhwloc.so.5 (0x00001555507c6000)
libnuma.so.1 => /lib64/libnuma.so.1 (0x00001555505ba000)
libudev.so.1 => /lib64/libudev.so.1 (0x0000155550323000)
libxml2.so.2 => /lib64/libxml2.so.2 (0x000015554ffbb000)
libevent_core-2.1.so.6 => /lib64/libevent_core-2.1.so.6 (0x000015554fd82000)
libevent_pthreads-2.1.so.6 => /lib64/libevent_pthreads-2.1.so.6 (0x000015554fb7f000)
/lib64/ld-linux-x86-64.so.2 (0x0000155555326000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x000015554f967000)
libslurm_pmi.so => /cm/shared/apps/slurm/22.05.9/lib64/slurm/libslurm_pmi.so (0x000015554f58b000)
libmount.so.1 => /lib64/libmount.so.1 (0x000015554f331000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x000015554f10a000)
libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x000015554ec21000)
libblkid.so.1 => /lib64/libblkid.so.1 (0x000015554e9ce000)
libuuid.so.1 => /lib64/libuuid.so.1 (0x000015554e7c6000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x000015554e59b000)
libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x000015554e317000)
[anr11010@login6 LINUX64]$



--
You received this message because you are subscribed to the Google Groups "NWChem Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nwchem-forum...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nwchem-forum/52d7ca8a-8712-45bb-bb01-9a321eb7014an%40googlegroups.com.

Edoardo Aprà

unread,
Jun 21, 2023, 9:38:58 AM6/21/23
to NWChem Forum
What fortran compiler was used? Was it gfortran?

Edoardo Aprà

unread,
Jun 21, 2023, 10:12:27 AM6/21/23
to NWChem Forum
The following should fix your problem (bash syntax)

export BLAS_SIZE=8
cd $NWCHEM_TOP/src/64to32blas
make clean
make
cd ..
make link

However, we do not recommend the use of USE_INTERNALBLAS.
A much better (and safer, as you experience shows) is to use BUILD_OPENBLAS and BUIL_SCALAPACK

Angelo Rossi

unread,
Jun 21, 2023, 1:41:11 PM6/21/23
to nwchem...@googlegroups.com
It worked !!!   Hooray!!  Thanks so much for your help.  I think that I will eventually recompile so that I do not use internal blas and scalapack routines.  On the cluster all the apps, i.e openmpi, gcc, gfortran, python, openblas, scalapack have different prerequisites which sometimes conflict with each other when loaded with the module app.  For example, module load openmpi may have different (or the same) prerequisites, as for example, module load scalapack.  I will go back and do this more carefully.

Grazie mille ancora, Angelo

Edoardo Aprà

unread,
Jun 21, 2023, 1:45:10 PM6/21/23
to NWChem Forum
If you do set the following environment variables, then you don't have to worry/rely on the openblas/scalapack installed on your system since NWChem will build the appropriate OpenBLAS and ScaLAPACK libraries (therefore NWChem will not use the system  OpenBLAS and ScaLAPACK librarie) . In other words, you  would simply need to have a working compiler and a working MPI implementation.

 export BUILD_OPENBLAS=1
 export BUILD_SCALAPACK=1
 export BLAS_SIZE=8
 export SCALAPACK_SIZE=8
 unset USE_INTERNALBLAS
Reply all
Reply to author
Forward
0 new messages