Error in Global Array: ga_pdsyevd/ga_llt_i: descinit A failed:Received an Error in Communication

137 views
Skip to first unread message

Tong Xiro

unread,
May 9, 2022, 10:04:46 AM5/9/22
to NWChem Forum
I am trying to run NWCHEM on a cluster with NVIDIA gpu, the installation process are successful, however when I run the program with the test case with  

mpirun -np 4 --mca orte_base_help_aggregate 0 $NWCHEM_TOP/bin/LINUX64/nwchem $NWCHEM_TOP/web/benchmarks/dft/siosi3.nw

The program filed with error

0: ga_llt_i: descinit A failed:Received an Error in Communication
1: ga_llt_i: descinit A failed:Received an Error in Communication
2: ga_llt_i: descinit A failed:Received an Error in Communication
3: ga_llt_i: descinit A failed:Received an Error in Communication
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

Does anyone meet this error before?

Edoardo Aprà

unread,
May 9, 2022, 1:23:47 PM5/9/22
to NWChem Forum

Could you describe to some details the setting (e.g. environment variables) used in the NWChem installation you are using?

Tong Xiro

unread,
May 9, 2022, 11:36:34 PM5/9/22
to NWChem Forum
Hi, here is the .bashrc file I am using

# gpu
module load cuda/11.1.1
module load nvhpc/22.1
module load openmpi/4.0.5-nvhpc22.1
module load aocl/3.1.0
# module load openblas/0.3.12-gcc10.2.0


# top directory of the NWChem source tree
export NWCHEM_TOP=/ocean/projects/cis210088p/sst/NWChem/gpu_ompi_ARMCI_MPI/nwchem

# your target platform
export NWCHEM_TARGET=LINUX64

export ARMCI_NETWORK=ARMCI-MPI

# General configurations
export USE_MPI=y
export USE_OPENMP=1
export NWCHEM_MODULES=qm
# export BLASOPT="-mkl"
# export BLASOPT="-lcublas"
export BLASOPT="-lblis"
export BLAS_SIZE=8
# export LAPACK_LIB="-lcublas"
export LAPACK_LIB="-lblis -llapack"
export USE_SCALAPACK=y

#! Remember to change
# export SCALAPACK="-lcublas"
export SCALAPACK="-lblis -lscalapack"
# export SCALAPACK="-mkl -lmkl_scalapack_ilp64 -lmkl_blacs_intelmpi_ilp64"
export SCALAPACK_SIZE=8

# For GPU
export FC=nvfortran
export USE_F90_ALLOCATABLE=1
export USE_OPENACC_TRPDRV=1
export NWCHEM_LINK_CUDA=1
# export DEV_GA=1 # optional

export MA_USE_CUDA_MEM=1
export TCE_CUDA=Y
export CUDA=nvcc
export CUDA_LIBS="-L/jet/packages/spack/opt/spack/linux-centos8-zen/gcc-8.3.1/cuda-10.2.89-kz7u4ix6ed53nioz4ycqin3kujcim3bs/lib64/ -lcudart"
export CUDA_FLAGS="-arch sm_70 "
export CUDA_ARCH="-arch sm70"
export CUDA_INCLUDE="-I/jet/packages/spack/opt/spack/linux-centos8-zen/gcc-8.3.1/cuda-10.2.89-kz7u4ix6ed53nioz4ycqin3kujcim3bs/include"

# export OMP_NUM_THREADS=

module load nvhpc/22.1



Tong Xiro

unread,
May 10, 2022, 2:23:45 AM5/10/22
to NWChem Forum
Sorry, I have figured out the problem, I may forget to change the version of cuda version in the option below, I will try again

Edoardo Aprà

unread,
May 10, 2022, 2:26:29 AM5/10/22
to NWChem Forum
I believe your BLAS_SIZE and SCALAPACK_SIZE settings are likely to be incorrect.
My suggestion is to do the following
export BLAS_SIZE=4
export SCALAPACK_SIZE=4
make 64_to_32
make clean
make

Reply all
Reply to author
Forward
0 new messages