Error in Global Array: ga_pdsyevd/ga_llt_i: descinit A failed:Received an Error in Communication

Tong Xiro

unread,

May 9, 2022, 10:04:46 AM5/9/22

to NWChem Forum

I am trying to run NWCHEM on a cluster with NVIDIA gpu, the installation process are successful, however when I run the program with the test case with

mpirun -np 4 --mca orte_base_help_aggregate 0 $NWCHEM_TOP/bin/LINUX64/nwchem $NWCHEM_TOP/web/benchmarks/dft/siosi3.nw

The program filed with error

0: ga_llt_i: descinit A failed:Received an Error in Communication
1: ga_llt_i: descinit A failed:Received an Error in Communication
2: ga_llt_i: descinit A failed:Received an Error in Communication
3: ga_llt_i: descinit A failed:Received an Error in Communication
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 3 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI COMMUNICATOR 3 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

Does anyone meet this error before?

Edoardo Aprà

unread,

May 9, 2022, 1:23:47 PM5/9/22

to NWChem Forum

Could you describe to some details the setting (e.g. environment variables) used in the NWChem installation you are using?

Tong Xiro

unread,

May 9, 2022, 11:36:34 PM5/9/22

to NWChem Forum

Hi, here is the .bashrc file I am using

# gpu

module load cuda/11.1.1

module load nvhpc/22.1

module load openmpi/4.0.5-nvhpc22.1

module load aocl/3.1.0

# module load openblas/0.3.12-gcc10.2.0

# top directory of the NWChem source tree

export NWCHEM_TOP=/ocean/projects/cis210088p/sst/NWChem/gpu_ompi_ARMCI_MPI/nwchem

# your target platform

export NWCHEM_TARGET=LINUX64

export ARMCI_NETWORK=ARMCI-MPI

# General configurations

export USE_MPI=y

export USE_OPENMP=1

export NWCHEM_MODULES=qm

# export BLASOPT="-mkl"

# export BLASOPT="-lcublas"

export BLASOPT="-lblis"

export BLAS_SIZE=8

# export LAPACK_LIB="-lcublas"

export LAPACK_LIB="-lblis -llapack"

export USE_SCALAPACK=y

#! Remember to change

# export SCALAPACK="-lcublas"

export SCALAPACK="-lblis -lscalapack"

# export SCALAPACK="-mkl -lmkl_scalapack_ilp64 -lmkl_blacs_intelmpi_ilp64"

export SCALAPACK_SIZE=8

# For GPU

export FC=nvfortran

export USE_F90_ALLOCATABLE=1

export USE_OPENACC_TRPDRV=1

export NWCHEM_LINK_CUDA=1

# export DEV_GA=1 # optional

export MA_USE_CUDA_MEM=1

export TCE_CUDA=Y

export CUDA=nvcc

export CUDA_LIBS="-L/jet/packages/spack/opt/spack/linux-centos8-zen/gcc-8.3.1/cuda-10.2.89-kz7u4ix6ed53nioz4ycqin3kujcim3bs/lib64/ -lcudart"

export CUDA_FLAGS="-arch sm_70 "

export CUDA_ARCH="-arch sm70"

export CUDA_INCLUDE="-I/jet/packages/spack/opt/spack/linux-centos8-zen/gcc-8.3.1/cuda-10.2.89-kz7u4ix6ed53nioz4ycqin3kujcim3bs/include"

# export OMP_NUM_THREADS=

module load nvhpc/22.1

Tong Xiro

unread,

May 10, 2022, 2:23:45 AM5/10/22

to NWChem Forum

Sorry, I have figured out the problem, I may forget to change the version of cuda version in the option below, I will try again

Edoardo Aprà

unread,

May 10, 2022, 2:26:29 AM5/10/22

to NWChem Forum

I believe your BLAS_SIZE and SCALAPACK_SIZE settings are likely to be incorrect.