Shiyuan Gao
unread,Jun 15, 2022, 3:00:05 PM6/15/22Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to BerkeleyGW Help, STANLEY, Mauro Del Ben, BerkeleyGW Help, elham oleiki
If I may follow up on this, I'm having a problem where absorption calculation (specifically diagonalization) is using much more memory than expected.
For example I'm running a 2D system with 120*16*1 fine grid and 2 conduction and 2 valence band on 2 nodes, 96 MPI tasks, and the memory stats given by BGW is
Memory available: 3628.9 MB per PE
Memory required for vcoul: 114.4 MB per PE
Memory needed to store the effective Ham. and intkernel arrays: 10.8 MB per PE
Additional memory needed for evecs and diagonalization: 41.6 MB per PE
The 41.6MB value is consistent with the formula posted by Mauro Del Ben.
However, the program would die on the diagonalization step with the error message "Program received signal SIGSEGV: Segmentation fault - invalid memory reference."
It would run successfully with 144 MPI tasks instead. I tried with multiple Nk, Nv, Nc configurations and the threshold for this error seem to be ~100 times the expected memory requirement for diagonalization.
Have anyone encountered similar problem or know what may be the cause?
My
arch.mk is the following and all the other steps of BGW seem perfectly fine.
COMPFLAG = -DGNU
PARAFLAG = -DMPI
MATHFLAG = -DUSESCALAPACK -DUNPACKED -DUSEFFTW3 -DHDF5
FCPP = cpp -C -nostdinc
F90free = mpif90 -ffree-form -ffree-line-length-none -fno-second-underscore
LINK = mpif90 -ldl
FOPTS = -O3
FNOOPTS = $(FOPTS)
MOD_OPT = -J
INCFLAG = -I
C_PARAFLAG = -DPARA -DMPICH_IGNORE_CXX_SEEK
CC_COMP = mpicxx
C_COMP = mpicc
C_LINK = mpicxx
C_OPTS = -O3
C_DEBUGFLAG =
REMOVE = /bin/rm -f
# Math Libraries
MKLDIR = /cm/shared/apps/Intel/2020/compilers_and_libraries_2020.2.254/linux/mkl
FFTWLIB = $(MKLDIR)/lib/intel64/libmkl_scalapack_lp64.a \
-Wl,--start-group \
$(MKLDIR)/lib/intel64/libmkl_gf_lp64.a \
$(MKLDIR)/lib/intel64/libmkl_core.a \
$(MKLDIR)/lib/intel64/libmkl_sequential.a \
$(MKLDIR)/lib/intel64/libmkl_blacs_openmpi_lp64.a \
-Wl,--end-group -lpthread -lm -ldl
FFTWINCLUDE = $(MKLDIR)/include/fftw
LAPACKLIB = $(FFTWLIB)
HDF5_LDIR = /data/apps/linux-centos8-cascadelake/gcc-9.3.0/hdf5-1.10.7-moicnskm5ddwfkxskropvpedzkegilkk/lib
HDF5LIB = $(HDF5_LDIR)/libhdf5hl_fortran.a \
$(HDF5_LDIR)/libhdf5_hl.a \
$(HDF5_LDIR)/libhdf5_fortran.a \
$(HDF5_LDIR)/libhdf5.a -lz -ldl
HDF5INCLUDE = /data/apps/linux-centos8-cascadelake/gcc-9.3.0/hdf5-1.10.7-moicnskm5ddwfkxskropvpedzkegilkk/include
TESTSCRIPT = sbatch rockfish.scr
Best regards,
Shiyuan Gao