absorption calculation memory problem

210 views
Skip to first unread message

elham oleiki

unread,
May 11, 2022, 10:08:00 PM5/11/22
to BerkeleyGW Help
Hello everybody!

I have been doing absorption calculation on a 22 22 22 fine grid for a bulk system with 20 atoms in the unit cell but the calculation failed at "interpolating BSE kernel" stage because of memory problem. I tried different numbers of cores and memory (according to what is printed in absorption.out file) to run the job but failed every time. Would someone please inform me how should I estimate the sufficient memory amount needed to finish the job?


Best Regards,
Elham

Mauro Del Ben

unread,
May 12, 2022, 4:01:50 PM5/12/22
to elham oleiki, BerkeleyGW Help
Hi Elham,

A rough estimation of the memory (in Mb) that you need per MPI task in absorption is:

4 * (Nk_fi * Nv_fi * Nc_fi)^2 * 16 / N_MPI / 1000 / 1000 

It's usually good if you use a number of MPI tasks (N_MPI) such that Nk_fi * Nv_fi * Nc_fi is a multiple of N_MPI.
If you have a memory problem it might also be good to try to increase the ratio OpenMP threads vs MPI task, which would give you more memory available per task.

Best

-M


--
You received this message because you are subscribed to the Google Groups "BerkeleyGW Help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help+uns...@berkeleygw.org.
To view this discussion on the web visit https://groups.google.com/a/berkeleygw.org/d/msgid/help/5e382e75-9de1-42c8-88dc-d57c494be08en%40berkeleygw.org.

STANLEY

unread,
May 13, 2022, 12:42:47 AM5/13/22
to BerkeleyGW Help, Mauro Del Ben, BerkeleyGW Help, elham oleiki
elham oleiki please do what Mauro Del Ben, who is an expert in this area has advised and your system will run.

Shiyuan Gao

unread,
Jun 15, 2022, 3:00:05 PM6/15/22
to BerkeleyGW Help, STANLEY, Mauro Del Ben, BerkeleyGW Help, elham oleiki
If I may follow up on this, I'm having a problem where absorption calculation (specifically diagonalization) is using much more memory than expected. 
For example I'm running a 2D system with 120*16*1 fine grid and 2 conduction and 2 valence band on 2 nodes, 96 MPI tasks, and the memory stats given by BGW is 

 Memory available: 3628.9 MB per PE
 Memory required for vcoul: 114.4 MB per PE
 Memory needed to store the effective Ham. and intkernel arrays: 10.8 MB per PE
 Additional memory needed for evecs and diagonalization: 41.6 MB per PE

The 41.6MB value is consistent with the formula posted by Mauro Del Ben. 
However, the program would die on the diagonalization step with the error message "Program received signal SIGSEGV: Segmentation fault - invalid memory reference."
It would run successfully with 144 MPI tasks instead. I tried with multiple Nk, Nv, Nc configurations and the threshold for this error seem to be ~100 times the expected memory requirement for diagonalization. 

Have anyone encountered similar problem or know what may be the cause?

My arch.mk is the following and all the other steps of BGW seem perfectly fine.

COMPFLAG  = -DGNU
PARAFLAG  = -DMPI
MATHFLAG  = -DUSESCALAPACK -DUNPACKED -DUSEFFTW3 -DHDF5

FCPP    = cpp -C -nostdinc
F90free = mpif90 -ffree-form -ffree-line-length-none -fno-second-underscore
LINK    = mpif90 -ldl
FOPTS   = -O3
FNOOPTS = $(FOPTS)
MOD_OPT = -J
INCFLAG = -I

C_PARAFLAG = -DPARA -DMPICH_IGNORE_CXX_SEEK
CC_COMP = mpicxx
C_COMP  = mpicc
C_LINK  = mpicxx
C_OPTS  = -O3
C_DEBUGFLAG =

REMOVE  = /bin/rm -f

# Math Libraries
MKLDIR = /cm/shared/apps/Intel/2020/compilers_and_libraries_2020.2.254/linux/mkl
FFTWLIB = $(MKLDIR)/lib/intel64/libmkl_scalapack_lp64.a \
               -Wl,--start-group \
               $(MKLDIR)/lib/intel64/libmkl_gf_lp64.a \
               $(MKLDIR)/lib/intel64/libmkl_core.a \
               $(MKLDIR)/lib/intel64/libmkl_sequential.a \
               $(MKLDIR)/lib/intel64/libmkl_blacs_openmpi_lp64.a \
               -Wl,--end-group -lpthread -lm -ldl
FFTWINCLUDE  = $(MKLDIR)/include/fftw

LAPACKLIB = $(FFTWLIB)

HDF5_LDIR    =  /data/apps/linux-centos8-cascadelake/gcc-9.3.0/hdf5-1.10.7-moicnskm5ddwfkxskropvpedzkegilkk/lib
HDF5LIB      =  $(HDF5_LDIR)/libhdf5hl_fortran.a \
                $(HDF5_LDIR)/libhdf5_hl.a \
                $(HDF5_LDIR)/libhdf5_fortran.a \
                $(HDF5_LDIR)/libhdf5.a -lz -ldl
HDF5INCLUDE  = /data/apps/linux-centos8-cascadelake/gcc-9.3.0/hdf5-1.10.7-moicnskm5ddwfkxskropvpedzkegilkk/include

TESTSCRIPT = sbatch rockfish.scr

Best regards,

Shiyuan Gao

Reply all
Reply to author
Forward
0 new messages