complied BerkeleyGW code failed to execute

189 views
Skip to first unread message

Fujie Tang

unread,
Jan 26, 2022, 5:31:05 PM1/26/22
to BerkeleyGW Help
Dear BerkeleyGW community,
I am trying to compile the BGW code in Cori with the public downloaded version 2.1 and 3.0.1. The reason I want to compile the BGW code instead of using the cori modules is that I want to add some features in the Plotxct module. 
I used the arch.mk from config/cori2.nersc.gov_intel.sharedlib.mk. The compiling process is successfully finished without error message. I used the commands from the beginning of "cori2.nersc.gov_intel.sharedlib.mk" to use the sharelib as follows "module swap craype-haswell craype-mic-knl && module unload darshan && module load cray-hdf5-parallel && module swap intel intel/19.0.0.117". 
However, when I run the calculation with compiled binary, e.g. absorption.cplx.x. It immediately died with the following error messages:
---------

srun: error: nid07105: task 44: Segmentation fault

srun: launch/slurm: _step_signal: Terminating StepId=53852305.0

srun: error: nid02310: task 6: Segmentation fault

srun: error: nid07106: task 45: Segmentation fault

srun: error: nid02318: task 14: Segmentation fault

srun: error: nid02325: task 17: Segmentation fault

srun: error: nid02312: task 8: Segmentation fault

....

_____

I suspects it comes from the changes of cori environment, does anyone compile the BGW code in cori recently? If so, could you please share some suggestions? 

Best wishes,

Fujie

Mauro Del Ben

unread,
Jan 27, 2022, 12:09:44 AM1/27/22
to Fujie Tang, BerkeleyGW Help, Phillip Thomas
Hi Fujie,

It seems you also opened a ticket at NERSC for this issue, if that's the case it's great, it will be easier to fix the problem. I did compile 3.0.1 on KNL and ran a couple of simple tests including absorption and all seem fine. So we need to figure out what is triggering the problem in your calculation. Can you share the input files? 

Best

-M


--
You received this message because you are subscribed to the Google Groups "BerkeleyGW Help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help+uns...@berkeleygw.org.
To view this discussion on the web visit https://groups.google.com/a/berkeleygw.org/d/msgid/help/a3263509-a4fc-477d-b562-e7a942c073can%40berkeleygw.org.

Fujie Tang

unread,
Jan 27, 2022, 12:36:14 AM1/27/22
to BerkeleyGW Help, Mauro Del Ben, BerkeleyGW Help, Phillip Thomas, Fujie Tang
Hi Mauro,
Yes, I opened a ticket at NERSC for this issue and Phillip is kindly helping to figure it out. I also posted it here to seek a broader audience.
My input file for absorption is 
------

number_val_bands_fine 128

number_val_bands_coarse 128

number_cond_bands_fine 192

number_cond_bands_coarse 192

coarse_grid_points 1

screening_semiconductor

#cell_box_truncation

spin_singlet

use_velocity

eqp_co_corrections

gaussian_broadening

energy_resolution 0.4

#dont_use_hdf5_output

write_eigenvectors -1

#avgpot 0.717915

----------

and my bash file is:

#!/bin/bash -l

#SBATCH -J npt

#SBATCH -A m3538

#SBATCH --mail-user=fujie...@temple.edu

#SBATCH -q debug

#SBATCH --time-min=00:30:00 

#SBATCH -N 64

#SBATCH -t 00:30:00

#SBATCH -C knl

#SBATCH --mail-type=begin,end,fail

export OMP_NUM_THREADS=64

AB=/global/homes/f/fujie/software/BerkeleyGW-3.0.1/bin/absorption.cplx.x

srun -N 64 -n 64 -c 64  $AB  &> absorption.out

----

Thanks!

Fujie


Mauro Del Ben

unread,
Jan 27, 2022, 12:54:07 AM1/27/22
to Fujie Tang, BerkeleyGW Help, Phillip Thomas
Hi Fujie,

On cori you can use the command "give -u mdelben " followed by a list of files to send them to me, if you can share all the input files that would help.

Best

-M

Fujie Tang

unread,
Jan 27, 2022, 1:13:43 AM1/27/22
to BerkeleyGW Help, Mauro Del Ben, BerkeleyGW Help, Phillip Thomas, Fujie Tang
Hi Mauro,
Thank you very much for your reply!!!
I sent the files to you via cori. In this folder you can find the 

select1/2-bgw/G0W0/4-absorption-192-intergal-test. is my working folder.

I used 02-calculate_absorption.run to submit the job.

Best wishes,

Fujie


Mauro Del Ben

unread,
Jan 27, 2022, 6:58:45 PM1/27/22
to Fujie Tang, BerkeleyGW Help, Phillip Thomas
Hi Fuije,

I think the reason your absorption calculation crashes with v3.0.1 and runs fine with 2.1 is that you generated the bsemat.h5 with version 2.1 and then use absorption from version 3.0.1, some fields have been changed, and you need to upgrade your old bsemat.h5 to the new format. To do so use the python script as shown below:

module load python 
BerkeleyGW-3.0.1/bin/update_bse.py bsemat.h5 

Also, in version 3.0.1 we have ported all WFNs to h5 format, and I/O time goes down to couple of seconds for each WFN read, use the following input key in absorption.inp  "use_wfn_hdf5" and convert the WFN from binary to hdf5 format, use "BerkeleyGW-3.0.1/bin/wfn2hdf5.x BIN WFN WFN.h5".

In attachment the input, output and submission script I've used on Cori KNL.

Let me know if this works 

Best

-M

absorption.out
02-calculate_absorption.run
absorption.inp

Fujie Tang

unread,
Jan 27, 2022, 11:40:49 PM1/27/22
to BerkeleyGW Help, Mauro Del Ben, BerkeleyGW Help, Phillip Thomas, Fujie Tang
Hi Mauro,
Thank you very much for your reply! I am sorry for misleading you, I cannot run the v2.1 as well by using my binary:(
If possible, could you please share the arch.mk file and the modules before you compiled the code with me? 2.1 or 3.0.1 versions are both fine for me. Thanks
Fujie

Mauro Del Ben

unread,
Jan 27, 2022, 11:42:49 PM1/27/22
to Fujie Tang, BerkeleyGW Help, Phillip Thomas
Here you go:

# arch.mk for BerkeleyGW codes
#
# suitable for Cori (KNL) at NERSC
#
# MDB
# 2021, NERSC
#
# Run the following command before compiling:
# module swap craype-haswell craype-mic-knl && module unload darshan && module load cray-hdf5-parallel && export CRAYPE_LINK_TYPE=static
#
# Precompiler options

COMPFLAG  = -DINTEL
PARAFLAG  = -DMPI -DOMP
MATHFLAG  = -DUSESCALAPACK -DUNPACKED -DUSEFFTW3 -DHDF5 -DUSEMR3 # -DUSEELPA # -DUSEPRIMME
# Only uncomment DEBUGFLAG if you need to develop/debug BerkeleyGW.
# The output will be much more verbose, and the code will slow down by ~20%.
#DEBUGFLAG = -DDEBUG

FCPP    = /usr/bin/cpp -C -nostdinc
F90free = ftn -free -qopenmp
LINK    = ftn -qopenmp  
FOPTS   = -fast -no-ip -no-ipo -align array64byte
#FOPTS   = -fast -no-ip -no-ipo -align array64byte -traceback
# FOPTS   = -fast -no-ip -no-ipo -align array64byte -g -debug inline-debug-info -traceback -check all -ftrapuv -init=snan
FNOOPTS = $(FOPTS)
MOD_OPT = -module
INCFLAG = -I

C_PARAFLAG  = -DPARA -DMPICH_IGNORE_CXX_SEEK
CC_COMP = CC -qopenmp
C_COMP  = cc -qopenmp
C_LINK  = CC -qopenmp
C_OPTS  = -fast -no-ip -no-ipo -align #-g -traceback
C_DEBUGFLAG =

REMOVE  = /bin/rm -f

# Math Libraries
#
FFTWPATH     =
FFTWLIB      = $(MKLROOT)/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group $(MKLROOT)/lib/intel64/libmkl_intel_lp64.a $(MKLROOT)/lib/intel64/libmkl_core.a \
               $(MKLROOT)/lib/intel64/libmkl_intel_thread.a $(MKLROOT)/lib/intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -lpthread -lm -ldl

#FFTWLIB      =
FFTWINCLUDE  = $(MKLROOT)/include/fftw/

HDF5_LDIR    =  $(HDF5_DIR)/lib
# HDF5LIB      =  $(HDF5_LDIR)/libhdf5hl_fortran.a \
#                 $(HDF5_LDIR)/libhdf5_hl.a \
#                 $(HDF5_LDIR)/libhdf5_fortran.a \
#                 $(HDF5_LDIR)/libhdf5.a -lz -ldl  \
# -L/global/common/software/m1759/ipm/install/cori_intel_cray-mpich/lib -lipmf -lipm
HDF5LIB      = -L$(HDF5_LDIR)/ -lhdf5hl_fortran  -lhdf5_hl  -lhdf5_fortran  -lhdf5  -lz  -ldl #  -L/global/common/software/m1759/ipm/install/cori_intel_cray-mpich/lib -lipmf -lipm
HDF5INCLUDE  = $(HDF5_DIR)/include

PERFORMANCE  =

LAPACKLIB = $(FFTWLIB)

# ELPA_DIR = /project/projectdirs/nesap/BerkeleyGW/ELPA/CORI_ELPA2018_KNL_intel/
# ELPA_DIR = /global/cscratch1/sd/mdelben/N10_benchmarks/WORK/elpa-2018.11.001/
# ELPAINCLUDE = ${ELPA_DIR}/include/elpa_openmp-2018.11.001/modules/
# ELPALIB = ${ELPA_DIR}/lib/libelpa_openmp.a

# PRIMMELIB = /global/homes/m/mdelben/cori/PRIMME/primme-3.1.1/lib/libprimme.a

TESTSCRIPT = sbatch cori2.scr

Fujie Tang

unread,
Jan 28, 2022, 12:04:47 AM1/28/22
to BerkeleyGW Help, Mauro Del Ben, BerkeleyGW Help, Phillip Thomas, Fujie Tang
Hi Mauro,
Thanks, is this for BGW-3.0.1 or 2.1? the preloading modules are "module swap craype-haswell craype-mic-knl && module unload darshan && module load cray-hdf5-parallel && export CRAYPE_LINK_TYPE=static", I copied it from arch.mk you gave it to me, am I right? and it is for cplx or real?
I just tried with 3.0.1 and 2.1, they could not pass the compilation....
I am sorry for asking too much..
Best 
Fujie

Mauro Del Ben

unread,
Jan 28, 2022, 12:22:54 AM1/28/22
to Fujie Tang, BerkeleyGW Help, Phillip Thomas
Hi Fujie,

I tested it for 3.0.1 and yes, before compiling you have to load the modules: 
module swap craype-haswell craype-mic-knl && module unload darshan && module load cray-hdf5-parallel && export CRAYPE_LINK_TYPE=static

It should work for both cplx and real.
I just tried to rebuild from scratch and it compiles fine for me. Let me just attach the arch.mk

In case, send me the compilation error. 

Best

-M

arch.mk

Fujie Tang

unread,
Jan 28, 2022, 9:13:09 PM1/28/22
to BerkeleyGW Help, Mauro Del Ben, BerkeleyGW Help, Phillip Thomas, Fujie Tang
Hi Mauro,
Thanks! I finally compiled the codes with your arch.mk files for 2.1 and 3.0.1 files. and I also learned that the bsemat.h5 has different format for 2.1 and 3.0.1 version....It caused another crash..
Best wishes,
Fujie

在2022年1月27日星期四 UTC-5 23:42:49<Mauro Del Ben> 写道:
Reply all
Reply to author
Forward
0 new messages