BerkelyGW in Piz daint

164 views
Skip to first unread message

Andres Ortega

unread,
Jan 23, 2024, 2:38:25 PM1/23/24
to BerkeleyGW Help
Dear Developers and community, 

I am trying to compile BerkelyGW in piz daint (https://www.cscs.ch/computers/piz-daint)
 like this 

#! /bin/sh

module load daint-gpu
module load libxc/5.1.7-CrayNvidia-21.09
module load cray-hdf5/1.12.0.4
module load cray-netcdf/4.7.4.4
module load nvhpc/22.2
module load cray-fftw/3.3.8.10

make
make all-flavors

I am trying this arch.mk 

# arch.mk for BerkeleyGW codes

# suitable for piz-daint

#
COMPFLAG  = -DGNU
PARAFLAG  = -DMPI  -DOMP
MATHFLAG  = -DUSESCALAPACK -DUNPACKED -DUSEFFTW3 -DHDF5 -DOPENACC
#
FCPP    = cpp -C -nostdinc
F90free = ftn -Mfree -acc -mp -gpu=cc70 -Mcuda=cuda11.0 -Mcudalib=cublas,cufft -Mcuda=lineinfo -traceback -Minfo=mp
LINK    = ftn        -acc -mp -gpu=cc70 -Mcuda=cuda11.0 -Mcudalib=cublas,cufft -Minfo=mp
FOPTS   = -O3 -funroll-loops -funsafe-math-optimizations
FNOOPTS = -O3 -funroll-loops -funsafe-math-optimizations
MOD_OPT = -J
INCFLAG = -I

#

C_PARAFLAG  = -DPARA -DMPICH_IGNORE_CXX_SEEK
CC_COMP = CC
C_COMP  = cc
C_LINK  = CC
C_OPTS  = -fast -mp
C_DEBUGFLAG =

FFTW_DIR = ${FFTW_ROOT}
FFTWLIB = ${FFTW_DIR} -lfftw3_omp -lfftw3
FFTWINCLUDE = ${FFTW_INC}


#
LAPACKLIB    = -L${CRAY_LIBSCI_DIR}/GNU/8.1/x86_64/lib/ -lsci_gnu_mp -lsci_gnu -lm
SCALAPACKLIB = -L${CRAY_LIBSCI_DIR}/GNU/8.1/x86_64/lib/ -libsci_gnu_mpi_mp -lsci_gnu -lm"


HDF5LIB   = -L${CRAY_HDF5_PARALLEL_DIR}/GNU/8.2/lib/ -libhdf5_hl_gnu -libhdf5hl_fortran_gnu -libhdf5_fortran_gnu -libhdf5_gnu -lz
HDF5INCLUDE  = $CRAY_HDF5_PARALLEL_DIR/GNU/8.2/include

with no success, 

I was wondering if someone could give me an advice and or suggestion 

best

Andres Ortega-Guerrero

Mauro Del Ben

unread,
Jan 23, 2024, 6:11:21 PM1/23/24
to Andres Ortega, BerkeleyGW Help
Hi Andres,

The arch.mk looks about right, you want to use -DPGI if this is BerkeleyGW-3.x (legacy for PGI compiler will be -DNVHPC in version 4.0). 
nvhpc/22.2 should provide scalapack, lapack and blas libraries so in principle you don't need to link libsci.
It would be useful if you can share the error message too.

Best

-M


--
You received this message because you are subscribed to the Google Groups "BerkeleyGW Help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to help+uns...@berkeleygw.org.
To view this discussion on the web visit https://groups.google.com/a/berkeleygw.org/d/msgid/help/3336c5b4-00aa-4dcb-9f32-da84c65cee20n%40berkeleygw.org.

Andres Ortega

unread,
Jan 24, 2024, 1:58:39 AM1/24/24
to BerkeleyGW Help, mde...@lbl.gov, BerkeleyGW Help

Dear Mauro, 

Thank you for your email, 

I change -DGNU to -DGPI , and also from the bash script i removed libsci, 
Please find here the output i get 


Best 
Andres 
output.txt

Andres Ortega

unread,
Jan 24, 2024, 2:02:58 AM1/24/24
to BerkeleyGW Help, mde...@lbl.gov, BerkeleyGW Help
Dear Mauro , 

This was the error when using -DGNU

*************************     Building REAL flavor    **************************

make[4]: Leaving directory '/project/s1141/aortega/BerkeleyGW-3.1.0'
make[4]: Entering directory '/project/s1141/aortega/BerkeleyGW-3.1.0'
make[4]: Common/nrtype_m.mod: Command not found
make[4]: *** [Common/common-rules.mk:280: Common/nrtype_m.mod] Error 127
make[4]: Leaving directory '/project/s1141/aortega/BerkeleyGW-3.1.0'
make[3]: *** [Makefile:8: pre] Error 2
make[3]: Leaving directory '/project/s1141/aortega/BerkeleyGW-3.1.0'
make[2]: *** [Makefile:115: all] Error 2
make[2]: Leaving directory '/project/s1141/aortega/BerkeleyGW-3.1.0'
make[1]: *** [Makefile:126: real] Error 2
make[1]: Leaving directory '/project/s1141/aortega/BerkeleyGW-3.1.0'
make: *** [Makefile:119: all-flavors] Error 2

Mauro Del Ben

unread,
Jan 25, 2024, 6:46:57 PM1/25/24
to Andres Ortega, BerkeleyGW Help
Hi Andres,

Get rid of " -funroll-loops -funsafe-math-optimizations" flags and use: MOD_OPT = -module instead of MOD_OPT = -J

Best

-M

Beatriz Bueno Mouriño

unread,
May 6, 2024, 4:48:41 AM5/6/24
to BerkeleyGW Help, Mauro Del Ben, BerkeleyGW Help, Andres Ortega
Hello,

I have been trying to compile on Piz Daint/CSCS (GPU). 
Although I have managed to compile without errors, the tests in testsuite seem to fail whenever the epsilon code is called.
I opened a ticket in the CSCS service desk, and have been requested to kindly ask here what would be the suggested way to compile BerkeleyGW on Pascal/sm60.
Whenever I try to set -mp=gpu, the code complains that it needs an architecture >= cc70. Could there be an incompatibility?

Here are the latest configurations I have tried:

loading modules:
"
#!/usr/bin/bash
module load daint-gpu craype-accel-nvidia60 && \
    module switch PrgEnv-cray PrgEnv-nvidia && \
    module load cray-fftw \
    cray-hdf5-parallel \
    cray-python && \
    module unload cray-libsci cray-libsci_acc # these are incompatible with PrgEnv-nvidia
module load intel-oneapi/2022.1.0

export CRAYPE_LINK_TYPE=dynamic
#export CRAY_NVIDIA_PREFIX=/opt/nvidia/hpc_sdk/Linux_x86_64/22.2
#export LD_LIBRARY_PATH=${CRAY_NVIDIA_PREFIX}/comm_libs/mpi/lib:${LD_LIBRARY_PATH}
#export /opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/lib

echo "Target accelerator: ${CRAY_ACCEL_TARGET}"

printenv | grep -E '(CRAY|CUDA|NVIDIA)'
"

# arck.mk for Piz Daint GPU
#
# Source the companion 'modules-gpu.sh' file before compiling!
#
# WARNING: this file has NOT been tested on Piz Daint
#          it's likely that the compilation flags need to be adjusted
#          in particular the GPU architecture (-gpu option)
#          and some other flags (-mp option)
#
COMPFLAG  = -DPGI
PARAFLAG  = -DMPI -DOMP
#!
# Not sure what -DOMP_TARGET does, it's undocumented
# On P100 GPUs, we cannot offload OpenMP directives, so I'm removing that flag

MATHFLAG  = -DUSESCALAPACK -DUNPACKED -DUSEFFTW3 -DHDF5 -DOPENACC

NVCC=nvcc
NVCCOPT= -O3 -use_fast_math
CUDALIB= -lcufft -lcublasLt -lcublas -lcudart -lcuda

FCPP    = /usr/bin/cpp  -C   -nostdinc
#!
# The following two lines must be adjusted
F90free = ftn -Mfree -acc -mp=multicore -gpu=cc60 -Mcudalib=cublas,cufft -traceback -gopt
LINK    = ftn -acc -mp=multicore -gpu=cc60 -Mcudalib=cublas,cufft ${CRAY_CUDATOOLKIT_POST_LINK_OPTS} # whenever I try -mpu=gpu I get an error that it is incompatible with cc60 and I should have cc70 or higher
#!
FOPTS   = -fast -Mfree -Mlarge_arrays
FNOOPTS = $(FOPTS)
MOD_OPT = -module
INCFLAG = -I


C_PARAFLAG  = -DPARA -DMPICH_IGNORE_CXX_SEEK
CC_COMP = CC
C_COMP  = cc
C_LINK  = cc $(CUDALIB) -lstdc++

C_OPTS  = -fast -mp
C_DEBUGFLAG =

REMOVE  = /bin/rm -f

FFTWLIB      = $(FFTW_DIR)/libfftw3.so \
               $(FFTW_DIR)/libfftw3_threads.so \
               $(FFTW_DIR)/libfftw3_omp.so \
               $(CUDALIB) -lstdc++
FFTWINCLUDE  = $(FFTW_INC)
PERFORMANCE  =

SCALAPACKLIB =   -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl
LAPACKLIB    =   -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl
HDF5_LDIR    =  ${HDF5_DIR}/lib/
HDF5LIB      =  $(HDF5_LDIR)/libhdf5hl_fortran.so \
                $(HDF5_LDIR)/libhdf5_hl.so \
                $(HDF5_LDIR)/libhdf5_fortran.so \
                $(HDF5_LDIR)/libhdf5.so -lz -ldl
HDF5INCLUDE  = ${HDF5_DIR}/include/
"

testsuite output (only for .real flavor):

"
...

Using input file : ./Benzene-SAPO/epsilon.inp

Starting test run ...
Executing: cd /scratch/snx3000/simonpi/tmp/BGW.iqMo9T;  /usr/bin/srun -n 4   /scratch/snx3000/simonpi/berkeleygw/BerkeleyGW-4.0/testsuite/../bin/epsilon.real.x  > eps.out
srun: error: nid02072: task 3: Floating point exception (core dumped)
srun: launch/slurm: _step_signal: Terminating StepId=53114648.4
srun: error: nid02071: task 2: Floating point exception (core dumped)
srun: error: nid02069: task 0: Floating point exception (core dumped)
srun: error: nid02070: task 1: Floating point exception (core dumped)
Elapsed time:      5.6 s



Test run failed with exit code 34816.

 Execution                              :  [  FAIL  ]


Skipping subsequent steps due to nonzero exit code.

...

    Passed:  4 / 31
    Skipped: 21 / 31
    Failed:  6 / 31

    testfile                                          # failed testcases
    --------------------------------------------------------------------
    Benzene-SAPO/Benzene.test                         1
    Graphene/Graphene.test                            1
    Graphene/Graphene_3D.test                         1
    Si-EPM/Si.test                                    1
    Si-EPM/Si_hdf5.test                               1
    Si-EPM_subspace/Si_subspace.test                  1


Total run-time of the testsuite: 00:02:13

make: *** [Makefile:38: check-parallel] Error 6
"

Thank you in advance for your help.

Best,
Beatriz

Mauro Del Ben

unread,
May 6, 2024, 4:39:45 PM5/6/24
to Beatriz Bueno Mouriño, BerkeleyGW Help, Andres Ortega
Hi Beatriz,

The "Floating point exception" usually happens when you compile with GPU support (as you are doing at least with OpenACC) and then run the testsuite on a non-GPU node, make sure that you run the testsuite using an interactive section and the correct running command (srun, mpirum, mpiexe, etc...) is correctly invoked (check in testsuite folder for script examples: *.scr).

Note that I never tested on Pascal/sm60 GPUs, but as long as the latest NVHPC is available and tested all should be fine. 

You can also try to build the CPU version and check that the version is running fine. 

Best

-M

Reply all
Reply to author
Forward
0 new messages