Hello,
I have been trying to compile on Piz Daint/CSCS (GPU).
Although I have managed to compile without errors, the tests in testsuite seem to fail whenever the epsilon code is called.
I opened a ticket in the CSCS service desk, and have been requested to kindly ask here what would be the suggested way to compile BerkeleyGW on Pascal/sm60.
Whenever I try to set -mp=gpu, the code complains that it needs an architecture >= cc70. Could there be an incompatibility?
Here are the latest configurations I have tried:
loading modules:
"
#!/usr/bin/bash
module load daint-gpu craype-accel-nvidia60 && \
module switch PrgEnv-cray PrgEnv-nvidia && \
module load cray-fftw \
cray-hdf5-parallel \
cray-python && \
module unload cray-libsci cray-libsci_acc # these are incompatible with PrgEnv-nvidia
module load intel-oneapi/2022.1.0
export CRAYPE_LINK_TYPE=dynamic
#export CRAY_NVIDIA_PREFIX=/opt/nvidia/hpc_sdk/Linux_x86_64/22.2
#export LD_LIBRARY_PATH=${CRAY_NVIDIA_PREFIX}/comm_libs/mpi/lib:${LD_LIBRARY_PATH}
#export /opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/lib
echo "Target accelerator: ${CRAY_ACCEL_TARGET}"
printenv | grep -E '(CRAY|CUDA|NVIDIA)'
"
#
arck.mk for Piz Daint GPU
#
# Source the companion 'modules-gpu.sh' file before compiling!
#
# WARNING: this file has NOT been tested on Piz Daint
# it's likely that the compilation flags need to be adjusted
# in particular the GPU architecture (-gpu option)
# and some other flags (-mp option)
#
COMPFLAG = -DPGI
PARAFLAG = -DMPI -DOMP
#!
# Not sure what -DOMP_TARGET does, it's undocumented
# On P100 GPUs, we cannot offload OpenMP directives, so I'm removing that flag
MATHFLAG = -DUSESCALAPACK -DUNPACKED -DUSEFFTW3 -DHDF5 -DOPENACC
NVCC=nvcc
NVCCOPT= -O3 -use_fast_math
CUDALIB= -lcufft -lcublasLt -lcublas -lcudart -lcuda
FCPP = /usr/bin/cpp -C -nostdinc
#!
# The following two lines must be adjusted
F90free = ftn -Mfree -acc -mp=multicore -gpu=cc60 -Mcudalib=cublas,cufft -traceback -gopt
LINK = ftn -acc -mp=multicore -gpu=cc60 -Mcudalib=cublas,cufft ${CRAY_CUDATOOLKIT_POST_LINK_OPTS} # whenever I try -mpu=gpu I get an error that it is incompatible with cc60 and I should have cc70 or higher
#!
FOPTS = -fast -Mfree -Mlarge_arrays
FNOOPTS = $(FOPTS)
MOD_OPT = -module
INCFLAG = -I
C_PARAFLAG = -DPARA -DMPICH_IGNORE_CXX_SEEK
CC_COMP = CC
C_COMP = cc
C_LINK = cc $(CUDALIB) -lstdc++
C_OPTS = -fast -mp
C_DEBUGFLAG =
REMOVE = /bin/rm -f
FFTWLIB = $(FFTW_DIR)/libfftw3.so \
$(FFTW_DIR)/libfftw3_threads.so \
$(FFTW_DIR)/libfftw3_omp.so \
$(CUDALIB) -lstdc++
FFTWINCLUDE = $(FFTW_INC)
PERFORMANCE =
SCALAPACKLIB = -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl
LAPACKLIB = -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl
HDF5_LDIR = ${HDF5_DIR}/lib/
HDF5LIB = $(HDF5_LDIR)/libhdf5hl_fortran.so \
$(HDF5_LDIR)/libhdf5_hl.so \
$(HDF5_LDIR)/libhdf5_fortran.so \
$(HDF5_LDIR)/libhdf5.so -lz -ldl
HDF5INCLUDE = ${HDF5_DIR}/include/
"
testsuite output (only for .real flavor):
"
...
Using input file : ./Benzene-SAPO/epsilon.inp
Starting test run ...
Executing: cd /scratch/snx3000/simonpi/tmp/BGW.iqMo9T; /usr/bin/srun -n 4 /scratch/snx3000/simonpi/berkeleygw/BerkeleyGW-4.0/testsuite/../bin/epsilon.real.x > eps.out
srun: error: nid02072: task 3: Floating point exception (core dumped)
srun: launch/slurm: _step_signal: Terminating StepId=53114648.4
srun: error: nid02071: task 2: Floating point exception (core dumped)
srun: error: nid02069: task 0: Floating point exception (core dumped)
srun: error: nid02070: task 1: Floating point exception (core dumped)
Elapsed time: 5.6 s
Test run failed with exit code 34816.
Execution : [ FAIL ]
Skipping subsequent steps due to nonzero exit code.
...
Passed: 4 / 31
Skipped: 21 / 31
Failed: 6 / 31
testfile # failed testcases
--------------------------------------------------------------------
Benzene-SAPO/Benzene.test 1
Graphene/Graphene.test 1
Graphene/Graphene_3D.test 1
Si-EPM/Si.test 1
Si-EPM/Si_hdf5.test 1
Si-EPM_subspace/Si_subspace.test 1
Total run-time of the testsuite: 00:02:13
make: *** [Makefile:38: check-parallel] Error 6
"
Thank you in advance for your help.
Best,
Beatriz