Install issues with IBM Power9 processors with Nvidia V100 GPU

184 views
Skip to first unread message

Nathan Keilbart

unread,
Mar 29, 2023, 9:12:26 PM3/29/23
to cp2k
Hello everyone,

I've been working on installing CP2K on a system with IBM Power9 processors and Nvidia V100 GPUs. I'm using the toolchain with these options:

./install_cp2k_toolchain.sh -j --with-cmake=system --mpi-mode=openmpi --enable-cuda --gpu-ver=V100

It installs all the dependencies without any errors so that I copy over the files to the arch folder and then source the setup file followed by

make -j ARCH=local_cuda VERSION=psmp

The following is some of the last lines of output

/usr/bin/env python3 /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/tools/build_utils/fypp/bin/fypp -n --line-marker-format=gfortran5 /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src/tensors/dbcsr_tensor_test.F dbcsr_tensor_test.F90
c -fno-omit-frame-pointer -fopenmp -g -mtune=native  -O3 -funroll-loops    -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/openblas-0.3.21/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/fftw-3.3.10/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libxc-6.0.0/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/COSMA-2.6.2/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/elpa-2022.11.001/nvidia/include/elpa_openmp-2022.11.001/modules' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/elpa-2022.11.001/nvidia/include/elpa_openmp-2022.11.001/elpa' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/gsl-2.7/include' -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/hdf5-1.12.0/include -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libvdwxc-0.4.0/include -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/spglib-1.16.2/include -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/SpFFT-1.0.6/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/SpLA-1.5.4/include/spla' -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/sirius-7.3.2/include/cuda -fbacktrace -ffree-form -fimplicit-none -std=f2008  -Werror=aliasing -Werror=ampersand -Werror=c-binding-type -Werror=intrinsic-shadow -Werror=intrinsics-std -Werror=line-truncation -Werror=tabs -Werror=target-lifetime -Werror=underflow -Werror=unused-but-set-variable -Werror=unused-variable -Werror=unused-dummy-argument -Werror=conversion -Werror=zerotrip -Wno-maybe-uninitialized -Wuninitialized -Wuse-without-only  -D__OFFLOAD_CUDA -D__DBCSR_ACC   -D__FFTW3  -D__LIBINT -D__LIBXC -D__SCALAPACK -D__COSMA -D__ELPA -D__ELPA_NVIDIA_GPU -D__GSL -D__HDF5 -D__LIBVDWXC -D__SPGLIB -D__LIBVORI -D__SPFFT  -D__OFFLOAD_GEMM  -D__SPLA -D__SIRIUS    -D__CUDA -D__SHORT_FILE__="\"dbcsr_tensor_test.F\"" -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src/tensors/' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src' dbcsr_tensor_test.F90
/bin/sh: c: command not found
make[4]: [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/build_dbcsr//Makefile:258: dbcsr_tensor_test.o] Error 127 (ignored)
/usr/bin/env python3 /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/tools/build_utils/fypp/bin/fypp -n --line-marker-format=gfortran5 /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src/tensors/dbcsr_tensor_api.F dbcsr_tensor_api.F90
c -fno-omit-frame-pointer -fopenmp -g -mtune=native  -O3 -funroll-loops    -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/openblas-0.3.21/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/fftw-3.3.10/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libint-v2.6.0-cp2k-lmax-5/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libxc-6.0.0/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/COSMA-2.6.2/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/elpa-2022.11.001/nvidia/include/elpa_openmp-2022.11.001/modules' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/elpa-2022.11.001/nvidia/include/elpa_openmp-2022.11.001/elpa' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/gsl-2.7/include' -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/hdf5-1.12.0/include -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/libvdwxc-0.4.0/include -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/spglib-1.16.2/include -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/SpFFT-1.0.6/include' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/SpLA-1.5.4/include/spla' -I/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/install/sirius-7.3.2/include/cuda -fbacktrace -ffree-form -fimplicit-none -std=f2008  -Werror=aliasing -Werror=ampersand -Werror=c-binding-type -Werror=intrinsic-shadow -Werror=intrinsics-std -Werror=line-truncation -Werror=tabs -Werror=target-lifetime -Werror=underflow -Werror=unused-but-set-variable -Werror=unused-variable -Werror=unused-dummy-argument -Werror=conversion -Werror=zerotrip -Wno-maybe-uninitialized -Wuninitialized -Wuse-without-only  -D__OFFLOAD_CUDA -D__DBCSR_ACC   -D__FFTW3  -D__LIBINT -D__LIBXC -D__SCALAPACK -D__COSMA -D__ELPA -D__ELPA_NVIDIA_GPU -D__GSL -D__HDF5 -D__LIBVDWXC -D__SPGLIB -D__LIBVORI -D__SPFFT  -D__OFFLOAD_GEMM  -D__SPLA -D__SIRIUS    -D__CUDA -D__SHORT_FILE__="\"dbcsr_tensor_api.F\"" -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src/tensors/' -I'/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/dbcsr/src' dbcsr_tensor_api.F90
/bin/sh: c: command not found
make[4]: [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/build_dbcsr//Makefile:258: dbcsr_tensor_api.o] Error 127 (ignored)
Updating archive /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/lib/local_cuda/psmp/exts/dbcsr/libdbcsr.a
ar: creating /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/lib/local_cuda/psmp/exts/dbcsr/libdbcsr.a
ar: dbcsr_cuda_profiling.o: No such file or directory
make[4]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/build_dbcsr//Makefile:330: /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/lib/local_cuda/psmp/exts/dbcsr/libdbcsr.a] Error 1
make[3]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/build_dbcsr/Makefile:179: libdbcsr] Error 2
make[2]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/exts/Makefile.inc:38: dbcsr] Error 2
make[1]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/Makefile:128: psmp] Error 2
make: *** [Makefile:123: all] Error 2

It seems that it is having issues with the DBCSR module. I initially had an issue with this because I seemed to have left off the --recursive option and after making sure my git clone had that it at least let me build most of the serial version. It at least gave me the cp2k.sopt binary and it seems to at least take inputs. I didn't have a chance to test it too much yet. When I got this binary I had done

make -j ARCH=local_cuda VERSION="ssmp sdbg psmp pdbg"

as suggested.

Also, I've attempted to install with spack by using

spack install cp2k@2023.1+cosma+cuda+elpa+libint+libxc+mpi+openmp+pexsi+plumed+sirius+spglib smm=blas cuda_arch=70

These are some of the last lines of output

 >> 4028    collect2: error: ld returned 1 exit status
  >> 4029    collect2: error: ld returned 1 exit status
  >> 4030    make[3]: *** [/tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/obj/linux-rhel7-power9le-gcc/psmp/
             all.dep:178: /tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/exe/linux-rhel7-power9le-gcc/cp2k.p
             smp] Error 1
     4031    make[3]: *** Waiting for unfinished jobs....
  >> 4032    make[3]: *** [/tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/obj/linux-rhel7-power9le-gcc/psmp/
             all.dep:194: /tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/exe/linux-rhel7-power9le-gcc/libcp2
             k_unittest.psmp] Error 1
  >> 4033    make[2]: *** [/tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/Makefile:146: all] Error 2
  >> 4034    make[1]: *** [/tmp/keilbart/spack-stage/spack-stage-cp2k-2023.1-24dhoyt24tbnn4d423glgoeqqquibmb6/spack-src/Makefile:128: psmp] Error 2
  >> 4035    make: *** [Makefile:123: all] Error 2

Finally, I also have some intel machines that I'm attempting to build on and having issues as well but we can start with the IBM machine as we're hoping to accelerate the simulations with the GPU.

Please let me know what other information I can provide. Thank you.

Nathan

Alfio Lazzaro

unread,
Mar 30, 2023, 3:22:43 AM3/30/23
to cp2k
There is no relation with the DBCSR compilation itself, you see a problem in DBCSR simply because it is the first to compile in CP2K.
The error message is:

/bin/sh: c: command not found

and indeed you are using the command

c -fno-omit-frame-pointer -fopenmp -g -mtune=native  -O3 -funroll-loops    ...

for compiling, therefore there is something wrong in the compiler call.
I think the problem is that the local_cuda.psmp file has something wrong in the definition of the compilers, namely the lines

CC             := mpicc
FC             := mpif90
LD             := mpif90
AR             := ar -r

could you check if they are linking to the rights commands?

Nathan Keilbart

unread,
Mar 30, 2023, 6:08:29 PM3/30/23
to cp2k
Thank Alfio. I wasn't sure what file was controlling that. I updated the file to have those compilers and then did a make realclean. Afterwards, I am now getting this error:

/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:192:19:

             gcd_max = -1
                   1
Error: Symbol 'gcd_max' at (1) has no IMPLICIT type
/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:193:18:

             DO ipe = 1, CEILING(SQRT(REAL(npe, dp)))
                  1
Error: Symbol 'ipe' at (1) has no IMPLICIT type
/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:194:18:

                jpe = npe/ipe
                  1
Error: Symbol 'jpe' at (1) has no IMPLICIT type
/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:185:29:

          my_blacs_grid_layout = BLACS_GRID_SQUARE
                             1
Error: Symbol 'my_blacs_grid_layout' at (1) has no IMPLICIT type; did you mean 'blacs_grid_layout'?
/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:221:25:

       my_blacs_repeatable = .FALSE.
                         1
Error: Symbol 'my_blacs_repeatable' at (1) has no IMPLICIT type; did you mean 'blacs_repeatable'?
/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:213:18:

       my_row_major = .TRUE.
                  1
Error: Symbol 'my_row_major' at (1) has no IMPLICIT type; did you mean 'row_major'?
/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:174:11:

       npcol = 1
           1
Error: Symbol 'npcol' at (1) has no IMPLICIT type; did you mean 'ipcol'?
/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:175:9:

       npe = blacs_env%n_pid
         1
Error: Symbol 'npe' at (1) has no IMPLICIT type
/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:173:11:

       nprow = 1
           1
Error: Symbol 'nprow' at (1) has no IMPLICIT type; did you mean 'iprow'?
/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/src/fm/cp_blacs_env.F:188:22:

          SELECT CASE (my_blacs_grid_layout)
                      1
Error: Argument of SELECT statement at (1) cannot be UNKNOWN
make[3]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/Makefile:519: cp_blacs_env.o] Error 1
make[2]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/Makefile:146: all] Error 2

make[1]: *** [/usr/gapps/qsg/codes/cp2k/lassen/v2023.1/Makefile:128: psmp] Error 2
make: *** [Makefile:123: all] Error 2

Alfio Lazzaro

unread,
Mar 31, 2023, 12:35:52 AM3/31/23
to cp2k
There is still something wrong in your local_cuda.psmp file.
In your output above I cannot find the flag `-D__parallel` . Isee only the followings:

-D__OFFLOAD_CUDA -D__DBCSR_ACC   -D__FFTW3  -D__LIBINT -D__LIBXC -D__SCALAPACK -D__COSMA -D__ELPA -D__ELPA_NVIDIA_GPU -D__GSL -D__HDF5 -D__LIBVDWXC -D__SPGLIB -D__LIBVORI -D__SPFFT  -D__OFFLOAD_GEMM  -D__SPLA -D__SIRIUS    -D__CUDA

So my guess is that the toolchain was not able to recognize MPI (no idea why). Could you add -D__parallel on top of those flags?

Nathan Keilbart

unread,
Apr 7, 2023, 7:26:22 PM4/7/23
to cp2k
Thanks Alfio. Sorry for my late reply. It seems something in my environment was keeping that from being detected correctly. My scripts now detect everything correctly and after finding certain libraries that wouldn't build I was finally able to get a working binary. One strange issue is that the -ldl flag was needed when compiling the parallel binary. Not sure if this is normally detected but for my system and inputs I was providing it didn't do it so I simply added it to the arch files.

Initially, I was getting a cuda memory issue when running my test system of 300 atoms on one node with four GPUs but I have since resubmitted the job several times and it appears to be working. I'm not sure if I was just getting a bad node or something.

As I mentioned, I had to disable quite a few libraries. They install just fine according to the terminal but when I go to compile the binaries it causes them to misbehave and crash before even doing the initial SCF loop. Here are the flags I used.

./install_cp2k_toolchain.sh --install-all --with-cmake=system --with-openmpi=system --with-gcc=system --with-quip=no --with-libtorch=no --with-plumed=no --with-cosma=no --with-sirius=no --enable-cuda --gpu-ver=V100

In your opinion, would I get any more of a speed up by debugging this issue? I'm primarily concerned with the cosma and sirius libraries. Once again, thank you for your help. I'm working on an intel system and have a working binary but might have some questions as I'm seeing very poor scaling when I use multiple nodes.

Alfio Lazzaro

unread,
Apr 8, 2023, 1:42:29 PM4/8/23
to cp2k
I'm not sure what it can be wrong...
I suggest to compile COSMA outside the toolchain with two steps: only CPU and test it, then if it works move to GPU compilation. 
What's the error you get with COSMA?

I'm surprised you get an error with Sirius, unless you specifically use it if should give any error...

Nathan Keilbart

unread,
Apr 11, 2023, 2:13:47 PM4/11/23
to cp2k
Seems my last post didn't go through. I will clarify in saying that I had to disable SIRIUS as it seems to hard code in the depedency of COSMA which enabled it everytime I was installing. It just seemed easier at that point to at least get a working binary.

I have recompiled with the SIRIUS and COSMA library enabled. Here is the output when I run the input.

error: GPU API call : unspecified launch failure
terminate called after throwing an instance of 'std::runtime_error'
  what():  GPU ERROR

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
error: GPU API call : unspecified launch failure
terminate called after throwing an instance of 'std::runtime_error'
  what():  GPU ERROR

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x20002885b34f in ???
#1  0x200028859c17 in ???
#2  0x2000000504d7 in ???
#0  0x20002885b34f in ???
#1  0x200028859c17 in ???
#2  0x2000000504d7 in ???
#3  0x200028cafcb0 in __GI_raise
        at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#3  0x200028cafcb0 in __GI_raise
        at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#4  0x200028cb200b in __GI_abort
        at /usr/src/debug/glibc-2.17-c758a686/stdlib/abort.c:90
#5  0x200011e3eda3 in ???
#6  0x200011e3b5d3 in ???
#7  0x200011e3b623 in ???
#8  0x200011e3baa7 in ???
#4  0x200028cb200b in __GI_abort
        at /usr/src/debug/glibc-2.17-c758a686/stdlib/abort.c:90
#5  0x200011e3eda3 in ???
#6  0x200011e3b5d3 in ???
#7  0x200011e3b623 in ???
#8  0x200011e3baa7 in ???
#9  0x13a41fdb in check_runtime_status
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/libs/Tiled-MM/src/Tiled-MM/util.hpp:17
#9  0x13a41fdb in check_runtime_status
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/libs/Tiled-MM/src/Tiled-MM/util.hpp:17
#10  0x13a45c6f in _ZN3gpu4gemmIdEEvRNS_9mm_handleIT_EEPS2_S5_S5_iiiS2_S2_bb
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:480
#10  0x13a45c6f in _ZN3gpu4gemmIdEEvRNS_9mm_handleIT_EEPS2_S5_S5_iiiS2_S2_bb
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/libs/Tiled-MM/src/Tiled-MM/tiled_mm.cpp:480
#11  0x13a01ccf in _ZN5cosma14local_multiplyIdEEvPN3gpu9mm_handleIT_EEPS3_S6_S6_iiiS3_S3_bb
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/local_multiply.cpp:98
#12  0x13a01dab in _ZN5cosma14local_multiplyIdEEvPNS_13cosma_contextIT_EEPS2_S5_S5_iiiS2_S2_b
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/local_multiply.cpp:168
#11  0x13a01ccf in _ZN5cosma14local_multiplyIdEEvPN3gpu9mm_handleIT_EEPS3_S6_S6_iiiS3_S3_bb
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/local_multiply.cpp:98
#12  0x13a01dab in _ZN5cosma14local_multiplyIdEEvPNS_13cosma_contextIT_EEPS2_S5_S5_iiiS2_S2_b
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/local_multiply.cpp:168
#13  0x139e4cd7 in _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:382
#14  0x139e468b in _ZN5cosma8parallelIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:868
#15  0x139e4ef3 in _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:409
#16  0x139e5197 in _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RKNS_8StrategyEP19ompi_communicator_tS2_S2_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:285
#17  0x139e5393 in _ZN5cosma8multiplyIdEEvRNS_11CosmaMatrixIT_EES4_S4_RKNS_8StrategyEP19ompi_communicator_tS2_S2_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:228
#13  0x139e4cd7 in _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:382
#14  0x139e468b in _ZN5cosma8parallelIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
#15  0x139e4ef3 in _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RNS_8IntervalES9_S9_S9_mRKNS_8StrategyEPNS_12communicatorES2_S2_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:409
#16  0x139e5197 in _ZN5cosma8multiplyIdEEvPNS_13cosma_contextIT_EERNS_11CosmaMatrixIS2_EES7_S7_RKNS_8StrategyEP19ompi_communicator_tS2_S2_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:285
#17  0x139e5393 in _ZN5cosma8multiplyIdEEvRNS_11CosmaMatrixIT_EES4_S4_RKNS_8StrategyEP19ompi_communicator_tS2_S2_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/multiply.cpp:228
#18  0x139b6613 in _ZN5cosma6pxgemmIdEEvcciiiT_PKS1_iiPKiS3_iiS5_S1_PS1_iiS5_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/cosma_pxgemm.cpp:350
#18  0x139b6613 in _ZN5cosma6pxgemmIdEEvcciiiT_PKS1_iiPKiS3_iiS5_S1_PS1_iiS5_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/cosma_pxgemm.cpp:350
#19  0x139aadd7 in cosma_pdgemm_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/prefixed_pxgemm.cpp:51
#19  0x139aadd7 in cosma_pdgemm_
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/prefixed_pxgemm.cpp:51
#20  0x139ab62b in cosma_pdgemm
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/prefixed_pxgemm.cpp:225
#20  0x139ab62b in cosma_pdgemm
        at /usr/gapps/qsg/codes/cp2k/lassen/v2023.1/tools/toolchain/build/cosma/src/cosma/prefixed_pxgemm.cpp:225
#21  0x10a5e92f in cosma_pdgemm
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/parallel_gemm_api.F:287
#22  0x10a5e92f in __parallel_gemm_api_MOD_parallel_gemm_fm
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/parallel_gemm_api.F:106
#21  0x10a5e92f in cosma_pdgemm
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/parallel_gemm_api.F:287
#22  0x10a5e92f in __parallel_gemm_api_MOD_parallel_gemm_fm
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/parallel_gemm_api.F:106
#23  0x10cc23c7 in __qs_mo_methods_MOD_make_basis_sm
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_mo_methods.F:116
#23  0x10cc23c7 in __qs_mo_methods_MOD_make_basis_sm
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_mo_methods.F:116
#24  0x11a72b37 in __qs_initial_guess_MOD_calculate_first_density_matrix
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_initial_guess.F:669
#24  0x11a72b37 in __qs_initial_guess_MOD_calculate_first_density_matrix
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_initial_guess.F:669
#25  0x10db7b5b in scf_env_initial_rho_setup
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:1107
#26  0x10db7b5b in init_scf_run
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:1003
#27  0x10dbac9b in __qs_scf_initialization_MOD_qs_scf_env_initialize
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:181
#25  0x10db7b5b in scf_env_initial_rho_setup
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:1107
#26  0x10db7b5b in init_scf_run
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:1003
#27  0x10dbac9b in __qs_scf_initialization_MOD_qs_scf_env_initialize
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf_initialization.F:181
#28  0x10daf233 in __qs_scf_MOD_scf
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf.F:232
#28  0x10daf233 in __qs_scf_MOD_scf
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_scf.F:232
#29  0x10b283c3 in __qs_energy_MOD_qs_energies
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_energy.F:111
#29  0x10b283c3 in __qs_energy_MOD_qs_energies
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_energy.F:111
#30  0x10b5fa43 in qs_forces
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_force.F:200
#31  0x10b602ff in __qs_force_MOD_qs_calc_energy_force
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_force.F:110
#30  0x10b5fa43 in qs_forces
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_force.F:200
#31  0x10b602ff in __qs_force_MOD_qs_calc_energy_force
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/qs_force.F:110
#32  0x1079c84b in __force_env_methods_MOD_force_env_calc_energy_force
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/force_env_methods.F:259
#32  0x1079c84b in __force_env_methods_MOD_force_env_calc_energy_force
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/force_env_methods.F:259
#33  0x102f5323 in qs_mol_dyn_low
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/motion/md_run.F:371
#34  0x102f648b in __md_run_MOD_qs_mol_dyn
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/motion/md_run.F:149
#33  0x102f5323 in qs_mol_dyn_low
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/motion/md_run.F:371
#34  0x102f648b in __md_run_MOD_qs_mol_dyn
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/motion/md_run.F:149
#35  0x101e73d3 in cp2k_run
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k_runs.F:364
#36  0x101e91af in __cp2k_runs_MOD_run_input
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k_runs.F:997
#35  0x101e73d3 in cp2k_run
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k_runs.F:364
#36  0x101e91af in __cp2k_runs_MOD_run_input
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k_runs.F:997
#37  0x101e24f7 in cp2k
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k.F:379
#38  0x101e3ca7 in main
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k.F:44
#37  0x101e24f7 in cp2k
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k.F:379
#38  0x101e3ca7 in main
        at /usr/gapps/qsg/codes/cp2k/lassen/Debug/src/start/cp2k.F:44
ERROR:  One or more process (first noticed rank 1) terminated with signal 6

Alfio Lazzaro

unread,
Apr 12, 2023, 4:03:56 AM4/12/23
to cp2k
I'm sorry, I understand I was not clear in my previous message: the error you see it is not CP2K related, this is a COSMA error. Conclusion: you cannot use the toolchain to install COSMA and you have to do your own installation of COSMA and try to investigate where the problem is.  You can check the way to install COSMA at https://github.com/eth-cscs/COSMA. 
I do the following:

Run the toolchain without COSMA.
Source the install/setup.

cosma_ver=2.6.5
wget https://github.com/eth-cscs/COSMA/releases/download/v${cosma_ver}/COSMA-v${cosma_ver}.tar.gz
tar xf COSMA-v${cosma_ver}.tar.gz && rm COSMA-v${cosma_ver}.tar.gz
cd COSMA-v${cosma_ver}
mkdir build && cd build
mkdir install
cmake -DCMAKE_INSTALL_PREFIX=${PWD}/install -DCOSMA_BLAS=CUDA -DCOSMA_SCALAPACK=OPENBLAS -DCOSMA_WITH_TESTS=NO -DCOSMA_WITH_BENCHMARKS=NO -DCMAKE_CXX_COMPILER=mpic++ -DCOSMA_WITH_APPS=NO -DCOSMA_WITH_PROFILING=NO -DBUILD_SHARED_LIBS=NO ..
make && make install

You can reuse the toolchain scalapack installation.
Note that I'm building with CUDA, my initial suggestion is to try the CPU only (i.e. -DCOSMA_BLAS=OPENBLAS).
Then you can run again the toolchain with

./install_cp2k_toolchain.sh --install-all --with-cmake=system --with-openmpi=system --with-gcc=system --with-quip=no --with-libtorch=no --with-plumed=no --with-cosma=<path to your COSMA insallation> --with-sirius=no --enable-cuda --gpu-ver=V100

Nathan Keilbart

unread,
Apr 12, 2023, 5:24:56 PM4/12/23
to cp2k
Ok thank you. I understand now what you mean. I'll work on doing this and getting back to you. Thanks.

Nathan Keilbart

unread,
Apr 19, 2023, 5:40:16 PM4/19/23
to cp2k
Thanks for the help Alfio. By building COSMA by hand I was able to get CP2K to install with COSMA and SIRIUS.
Reply all
Reply to author
Forward
0 new messages