CP2K 7.1 memory leak

344 views
Skip to first unread message

Paul Schwarz

unread,
Aug 11, 2020, 7:54:20 AM8/11/20
to cp2k

Dear CP2K developers,

 

I downloaded the latest 7.1 version and used toolchain to install the libraries libint, libxc, and libxsmm with the following command:

 

$ ./install_cp2k_toolchain.sh --enable-omp --with-libxc=install --with-libint=install --with-fftw=system --with-libxsmm=install --with-mkl=system  --with-elpa=no --with-cmake=system --with-openblas=no --with-hdf5=system

 

After the successfull installations, I used the resulting local.popt to make an executable (without any further modifications).

The used compilers were gcc, g++, and mpif90, which in my case is Intel version 2019.5.281, and not OpenMPI.

 

Running relaxation simulations of molecules on slab surfaces, I experience an increasing memory usage.

Sometimes the simulation relaxes fast enough, so that I don’t run out of memory.

Sometimes I run out of memory before the relaxation finishes.

I call cp2k_shell.popt, since I run my relaxations via an ase interface (but I guess it doesn’t matter, since there is no shell binary, but just a link of the shell to cp2k.popt).

 

Does anyone else experience memory issues? I have the same issue on a HPC center, where a system admin installed the 7.1 version for me, and here my jobs also crash due to running out of memory. 

Does someone have a suggestion as to what the problem is or how to avoid it?

Is there maybe a compiler flag/option I should avoid or put in?

 

 

Best regards,

Paul Schwarz

Ole Schütt

unread,
Aug 11, 2020, 8:16:53 AM8/11/20
to cp...@googlegroups.com
Hi Paul,

you can run the pdbg binary which has LeakSanitizer enabled. When CP2K
exits it will print a detailed report of code locations that leaked
memory.

Note that there are already a few known leaks which we usually ignore:


https://github.com/cp2k/cp2k/blob/master/tools/toolchain/scripts/install_gcc.sh#L165

Generally, those leaks are considered benign, but if nothing else show
up then you might want to check them as well.

Cheers,
Ole
> --
> You received this message because you are subscribed to the Google
> Groups "cp2k" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cp2k+uns...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cp2k/d9628406-6b9d-4e00-8b6b-edfe1b765d54n%40googlegroups.com
> [1].
>
>
> Links:
> ------
> [1]
> https://groups.google.com/d/msgid/cp2k/d9628406-6b9d-4e00-8b6b-edfe1b765d54n%40googlegroups.com?utm_medium=email&utm_source=footer

Paul Schwarz

unread,
Aug 11, 2020, 9:06:06 AM8/11/20
to cp2k
Hi Ole,

thanks for your reply.
I will try out the pdbg binary, and compare output to the known leaks.


Paul

Paul Schwarz

unread,
Aug 12, 2020, 4:43:20 AM8/12/20
to cp2k
Hi Ole,

I tried building the pdbg binary, but I came across "undefined reference" errors during compilation for libxsmm

/opt/chem/CP2K/cp2k-7.1.0/tools/build_utils/fypp -n --line-marker-format=gfortran5 /opt/chem/CP2K/cp2k-7.1.0/src/eri_mme/eri_mme_lattice_summation.F eri_mme_lattice_summation.F90
/opt/chem/CP2K/cp2k-7.1.0/tools/toolchain/install/libxsmm-1.14/lib/libxsmm.a(libxsmm_gemm.o): In function `__real_dgemm_':
libxsmm_gemm.c:(.text.__real_dgemm_+0x1): undefined reference to `dgemm_'
/opt/chem/CP2K/cp2k-7.1.0/tools/toolchain/install/libxsmm-1.14/lib/libxsmm.a(libxsmm_gemm.o): In function `__real_sgemm_':
libxsmm_gemm.c:(.text.__real_sgemm_+0x1): undefined reference to `sgemm_'
/opt/chem/CP2K/cp2k-7.1.0/tools/toolchain/install/libxsmm-1.14/lib/libxsmm.a(libxsmm_gemm.o): In function `__real_dgemv_':
libxsmm_gemm.c:(.text.__real_dgemv_+0x1): undefined reference to `dgemv_'
/opt/chem/CP2K/cp2k-7.1.0/tools/toolchain/install/libxsmm-1.14/lib/libxsmm.a(libxsmm_gemm.o): In function `__real_sgemv_':
libxsmm_gemm.c:(.text.__real_sgemv_+0x1): undefined reference to `sgemv_'
collect2: error: ld returned 1 exit status
/opt/chem/CP2K/cp2k-7.1.0/obj/local/pdbg/all.dep:108: recipe for target '/opt/chem/CP2K/cp2k-7.1.0/exe/local/graph.pdbg' failed
make[3]: *** [/opt/chem/CP2K/cp2k-7.1.0/exe/local/graph.pdbg] Error 1

I used the standard local.pdbg that was created by toolchain.
Is this some sort of linking error? Any idea how to resolve that?


Best wishes,
Paul

Ole Schütt

unread,
Aug 13, 2020, 4:29:21 AM8/13/20
to cp...@googlegroups.com
Hi Paul,

that looks like https://github.com/cp2k/cp2k/issues/695.

You can either try to patch the arch file manually or simply rebuild the
toolchain using the latest version from git.

-Ole
> https://groups.google.com/d/msgid/cp2k/fd14f21f-7b10-4549-bdc2-1a6b32b5b4abn%40googlegroups.com
> [1].
>
>
> Links:
> ------
> [1]
> https://groups.google.com/d/msgid/cp2k/fd14f21f-7b10-4549-bdc2-1a6b32b5b4abn%40googlegroups.com?utm_medium=email&utm_source=footer

Paul Schwarz

unread,
Aug 18, 2020, 7:09:26 AM8/18/20
to cp2k
Hi Ole,

Regarding toolchain:
Yes, that was the right issue. I added the codelines from the updated toolchain script to my own and compiled a pdbg binary.

Regarding the memory leaks:
I ran a geometry optimization for two different systems with the pdbg binary: one that does not run out of memory before finishing, and one that does run out of memory before finishing.

In the first case, I got messages from the leak sanitizer in the end, and I attach the output at the end of this post.
Since I run my calculations with 'nohup ... &', I got the sanitizer messages written to nohup.out.
The leaks seem to come from
    /usr/lib/x86_64-linux-gnu/liblsan.so
    /opt/intel_.../linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so
    /lib/x86_64-linux-gnu/libc.so

However, I'm not familiar with debugging code, and it seems to me that 1280 bytes of leaked memory in 8 objects is not much.

In the second case, though, the leak sanitizer did not write anything to nohup.out and the simulation just crashed as before, as it ran out of memory.
I thought that there might be some problem, since I'm running this calculation via the ASE interface and maybe some information got lost, I don't know.
So I restarted the simulation, but now with the standard cp2k.pdbg, and not cp2k_shell.pdbg.
Here, I'm currently tracking the memory usage of my machine and I see that the memory usage increases from ~24 GB to 35 GB over rougly 170 minutes.

Even though I did not track the memory usage for the first case, I think that the memory usage must have grown in the same way.
And I don't really understand how the reported 1280 bytes of leaked memory could lead to such a large increase of memory usage.


Best regards
Paul


========================================================
Leak sanitizer output:

==42842==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1280 byte(s) in 8 object(s) allocated from:
    #0 0x7f09c65047e3  (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xf7e3)
    #1 0x7f09318f36b7 in rxm_cmap_alloc_handle (/opt/devel/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so+0x86b7)

Direct leak of 27 byte(s) in 5 object(s) allocated from:
    #0 0x7f09c6503acb in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xeacb)
    #1 0x7f09c34a8a29 in __strdup (/lib/x86_64-linux-gnu/libc.so.6+0x9da29)

SUMMARY: LeakSanitizer: 1307 byte(s) leaked in 13 allocation(s).

=================================================================
==42843==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1280 byte(s) in 8 object(s) allocated from:
    #0 0x7fe1887c47e3  (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xf7e3)
    #1 0x7fe0f3bf36b7 in rxm_cmap_alloc_handle (/opt/devel/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so+0x86b7)

Direct leak of 27 byte(s) in 5 object(s) allocated from:
    #0 0x7fe1887c3acb in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xeacb)
    #1 0x7fe185768a29 in __strdup (/lib/x86_64-linux-gnu/libc.so.6+0x9da29)

SUMMARY: LeakSanitizer: 1307 byte(s) leaked in 13 allocation(s).

=================================================================
==42844==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1280 byte(s) in 8 object(s) allocated from:
    #0 0x7f4564a917e3  (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xf7e3)
    #1 0x7f44cfef36b7 in rxm_cmap_alloc_handle (/opt/devel/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so+0x86b7)

Direct leak of 27 byte(s) in 5 object(s) allocated from:
    #0 0x7f4564a90acb in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xeacb)
    #1 0x7f4561a35a29 in __strdup (/lib/x86_64-linux-gnu/libc.so.6+0x9da29)

SUMMARY: LeakSanitizer: 1307 byte(s) leaked in 13 allocation(s).

=================================================================
==42845==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1280 byte(s) in 8 object(s) allocated from:
    #0 0x7faa81dbf7e3  (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xf7e3)
    #1 0x7fa9ed1f36b7 in rxm_cmap_alloc_handle (/opt/devel/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so+0x86b7)

Direct leak of 27 byte(s) in 5 object(s) allocated from:

=================================================================
==42846==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1280 byte(s) in 8 object(s) allocated from:
    #0 0x7f54b0dcb7e3  (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xf7e3)
    #1 0x7f541c1f36b7 in rxm_cmap_alloc_handle (/opt/devel/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so+0x86b7)

Direct leak of 27 byte(s) in 5 object(s) allocated from:
    #0 0x7f54b0dcaacb in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xeacb)
    #1 0x7f54add6fa29 in __strdup (/lib/x86_64-linux-gnu/libc.so.6+0x9da29)

SUMMARY: LeakSanitizer: 1307 byte(s) leaked in 13 allocation(s).

=================================================================
==42848==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1280 byte(s) in 8 object(s) allocated from:
    #0 0x7ff87ac327e3  (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xf7e3)
    #1 0x7ff7e5ff36b7 in rxm_cmap_alloc_handle (/opt/devel/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so+0x86b7)

Direct leak of 27 byte(s) in 5 object(s) allocated from:
    #0 0x7ff87ac31acb in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xeacb)
    #1 0x7ff877bd6a29 in __strdup (/lib/x86_64-linux-gnu/libc.so.6+0x9da29)

SUMMARY: LeakSanitizer: 1307 byte(s) leaked in 13 allocation(s).
    #0 0x7faa81dbeacb in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xeacb)
    #1 0x7faa7ed63a29 in __strdup (/lib/x86_64-linux-gnu/libc.so.6+0x9da29)

SUMMARY: LeakSanitizer: 1307 byte(s) leaked in 13 allocation(s).

=================================================================
==42847==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1280 byte(s) in 8 object(s) allocated from:
    #0 0x7f1b63ff97e3  (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xf7e3)
    #1 0x7f1acf3f36b7 in rxm_cmap_alloc_handle (/opt/devel/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so+0x86b7)

Direct leak of 27 byte(s) in 5 object(s) allocated from:
    #0 0x7f1b63ff8acb in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xeacb)
    #1 0x7f1b60f9da29 in __strdup (/lib/x86_64-linux-gnu/libc.so.6+0x9da29)

SUMMARY: LeakSanitizer: 1307 byte(s) leaked in 13 allocation(s).
=================================================================
==42841==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1280 byte(s) in 8 object(s) allocated from:
    #0 0x7f50707517e3  (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xf7e3)
    #1 0x7f4fdbaf36b7 in rxm_cmap_alloc_handle (/opt/devel/intel/compilers_and_libraries_2019.5.281/linux/mpi/intel64/libfabric/lib/prov/librxm-fi.so+0x86b7)

Direct leak of 27 byte(s) in 5 object(s) allocated from:
    #0 0x7f5070750acb in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/liblsan.so.0+0xeacb)
    #1 0x7f506d6f5a29 in __strdup (/lib/x86_64-linux-gnu/libc.so.6+0x9da29)

SUMMARY: LeakSanitizer: 1307 byte(s) leaked in 13 allocation(s).


Adams ke

unread,
Feb 13, 2021, 4:42:50 AM2/13/21
to cp2k
Hi,
     Have you solved this problem ?

Reply all
Reply to author
Forward
0 new messages