Memory Leak on CP2k 9.1

235 views
Skip to first unread message

Quentin Pessemesse

unread,
Oct 5, 2022, 6:38:44 AM10/5/22
to cp2k
Dear all,
Our group is encountering a memory leak issue that makes running DFT-MD impossible with large systems (~100 atoms) on one of the clusters we have access to, even though the same calculations run correctly on other machines.
The cluster support sent me the following valgrind output and asked me to find suggestions on how to proceed. Does anyone have input on how to deal with such memory leaks ?
Best,
Quentin P.

==62== Invalid write of size 4 ==62== at 0x1EA9887: grid_ref_create_task_list (in /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp) ==62== by 0x1E7A772: grid_create_task_list (in /ccc/products2/cp2k-9.1/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/plumed/bin/cp2k.psmp) ==62== by 0x1E790B3: __grid_api_MOD_grid_create_task_list (grid_api.F:938) ==62== by 0x104AA67: __task_list_methods_MOD_generate_qs_task_list (task_list_methods.F:623) ==62== by 0xF58353: __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct (qs_update_s_mstruct.F:187) ==62== by 0xCC03AB: __qs_energy_init_MOD_qs_energies_init (qs_energy_init.F:311) ==62== by 0xCBF0A1: __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by 0xCE087E: __qs_force_MOD_qs_forces (qs_force.F:212) ==62== by 0xCE4349: __qs_force_MOD_qs_calc_energy_force (qs_force.F:117) ==62== by 0x9AE2C0: __force_env_methods_MOD_force_env_calc_energy_force (force_env_methods.F:271) ==62== by 0x50CD0C: __md_run_MOD_qs_mol_dyn_low (md_run.F:372) ==62== by 0x50DCF2: __md_run_MOD_qs_mol_dyn (md_run.F:153) ==62== Address 0x26d18670 is 16 bytes before a block of size 10 free'd ==62== at 0x4C35FAC: free (vg_replace_malloc.c:538) ==62== by 0x2B73E68: __offload_api_MOD_offload_timeset (offload_api.F:137) ==62== by 0x2B60EDA: __timings_MOD_timeset_handler (timings.F:278) ==62== by 0x2BE2C6D: __message_passing_MOD_mp_waitany (message_passing.F:4597) ==62== by 0x2963EA5: __realspace_grid_types_MOD_rs_pw_transfer_distributed (realspace_grid_types.F:1439) ==62== by 0x2966559: __realspace_grid_types_MOD_rs_pw_transfer (realspace_grid_types.F:711) ==62== by 0xC9310B: __qs_collocate_density_MOD_calculate_rho_core (qs_collocate_density.F:966) ==62== by 0xF57698: __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB: __qs_energy_init_MOD_qs_energies_init (qs_energy_init.F:311) ==62== by 0xCBF0A1: __qs_energy_MOD_qs_energies (qs_energy.F:84) ==62== by 0xCE087E: __qs_force_MOD_qs_forces (qs_force.F:212) ==62== by 0xCE4349: __qs_force_MOD_qs_calc_energy_force (qs_force.F:117) ==62== Block was alloc'd at ==62== at 0x4C34DFF: malloc (vg_replace_malloc.c:307) ==62== by 0x2F21116: _gfortrani_xmallocarray (memory.c:66) ==62== by 0x2F1C271: _gfortran_string_trim (string_intrinsics_inc.c:167) ==62== by 0x2B73E1C: __offload_api_MOD_offload_timeset (offload_api.F:137) ==62== by 0x2B60EDA: __timings_MOD_timeset_handler (timings.F:278) ==62== by 0x2BE2C6D: __message_passing_MOD_mp_waitany (message_passing.F:4597) ==62== by 0x2963EA5: __realspace_grid_types_MOD_rs_pw_transfer_distributed (realspace_grid_types.F:1439) ==62== by 0x2966559: __realspace_grid_types_MOD_rs_pw_transfer (realspace_grid_types.F:711) ==62== by 0xC9310B: __qs_collocate_density_MOD_calculate_rho_core (qs_collocate_density.F:966) ==62== by 0xF57698: __qs_update_s_mstruct_MOD_qs_env_update_s_mstruct (qs_update_s_mstruct.F:109) ==62== by 0xCC03AB: __qs_energy_init_MOD_qs_energies_init (qs_energy_init.F:311) ==62== by 0xCBF0A1: __qs_energy_MOD_qs_energies (qs_energy.F:84)

Krack Matthias (PSI)

unread,
Oct 5, 2022, 7:19:26 AM10/5/22
to cp...@googlegroups.com

Hi Quentin

 

It seems that you are using OpenMPI which is known to have leaks in some versions. Check this issue and this discussion here on this forum for further information.

 

HTH

 

Matthias

--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/4d753954-6ada-4d19-9092-931469136f8en%40googlegroups.com.

Quentin Pessemesse

unread,
Oct 6, 2023, 4:47:21 AM10/6/23
to cp2k
Dear all, 
The cluster staff has moved to using a docker with a CP2k image, with CP2k 2023.1 (https://hub.docker.com/r/cp2k/cp2k/tags). The program experiences serious memory leaks (out-of-memory crash after less than 24 hours on AIMD with a system less than 100 atoms with 256 GB). The cluster cannot use intelmpi versions older than intelmpi 20. Is there a more recent version of CP2k which is stable and does not experience this type of large memory leaks?
We've tried to compile our own versions of CP2k with multiple versions of openMPI to no avail. The only stable CP2k version we have is CP2k 6.1, which is used with intelMPI 18 but it is on a legacy container where no new software can be installed.
Has anyone managed to use this docker image succesfully, and if so, which MPI package/version have you used ? If necessary, we can downgrade down to CP2k 9.1.
Best,
Quentin

Krack Matthias

unread,
Oct 6, 2023, 5:27:26 AM10/6/23
to cp...@googlegroups.com

Hi Quentin

 

There are some more cp2k 2023.2 docker containers for production available (build with MPICH or OpenMPI) which can also be pulled with apptainer (see READ.me for details). Maybe, you have more luck with one of these.

 

Best

 

Matthias

 

Quentin Pessemesse

unread,
Oct 6, 2023, 8:59:02 AM10/6/23
to cp2k
Dear Matthias,
Thank you kindly for your advice, I will try these different versions as soon as possible.
I've built the docker image for an openmpi version of CP2k on the cluster. With version 2023.1, I used to source the environment variables using " source /opt/cp2k-toolchain/install/setup". This does not work anymore. Is it a problem on the image's end or on the cluster end ?
Best,
Quentin

Krack Matthias

unread,
Oct 6, 2023, 9:16:09 AM10/6/23
to cp...@googlegroups.com

Dear Quentin

 

These containers are built differently following the usual cp2k toolchain installation process. There is no /opt/cp2k-toolchain/ folder but the folder /opt/cp2k/tools/toolchain/install/. There is no need, however, to source that setup file, because the entrypoint.sh script takes already care of that. You should be able to run the container as described in the READ.me.

 

Best

 

Matthias

 

Quentin Pessemesse

unread,
Nov 9, 2023, 9:33:05 AM11/9/23
to cp2k
Dear Matthias,
Thank you very much for your help, we were able to solve the memory leak issue by using the mpi that is compiled with the image, with some modifications to the command you provided to account for specificities of the cluster.
Best,
Quentin

Krack Matthias

unread,
Nov 9, 2023, 9:58:06 AM11/9/23
to cp...@googlegroups.com

Dear Quentin

 

I am glad to read that you could solve the problems by using the provided container images.

 

For future reference, I would like the mention that the README and the docker files for CP2K production containers have been moved to a separate repository on GitHub called cp2k-containers, recently. Therefore, the links in my previous mail are no longer valid. The CP2K production containers can be download from DockerHub here.

 

Best

 

Matthias

 

Quentin Pessemesse

unread,
Nov 9, 2023, 10:19:01 AM11/9/23
to cp...@googlegroups.com

Dear Matthias,

Do you know if there is a easy way to use plumed modules beyond the default ones with these docker images ? Some modules I need are deactivated by default. I have no experience in editing docker images, and this may rather be a question for the plumed forum...

Thank you again,

Q.

You received this message because you are subscribed to a topic in the Google Groups "cp2k" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cp2k/uEodASnEFfQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cp2k+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/ZRAP278MB0827795C76D23FF284B7D273F4AFA%40ZRAP278MB0827.CHEP278.PROD.OUTLOOK.COM.

Krack Matthias

unread,
Nov 9, 2023, 11:04:29 AM11/9/23
to cp...@googlegroups.com

Dear Quentin

 

This can’t be accomplished just by a change of the docker file. The configure command for the plumed build in the CP2K toolchain has to be changed too. I just applied such a change in my personal fork of the current CP2K master version (see https://github.com/mkrack/cp2k):

https://github.com/cp2k/cp2k/commit/cbe099e098ced65b0aeda353db8eb4c5e391f27f

You can do the same in your personal fork of CP2K and then use the URL of that github repository in one of the master_* docker files for the “git clone” command.

 

HTH

 

Matthias

 

Reply all
Reply to author
Forward
0 new messages