Dear all,
I've been running some AIMD calculations and am trying to speed up the calculations a bit by playing with the FFTW_PLAN_TYPE option. Unfortunately, only MEASURE and the default ESTIMATE are working. If I try to set it to PATIENT (as recommended for long AIMD runs) or EXHAUSTIVE, the calculation crashes almost immediately with the following error messages (see also attached files):
[PATIENT]
...
corrupted double-linked list
corrupted double-linked list (not small)
cp2k.popt: malloc.c:4106: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.
Program received signal SIGABRT: Process abort signal.
...
[EXHAUSTIVE]
...
malloc_consolidate(): unaligned fastbin chunk detected
malloc_consolidate(): unaligned fastbin chunk detected
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
malloc_consolidate(): unaligned fastbin chunk detected
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
corrupted double-linked list
cp2k.popt: malloc.c:4106: _int_malloc: Assertion `(unsigned long) (size) >= (unsigned long) (nb)' failed.
...
I'm running CP2K/2023.2-foss-2022a as compiled with Easybuild by our HPC centre, but the same problems appear when I try the CP2K/2022.1-foss-2022a version. However, when I run it with the CP2K/7.1-intel-2020a version which is also available, both EXHAUSTIVE and PATIENT seem to be working properly... Is this something that can be solved in some way or will this require a different compilation of CP2K, possibly with the intel toolchain instead of the foss toolchain?
Kind regards,
Léon
Dear Leon,
I do not know the root of the error and I cannot suggest a solution. The options themself are tested within our regtest suite and we do not find any issues there. So, it seems to be a more complicated problem either on FFTW site or on CP2K site.
I have two questions:
1. Did you check whether the FFT kernel actually needs an improvement? Check the runtime of the routines pw_transfer and those starting with fft_wrap_pw1pw2 (or similar).
2. Do you have a pdbg-version of CP2K available? If yes, can you run one of the failing tests with that one? It might also help to turn on the keywords TRACE and TRACE_MASTER in the GLOBAL section of your input files to identify the actual routine on CP2K site where the error occurs.
Regards,
Frederick


