compilation problems - LHS and RHS of an assignment statement have incompatible types

585 views
Skip to first unread message

bartosz mazur

unread,
Oct 8, 2024, 7:17:08 AM10/8/24
to cp2k
Hi all, 

I recently managed to compile cp2k on our cluster, but regtests showed several errors. Most of the failures are due to the error `forrtl: severe (189): LHS and RHS of an assignment statement have incompatible types` or `forrtl: severe (153): allocatable array or pointer is not allocated`. After looking at the output from `make` I noticed that there are quite a few similar warnings there:

```
/lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exts/dbcsr/src/mpi/dbcsr_mpiwrap.F(1930): warning #8100: The actual argument is an array section or assumed-shape array, corresponding dummy argument that has either the VOLATILE or ASYNCHRONOUS attribute shall be an assumed-shape array.   [MSGIN]
         CALL mpi_isend(msgin, msglen, MPI_LOGICAL, dest, my_tag, &
------------------------^
```

For compilation I used GCC 12.2.0 and intel 2022.2.1. My toolchain command was `./install_cp2k_toolchain.sh --mpi-mode=intelmpi --with-intel --with-gcc=system --with-plumed --with-quip --with-pexsi --with-ptscotch --with-superlu --with-fftw=no --with-hdf5`. In the attachment I provide all outputs from toolchain, make, and regtests. 

I'm not sure what went wrong and how should I proceed so any help will be much appreciated! 

Best
Bartosz
bem_compilation_warnings.zip

Frederick Stein

unread,
Oct 8, 2024, 8:07:15 AM10/8/24
to cp2k
Dear Bartosz,
If you want to compile with Intel, then drop the "--with-gcc" flag. Regarding Intel, we do not test Intel 2022.2 anymore. You should try the IntelOneAPI containing more recent compilers instead.  We are currently testing version 2024.2.
The warnings can be ignored for now, but we are aware of that issue and will make adjustments later after dropping some older compilers.
Regarding the runtime errors. The error "LHS and RHS of an assignment statement have incompatible types" could be a compiler bug (see https://community.intel.com/t5/Intel-Fortran-Compiler/Segmentation-fault-due-to-assignment-of-derived-type-variable/td-p/1489823). The allocation error may also be a compiler bug as the respective array is always allocated and the routine is left directly after deallocating the array earlier in the routine.
Best,
Frederick

bartosz mazur

unread,
Oct 8, 2024, 8:43:19 AM10/8/24
to cp2k
Hi Frederick, 

Thank you for your quick response! Just to be sure, if I compile the latest version of cp2k using Intel 2021 (https://www.cp2k.org/dev:compiler_support), I should no longer have the problems described? I ask because I don't see a module with Intel OneAPI 2024 on our HPC, so I am considering using either an older module or asking the admins to provide a newer one.
 
Best
Bartosz

Frederick Stein

unread,
Oct 8, 2024, 9:46:14 AM10/8/24
to cp2k
Hi Bartosz,
No, Intel 2021 will be probably not work, it is older than Intel 2022. I meant something like Intel OneAPI 2023 or 2024.
Best,
Frederick

bartosz mazur

unread,
Oct 11, 2024, 7:46:08 AM10/11/24
to cp2k
Hi Frederic, 

I've used Intel OneAPI 2024.2. and it helped with the error we discussed. Thanks a lot for that! 

However, still some tests failed (correct: 4091 / 4227; failed: 136). Now most of the failed tests are killed without additional information with:

```
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 172367 RUNNING AT r23c03b11
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 172368 RUNNING AT r23c03b11
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
```

and sometimes also this message is printed:

```
LIBXSMM_VERSION: develop-1.17-3834 (25693946)
LIBXSMM_TARGET: clx [Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz]
Registry and code: 13 MB
Command (PID=172367): /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp 2H2O_t01.inp
Uptime: 0.932725 s
```

or 

```
LIBXSMM_VERSION: develop-1.17-3834 (25693946)
CLX/DP      TRY    JIT    STA    COL
   0..13     22     22      0      0 
  14..23     15     15      0      0 
  24..64      0      0      0      0 
Registry and code: 13 MB + 320 KB (gemm=37)
Command (PID=132831): /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp admm_dbcsr_thread_dist.inp
Uptime: 1.272898 s
```

I was able to find similar issue (here) but I am not sure how I could fix it. I performed the regetests twice and in some cases the tasks finished without error the first time, but failed the second time, and the opposite: for example `QS/regtest-hfx/H2-ADMM-full.inp` was `OK` in the first run but finished with `RUNTIME FAIL` in second run, or `QS/regtest-as-1/h2_gapw_pp_2-4.inp` finished with `OK` in the first run (as the only one in this set) but in the second run finished with `RUNTIME FAIL`. In the attachment I provide outputs from toolchain, make and regtests 1st and 2nd run. 

The thing I've noticed is that toolchain is using ifort, which is some older version `ifort (IFORT) 2021.13.0 20240602`. Do you think using ifx would be better and maybe could help solving this issue? If yes, how can I force toolchain to use ifx instead of ifort? 

Another question - none of the regtests ended in `WRONG`. Does this mean that I can assume that cp2k is safe to use and if an error occurs, the job will be killed instead of getting an erroneous result? 

Best
Bartosz

bartosz mazur

unread,
Oct 11, 2024, 7:48:42 AM10/11/24
to cp2k
Sorry, forgot attachments.

intel2024.zip

Frederick Stein

unread,
Oct 11, 2024, 8:30:25 AM10/11/24
to cp2k
Dear Bartosz,
If I am not mistaken, you used 8 OpenMP threads. The test do not run that efficiently with such a large number of threads. 2 should be sufficient.
The test result suggests that most of the functionality may work but due to a missing backtrace (or similar information), it is hard to tell why they fail. You could also try to run some of the single-node tests to assess the stability of CP2K.
Best,
Frederick

bartosz mazur schrieb am Freitag, 11. Oktober 2024 um 13:48:42 UTC+2:
Sorry, forgot attachments.

bartosz mazur

unread,
Oct 18, 2024, 9:37:43 AM10/18/24
to cp2k
Hi Frederick,

thanks again for help. So I have tested different simulation variants and I know that the problem occurs when using OMP. For MPI calculations without OMP all tests pass. I have also tested the effect of the `OMP_PROC_BIND` and `OMP_PLACES` parameters and apart from the effect on simulation time, they have no significant effect on the presence of errors. Below are the results for ssmp:

```
OMP_PROC_BIND, OMP_PLACES, correct, total, wrong, failed, time
spread, threads, 3850, 4144, 4, 290, 186min
spread, cores, 3831, 4144, 3, 310, 183min
spread, sockets, 3864, 4144, 3, 277, 104min
close, threads, 3879, 4144, 3, 262, 171min
close, cores, 3854, 4144, 0, 290, 168min
close, sockets, 3865, 4144, 3, 276, 104min
master, threads, 4121, 4144, 0, 23, 1002min
master, cores, 4121, 4144, 0, 23, 986min
master, sockets, 3942, 4144, 3, 199, 219min
false, threads, 3918, 4144, 0, 226, 178min
false, cores, 3919, 4144, 3, 222, 176min
false, sockets, 3856, 4144, 4, 284, 104min
```

and psmp:

```
OMP_PROC_BIND, OMP_PLACES, results
spread, threads, Summary: correct: 4097 / 4227; failed: 130; 495min
spread, cores, 26 / 362
spread, cores, 26 / 362
close, threads, Summary: correct: 4133 / 4227; failed: 94; 484min
close, cores, 60 / 362
close, sockets, 13 / 362
master, threads, 13 / 362
master, cores, 79 / 362
master, sockets, Summary: correct: 4153 / 4227; failed: 74; 563min
false, threads, Summary: correct: 4153 / 4227; failed: 74; 556min
false, cores, Summary: correct: 4106 / 4227; failed: 121; 511min
false, sockets, 96 / 362
not specified, not specified, Summary: correct: 4129 / 4227; failed: 98; 263min
```

Any ideas what I could do next to have more information about the source of the problem or maybe you see a potential solution at this stage? I would appreciate any further help. 

Best
Bartosz

Frederick Stein

unread,
Oct 18, 2024, 10:24:16 AM10/18/24
to cp2k
Dear Bartosz,
What happens if you set the number of OpenMP threads to 1 (add '--ompthreads 1' to TESTOPTS)? What errors do you observe in case of the ssmp?
Best,
Frederick

bartosz mazur

unread,
Oct 18, 2024, 11:09:40 AM10/18/24
to cp2k
I'm using do_regtests.py script, not make regtesting, but I assume it makes no difference. As I mentioned in previous message for `--ompthreads 1` all tests were passed both for ssmp and psmp. For ssmp with `--ompthreads 2` I observe similar errors as for psmp with the same setting, I provide example output as attachment. 

Thanks
Bartosz

regtests_ssmp.out

Frederick Stein

unread,
Oct 18, 2024, 11:18:39 AM10/18/24
to cp2k
Please pick one of the failing tests. Then, add the TRACE keyword to the &GLOBAL section and then run the test manually. This increases the size of the output file dramatically (to some million lines). Can you send me the last ~20 lines of the output?

bartosz mazur

unread,
Oct 20, 2024, 10:47:15 AM10/20/24
to cp2k
The error is:

```
LIBXSMM_VERSION: develop-1.17-3834 (25693946)
CLX/DP      TRY    JIT    STA    COL
   0..13      2      2      0      0
  14..23      0      0      0      0

  24..64      0      0      0      0
Registry and code: 13 MB + 16 KB (gemm=2)
Command (PID=2607388): /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp -i H2O-9.inp -o H2O-9.out
Uptime: 5.288243 s


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 2607388 RUNNING AT r21c01b10

=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 2607389 RUNNING AT r21c01b10
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================
```

and the last 20 lines:

```
 000000:000002<<                                  13     76 pw_copy       0.001
 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13     19 pw_derive       star
 t Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13     19 pw_derive       0.00
 2 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13    168 pw_pool_create_pw
     start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                     14     97 pw_create_c1d
    start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                     14     97 pw_create_c1d
    0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13    168 pw_pool_create_pw
     0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13     77 pw_copy       start
 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13     77 pw_copy       0.001
 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13     20 pw_derive       star
 t Hostmem: 693 MB GPUmem: 0 MB
```

Thanks!

Frederick Stein

unread,
Oct 21, 2024, 2:58:34 AM10/21/24
to cp2k
Dear Bartosz,
I have no idea about the issue with LibXSMM.
Regarding the trace, I do not know either as there is not much that could break in pw_derive (it just performs multiplications) and the sequence of operations is to unspecific. It may be that the code actually breaks somewhere else. Can you do the same with the ssmp and post the last 100 lines? This way, we remove the asynchronicity issues for backtraces with the psmp version.
Best,
Frederick

bartosz mazur

unread,
Oct 21, 2024, 10:33:45 AM10/21/24
to cp2k
The error for ssmp is:

```
LIBXSMM_VERSION: develop-1.17-3834 (25693946)
CLX/DP      TRY    JIT    STA    COL
   0..13      4      4      0      0
  14..23      0      0      0      0
  24..64      0      0      0      0
Registry and code: 13 MB + 32 KB (gemm=4)
Command (PID=54845): /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.ssmp -i H2O-9.inp -o H2O-9.out
Uptime: 2.861583 s
/var/spool/slurmd/r30c01b15/job3120330/slurm_script: line 36: 54845 Segmentation fault      (core dumped) /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.ssmp -i H2O-9.inp -o H2O-9.out
```

and the last 100 lines of output:

```
 000000:000001>>                               12     20 mp_sum_d       start Ho
 stmem: 380 MB GPUmem: 0 MB
 000000:000001<<                               12     20 mp_sum_d       0.000 Ho
 stmem: 380 MB GPUmem: 0 MB
 000000:000001<<                            11     13 dbcsr_dot_sd       0.000 H
 ostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                         10     12 calculate_ptrace_kp       0.0
 00 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                       9      6 evaluate_core_matrix_traces    
   0.000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                       9      6 rebuild_ks_matrix       start Ho
 stmem: 380 MB GPUmem: 0 MB
 000000:000001>>                         10      6 qs_ks_build_kohn_sham_matrix
       start Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                            11    140 pw_pool_create_pw       st
 art Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                               12     79 pw_create_c1d       sta
 rt Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                               12     79 pw_create_c1d       0.0
 00 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                            11    140 pw_pool_create_pw       0.
 000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                            11    141 pw_pool_create_pw       st
 art Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                               12     80 pw_create_c1d       sta
 rt Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                               12     80 pw_create_c1d       0.0
 00 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                            11    141 pw_pool_create_pw       0.
 000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                            11     61 pw_copy       start Hostme
 m: 380 MB GPUmem: 0 MB
 000000:000001<<                            11     61 pw_copy       0.004 Hostme
 m: 380 MB GPUmem: 0 MB
 000000:000001>>                            11     35 pw_axpy       start Hostme
 m: 380 MB GPUmem: 0 MB
 000000:000001<<                            11     35 pw_axpy       0.002 Hostme
 m: 380 MB GPUmem: 0 MB
 000000:000001>>                            11      6 pw_poisson_solve       sta
 rt Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                               12      6 pw_poisson_rebuild    
   start Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                               12      6 pw_poisson_rebuild    
   0.000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                               12    142 pw_pool_create_pw      
  start Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                                  13     81 pw_create_c1d      
 start Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                                  13     81 pw_create_c1d      
 0.000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                               12    142 pw_pool_create_pw      
  0.000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                               12     62 pw_copy       start Hos
 tmem: 380 MB GPUmem: 0 MB
 000000:000001<<                               12     62 pw_copy       0.003 Hos
 tmem: 380 MB GPUmem: 0 MB
 000000:000001>>                               12      6 pw_multiply_with      
 start Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                               12      6 pw_multiply_with      
 0.002 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                               12     63 pw_copy       start Hos
 tmem: 380 MB GPUmem: 0 MB
 000000:000001<<                               12     63 pw_copy       0.003 Hos
 tmem: 380 MB GPUmem: 0 MB
 000000:000001>>                               12      6 pw_integral_ab       st
 art Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                               12      6 pw_integral_ab       0.
 005 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                               12      7 pw_poisson_set       st
 art Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                                  13    143 pw_pool_create_pw  
     start Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                                     14     82 pw_create_c1d    
    start Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                                     14     82 pw_create_c1d    
    0.000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                                  13    143 pw_pool_create_pw  
     0.000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                                  13     64 pw_copy       start
 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                                  13     64 pw_copy       0.003
 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                                  13     16 pw_derive       star
 t Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                                  13     16 pw_derive       0.00
 6 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                                  13    144 pw_pool_create_pw  
     start Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                                     14     83 pw_create_c1d    
    start Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                                     14     83 pw_create_c1d    
    0.000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                                  13    144 pw_pool_create_pw  
     0.000 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                                  13     65 pw_copy       start
 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001<<                                  13     65 pw_copy       0.004
 Hostmem: 380 MB GPUmem: 0 MB
 000000:000001>>                                  13     17 pw_derive       star
 t Hostmem: 380 MB GPUmem: 0 MB
```

for psmp the last 100 lines is:

```
 000000:000002<<                       9      7 evaluate_core_matrix_traces    
   0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                       9      7 rebuild_ks_matrix       start Ho

 stmem: 693 MB GPUmem: 0 MB
 000000:000002>>                         10      7 qs_ks_build_kohn_sham_matrix
       start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                            11    164 pw_pool_create_pw       st
 art Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                               12     93 pw_create_c1d       sta
 rt Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                               12     93 pw_create_c1d       0.0
 00 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                            11    164 pw_pool_create_pw       0.
 000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                            11    165 pw_pool_create_pw       st
 art Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                               12     94 pw_create_c1d       sta
 rt Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                               12     94 pw_create_c1d       0.0
 00 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                            11    165 pw_pool_create_pw       0.
 000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                            11     73 pw_copy       start Hostme

 m: 693 MB GPUmem: 0 MB
 000000:000002<<                            11     73 pw_copy       0.001 Hostme

 m: 693 MB GPUmem: 0 MB
 000000:000002>>                            11     41 pw_axpy       start Hostme

 m: 693 MB GPUmem: 0 MB
 000000:000002<<                            11     41 pw_axpy       0.001 Hostme

 m: 693 MB GPUmem: 0 MB
 000000:000002>>                            11     52 mp_sum_d       start Hostm

 em: 693 MB GPUmem: 0 MB
 000000:000002<<                            11     52 mp_sum_d       0.000 Hostm

 em: 693 MB GPUmem: 0 MB
 000000:000002>>                            11      7 pw_poisson_solve       sta
 rt Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                               12      7 pw_poisson_rebuild    
   start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                               12      7 pw_poisson_rebuild    
   0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                               12    166 pw_pool_create_pw      

  start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13     95 pw_create_c1d      
 start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13     95 pw_create_c1d      
 0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                               12    166 pw_pool_create_pw      

  0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                               12     74 pw_copy       start Hos

 tmem: 693 MB GPUmem: 0 MB
 000000:000002<<                               12     74 pw_copy       0.001 Hos

 tmem: 693 MB GPUmem: 0 MB
 000000:000002>>                               12      7 pw_multiply_with      
 start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                               12      7 pw_multiply_with      
 0.001 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                               12     75 pw_copy       start Hos

 tmem: 693 MB GPUmem: 0 MB
 000000:000002<<                               12     75 pw_copy       0.001 Hos

 tmem: 693 MB GPUmem: 0 MB
 000000:000002>>                               12      7 pw_integral_ab       st
 art Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13     53 mp_sum_d       start

  Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13     53 mp_sum_d       0.000

  Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                               12      7 pw_integral_ab       0.
 003 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                               12      8 pw_poisson_set       st
 art Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13    167 pw_pool_create_pw  
     start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                     14     96 pw_create_c1d    

    start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                     14     96 pw_create_c1d    

    0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13    167 pw_pool_create_pw  
     0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13     76 pw_copy       start
 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13     76 pw_copy       0.001
 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13     19 pw_derive       star
 t Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13     19 pw_derive       0.00
 2 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13    168 pw_pool_create_pw  
     start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                     14     97 pw_create_c1d    
    start Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                     14     97 pw_create_c1d    
    0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13    168 pw_pool_create_pw  
     0.000 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13     77 pw_copy       start
 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002<<                                  13     77 pw_copy       0.001
 Hostmem: 693 MB GPUmem: 0 MB
 000000:000002>>                                  13     20 pw_derive       star
 t Hostmem: 693 MB GPUmem: 0 MB
```

Thanks
Bartosz

Frederick Stein

unread,
Oct 22, 2024, 5:12:57 AM10/22/24
to cp2k
Dear Bartosz,
I am currently running some tests with the latest Intel compiler myself. What bothers me about your setup is the module GCC13/13.3.0 . Why is it loaded? Can you unload it? This would at least reduce potential interferences with between the Intel and the GCC compilers.
Best,
Frederick

bartosz mazur

unread,
Oct 22, 2024, 5:58:57 AM10/22/24
to cp2k
I was loading it as it was needed for compilation. I have unloaded the module, but the error still occurs: 

```
LIBXSMM_VERSION: develop-1.17-3834 (25693946)
CLX/DP      TRY    JIT    STA    COL
   0..13      2      2      0      0
  14..23      0      0      0      0
  24..64      0      0      0      0
Registry and code: 13 MB + 16 KB (gemm=2)
Command (PID=15485): /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp -i H2O-9.inp -o H2O-9.out
Uptime: 1.757102 s


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 15485 RUNNING AT r30c01b01

=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 15486 RUNNING AT r30c01b01

=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================
```


and the last 100 lines:

```
 000000:000002>>                            11     37 pw_create_c1d       start
 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                            11     37 pw_create_c1d       0.000
 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                         10     64 pw_pool_create_pw       0.000
  Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                         10     25 pw_copy       start Hostmem:
 697 MB GPUmem: 0 MB
 000000:000002<<                         10     25 pw_copy       0.001 Hostmem:
 697 MB GPUmem: 0 MB
 000000:000002>>                         10     17 pw_axpy       start Hostmem:
 697 MB GPUmem: 0 MB
 000000:000002<<                         10     17 pw_axpy       0.001 Hostmem:
 697 MB GPUmem: 0 MB
 000000:000002>>                         10     19 mp_sum_d       start Hostmem:
  697 MB GPUmem: 0 MB
 000000:000002<<                         10     19 mp_sum_d       0.000 Hostmem:
  697 MB GPUmem: 0 MB
 000000:000002>>                         10      3 pw_poisson_solve       start
 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                            11      3 pw_poisson_rebuild       s
 tart Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                            11      3 pw_poisson_rebuild       0
 .000 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                            11     65 pw_pool_create_pw       st
 art Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12     38 pw_create_c1d       sta
 rt Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12     38 pw_create_c1d       0.0
 00 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                            11     65 pw_pool_create_pw       0.
 000 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                            11     26 pw_copy       start Hostme
 m: 697 MB GPUmem: 0 MB
 000000:000002<<                            11     26 pw_copy       0.001 Hostme
 m: 697 MB GPUmem: 0 MB
 000000:000002>>                            11      3 pw_multiply_with       sta
 rt Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                            11      3 pw_multiply_with       0.0
 01 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                            11     27 pw_copy       start Hostme
 m: 697 MB GPUmem: 0 MB
 000000:000002<<                            11     27 pw_copy       0.001 Hostme
 m: 697 MB GPUmem: 0 MB
 000000:000002>>                            11      3 pw_integral_ab       start
  Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12     20 mp_sum_d       start Ho
 stmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12     20 mp_sum_d       0.001 Ho
 stmem: 697 MB GPUmem: 0 MB
 000000:000002<<                            11      3 pw_integral_ab       0.004
  Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                            11      4 pw_poisson_set       start
  Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12     66 pw_pool_create_pw      
  start Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                                  13     39 pw_create_c1d      
 start Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                                  13     39 pw_create_c1d      
 0.000 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12     66 pw_pool_create_pw      
  0.000 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12     28 pw_copy       start Hos
 tmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12     28 pw_copy       0.001 Hos
 tmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12      7 pw_derive       start H
 ostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12      7 pw_derive       0.002 H
 ostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12     67 pw_pool_create_pw      
  start Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                                  13     40 pw_create_c1d      
 start Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                                  13     40 pw_create_c1d      
 0.000 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12     67 pw_pool_create_pw      
  0.000 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12     29 pw_copy       start Hos
 tmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12     29 pw_copy       0.001 Hos
 tmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12      8 pw_derive       start H
 ostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12      8 pw_derive       0.002 H
 ostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12     68 pw_pool_create_pw      
  start Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                                  13     41 pw_create_c1d      
 start Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                                  13     41 pw_create_c1d      
 0.000 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12     68 pw_pool_create_pw      
  0.000 Hostmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12     30 pw_copy       start Hos
 tmem: 697 MB GPUmem: 0 MB
 000000:000002<<                               12     30 pw_copy       0.001 Hos
 tmem: 697 MB GPUmem: 0 MB
 000000:000002>>                               12      9 pw_derive       start H
 ostmem: 697 MB GPUmem: 0 MB
 ```


This is the list of currently loaded modules (all come with intel):

```
Currently Loaded Modulefiles:
 1) GCCcore/13.3.0                  7) impi/2021.13.0-intel-compilers-2024.2.0  
 2) zlib/1.3.1-GCCcore-13.3.0       8) imkl/2024.2.0                            
 3) binutils/2.42-GCCcore-13.3.0    9) iimpi/2024a                              
 4) intel-compilers/2024.2.0       10) imkl-FFTW/2024.2.0-iimpi-2024a          
 5) numactl/2.0.18-GCCcore-13.3.0  11) intel/2024a                              
 6) UCX/1.16.0-GCCcore-13.3.0    
```

Frederick Stein

unread,
Oct 22, 2024, 7:12:49 AM10/22/24
to cp2k
I can reproduce the error locally. I am investigating it now.

Frederick Stein

unread,
Oct 22, 2024, 9:24:04 AM10/22/24
to cp2k
I have a fix for it. In contrast to my first thought, it is a case of invalid type conversion from real to complex numbers (yes, Fortran is rather strict about it) in pw_derive. This may also be present in a few other spots. I am currently running more tests and I will open a pull request within the next few days.
Best,
Frederick

bartosz mazur

unread,
Oct 22, 2024, 11:45:21 AM10/22/24
to cp2k
Great! Thank you for your help. 

Best
Bartosz

Frederick Stein

unread,
Oct 23, 2024, 3:15:33 AM10/23/24
to cp2k
Dear Bartosz,
My fix is merged. Can you switch to the CP2K master and try it again? We are still working on a few issues with the Intel compilers such that we may eventually migrate from ifort to ifx.
Best,
Frederick

bartosz mazur

unread,
Oct 25, 2024, 3:50:47 AM10/25/24
to cp2k
Hi Frederick, 

it helped with most of the tests! Now only 13 have failed. In the attachments you will find full output from regtests and here is output from single job with TRACE enabled:

```
Loading intel/2024a
  Loading requirement: GCCcore/13.3.0 zlib/1.3.1-GCCcore-13.3.0
    binutils/2.42-GCCcore-13.3.0 intel-compilers/2024.2.0
    numactl/2.0.18-GCCcore-13.3.0 UCX/1.16.0-GCCcore-13.3.0
    impi/2021.13.0-intel-compilers-2024.2.0 imkl/2024.2.0 iimpi/2024a
    imkl-FFTW/2024.2.0-iimpi-2024a

Currently Loaded Modulefiles:
 1) GCCcore/13.3.0                  7) impi/2021.13.0-intel-compilers-2024.2.0  
 2) zlib/1.3.1-GCCcore-13.3.0       8) imkl/2024.2.0                            
 3) binutils/2.42-GCCcore-13.3.0    9) iimpi/2024a                              
 4) intel-compilers/2024.2.0       10) imkl-FFTW/2024.2.0-iimpi-2024a          
 5) numactl/2.0.18-GCCcore-13.3.0  11) intel/2024a                              
 6) UCX/1.16.0-GCCcore-13.3.0      
2 MPI processes with 2 OpenMP threads each
started at Fri Oct 25 09:34:34 CEST 2024 in /lustre/tmp/slurm/3127182
SIRIUS 7.6.1, git hash: https://api.github.com/repos/electronic-structure/SIRIUS/git/ref/tags/v7.6.1
Warning! Compiled in 'debug' mode with assert statements enabled!


LIBXSMM_VERSION: develop-1.17-3834 (25693946)
CLX/DP      TRY    JIT    STA    COL
   0..13      8      8      0      0
  14..23      0      0      0      0
  24..64      0      0      0      0
Registry and code: 13 MB + 64 KB (gemm=8)
Command (PID=423503): /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp -i dftd3src1.inp -o dftd3src1.out
Uptime: 2.752513 s


===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 423503 RUNNING AT r21c01b03

=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 423504 RUNNING AT r21c01b03

=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================
finished at Fri Oct 25 09:34:39 CEST 2024
```


and the last lines:

```
 000000:000002<<                                  13      3 mp_sendrecv_dm2    
   0.000 Hostmem: 955 MB GPUmem: 0 MB
 000000:000002>>                                  13      4 mp_sendrecv_dm2    
   start Hostmem: 955 MB GPUmem: 0 MB
 000000:000002<<                                  13      4 mp_sendrecv_dm2    
   0.000 Hostmem: 955 MB GPUmem: 0 MB
 000000:000002<<                               12      2 pw_nn_compose_r       0
 .003 Hostmem: 955 MB GPUmem: 0 MB
 000000:000002<<                            11      1 xc_pw_derive       0.003 H
 ostmem: 955 MB GPUmem: 0 MB
 000000:000002>>                            11      5 pw_zero       start Hostme
 m: 955 MB GPUmem: 0 MB
 000000:000002<<                            11      5 pw_zero       0.000 Hostme
 m: 955 MB GPUmem: 0 MB
 000000:000002>>                            11      2 xc_pw_derive       start H
 ostmem: 955 MB GPUmem: 0 MB
 000000:000002>>                               12      3 pw_nn_compose_r       s
 tart Hostmem: 955 MB GPUmem: 0 MB
 000000:000002>>                                  13      5 mp_sendrecv_dm2    
   start Hostmem: 955 MB GPUmem: 0 MB
 000000:000002<<                                  13      5 mp_sendrecv_dm2    
   0.000 Hostmem: 955 MB GPUmem: 0 MB
 000000:000002>>                                  13      6 mp_sendrecv_dm2    
   start Hostmem: 955 MB GPUmem: 0 MB
 000000:000002<<                                  13      6 mp_sendrecv_dm2    
   0.000 Hostmem: 955 MB GPUmem: 0 MB
 000000:000002<<                               12      3 pw_nn_compose_r       0
 .002 Hostmem: 955 MB GPUmem: 0 MB
 000000:000002<<                            11      2 xc_pw_derive       0.002 H
 ostmem: 955 MB GPUmem: 0 MB
 000000:000002>>                            11      6 pw_zero       start Hostme
 m: 955 MB GPUmem: 0 MB
 000000:000002<<                            11      6 pw_zero       0.001 Hostme
 m: 960 MB GPUmem: 0 MB
 000000:000002>>                            11      3 xc_pw_derive       start H
 ostmem: 960 MB GPUmem: 0 MB
 000000:000002>>                               12      4 pw_nn_compose_r       s
 tart Hostmem: 960 MB GPUmem: 0 MB
 000000:000002>>                                  13      7 mp_sendrecv_dm2    
   start Hostmem: 960 MB GPUmem: 0 MB
 000000:000002<<                                  13      7 mp_sendrecv_dm2    
   0.000 Hostmem: 960 MB GPUmem: 0 MB
 000000:000002>>                                  13      8 mp_sendrecv_dm2    
   start Hostmem: 960 MB GPUmem: 0 MB
 000000:000002<<                                  13      8 mp_sendrecv_dm2    
   0.000 Hostmem: 960 MB GPUmem: 0 MB
 000000:000002<<                               12      4 pw_nn_compose_r       0
 .002 Hostmem: 960 MB GPUmem: 0 MB
 000000:000002<<                            11      3 xc_pw_derive       0.002 H
 ostmem: 960 MB GPUmem: 0 MB
 000000:000002>>                            11      1 pw_spline_scale_deriv    
   start Hostmem: 960 MB GPUmem: 0 MB
 000000:000002<<                            11      1 pw_spline_scale_deriv    
   0.001 Hostmem: 960 MB GPUmem: 0 MB
 000000:000002>>                            11     20 pw_pool_give_back_pw      
  start Hostmem: 965 MB GPUmem: 0 MB
 000000:000002<<                            11     20 pw_pool_give_back_pw      
  0.000 Hostmem: 965 MB GPUmem: 0 MB
 000000:000002>>                            11     21 pw_pool_give_back_pw      
  start Hostmem: 965 MB GPUmem: 0 MB
 000000:000002<<                            11     21 pw_pool_give_back_pw      
  0.000 Hostmem: 965 MB GPUmem: 0 MB
 000000:000002>>                            11     22 pw_pool_give_back_pw      
  start Hostmem: 965 MB GPUmem: 0 MB
 000000:000002<<                            11     22 pw_pool_give_back_pw      
  0.000 Hostmem: 965 MB GPUmem: 0 MB
 000000:000002>>                            11     23 pw_pool_give_back_pw      
  start Hostmem: 965 MB GPUmem: 0 MB
 000000:000002<<                            11     23 pw_pool_give_back_pw      
  0.000 Hostmem: 965 MB GPUmem: 0 MB
 000000:000002>>                            11      1 xc_functional_eval       s
 tart Hostmem: 965 MB GPUmem: 0 MB
 000000:000002>>                               12      1 b97_lda_eval       star
 t Hostmem: 965 MB GPUmem: 0 MB
 000000:000002<<                               12      1 b97_lda_eval       0.10
 3 Hostmem: 979 MB GPUmem: 0 MB
 000000:000002<<                            11      1 xc_functional_eval       0
 .103 Hostmem: 979 MB GPUmem: 0 MB
 000000:000002<<                         10      1 xc_rho_set_and_dset_create  
     0.120 Hostmem: 979 MB GPUmem: 0 MB
 000000:000002>>                         10      1 check_for_derivatives       s
 tart Hostmem: 979 MB GPUmem: 0 MB
 000000:000002<<                         10      1 check_for_derivatives       0
 .000 Hostmem: 979 MB GPUmem: 0 MB
 000000:000002>>                         10     14 pw_create_r3d       start Hos
 tmem: 979 MB GPUmem: 0 MB
 000000:000002<<                         10     14 pw_create_r3d       0.000 Hos
 tmem: 979 MB GPUmem: 0 MB
 000000:000002>>                         10     15 pw_create_r3d       start Hos
 tmem: 979 MB GPUmem: 0 MB
 000000:000002<<                         10     15 pw_create_r3d       0.000 Hos
 tmem: 979 MB GPUmem: 0 MB
 000000:000002>>                         10     16 pw_create_r3d       start Hos
 tmem: 979 MB GPUmem: 0 MB
 000000:000002<<                         10     16 pw_create_r3d       0.000 Hos
 tmem: 979 MB GPUmem: 0 MB
 000000:000002>>                         10     17 pw_create_r3d       start Hos
 tmem: 979 MB GPUmem: 0 MB
 000000:000002<<                         10     17 pw_create_r3d       0.000 Hos
 tmem: 979 MB GPUmem: 0 MB
```


Best
Bartosz
regtests.out.zip

bartosz mazur

unread,
Oct 25, 2024, 4:15:19 AM10/25/24
to cp2k
I just got another error with LibXSMM, now in my regular simulation and without using OpenMP. This is the error:

```
[1729843139.920274] [r23c01b04:2913 :0]           ib_md.c:295  UCX  ERROR ibv_reg_mr(address=0x14f0b46fc080, length=7424, access=0xf) failed: Cannot allocate memory
[1729843139.920290] [r23c01b04:2913 :0]          ucp_mm.c:70   UCX  ERROR failed to register address 0x14f0b46fc080 (host) length 7424 on md[4]=mlx5_0: Input/output error (md supports: host)

LIBXSMM_VERSION: develop-1.17-3834 (25693946)[1729843139.932647] [r23c01b04:2945 :0]           ib_md.c:295  UCX  ERROR ibv_reg_mr(address=0x1491f069e040, length=8128, access=0xf) failed: Cannot allocate memory
[1729843139.932660] [r23c01b04:2945 :0]          ucp_mm.c:70   UCX  ERROR failed to register address 0x1491f069e040 (host) length 8128 on md[4]=mlx5_0: Input/output error (md supports: host)


CLX/DP      TRY    JIT    STA    COL
   0..13      4      4      0      0
  14..23      4      4      0      0

  24..64      0      0      0      0
Registry and code: 13 MB + 80 KB (gemm=8)
Command (PID=2913): /lustre/pd01/hpc-kuchta-1716987452/software/cp2k/exe/local/cp2k.psmp -i cp2k.inp -o cp2k.out
Uptime: 407633.177169 s
```

and this is simulation input I'm using:

```
&GLOBAL
  PROJECT uam1o_npt_rms
  RUN_TYPE MD
  PRINT_LEVEL LOW
  PREFERRED_DIAG_LIBRARY SCALAPACK
&END GLOBAL

&FORCE_EVAL
  METHOD QUICKSTEP
  STRESS_TENSOR ANALYTICAL
  &DFT
    BASIS_SET_FILE_NAME BASIS_MOLOPT_UZH
    POTENTIAL_FILE_NAME POTENTIAL_UZH
    &MGRID
      CUTOFF 500
    &END MGRID
    &XC
      &XC_FUNCTIONAL PBE
      &END XC_FUNCTIONAL
      &VDW_POTENTIAL
        POTENTIAL_TYPE PAIR_POTENTIAL
        &PAIR_POTENTIAL
          TYPE  DFTD3(BJ)
          PARAMETER_FILE_NAME  dftd3.dat
          REFERENCE_FUNCTIONAL PBE
          R_CUTOFF  25.0
        &END PAIR_POTENTIAL
      &END VDW_POTENTIAL
    &END XC
  &END DFT

  &SUBSYS
    &CELL
      A      12.2807999       0.0000000       0.0000000
      B       7.6258602       9.6257200       0.0000000
      C      -2.1557724      -1.0420258      18.0042801
    &END CELL
    &COORD
      Zn      11.37811      4.60286      0.24515
      Zn       8.15435      3.05288      8.74518
      Zn       6.37590      3.97311     17.74650
      Zn       9.59842      5.54014      9.24747
      S       11.79344      6.72692     17.10850
      S        4.06825      3.00573      9.90358
      S        5.95830      1.84422      0.90027
      S       13.67407      5.58944      8.10767
      O       10.72408      3.58291      1.89315
      O        8.51986      4.01962      1.53085
      O        6.60135      3.91587      7.68572
      O        7.74637      5.79259      8.21600
      O       15.32810      8.58246      5.10041
      O        9.35608      2.93551      7.09500
      O       10.38999      4.93007      7.45977
      O       11.66491      6.35111      1.31266
      O        9.48582      6.62478      0.77364
      O        2.59062      2.40094      3.91496
      O        7.03031      4.99173     16.09885
      O        9.23544      4.56122     16.46252
      O       11.14602      4.67776     10.31440
      O       10.00982      2.79915      9.77218
      O        2.41388      0.01898     12.91899
      O        8.39375      5.66143     10.89628
      O        7.36998      3.66087     10.53589
      O        6.08863      2.22161     16.68336
      O        8.26988      1.95313     17.21650
      O       15.16937      6.16381     14.09906
      N       13.25907      3.80728      0.04001
      N        2.36335     -0.74130     17.33402
      N        7.60676      1.08576      8.95623
      N       15.77729      5.75974      9.67861
      N        4.49430      4.76652     17.95756
      N       15.38873      9.31230      0.67467
      N       10.14308      7.50848      9.04236
      N        1.96529      2.83557      8.33233
      C        6.76554      5.18292      7.68414
      C       14.28210      4.11624      0.86006
      C        9.47998      3.39622      2.09658
      C        3.20112      3.42080      0.84626
      C        9.91466      1.18589      3.17244
      C        9.08210      2.29987      3.02657
      C        5.74710      6.04945      7.01821
      C        7.83265      2.30920      3.66005
      C        3.35793      2.34328     -0.04029
      C        4.51663      1.46385     -0.02755
      C       16.24194      7.75266      5.73606
      C        4.78940      5.52817      6.14198
      C        7.40810      1.21174      4.39947
      C       16.18016      6.38244      5.49010
      C        9.48869      0.06986      3.88005
      C       11.27238      1.77457     17.14330
      C        5.77166      7.43009      7.27236
      C       11.14819      8.24901     17.58588
      C        8.22170      0.08058      4.47135
      C        0.15087      1.02286     17.07544
      C       17.16180      8.28565      6.64351
      C       10.57067      7.01060      1.31282
      C        6.72654      0.47459      8.14002
      C       10.27972      3.79035      6.89470
      C       14.15006      8.72843      8.15880
      C       11.73751      2.06868      5.82537
      C       11.38838      3.41515      5.96966
      C       10.52304      8.34339      1.98566
      C       12.16584      4.39562      5.33967
      C       14.89762      7.93801      9.04648
      C       14.86698      6.48365      9.03575
      C        2.67167      1.17044      3.27681
      C       11.52468      8.76552      2.86608
      C       13.29140      4.04007      4.60622
      C        3.78230      0.36534      3.52266
      C       12.87823      1.70260      5.12344
      C        8.27761      0.34001      9.85941
      C        9.42677      9.18364      1.73295
      C        3.27553      4.45658      9.42657
      C       13.66559      2.69775      4.53650
      C       15.77023      8.59069      9.93240
      C        1.68356      0.78491      2.36643
      C       10.98451      3.41041     10.31327
      C        3.46873      4.45681     17.14097
      C        8.27403      5.18373     15.89814
      C       14.54907      5.15099     17.15930
      C        7.83119      7.39584     14.82858
      C        8.66916      6.28563     14.97331
      C       11.99928      2.54577     10.98702
      C        9.92072      6.28547     14.34388
      C       16.54982      7.26986      0.04271
      C       15.39103      8.14919      0.03189
      C        1.50023      0.84646     12.27989
      C       12.95126      3.06908     11.86817
      C       10.34198      7.38826     13.61070
      C        1.55836      2.21699     12.52561
      C        8.25354      8.51697     14.12666
      C        6.48249      6.79770      0.85630
      C       11.97760      1.16465     10.73446
      C        6.60385      0.32218      0.42301
      C        9.52282      8.51550     13.54043
      C       17.60321      7.54791      0.92891
      C        0.58530      0.31102     11.36884
      C        7.18362      1.56332     16.68291
      C       11.01926      8.11905      9.86341
      C        7.47582      4.80132     11.10039
      C        3.59282     -0.13430      9.84955
      C        6.01179      6.51430     12.17471
      C        6.36853      5.17005     12.02942
      C        7.23131      0.22715     16.01652
      C        5.59963      4.18477     12.66234
      C        2.84614      0.65728      8.96213
      C        2.87561      2.11161      8.97508
      C       15.08536      7.39548     14.73440
      C        6.23001     -0.19920     15.13769
      C        4.47482      4.53325     13.40042
      C       13.97400      8.19851     14.48576
      C        4.87173      6.87322     12.88120
      C        9.47231      8.25578      8.14046
      C        8.32790     -0.61137     16.27301
      C       14.46698      4.13864      8.58475
      C        4.09294      5.87331     13.47165
      C        1.97640      0.00563      8.07267
      C       16.07240      7.78504     15.64417
      H       14.10215      4.93465      1.55678
      H        3.98110      3.68721      1.55899
      H       10.89072      1.19647      2.69205
      H        7.19958      3.19021      3.56839
      H        4.75923      4.45384      5.96230
      H        6.45299      1.21835      4.92062
      H       15.44211      6.00062      4.78824
      H       17.75043      8.81610      3.97156
      H       10.41563      1.57993     16.49923
      H        6.49332      7.81303      7.99143
      H        0.24800      0.19739     16.37425
      H        9.53586     -0.26872      6.84508
      H        6.19685      1.12218      7.44173
      H       13.45550      8.28133      7.44815
      H       11.11633      1.31384      6.30260
      H       11.87413      5.44074      5.42962
      H       12.38442      8.12016      3.04474
      H       13.88694      4.78876      4.08791
      H        4.53915      0.70283      4.22717
      H        0.88557      0.65625      5.03328
      H        8.96418      0.89159     10.50060
      H        8.67994      8.85961      1.01083
      H       16.35704      8.00331     10.63471
      H       13.12606      1.45212      2.16563
      H        3.64702      3.63930     16.44281
      H       13.76743      4.88477     16.44833
      H        6.85355      7.37827     15.30535
      H       10.55820      5.40745     14.43410
      H       12.97886      4.14375     12.04672
      H       11.29905      7.38966     13.09313
      H        2.29216      2.60091     13.23073
      H       -0.01303     -0.23279     14.03603
      H        7.34113      6.99275      1.49776
      H       11.26049      0.78023     10.01184
      H       17.50743      8.37258      1.63130
      H        8.21398      8.86531     11.16822
      H       11.54834      7.47018     10.56097
      H        4.28503      0.31205     10.56295
      H        6.62643      7.27289     11.69479
      H        5.89748      3.14154     12.57118
      H        5.36986      0.44461     14.95599
      H        3.88656      3.78035     13.92095
      H       13.21826      7.85764     13.78163
      H       16.85773      7.91771     12.97237
      H        8.78884      7.70469      7.49554
      H        9.07452     -0.28399     16.99402
      H        1.39009      0.59398      7.37083
      H        4.63062      7.11938     15.84758
    &END COORD
    &KIND Zn
      BASIS_SET TZVP-MOLOPT-PBE-GTH-q12
      POTENTIAL GTH-PBE-q12
    &END KIND
    &KIND S
      BASIS_SET TZVP-MOLOPT-PBE-GTH-q6
      POTENTIAL GTH-PBE-q6
    &END KIND
    &KIND O
      BASIS_SET TZVP-MOLOPT-PBE-GTH-q6
      POTENTIAL GTH-PBE-q6
    &END KIND
    &KIND N
      BASIS_SET TZVP-MOLOPT-PBE-GTH-q5
      POTENTIAL GTH-PBE-q5
    &END KIND
    &KIND C
      BASIS_SET TZVP-MOLOPT-PBE-GTH-q4
      POTENTIAL GTH-PBE-q4
    &END KIND
    &KIND H
      BASIS_SET TZVP-MOLOPT-PBE-GTH-q1
      POTENTIAL GTH-PBE-q1
    &END KIND
  &END SUBSYS
&END FORCE_EVAL

&MOTION
  &MD
    ENSEMBLE NPT_I
    TEMPERATURE 298
    TIMESTEP 1.0
    STEPS 50000
    &THERMOSTAT
      TYPE NOSE
      &NOSE
        LENGTH 3
        YOSHIDA 3
        TIMECON 1000
      &END NOSE
    &END THERMOSTAT
    &BAROSTAT
      PRESSURE 1.0
      TIMECON 4000
    &END BAROSTAT
  &END MD
  &FREE_ENERGY
    METHOD METADYN
    &METADYN
      USE_PLUMED .TRUE.
      PLUMED_INPUT_FILE plumed.dat
    &END METADYN
  &END FREE_ENERGY
  &PRINT
    &TRAJECTORY
      &EACH
        MD 5
      &END EACH
    &END TRAJECTORY
    &FORCES
      UNIT eV*angstrom^-1
      &EACH
        MD 5
      &END EACH
    &END FORCES
    &CELL
      &EACH
        MD 5
      &END EACH
    &END CELL
  &END PRINT
&END MOTION
```

This simulation was performed with previous version of cp2k (so without your fix). 

Frederick Stein

unread,
Oct 25, 2024, 5:46:00 AM10/25/24
to cp2k
Dear Bartosz,
I will check the other issues with your regtests.
Regarding your latest issue, please provide more information such as an output file or a hint on the context. If I am supposed to retry the calculation on my local machine, I need all additional input files such as your plumed file. I can run your input file up to the point that CP2K needs plumed.
Best,
Frederick

Frederick Stein

unread,
Oct 25, 2024, 8:27:36 AM10/25/24
to cp2k
Regarding the other issues:
I can confirm them but cannot provide fixes for all of them because the probably trigger bugs in ifort. Because ifort is already deprecated, these bugs will probably not be fixed. Furthermore, we do not see any issues on our Intel CI. I will fix what I can but some of them will be left as we will focus our efforts on the support of the new ifx compiler.

bartosz mazur

unread,
Oct 28, 2024, 4:34:45 AM10/28/24
to cp2k
Many thanks Frederick for your help! 

bartosz mazur

unread,
Nov 20, 2024, 10:01:01 AM11/20/24
to cp2k
Hi Frederic,

I am writing this as a follow up to previous discussions. I am currently seeing a recurring problem with CP2K, where tasks are being killed after about 10 days with errors as in the attached outputs. This is not particularly annoying, as a restart is sufficient and the simulation can run on. Unfortunately, I don't think you will be able to reproduce this error, given the very long simulation time. However, if there is anything else I can provide to help understand the source of these problems, let me know. 

Best
Bartosz

slurm-3144902.out
slurm-3127239.out
slurm-3164366.out
slurm-3117616.out
slurm-3098731.out

Frederick Stein

unread,
Nov 20, 2024, 10:28:02 AM11/20/24
to cp2k
Dear Bartosz,
Without actual CP2K input or output files, I can only guess. The first Slurm output states "No space left on device", the others "Cannot allocate memory". This suggests that there is either not enough memory on the harddrive available (Do you have any additional CP2K output files from each respective rank?). The others that you do not have enough RAM available. You can try to run CP2K with less MPI ranks and more OpenMP ranks. This reduces the number of additional temporary output files and reduces the memory footprint in RAM but increases the the runtime.
Best,
Frederick

bartosz mazur

unread,
Nov 20, 2024, 11:10:13 AM11/20/24
to cp2k
As for the available disk space, I checked it and at the time there was enough (about 10 GB of free space), so I do not understand where the error came from. As for RAM, for the last task the maximum usage was about 50 GB, and there was 2x180 GB allocated. 

```
> sacct -j 3164366 --format=JobID,MaxRSS,AveRSS,MaxVMSize,AveVMSize --units=GB
JobID            MaxRSS     AveRSS  MaxVMSize  AveVMSize
------------ ---------- ---------- ---------- ----------
3164366                                                  
3164366.bat+      0.01G      0.01G      0.01G      0.01G
3164366.ext+      0.00G      0.00G      0.00G      0.00G
3164366.0        47.23G     47.06G     48.05G     47.71G
```

Here you can see how memory usage was changing with time: https://hpc-info.kdm.wcss.pl/goto/dz19rInNR?orgId=1

In the attachment I provide all input files and output is under link (because of size limits): https://we.tl/t-g7ObcwaNXn.

I'm not sure about output files for each rank, are they created by default? 

Best 
Bartosz 
run_cp2k.job
plumed.dat
cp2k.inp

Frederick Stein

unread,
Nov 20, 2024, 11:51:11 AM11/20/24
to cp2k
I am afraid I cannot help with the given method as I am absolutely not familiar with Plumed.

Regarding the output files: In many setups, each rank creates separate output files which are removed later to simplify debugging. Maybe other have an idea on what may be going on.
Best,
Frederick
Reply all
Reply to author
Forward
0 new messages