cellopt calculation on EIGER aborted

59 views
Skip to first unread message

Miriam Jasmin Pougin

unread,
Sep 8, 2022, 5:36:53 AM9/8/22
to cp2k
Hello all,

I am trying to run a cell-optimization for a metal-organic framework using the scan functional and rvv10 vdw functional. As I had problems with SCF convergence, I increased the cutoff and used the NN50_SMOOTH method for calculating the XC derivatives and the nn50 density smoothing for the xc calculations, as suggested in another conversation here.
The singlepoint calculation converged with these settings, but when I tried to run the cellopt on piz daint (32 nodes, 64GB RAM per node) I got an out-of memory error:
"ERROR: Not enough shared memory in grid_gpu_integrate.
cab_len: 4704, alpha_len: 1512, cxyz_len: 364, total smem_per_block: 51.406250 kb"


So I tried running the calculations on Alps (Eiger) instead (256GB RAM/node). Now I get an error in the cp2k outfile as soon as the SCF calculation starts that I don't understand:
"libfabric:187819:1662628695:cxi:core:cxip_ux_onload_cb():2259<warn> nid001534: RXC (0x2300:32:0): PtlTE 105LE resources not recovered during flow control. FI_CXI_RX_MATCH_MODE=[hybrid|software] is required.

Program received signal SIGABRT: Process abort signal."

Does someone have an idea what went wrong?
I am using cp2k-9.1, I attach you my input file and the outfile with the complete error message.
Thank you!
cellopt.inp
cellopt.out

Krack Matthias (PSI)

unread,
Sep 8, 2022, 7:28:12 AM9/8/22
to cp...@googlegroups.com

Hello

 

There is a hard limit (48*1024) coded in GPU grid routines of CP2K because of the limited GPU memory available. Using more nodes does not help here, because this won’t increase the shared memory available per GPU. A work around is to use the CPU implementation of grid_integrate instead of the GPU implementation by selecting the grid BACKEND CPU explicitly (the default is AUTO which will then select GPU automatically on Piz Daint). Alternatively, you can try to change the code and increase that limit, e.g. to 51*1024, with the risk, however, of triggering other problems.

 

I don’t know what causes the error on Eiger.

 

HTH

 

Matthias

--
You received this message because you are subscribed to the Google Groups "cp2k" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cp2k+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cp2k/c0f4eecc-78a1-407c-a18d-20d35785d392n%40googlegroups.com.

Miriam Jasmin Pougin

unread,
Sep 8, 2022, 9:24:07 AM9/8/22
to cp2k
Hello Matthias,

Thanks a lot for your fast reply and explanations. As you suggest, I tried with the CPU implementation and that solved the memory problem on Daint. It is working fine now, thank you again for your help.

Best regards,
Miriam
Reply all
Reply to author
Forward
0 new messages