[CP2K] Process Killed (Signal 9) during SCF initialization — No error in output.

39 views
Skip to first unread message

Hanaa Sari

unread,
Jul 28, 2025, 3:31:01 AM7/28/25
to cp2k

Dear CP2K community,

I'm encountering an issue when running a CP2K calculation on my local workstation ( 8 cores, using MPI with cp2k.popt). The calculation stops at the beginning of the SCF loop, and the output file (Ru331.out) shows no specific error — it stops right after initializing SCF.

However, the terminal displays the following messages:

WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            HP-Z8
  Device name:           irdma0
  Device vendor ID:      0x8086
  Device vendor part ID: 14289

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: HP-Z8
--------------------------------------------------------------------------
[HP-Z8:178878] 7 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[HP-Z8:178878] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[HP-Z8:178878] 7 more processes have sent help message help-mpi-btl-openib.txt / no active ports found
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node HP-Z8 exited on signal 9 (Killed).



It seems that one of the MPI processes is being killed unexpectedly. I suspect it might be a memory-related issue, but I’m not entirely sure.

Could anyone help me:

*Confirm if this is likely due to memory overload?

*Suggest how to reduce memory consumption in the input?

*Or how to properly disable OpenMPI’s use of Infiniband on a local workstation?

I am attaching the input and output files.


  Best regards,
Hanaa.  

Ru331.inp
Ru331.out

Frederick Stein

unread,
Jul 28, 2025, 4:27:10 AM7/28/25
to cp2k
Dear Hanaa,
I do not know your system to well. But it may be possible that you observed an OOM event. You can check the memory consumption by adding the TRACE and TRACE_MASTER keywords to your input file. CP2K will then trace all routines including the memory consumption. Beware that the output file will be huge then.
Regarding wys to reduce the memory consumption:
- You seem to simulate a 2D-periodic system. Can you try setting the PERIODIC (in the CELL and POISSON sections) to XY instead?
- You may try to reduce NBROYDEN as it increases the internal buffer sizes or switch to an entirely different mixing scheme.
Regarding your issues with OpenMPI, you can check the available options using the ompi_info executable and override the defaults using environment variables. (See for instance the manual on Infiniband with OpenMPI https://docs.open-mpi.org/en/v5.0.x/tuning-apps/networking/ib-and-roce.html ).
Best,
Frederick
Reply all
Reply to author
Forward
0 new messages