Dear ULFM team,
I’m working currently with ULFM with my colleagues in a parallel computing discipline. However, we have some technical problems: every time we run one of the classical tutorials with SIGKILL, e.g. when running the noft2.c (attached file) tutorial example the application will hang or be finished with the following error despite the MPI error handler:
mpirun noticed that process rank 1 with PID 0 on node master exited on signal 9 (Killed).
We use different version of open MPI (Open MPI) 4.0.2rc3 and openmpi-5.0.0rc2:
> mpirun -np 2 ./noft
Rank 0 / 2: Before sigkill . !
Rank 1 / 2: Before sigkill . !
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node master exited on signal 9 (Killed).
--------------------------------------------------------------------------
> mpirun -np 2 --enable-recovery ./noft
Rank 1 / 2: Before sigkill . !
Rank 0 / 2: Before sigkill . !
[master.ipoib:128008] PMIX ERROR: BAD-PARAM in file event/pmix_event_notification.c at line 923
> mpirun -np 2 --enable-recovery ./noft
Rank 1 / 2: Before sigkill . !
Rank 0 / 2: Before sigkill . !
[master.ipoib:130374] PMIX ERROR: BAD-PARAM in file event/pmix_event_notification.c at line 923
Rank 0 / 2: Notified of error . Stayin' alive!
I wanted to ask you if you know what else we can try here in order to do all of these works together. Do we have to pass some parameters at runtime (=when calling mpirun) for OpenMPI, etc.?
Thanks a lot in advance
Best,
Vahid
--
You received this message because you are subscribed to the Google Groups "User Level Fault Mitigation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ulfm+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ulfm/4faa4d22-61b8-4005-8b00-22dc7c45bca3n%40googlegroups.com.