MPI_Finalize hangs

23 views
Skip to first unread message

Polykarpos Thomadakis

unread,
Nov 19, 2020, 5:09:05 PM11/19/20
to User Level Fault Mitigation
Hello,

First of all I would like to thank you for your great work on the ULFM library.

I found out about it at the SC tutorial. I'm trying to run the examples given to us there
under SLURM but I notice the following issues:
1) When running with 2 ranks, after a failure detection the application will terminate without
returning from MPI_Finalize()
2) When running on more than 2 ranks, MPI_Finalize() hangs. In this case I also get
wrong failure detections some times.

Could it be my configuration? I happens every single time. I followed the build instructions
on your website and run with the following flags:
-mca btl tcp,self -mca pml ob1 -mca mpi_ft_detector true

Thank you,
Polykarpos


Polykarpos Thomadakis

unread,
Nov 19, 2020, 6:48:04 PM11/19/20
to User Level Fault Mitigation
Also when run outside of slurm on a single node of 40 cores it works correctly.
Reply all
Reply to author
Forward
0 new messages