Hello,
First of all I would like to thank you for your great work on the ULFM library.
I found out about it at the SC tutorial. I'm trying to run the examples given to us there
under SLURM but I notice the following issues:
1) When running with 2 ranks, after a failure detection the application will terminate without
returning from MPI_Finalize()
2) When running on more than 2 ranks, MPI_Finalize() hangs. In this case I also get
wrong failure detections some times.
Could it be my configuration? I happens every single time. I followed the build instructions
on your website and run with the following flags:
-mca btl tcp,self -mca pml ob1 -mca mpi_ft_detector true
Thank you,
Polykarpos