This is indeed a known problem for systems without asynchronous communications. Keep in mind that MPI standard does not mandate communication progress outside MPI calls, which means that when you block the process outside any MPI call there is no guarantee that any communication will be answered, including the fault detection. That being said there are few solutions to this issue:
1. increase the fault detector timeout to a small multiple of the longest time in which your application will be unresponsive.
2. use a communication thread or ask OMPI/ULFM to provide asynchronous progress (you will need to dedicate some resources and to accept the potential performance impact of this)
3. use a more recent version of ULFM, where the fault detector is located outside the MPI library (in the PPRTE/ORTE daemon).
4. don't do "while (42) ;" ;)
Hope this helps,
George.