openmpi-5.0.0rc2 does not finalize

22 views
Skip to first unread message

Edson Camargo

unread,
Oct 22, 2021, 8:59:37 PM10/22/21
to User Level Fault Mitigation
Hey guys!
Nice to see the progress on top of ULFM, it's been a while since I came here. I am working with ULFM with my Masters students in a parallel processing discipline. I used to work with the old versions of ULFM (2013 to 2017), but now I'm having problems with this version of ULFM built into openmpi: I'm using the openmpi-5.0.0rc2 version. I applied the ./configure --with-ft configuration option during installation, but when running the SC20 tutorial examples the applications don't finish. For example: 

mpiexec -np 4 --hostfile my-hostfile --enable-recovery ./02.err_handler
Rank 2/4: bye bye!
Rank 3/4: bye bye!

The same happens on my machine and on the supercomputer SantosDummont (Brazil). Please can you tell me if I missed anything?

Thanks in advance!

Edson

Aurelien Bouteiller

unread,
Oct 25, 2021, 11:03:28 AM10/25/21
to User Level Fault Mitigation
Edson,

Thanks for reporting, this is unfortunately a known issue, it is being worked on.

If you look carefully, you should see that the MPI programs actually terminates normally, but mpiexec is what remains stuck.

Best regards,
Aurelien
> --
> You received this message because you are subscribed to the Google Groups "User Level Fault Mitigation" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ulfm+uns...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ulfm/5ff7d60c-bda6-417a-a650-70c664782a8an%40googlegroups.com.

Edson Tavares de Camargo

unread,
Oct 25, 2021, 1:11:14 PM10/25/21
to User Level Fault Mitigation
Hi Aurelien. Thanks for your reply. Could you please what is the latest ULFM stable version?

Edson

Aurelien Bouteiller

unread,
Nov 4, 2021, 9:46:31 AM11/4/21
to User Level Fault Mitigation
Reply all
Reply to author
Forward
0 new messages