Shrink for inter-communicators in OpenMPI and other implementations

32 views
Skip to first unread message

Sarthak Joshi

unread,
Jan 7, 2022, 10:21:29 AM1/7/22
to User Level Fault Mitigation
Hello,
I noticed that the current OpenMPI implementation does not seem to properly support MPIX_Comm_shrink for inter-communicators. I wanted to know if this is still a work in progress or if there are some fundamental issues with the library that prevent it from working. Is this support present in any other implementation? Lastly, is there any implementation of ULFM (in development or otherwise) that exists independent of the underlying libraries like OpenMPI and MPICH?
Thanks for your attention
Sarthak

Aurelien Bouteiller

unread,
Jan 10, 2022, 4:04:38 PM1/10/22
to User Level Fault Mitigation
Hi Sarthak,

You are correct, Open MPI does not implement the SHRINK operation on inter-communicators. While there is no fundamental issue that a would prevent implementing it, we just have never encountered a case where it was useful so far, as most people use the inter-communicator only as an intermediate step after COMM_SPAWN, and immediately flatten it back to an intra-comm.

One way you can work around the missing feature is to shrink both sides of the intercomm, then create a new intercomm from the 2 local comms, and validate that it all went as expected (using MPI_COMM_SHRINK(local_comm, &shrink_comm); MPI_INTERCOMM_CREATE(shrink_comm, &shrink_icomm); MPI_COMM_AGREE(shrink_comm) )

There is a bit more to it that just this quick sketch though: I’ll refer you to a tutorial we made a couple of years ago with a full bodied example code:
https://fault-tolerance.org/2014/07/10/uniform-intercomm-creation/


I do not know what is the support status in MPICH for Intercomm Shrink. I am not aware of any portable ULFM implementation that would be cross-compatible between MPI Implementations, and in my opinion that would be very difficult to achieve: some operations (e.g., agree) could be written in MPI neutral portable code, but the core of error reporting is deeply integrated with the communication engine.

Best,
Aurelien
> --
> You received this message because you are subscribed to the Google Groups "User Level Fault Mitigation" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ulfm+uns...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ulfm/c6b62f99-6b05-4a03-a0f8-75b6ceeccd9an%40googlegroups.com.

Sarthak Joshi

unread,
Jan 11, 2022, 12:10:57 AM1/11/22
to User Level Fault Mitigation
Thank you for the help. That was really useful to learn. There was one other point I recently noticed in the Open MPI implementation that had me a bit confused. Near the end of the ompi_comm_failure_get_acked_internal function there is this section of code:
if( OMPI_COMM_IS_INTER(comm) ) {
        ret = ompi_group_intersection(tmp_sub_group,
                                      comm->c_local_group,
                                      group);
    } else {
        ret = ompi_group_intersection(tmp_sub_group,
                                      comm->c_remote_group,
                                      group);
    }

Here the local group of the intercommunicator is being intersected with failed processes to return a group of failed processes in that communicator whereas the remote group is being used for an intracommunicator. While this still works for the intracommunicator implementation as local and remote group is set to the same value for an intracommunicator in Open MPI, this seems a bit strange for the intercommunicator as I would have expected either a group of all failed processes from both the groups to be returned or all failed processes from the remote group considering that if a process failure is detected while communicating through an intercommunicator, the failed process would exist in the remote group. Is there any reason why it has been implemented like this? Thank you.
Sarthak

George Bosilca

unread,
Jan 11, 2022, 1:27:49 AM1/11/22
to ul...@googlegroups.com
Good catch. You are right, the return should always be related to the group of processes you can communicate with in the communicator, which means the remote group for an intercommunicator and the local group for an intra.

  George.

Reply all
Reply to author
Forward
0 new messages