Using MPI and GASNet in a single program

9 views
Skip to first unread message

Moraru, Maxim

unread,
Nov 15, 2023, 4:37:46 PM11/15/23
to gasnet...@lbl.gov
Hello,
I would like to use MPI and GASNet in a single program, and hopefully synchronization between the two communication systems. The https://gasnet.lbl.gov/dist-ex/README documentation specifies that " arbitrarily interleaving blocking MPI calls with GASNet calls
 (e.g. UPC shared accesses) can easily lead to deadlock".  Would you have any examples where we can reproduce the deadlock? Also, does the problem only arise for certain conduits or certain GASNet routines type (e.g. active messages)?

Thank you,
Maxim MORARU

Dan Bonachea

unread,
Nov 29, 2023, 6:53:31 PM11/29/23
to Moraru, Maxim, gasnet...@lbl.gov, Sean Treichler

Hi Maxim - Apologies for the delayed response, things have been very busy here.


We don't have any complete examples to offer, but can define the rules at a high level.

However, without an understanding of what the GASNet client code (e.g. Legion/librealm implementation) is doing it might be hard to apply them.


Basically, any pattern in which a blocking call to one library prevents necessary progress by the other can be a problem.  For instance:


if (rank % 2) {

   gex_Event_Wait(gex_AD_OpNB(...));  // where gex_AD_OpNB could be any active message round-trip

}

MPI_Barrier(); // ranks blocked here do not progress incoming GASNet Active Messages


Note that with multiple threads (with any given thread assigned to making GASNet or MPI calls, but not both), it becomes possible to reliably service both progress engines. So that might be the most reliable workaround.


The progress issue is conduit-independent.

There are also conduit-specific (and MPI implementation specific) resource contention/starvation risks - if you’re concerned about a particular combination we can discuss further.


Hope this helps..

-D


--
You received this message because you are subscribed to the Google Groups "gasnet-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gasnet-users...@lbl.gov.
To view this discussion on the web visit https://groups.google.com/a/lbl.gov/d/msgid/gasnet-users/SA1PR09MB104769659BF8C81F348E315C6B3B1A%40SA1PR09MB10476.namprd09.prod.outlook.com.

Moraru, Maxim

unread,
Nov 29, 2023, 8:29:41 PM11/29/23
to Dan Bonachea, gasnet...@lbl.gov, Sean Treichler
Hello,
Thank you for this clear response. 

I would be interested in a particular use case with the ucx and ibv conduits. Suppose there is a group of processes (e.g. processes 0 and 1) that uses exclusively MPI for communication (via a separate communicator), and another group (e.g. processes 2,3,4) that uses exclusively GASNet to exchange data at the same time,  and that the first group (processes 0 and 1) never communicates with the second one (2,3,4). In this particular case there will never be a deadlock?

Another question about ucx. For all the fabrics that are supported by ucx : accessing the hardware resources (e.g. the network card) simultaneously from MPI and GASNet will never trigger a deadlock ?

Thank you again for your time,
Maxim

From: Dan Bonachea <dobon...@lbl.gov>
Sent: Wednesday, November 29, 2023 4:52 PM
To: Moraru, Maxim <mor...@lanl.gov>
Cc: gasnet...@lbl.gov <gasnet...@lbl.gov>; Sean Treichler <se...@nvidia.com>
Subject: [EXTERNAL] Re: [gasnet-users] Using MPI and GASNet in a single program
 

Dan Bonachea

unread,
Nov 30, 2023, 1:30:37 PM11/30/23
to Moraru, Maxim, gasnet...@lbl.gov, Sean Treichler
On Wed, Nov 29, 2023 at 8:29 PM Moraru, Maxim <mor...@lanl.gov> wrote:
I would be interested in a particular use case with the ucx and ibv conduits. Suppose there is a group of processes (e.g. processes 0 and 1) that uses exclusively MPI for communication (via a separate communicator), and another group (e.g. processes 2,3,4) that uses exclusively GASNet to exchange data at the same time,  and that the first group (processes 0 and 1) never communicates with the second one (2,3,4). In this particular case there will never be a deadlock?

The scenario you describe sounds like it should be safe.

It's of course always possible to construct deadlocks via incorrect code patterns using either library in isolation, but the scenario you describe should not create a problem due to the use of both libraries.
 
Another question about ucx. For all the fabrics that are supported by ucx : accessing the hardware resources (e.g. the network card) simultaneously from MPI and GASNet will never trigger a deadlock ?

As mentioned in my previous email there are cases where using (or sometimes just initializing) both MPI and GASNet in the same process can cause interference, due to both libraries independently requesting network resources such that the sum of requested resources exceeds the available physical resources. In many cases such problems can be solved by adjusting software settings for one or both libraries (e.g. via environment variables). This problem is probably less likely to arise if both libraries are using UCX underneath, but I won't say it can never happen. However the cases of such problems I've personally seen most often result in a loud failure at startup, rather than a silent deadlock.

Hope this helps..
-D
Reply all
Reply to author
Forward
0 new messages