mpi4py hangs on disconnect call for topology communicator

93 views
Skip to first unread message

Maarten Braakhekke

unread,
Aug 21, 2020, 4:48:11 AM8/21/20
to mpi4py

I have Python script that spawns several Fortran processes. The python and Fortran processes communicate with each. For this an intracommunicator is created (using Merge()) and a graph topology is created with Dist_graph_create(). So I have three communicators: 1) the intercommunicator, 2) the intracommunicator, and 3) the topocommunicator.
After the Fortran processes are finished MPI_Disconnect() is called for all communicators. Next, a new set of Fortran processes is started, and so on. See below for an overview of the python and Fortran calls.
The first iteration works fine, but after the second iteration the Python process hangs on the disconnect call for the topoCommunicator. Am I doing something wrong?

I'm using MSMPI on Windows 10.

cheers,
Maarten


Python
interComm = MPI.COMM_SELF.Spawn_multiple(...)
intraComm = interComm.Merge()
topoComm = intraComm.Create_dist_graph()

#do stuff

topoComm.Disconnect() # hangs here on second iteration
interComm.Disconnect()
intraComm.Disconnect()


Fortran
call MPI_Init()
call MPI_Comm_get_parent(interComm)
call MPI_Intercomm_merge(interComm, ..., intraComm)
call MPI_Dist_graph_create(intraComm, ...,topoComm)

!do stuff

if (topoComm /= MPI_COMM_NULL) call MPI_Comm_disconnect(topoComm)
if (interComm /= MPI_COMM_NULL) call MPI_Comm_disconnect(interComm)
if (intraComm /= MPI_COMM_NULL) call MPI_Comm_disconnect(intraComm)
call MPI_Finalize()

Lisandro Dalcin

unread,
Aug 21, 2020, 9:36:36 AM8/21/20
to mpi...@googlegroups.com
I do not see any obvious issue in your code structure. 

* Can you swap the lines disconnecting {inter|intra}Comm in both sides? Does that make any difference?
* Does using comm.Free() on both sides make any difference?

One thing this could be happening is because your code does not complete all communication calls, as Disconnect() blocks until all ongoing communication completes. Using comm.Free() is a way to figure out if that may be the case.

Any chance you could test this on Linux or macOS? Or maybe with Intel MPI on Windows? This way you could confirm whether the problem is related to MSMPI or not, then I would suggest contacting the MSMPI dev team, they have been quite responsive to my queries and bug reports in the past. Of course, much better if you can set up a short, self-container reproducer. Also probably better if you can reproduce the issue with non-Fortran code, let say pure C, or maybe even pure Python using mpi4py on both sides.




--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mpi4py/51ee3ec9-b5c8-49d2-8eb6-ed899ca4781bn%40googlegroups.com.


--
Lisandro Dalcin
============
Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
Reply all
Reply to author
Forward
0 new messages