mpi4py enters deadlock after using spawn

29 views
Skip to first unread message

Pavankumar Koratikere

unread,
Dec 26, 2022, 10:22:38 AM12/26/22
to mpi4py

I have following arrangement of code:

parent.py:

from mpi4py import MPI
... some code ...
 for i in range(10): .
    .. some code ...
    child_comm = MPI.COMM_SELF.Spawn(sys.executable, args=["runscript_airfoil.py"], maxprocs=9)
    child_comm.Barrier()
    child_comm.Disconnect()
    ... some code ...

child.py:

from mpi4py import MPI
... some code ...
comm = MPI.COMM_WORLD
comm.Barrier()

Primary objective here is to run child.py with multiple processors in a again and again. I used Barrier() method here since I wanted the program to wait until the child.py is executed.

But, the program is stuck after the first iteration. I think the program is going into deadlock. Also, all the used processors by child.py should be freed so that I can use them in next loop.

I am new to MPI and mpi4py, so I don't know what functions to use where. Any help to implement this will be very useful.

Lisandro Dalcin

unread,
Dec 29, 2022, 3:46:52 PM12/29/22
to mpi4py
In child.py, you are missing the following two lines:

parent_comm = MPI.Comm.Get_parent()
parent_comm.Barrier()

That barrier call above matches the child_comm.Barrier() call in parent.py

Pavankumar Koratikere

unread,
Dec 29, 2022, 8:09:37 PM12/29/22
to mpi4py
Hello Lisandro

Thanks a lot for your reply! Based on your suggestion and some more online searching, I found that following solves the problem of deadlock:

parent.py:
from mpi4py import MPI
... some code ...
 for i in range(10): .
    .. some code ...
    child_comm = MPI.COMM_SELF.Spawn(sys.executable, args=["runscript_airfoil.py"], maxprocs=9)
    child_comm.Barrier()
    ... some code ...

child.py:
from mpi4py import MPI
... some code ...
comm = MPI.COMM_WORLD
parent_comm = comm.Get_parent()
parent_comm.Barrier()

But, now I am facing another issue.

I know that my laptop has 10 cores (verified using lscpu command and I have a ran scripts with 10 processes using mpirun command). I found that when I keep the maxprocs as 4 or less, it works fine. But, when I keep maxprocs between 5 and 9, it starts giving "not enough slots" error in second iteration. In the first iteration of for loop, spawn method spawns 'maxprocs' number of processes. But in the second iteration, it gives 'not enough slots' error.

I am running the parent.py using 'python parent.py'. I tried to run it using 'mpirun -n 1 python parent.py' but I get the same error. I am not sure what the problem here is.

OpenMPI: 4.0.7
mpi4py: 3.1.3

Pavankumar Koratikere

unread,
Dec 31, 2022, 11:13:53 PM12/31/22
to mpi4py
I got a solution for this problem. I have posted it on stack overflow: https://stackoverflow.com/questions/74917169/mpi4py-enters-deadlock-after-using-spawn
Reply all
Reply to author
Forward
0 new messages