The send function in MPI4PY gets stuck for large numpy arrays when used in loop

56 views
Skip to first unread message

ankitn...@gmail.com

unread,
Jun 13, 2020, 10:16:50 AM6/13/20
to mpi4py

Below is the code that works when a numpy array of smaller size (stored as Data) is communicated to the neighbour workers (defined by NbrList). For large array (eg N=50000), the send function gets stuck. Any idea how to solve it?

from mpi4py import MPI
import numpy as np

Comm = MPI.COMM_WORLD
N_Workers = Comm.Get_size()
Rank = Comm.Get_rank()

if not N_Workers == 2:  raise Exception('Only 2 workers allowed')

NbrList = [1, 0]

N = int(5e4)
Data = np.random.rand(N)

Dest = NbrList[Rank]

print('Sending: ' + str(Rank) + '->' + str(Dest))
Comm.Send([Data, N, MPI.DOUBLE] , dest=Dest, tag=Rank)
print('Data Sent: ' + str(Rank) + '->' + str(Dest))


RecvData = np.empty(N, dtype=np.float)
Src = NbrList[Rank]
Comm.Recv([RecvData, N, MPI.DOUBLE], source=Src, tag=Src)
print('Data Recvd: ' + str(Rank) + '<-' + str(Src) + ': ' + str(len(RecvData)))

Output for N = 30000:

Sending: 0->1
Sending: 1->0
Data Sent: 1->0
Data Sent: 0->1
Data Recvd: 0<-1: 30000
Data Recvd: 1<-0: 30000

Output for N=50000:

Sending: 0->1
Sending: 1->0

Lisandro Dalcin

unread,
Jun 14, 2020, 3:57:54 AM6/14/20
to mpi...@googlegroups.com
Your code is wrong. Both processes are sending and receiving, you need a `if` branch such that one process sends and the other receives. Please read one of the many MPI tutorials out there.

--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mpi4py/de1be355-755f-4345-bb16-e257b431fb32o%40googlegroups.com.


--
Lisandro Dalcin
============
Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

Ankit Ankit

unread,
Jun 14, 2020, 4:27:41 AM6/14/20
to mpi...@googlegroups.com
Hi, thanks for the reply. I need to 'exchange' the information between the workers. So even if I use 'if' statement, I guess the code will remain the same within the 'if' block. Further, I have scaled this code to thousands of workers (eg. N_Workers=1024), where each worker exchanges data among its neighbours simultaneously defined by 'NbrList'. This code works when N<2^15 (approx.)  and hangs for higher value. One work around that I have implemented is that whenever N>2^15, I split the data into smaller chunks. However, this adds some overheads. I was wondering if I could send large data directly, it could improve my speed-up. 

Lisandro Dalcin

unread,
Jun 14, 2020, 8:08:16 AM6/14/20
to mpi...@googlegroups.com
Your code, as written, is a classic, kindergarten-level example of deadlock. You can scale it on as many processes as you want, but in most MPI implementations out there, the behavior will be the same: the code works just fine for small arrays, but as you start to increase the array size, at some point it will suddenly deadlock. I'm in this business since 2003, I've read all of the MPI standard versions, page by page without omissions, I've read all important MPI books. So, while I do not expect you to trust me blindly, at least you should grant me a chance of being right :-)

You are using what the MPI standard calls a "Send in standard mode". In this mode, send may buffer messages up to some implementation-defined size, such that the process can progress even if no matching Recv() call has been issued. However, at some point, if the message size is large, then Send() will BLOCK until a matching Recv() has been posted on the receiving end, such that the communication can progress. Look again at your code: The first MPI thing all of them do is Send(), all of them are trying to send at the same time, and if the message is large, all  of them will block waiting for a Recv(), which will never happen because all of them are blocked in the send. This is the most basic example of deadlock, discussed elsewhere in books and maybe tutorials, I've been explaining this for years. 

How to "break" this deadlock hell? Well, there are many ways, all of them (except the last below) should be easy to understand to anyone with a basic knowledge of MPI:

1) Break the deadlock switching the send/recv order in one of the process

if rank == 0:
    Recv(..., source) # Recv first
    Send(..., dest)   # then Send
else
    Send(..., dest)
    Recv(..., source)

2) Use Sendrecv(), which is a guaranteed way to perform exchanges without deadlock.

Sendrc(sendbuf=sbuf, dest=dst, recvbuf=rbuf, source=src )

3) Use nonblocking communication

req = Isend(..., dest)
Recv(..., source)
req.Wait()

4) Create a graph communicator and use neighborhood collectives (probably overkill if if you exchange with just one process rather than many)





Ankit Ankit

unread,
Jun 14, 2020, 8:23:49 AM6/14/20
to mpi...@googlegroups.com
Wow! Thanks for the detailed response mate. I will implement the code as per your suggestions and will let you know shortly. Thanks again :)

Ankit Ankit

unread,
Jun 15, 2020, 2:37:25 AM6/15/20
to mpi4py
Hi Mate.. it is just to inform you that the code now works well without deadlock. Thank you very much :)
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+unsubscribe@googlegroups.com.


--
Lisandro Dalcin
============
Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+unsubscribe@googlegroups.com.


--
Lisandro Dalcin
============
Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages