Hi nice people,
I'm sending back from MPI workers to root a number of large python dictionaries (~1.5 million keys each). These dictionaries come back in the form of a list, as in
list_of_dictionaries = [{'68eef0f0-9385-11e3-baa8-0800200c9a66' : 1287612, '533c5680-938a-11e3-baa8-0800200c9a66': 98234, . . . }, {'66339ecc-db8d-440a-b015-897db84ffb6d': 188892, '7ed7847d-63fa-4cc1-9ddd-46e07978888a': 234324, . . . }, . . .]
I'm gathering everything back in root by using:
d = comm.gather(list_of_dictionaries, root=0)
This approach works good for smaller subsets of the dictionaries, however for the whole set I'm getting the error:
Traceback (most recent call last):
File "4_SCRIPTS/MPI/11_ranking_artists_album_tracks_DUP.py", line 138, in <module>
dictionary_list_gathered = comm.gather(dict_this_rank, root=0)
File "Comm.pyx", line 869, in mpi4py.MPI.Comm.gather (src/mpi4py.MPI.c:73266)
File "pickled.pxi", line 614, in mpi4py.MPI.PyMPI_gather (src/mpi4py.MPI.c:33592)
File "pickled.pxi", line 146, in mpi4py.MPI._p_Pickle.allocv (src/mpi4py.MPI.c:28517)
File "pickled.pxi", line 95, in mpi4py.MPI._p_Pickle.alloc (src/mpi4py.MPI.c:27832)
SystemError: Negative size passed to PyString_FromStringAndSize
I searched into this forum and realized that Lisandro Dalcin posted that he recommends not using gather() for large messages, but to use Gather(). I implemented a toy example using Numpy structured arrays for not sending an actual Python dictionary, but a list:
import mpi4py.MPI as mpi
import numpy as np
comm = mpi.COMM_WORLD
rank = comm.Get_rank()
x = np.array(('ba78f30e-9339-4a98-9851-abee0ef60c36', 102809), dtype=('a36,i4'))
if rank == 0:
x_all = np.zeros((4,), dtype = x.dtype)
else:
x_all = None
comm.Gather(x, x_all, root=0)
if rank == 0:
print x_all
But I'm getting the error:
Traceback (most recent call last):
File "mpi_Gather_test.py", line 19, in <module>
comm.Gather(x, x_all, root=0)
File "Comm.pyx", line 415, in mpi4py.MPI.Comm.Gather (src/mpi4py.MPI.c:66916)
File "message.pxi", line 429, in mpi4py.MPI._p_msg_cco.for_gather (src/mpi4py.MPI.c:23582)
File "message.pxi", line 369, in mpi4py.MPI._p_msg_cco.for_cco_recv (src/mpi4py.MPI.c:23059)
File "message.pxi", line 355, in mpi4py.MPI._p_msg_cco.for_cco_send (src/mpi4py.MPI.c:22959)
File "message.pxi", line 111, in mpi4py.MPI.message_simple (src/mpi4py.MPI.c:20516)
File "message.pxi", line 58, in mpi4py.MPI.message_basic (src/mpi4py.MPI.c:19723)
KeyError: 'T{36s:f0:i:f1:}'
MPI returns the error when the Numpy structured array has mixed datatypes (in this case: [('f0', 'S36'), ('f1', '<i4')]).
My questions are: is there anyway of overcoming the limit of the pickle serialized objects? As I guess the answer is no: Is there any way to send a structured array with mixed datatypes using mpi4py?
Thank you very much,
G.