Problem with MPI Gather and dense matrices

678 views
Skip to first unread message

Tiago Freitas Pereira

unread,
Oct 9, 2013, 1:11:31 PM10/9/13
to mpi...@googlegroups.com
Dear members,

I'm using mpi4py to speed up some computations in my cluster. In these computations I have to deal with very dense matrices. 

I have a script that, in some point, tries to gather some matrices using MPI_GATHER
In this script, when I used more than 5 process, thrown the following error message:

Traceback (most recent call last):
  File "example.py", line 10, in <module>
    gather_m1 = comm.gather(x,root=0)
  File "Comm.pyx", line 869, in mpi4py.MPI.Comm.gather (src/mpi4py.MPI.c:67905)
  File "pickled.pxi", line 612, in mpi4py.MPI.PyMPI_gather (src/mpi4py.MPI.c:31966)
  File "pickled.pxi", line 144, in mpi4py.MPI._p_Pickle.allocv (src/mpi4py.MPI.c:27005)
  File "pickled.pxi", line 93, in mpi4py.MPI._p_Pickle.alloc (src/mpi4py.MPI.c:26327)
SystemError: Negative size passed to PyString_FromStringAndSize


#####

My cluster has two hosts with Ubuntu Server 12.04 LTS 64bits, each one with 24 processors and 64GB of RAM memory
Follow below an example that reproduces the aforementioned error:

import numpy
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

x = numpy.random.rand(400,400,400) * 1000
x = numpy.array(x,dtype='double')

gather_m1 = comm.gather(x,root=0)

if(rank == 0):
  print(len(gather_m1))

########
Whats is wrong?
Is some kind of overflow? If yes, the problem is in the mpi bind or in the mpi itself?

Thanks in advance

Tiago


paul.ant...@gmail.com

unread,
Nov 9, 2013, 1:27:47 AM11/9/13
to mpi...@googlegroups.com
On Wednesday, October 9, 2013 1:11:31 PM UTC-4, Tiago Freitas Pereira wrote:
Traceback (most recent call last):
  File "example.py", line 10, in <module>
    gather_m1 = comm.gather(x,root=0)
  File "Comm.pyx", line 869, in mpi4py.MPI.Comm.gather (src/mpi4py.MPI.c:67905)
  File "pickled.pxi", line 612, in mpi4py.MPI.PyMPI_gather (src/mpi4py.MPI.c:31966)
  File "pickled.pxi", line 144, in mpi4py.MPI._p_Pickle.allocv (src/mpi4py.MPI.c:27005)
  File "pickled.pxi", line 93, in mpi4py.MPI._p_Pickle.alloc (src/mpi4py.MPI.c:26327)
SystemError: Negative size passed to PyString_FromStringAndSize


This looks like a problem in pickling your data, rather than a problem in MPI land.  mpi4py's pickling is hidden away, so try this:

import numpy
x = numpy.random.rand(400,400,400) * 1000
x = numpy.array(x,dtype='double')
import pickle
pickle.dumps(x)

If that raises an error, you know the problem is in pickling your data, and you've got a more helpful backtrace for figuring out why.

Cheers,

Paul.

paul.ant...@gmail.com

unread,
Nov 9, 2013, 2:24:00 AM11/9/13
to mpi...@googlegroups.com
Oh, and I just saw some other posts worrying about memory, which made me think a bit more.

Bear in mind that pickle is not a very efficient representation, memory-wise.  Consider the following:

>>> import numpy
>>> x = numpy.random.rand(1000)
>>> import pickle
>>> p = pickle.dumps(x)
>>> len(p)
22231

The vector in memory is stored efficiently, with around 1000x8 bytes = 8000 bytes (plus some overhead).  But the pickle representation is nearly 3 times larger.  Because you need to hold both in memory at the same time, you actually need four times the memory you think you do (and that's assuming no additional copies are being made along the way….).

P.

Lisandro Dalcin

unread,
Nov 10, 2013, 3:19:08 AM11/10/13
to mpi4py
Please note that mpi4py internally uses the binary pickle protocol,
and for large numpy arrays the size overhead is minimal (133 bytes in
my test below):

In [1]: import numpy

In [2]: x = numpy.random.rand(1000)

In [3]: import cPickle as pickle

In [4]: p = pickle.dumps(x, -1) # -1 means highest protocol

In [5]: len(p)
Out[5]: 8133

In [6]: y = numpy.random.rand(10000)

In [7]: q = pickle.dumps(y, -1)

In [8]: len(q)
Out[8]: 80133


Anyway, larger numpy arrays should be communicated with e.g.
comm.Gather() and not comm.gather(). Using the upper-case methods do
not rely on pickle serialization, and should be much faster for large
arrays.


--
Lisandro Dalcin
---------------
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169
Reply all
Reply to author
Forward
0 new messages