Gatherv seg fault?

311 views
Skip to first unread message

c...@stanford.edu

unread,
May 6, 2016, 3:32:30 AM5/6/16
to mpi4py
Hi,

We're seeing segfaults only when the data volumes get large with Gatherv, running on RHEL7 with openmpi 1.8.8 and mpi4py 1.3.1.  The script/stacktrace is below.  This happens both with the openib btl and the tcp btl.  For us, the script below fails on 48 cores (4 nodes) but works on 36 cores.  If you had any advice/experience with such a problem we would be interested.  Thank you for the wonderful mpi4py package...

chris

import numpy as np

from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

intensities = []   # use a list instead of array because faster to append
for nevent in range(1000):
    intensities.append(np.zeros((60000))+rank)

lengths = np.array(comm.gather(len(intensities)*intensities[0].shape[0])) # get list of lengths
tmp = np.array(intensities)
mysend = np.ascontiguousarray(tmp)
myrecv = None
if rank==0:
    myrecv = np.empty((sum(lengths)),mysend.dtype) # allocate receive buffer
    print '***',myrecv.shape,myrecv.dtype

print 'Rank',rank,'sending',mysend.shape

comm.Gatherv(sendbuf=mysend, recvbuf=[myrecv, lengths])

if rank==0:
    start = 0
    # look in the receive buffer for the contribution from each rank
    for r,mylen in enumerate(lengths):
        print 'Rank 0 received',mylen,'from rank',r
        start += mylen



[psana1101:08529] *** Process received signal ***
[psana1101:08529] Signal: Segmentation fault (11)
[psana1101:08529] Signal code: Address not mapped (1)
[psana1101:08529] Failing at address: 0x2b9c1a942020
[psana1101:08529] [ 0] /lib64/libpthread.so.0(+0xf100)[0x2b9fc8aed100]
[psana1101:08529] [ 1] /lib64/libc.so.6(+0x147dc4)[0x2b9fc954adc4]
[psana1101:08529] [ 2] /reg/g/psdm/sw/external/openmpi/1.8.8/x86_64-rhel7-gcc48-opt/lib/libopen-pal.so.6(opal_convertor_unpack+0xb0)[0x2b9fd682f9c0]
[psana1101:08529] [ 3] /reg/g/psdm/sw/external/openmpi/1.8.8/x86_64-rhel7-gcc48-opt/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_rndv+0x154)[0x2b9fdaed5c24]
[psana1101:08529] [ 4] /reg/g/psdm/sw/external/openmpi/1.8.8/x86_64-rhel7-gcc48-opt/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_req_start+0x4d7)[0x2b9fdaed64b7]
[psana1101:08529] [ 5] /reg/g/psdm/sw/external/openmpi/1.8.8/x86_64-rhel7-gcc48-opt/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv+0xb6)[0x2b9fdaece2f6]
[psana1101:08529] [ 6] /reg/g/psdm/sw/external/openmpi/1.8.8/x86_64-rhel7-gcc48-opt/lib/openmpi/mca_coll_basic.so(mca_coll_basic_gatherv_intra+0x18f)[0x2b9fdb2ed4ff]
[psana1101:08529] [ 7] /reg/g/psdm/sw/releases/ana-current/arch/x86_64-rhel7-gcc48-opt/lib/libmpi.so.1(MPI_Gatherv+0x1c8)[0x2b9fd63078a8]
[psana1101:08529] [ 8] /reg/g/psdm/sw/releases/ana-current/arch/x86_64-rhel7-gcc48-opt/python/mpi4py/MPI.so(+0x4af1f)[0x2b9fd5ff9f1f]
[psana1101:08529] [ 9] /reg/g/psdm/sw/external/python/2.7.10/x86_64-rhel7-gcc48-opt/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4c8c)[0x2b9fc87dce0c]
[psana1101:08529] [10] /reg/g/psdm/sw/external/python/2.7.10/x86_64-rhel7-gcc48-opt/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x80d)[0x2b9fc87de25d]
[psana1101:08529] [11] /reg/g/psdm/sw/external/python/2.7.10/x86_64-rhel7-gcc48-opt/bin/../lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x2b9fc87de392]
[psana1101:08529] [12] /reg/g/psdm/sw/external/python/2.7.10/x86_64-rhel7-gcc48-opt/bin/../lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x92)[0x2b9fc88090e2]
[psana1101:08529] [13] /reg/g/psdm/sw/external/python/2.7.10/x86_64-rhel7-gcc48-opt/bin/../lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xd9)[0x2b9fc880a619]
[psana1101:08529] [14] /reg/g/psdm/sw/external/python/2.7.10/x86_64-rhel7-gcc48-opt/bin/../lib/libpython2.7.so.1.0(Py_Main+0xc4d)[0x2b9fc882021d]
[psana1101:08529] [15] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b9fc9424b15]
[psana1101:08529] [16] python[0x400731]
[psana1101:08529] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 8529 on node psana1101 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Lisandro Dalcin

unread,
May 6, 2016, 4:31:57 AM5/6/16
to mpi4py
On 6 May 2016 at 05:17, <c...@stanford.edu> wrote:
> Hi,
>
> We're seeing segfaults only when the data volumes get large with Gatherv,
> running on RHEL7 with openmpi 1.8.8 and mpi4py 1.3.1. The script/stacktrace
> is below. This happens both with the openib btl and the tcp btl. For us,
> the script below fails on 48 cores (4 nodes) but works on 36 cores. If you
> had any advice/experience with such a problem we would be interested. Thank
> you for the wonderful mpi4py package...
>

So, the number of entries in you receive buffer is:

48 * 1,000 * 60,000 = 2,880,000,000

and that is above the 2GB limit of 32 bit integers MPI uses for
message counts and displacements.

In your particular example, your messages seems to be formed by chucks
of 60,000 entries.
I would try the usual trick for large messages:

chunksize = 60000
chunklens = [n//chunksize for n in lengths] if rank==0 else None
chunktype = MPI.DOUBLE.Create_contiguous(chunksize).Commit()
comm.Gatherv(sendbuf=[mysend, chunktype], recvbuf=[myrecv, chunklens,
chunktype])
chunktype.Free()

Disclaimer: I typed the code above directly in my browser, I have not
tested it, it may contain a trivial mistake.

--
Lisandro Dalcin
============
Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459

chri...@gmail.com

unread,
May 6, 2016, 1:52:13 PM5/6/16
to mpi4py
That makes perfect sense.  I was unaware of that limitation.  Thanks for the extremely useful response and the great mpi4py package.

chris

chri...@gmail.com

unread,
Jun 19, 2016, 4:16:23 PM6/19/16
to mpi4py
Hi Lisandro,

I tried following your suggestion, creating the large-chunk datatype, but I still see a segfault running on 2 cores (on different nodes) when I cross the 2GB send limit.  I fear I am doing something incorrectly, but can't spot it.  Can you see if I'm doing something wrong in the 27-line script below?  We're using mpi4py 1.3.1 on top of openmpi 1.8.8.  Thank you for any advice,

chris

import numpy as np

from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

mylen = 400000000

mysend=np.ascontiguousarray(np.arange(mylen,dtype=np.float64))
myrecv = None

lengths=size*[mylen]

if rank==0:
    myrecv = np.ascontiguousarray(np.empty(sum(lengths),mysend.dtype)) # allocate receive buffer                                                                                                                  
    print '*** recv buf',myrecv.nbytes/float(1<<30),'GB'

print 'rank',rank,'sending',mysend.nbytes/float(1<<30),'GB'

chunksize = mylen/10
chunklens = [n//chunksize for n in lengths] if rank==0 else None
chunktype = MPI.DOUBLE.Create_contiguous(chunksize).Commit()
comm.Gatherv(sendbuf=[mysend, chunktype], recvbuf=[myrecv, chunklens, chunktype])
chunktype.Free()

MPI.Finalize()

Lisandro Dalcin

unread,
Jun 20, 2016, 3:05:01 PM6/20/16
to mpi4py
On 19 June 2016 at 23:16, <chri...@gmail.com> wrote:
> I tried following your suggestion, creating the large-chunk datatype, but I
> still see a segfault running on 2 cores (on different nodes) when I cross
> the 2GB send limit. I fear I am doing something incorrectly, but can't spot
> it. Can you see if I'm doing something wrong in the 27-line script below?

I'll give it a try tomorrow in my desktop computer at office.

> We're using mpi4py 1.3.1 on top of openmpi 1.8.8. Thank you for any advice,

Well, I would say chances are high your MPI is too old. Supporting
large messages do require some internal care in the MPI
implementations (that is, perform integer arithmetic with 64bit
integers to not overflow)

I would recommend you to upgrade to mpi4py 2.0.0 and Open MPI 1.10.3
(or perhaps MPICH 3.2). If you use Anaconda Python, just create a
scatch environment and "conda install --channel mpi4py mpich mpi4py".

Lisandro Dalcin

unread,
Jun 21, 2016, 3:19:52 AM6/21/16
to mpi4py
On 20 June 2016 at 22:04, Lisandro Dalcin <dal...@gmail.com> wrote:
>> I tried following your suggestion, creating the large-chunk datatype, but I
>> still see a segfault running on 2 cores (on different nodes) when I cross
>> the 2GB send limit. I fear I am doing something incorrectly, but can't spot
>> it. Can you see if I'm doing something wrong in the 27-line script below?
>
> I'll give it a try tomorrow in my desktop computer at office.

Your code worked just fine with up to 8 processes in my desktop, using
mpi4py/master and MPICH 3.1 (from Fedora 23).

$ mpiexec -n 8 python test-bigtype.py
rank 3 sending 2.98023223877 GB
rank 2 sending 2.98023223877 GB
*** recv buf 23.8418579102 GB
rank 0 sending 2.98023223877 GB
rank 1 sending 2.98023223877 GB
rank 4 sending 2.98023223877 GB
rank 7 sending 2.98023223877 GB
rank 6 sending 2.98023223877 GB
rank 5 sending 2.98023223877 GB

PS: You don't really need MPI.Finalize() at the end, I removed it for
my own tests.

Christopher O'Grady

unread,
Jun 21, 2016, 10:40:42 AM6/21/16
to mpi...@googlegroups.com

On Jun 21, 2016, at 12:19 AM, Lisandro Dalcin <dal...@gmail.com> wrote:

Your code worked just fine with up to 8 processes in my desktop, using
mpi4py/master and MPICH 3.1 (from Fedora 23).

Thank you very much for taking the time to look at that, Lisandro.  One thought crosses my mind:  it only fails for me if I run it on 2 distinct nodes (in my case: 1 core per node).  This might make sense because it would use a different BTL.  If it was quick for you to try on two different nodes I would be grateful.

But it already starts to feel like a bug in the versions I am running, or perhaps how I’ve built the code.  I will start work to upgrade.  Thanks!

chris

Reply all
Reply to author
Forward
0 new messages