Segmentation fault (11) with (signal code) Address not mapped (1)

6,285 views
Skip to first unread message

renyu...@gmail.com

unread,
Jun 19, 2017, 3:50:05 AM6/19/17
to mpi4py
Hi,

I encounter a strange behavior when using mpi4py and am mot sure where to ask, so try to ask here first.

I'm writing a program and it's quite a bit of codes (so probably too long to post). I added some `print()` and figured out that the problem was triggered at the calls on one of the `comm.recv()` function call and/or (sometimes `recv()`, sometimes the following) its following `status.Get_source()` and `status.Get_tag()` (they are on the same line) . This error is only observed when I removed most of (or all of) the debugging instructions (so the program runs a little faster), and usually won't be triggered (once every several tens of times).

This `comm.recv()` call (i.e. on this communicator) as well as the following `status.Get_source()` and `status.Get_tag()` calls on each process is single-threaded (though calls to other communicators are executed in parallel on other threads), and there may be concurrent `comm.send()` calls on the same communicator (but as of my observation, when the error happens, no `comm.send()` is on-going).

I run the program locally (x86_64 linux), with mpi4py 2.0.0, python 3.6.1, openmpi 1.10.6.

The error message is this:

[gh-xps:04520] *** Process received signal ***
[gh-xps:04520] Signal: Segmentation fault (11)
[gh-xps:04520] Signal code: Address not mapped (1)
[gh-xps:04520] Failing at address: 0x12f6af0

[gh-xps:04520] [ 0] /usr/lib/libpthread.so.0(+0x11940)[0x7fb2e9fc1940]
[gh-xps:04520] [ 1] /usr/lib/openmpi/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x14c)[0x7fb2d7dfbaac]
[gh-xps:04520] [ 2] /usr/lib/openmpi/openmpi/mca_btl_vader.so(+0x3c9e)[0x7fb2d7dfbc9e]
[gh-xps:04520] [ 3] /usr/lib/openmpi/libopen-pal.so.13(opal_progress+0x4a)[0x7fb2e2fc014a]
[gh-xps:04520] [ 4] /usr/lib/openmpi/openmpi/mca_pml_ob1.so(mca_pml_ob1_iprobe+0x292)[0x7fb2d75471a2]
[gh-xps:04520] [ 5] /usr/lib/openmpi/libmpi.so.12(PMPI_Iprobe+0xce)[0x7fb2e351e6de]
[gh-xps:04520] [ 6] /home/ryey/self/venv/lib/python3.6/site-packages/mpi4py/MPI.cpython-36m-x 86_64-linux-gnu.so(+0x50ae6)[0x7fb2e37edae6]
[gh-xps:04520] [ 7] /usr/lib/libpython3.6m.so.1.0(_PyCFunction_FastCallDict+0x12c)[0x7fb2e982c94c]
[gh-xps:04520] [ 8] /usr/lib/libpython3.6m.so.1.0(_PyCFunction_FastCallKeywords+0x4d)[0x7fb2e982cb2d]
[gh-xps:04520] [ 9] /usr/lib/libpython3.6m.so.1.0(+0x1549df)[0x7fb2e97ff9df]
[gh-xps:04520] [10] /usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x1144)[0x7fb2e97b51f4]
[gh-xps:04520] [11] /usr/lib/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x12a)[0x7fb2e97fef6a]
[gh-xps:04520] [12] /usr/lib/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x28e)[0x7fb2e986734e]
[gh-xps:04520] [13] /usr/lib/libpython3.6m.so.1.0(_PyObject_Call_Prepend+0x52)[0x7fb2e98680b2]
[gh-xps:04520] [14] /usr/lib/libpython3.6m.so.1.0(PyObject_Call+0x4b)[0x7fb2e986817b]
[gh-xps:04520] [15] /usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x1b33)[0x7fb2e97b5be3]
[gh-xps:04520] [16] /usr/lib/libpython3.6m.so.1.0(+0x15450a)[0x7fb2e97ff50a]
[gh-xps:04520] [17] /usr/lib/libpython3.6m.so.1.0(+0x154ac3)[0x7fb2e97ffac3]
[gh-xps:04520] [18] /usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x317)[0x7fb2e97b43c7]
[gh-xps:04520] [19] /usr/lib/libpython3.6m.so.1.0(+0x15450a)[0x7fb2e97ff50a]
[gh-xps:04520] [20] /usr/lib/libpython3.6m.so.1.0(+0x154ac3)[0x7fb2e97ffac3]
[gh-xps:04520] [21] /usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x317)[0x7fb2e97b43c7]
[gh-xps:04520] [22] /usr/lib/libpython3.6m.so.1.0(_PyFunction_FastCallDict+0x12a)[0x7fb2e97fef6a]
[gh-xps:04520] [23] /usr/lib/libpython3.6m.so.1.0(_PyObject_FastCallDict+0x28e)[0x7fb2e986734e]
[gh-xps:04520] [24] /usr/lib/libpython3.6m.so.1.0(_PyObject_Call_Prepend+0x52)[0x7fb2e98680b2]
[gh-xps:04520] [25] /usr/lib/libpython3.6m.so.1.0(PyObject_Call+0x4b)[0x7fb2e986817b]
[gh-xps:04520] [26] /usr/lib/libpython3.6m.so.1.0(+0x1e2b92)[0x7fb2e988db92]
[gh-xps:04520] [27] /usr/lib/libpthread.so.0(+0x7297)[0x7fb2e9fb7297]
[gh-xps:04520] [28] /usr/lib/libc.so.6(clone+0x3f)[0x7fb2e9cf825f]
[gh-xps:04520] *** End of error message ***

--------------------------------------------------------------------------
mpiexec noticed that process rank 15 with PID 4520 on node gh-xps exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------


Any suggestions are appreciated. 
 

Lisandro Dalcin

unread,
Jun 19, 2017, 3:59:43 AM6/19/17
to mpi4py
On 19 June 2017 at 02:57, <renyu...@gmail.com> wrote:
> I run the program locally (x86_64 linux), with mpi4py 2.0.0, python 3.6.1,
> openmpi 1.10.6.

Does you Open MPI build support MPI_THREAD_MULTIPLE? What's the output
of MPI.Query_thread()? If you are calling MPI routines in threads
without the proper thread support, bad things could happen. Have you
tried your code with an alternative MPI implementation, let say MPICH?

Finally, I recommend you to try using mpi4py from a git checkout.
There are some important fixes related to thread safety though using
fine-grained locking in pickle-based communication calls. Maybe this
new stuff fixes your issues.



--
Lisandro Dalcin
============
Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459

趙睿

unread,
Jun 19, 2017, 6:07:52 AM6/19/17
to mpi4py
Hi Lisandro, thanks for your reply.

On 2017-6-19 Monday UTC+1 8:59:43,Lisandro Dalcin wrote:
On 19 June 2017 at 02:57,  <renyu...@gmail.com> wrote:
> I run the program locally (x86_64 linux), with mpi4py 2.0.0, python 3.6.1,
> openmpi 1.10.6.

Does you Open MPI build support MPI_THREAD_MULTIPLE? What's the output
of MPI.Query_thread()? If you are calling MPI routines in threads
without the proper thread support, bad things could happen. Have you
tried your code with an alternative MPI implementation, let say MPICH?

The output of MPI.Query_thread() is `2`.

Not really. Actually I have tried, but when using mpich (with the same set of arguments when I use openmpi), the size of COMM_WORLD is always `1` (for each process), so the program won't run properly.
I'm not sure where it went wrong, but apparently this should be related to the core function of mpich... I just did a fresh install (without any modification) through AUR (I'm using archlinux on my laptop).
 

Finally, I recommend you to try using mpi4py from a git checkout.
There are some important fixes related to thread safety though using
fine-grained locking in pickle-based communication calls. Maybe this
new stuff fixes your issues.

I just tried that, but with no luck. The problem is the same, regardless of the different memory address.

(Maybe not related to this, but I think more information would help:) Actually I have encounterd a similar problem before, but in a different situation. At that time, it fails at the rank 0 process (which acts as a coordinator) followed by a syntax error (manually observed, not thrown by the interpreter/runtime).

I read most of the segmantation fault posts on this forum but they don't seem to be similar to my issue. Through my web search, ths only similar question is https://stackoverflow.com/questions/30275341/c-signal-code-address-not-mapped-1-mpirecv , though if this is the actual cause, this should be a lower level issue (maybe related to mpi4py or openmpi itself).

Lisandro Dalcin

unread,
Jun 19, 2017, 6:49:34 AM6/19/17
to mpi4py
On 19 June 2017 at 13:07, 趙睿 <renyu...@gmail.com> wrote:
> Hi Lisandro, thanks for your reply.
>
> On 2017-6-19 Monday UTC+1 8:59:43,Lisandro Dalcin wrote:
>>
>> On 19 June 2017 at 02:57, <renyu...@gmail.com> wrote:
>> > I run the program locally (x86_64 linux), with mpi4py 2.0.0, python
>> > 3.6.1,
>> > openmpi 1.10.6.
>>
>> Does you Open MPI build support MPI_THREAD_MULTIPLE? What's the output
>> of MPI.Query_thread()? If you are calling MPI routines in threads
>> without the proper thread support, bad things could happen. Have you
>> tried your code with an alternative MPI implementation, let say MPICH?
>
>
> The output of MPI.Query_thread() is `2`.
>

Well, this means you MPI build does not support multiple threads.

> Not really. Actually I have tried, but when using mpich (with the same set
> of arguments when I use openmpi), the size of COMM_WORLD is always `1` (for
> each process), so the program won't run properly.
> I'm not sure where it went wrong, but apparently this should be related to
> the core function of mpich... I just did a fresh install (without any
> modification) through AUR (I'm using archlinux on my laptop).
>

Then your system is messed up. This happens sometimes in Ubuntu, as
the alternatives system is messed up. The problem is that you are
building with MPICH, but your "mpiexec" command likely corresponds to
Open MPI. Maybe you have a "mpiexec.mpich" command? You should use
that one to execute Python and run MPICH+mpi4py scripts.

> (Maybe not related to this, but I think more information would help:)
> Actually I have encounterd a similar problem before, but in a different
> situation. At that time, it fails at the rank 0 process (which acts as a
> coordinator) followed by a syntax error (manually observed, not thrown by
> the interpreter/runtime).
>
> I read most of the segmantation fault posts on this forum but they don't
> seem to be similar to my issue. Through my web search, ths only similar
> question is
> https://stackoverflow.com/questions/30275341/c-signal-code-address-not-mapped-1-mpirecv
> , though if this is the actual cause, this should be a lower level issue
> (maybe related to mpi4py or openmpi itself).
>

Could you paste the exact line with the "comm.recv()" call that
produces the segfault?
Reply all
Reply to author
Forward
0 new messages