mpi4py hang in test. openmpi seems fine.

323 views
Skip to first unread message

Johnny Lu

unread,
May 13, 2016, 6:30:19 AM5/13/16
to mpi4py
Hi.

I can run and compile the hello_c.c and ring_c.c examples of openmpi 1.8.8 without problem.
I get errors. The python is python2.7.11 with GCC 4.9.3

What are the possible causes?
I would appreciate any pointer to the cause of this problem.

Thank you.

command:
mpiexec python2.7 ring.py

Error:
sending hello from rank 7 to 0 to 0
Traceback (most recent call last):
 File "ring.py", line 21, in <module>
Traceback (most recent call last):
 File "ring.py", line 21, in <module>
incoming_msg = comm.recv(source=source, tag=11)
 File "MPI/Comm.pyx", line 1192, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:106889)
sending hello from rank 5 to 6 to 6
incoming_msg = comm.recv(source=source, tag=11)
 File "MPI/Comm.pyx", line 1192, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:106889)
 File "MPI/msgpickle.pxi", line 264, in mpi4py.MPI.PyMPI_recv (src/mpi4py.MPI.c:42691)
 File "MPI/msgpickle.pxi", line 264, in mpi4py.MPI.PyMPI_recv (src/mpi4py.MPI.c:42691)
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error
sending hello from rank 2 to 3 to 3
sending hello from rank 0 to 1 to 1
sending hello from rank 3 to 4 to 4
sending hello from rank 6 to 7 to 7
Traceback (most recent call last):
 File "ring.py", line 21, in <module>
incoming_msg = comm.recv(source=source, tag=11)
 File "MPI/Comm.pyx", line 1192, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:106889)
Traceback (most recent call last):
 File "ring.py", line 19, in <module>
incoming_msg = comm.recv(source=source, tag=11)
 File "MPI/Comm.pyx", line 1192, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:106889)
Traceback (most recent call last):
 File "ring.py", line 19, in <module>
incoming_msg = comm.recv(source=source, tag=11)
 File "MPI/Comm.pyx", line 1192, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:106889)
Traceback (most recent call last):
 File "ring.py", line 19, in <module>
incoming_msg = comm.recv(source=source, tag=11)
 File "MPI/Comm.pyx", line 1192, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:106889)
 File "MPI/msgpickle.pxi", line 264, in mpi4py.MPI.PyMPI_recv (src/mpi4py.MPI.c:42691)
 File "MPI/msgpickle.pxi", line 264, in mpi4py.MPI.PyMPI_recv (src/mpi4py.MPI.c:42691)
 File "MPI/msgpickle.pxi", line 264, in mpi4py.MPI.PyMPI_recv (src/mpi4py.MPI.c:42691)
 File "MPI/msgpickle.pxi", line 264, in mpi4py.MPI.PyMPI_recv (src/mpi4py.MPI.c:42691)
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error
sending hello from rank 4 to 5 to 5
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error
Traceback (most recent call last):
 File "ring.py", line 19, in <module>
incoming_msg = comm.recv(source=source, tag=11)
 File "MPI/Comm.pyx", line 1192, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:106889)
 File "MPI/msgpickle.pxi", line 264, in mpi4py.MPI.PyMPI_recv (src/mpi4py.MPI.c:42691)
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error
sending hello from rank 1 to 2 to 2
Traceback (most recent call last):
 File "ring.py", line 21, in <module>
incoming_msg = comm.recv(source=source, tag=11)
 File "MPI/Comm.pyx", line 1192, in mpi4py.MPI.Comm.recv (src/mpi4py.MPI.c:106889)
 File "MPI/msgpickle.pxi", line 264, in mpi4py.MPI.PyMPI_recv (src/mpi4py.MPI.c:42691)
mpi4py.MPI.Exception: MPI_ERR_INTERN: internal error
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

 Process name: [[20017,1],0]
 Exit code:    1
--------------------------------------------------------------------------

mpiexec python2.7 setup.py test
running test
running test
running test
running test
running test
running test
running test
running test
running test
running test
running test
running test
[1...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[1...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[1...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[1...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[1...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[1...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[1...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[1...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[1...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[5...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[5...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[5...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[8...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[8...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[8...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[3...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[3...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[3...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[0...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[0...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[0...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[6...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[6...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[6...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[7...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[7...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[7...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[2...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[2...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[2...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[4...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[4...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[4...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
[9...@compute-1-6.local] Python 2.7 (/home/jlu/local/bin/python2.7)
[9...@compute-1-6.local] MPI 3.0 (Open MPI 1.8.8)
[9...@compute-1-6.local] mpi4py 2.0.0 (build/lib.linux-x86_64-2.7/mpi4py)
....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE..........EEEEEEEEEEEEEE......................................................................EE.EEE..E..EEEE.E.............E....EEEEE.EE..EEEEEEE.EE.EE.EEEEEEE.E.EEEE.E.EE..EEEEE...E.E..EEE.E.....E.E.E..EEE..EE.E...EE.E.E.EE.E...EE.E.E....E.EE..E.E......E...E.E....................................E.E.EE.E......EEEEEE.E.E.EE.E.E..E...E.EEEEE.E...EEE.E..EEEEEE.EEE.EEE...E.E...EEE.....E.E...E..E..E.E.EEEE.EE....E......E..............................................EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE.EE.EEEE.EE..E.E.EEE.E.E.EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Lisandro Dalcin

unread,
May 13, 2016, 8:42:23 AM5/13/16
to mpi4py
On 13 May 2016 at 00:05, Johnny Lu <johnny...@gmail.com> wrote:
> Hi.
>
> I can run and compile the hello_c.c and ring_c.c examples of openmpi 1.8.8
> without problem.

Where are hello_c.c and ring_c.c ? The way mpi4py implements
send()/recv() is likely different to the way you coded you C examples,
but I would like to take a look to confirm.

> But, when I run a similar ring.py
> (https://github.com/gjbex/training-material/blob/master/Python/Mpi4py/ring.py).
> I get errors. The python is python2.7.11 with GCC 4.9.3
>

Don't use socket to get the hostname, the MPI way is to use
MPI.Get_processor_name()

> What are the possible causes?

Maybe you are using a thread-multiple enabled build of Open MPI or
some other non-default configure option? Look to this output

$ ompi_info | grep Thread
Thread support: posix (MPI_THREAD_MULTIPLE: no, OPAL support:
yes, OMPI progress: no, ORTE progress: yes, Event lib: yes)

Does it match yours, particularly the MPI_THREAD_MULTIPLE option?


Try adding the following lines to the VERY BEGINNING of your ring.py
file and run it again:

import mpi4py
mpi4py.rc.threads = False
mpi4py.rc.recv_mprobe = False

If that works, then try commenting out the second or third line, to
see which one fixes the behavior, then come back to me.

> I would appreciate any pointer to the cause of this problem.
>

Well, Open MPI has a tradition of breaking mpi4py. I would suggest you
to upgrade to Open MPI 1.10.2 and try again.


--
Lisandro Dalcin
============
Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459
Reply all
Reply to author
Forward
0 new messages