On 27 October 2017 at 15:16, Johnnie Gray <
johnni...@gmail.com> wrote:
> I've been using the mpi4py.futures module with great success, and want to
> scale to a multi-node infiniband setting.
>
Good to know. Any complaints or feedback?
> 'Normal' mpi4py seems to work fine, and I can solve large problems using
> slepc4py etc, however, everything hangs indefinitely when
> using either MPICommExecutor or MPIPoolExecutor - no errors appear.
>
Frustrating...
> Just wondering if there are any suggestions for troubleshooting what the
> problem could be?
>
Please try fist to use just MPICommExecutor. It requires the less
advanced MPI features, actually, it should work even with ancient
MPI-1.x implementations. Now some tips to try to make things work.
Maybe this is related to the lack of threading support in the backend MPI.
Could you please edit the file `src/mpi4py/__init__.py` and replace
the rc.thread_level = 'multiple' line to set 'serialized' rather than
'multiple' ? Or maybe even 'single' (you may get a warning later, but
things may still work).
diff --git a/src/mpi4py/__init__.py b/src/mpi4py/__init__.py
index 59f9c34..2ee6c3e 100644
--- a/src/mpi4py/__init__.py
+++ b/src/mpi4py/__init__.py
@@ -89,7 +89,7 @@ def rc(**kargs): # pylint: disable=invalid-name
rc.initialize = True
rc.threads = True
-rc.thread_level = 'multiple'
+rc.thread_level = 'serialized'
rc.finalize = None
rc.fast_reduce = True
rc.recv_mprobe = True
Another thing to try is the following patch:
diff --git a/src/mpi4py/futures/_lib.py b/src/mpi4py/futures/_lib.py
index db0e01a..bef5b2e 100644
--- a/src/mpi4py/futures/_lib.py
+++ b/src/mpi4py/futures/_lib.py
@@ -245,7 +245,7 @@ def comm_split(comm, root=0):
assert 0 <= root < comm.Get_size()
rank = comm.Get_rank()
- if MPI.Get_version() >= (2, 2):
+ if 0: # MPI.Get_version() >= (2, 2):
allgroup = comm.Get_group()
if rank == root:
group = allgroup.Incl([root])
Another source of problem may be a broken MPI_Ibarrier implementation,
you can apply this patch:
diff --git a/src/mpi4py/futures/_lib.py b/src/mpi4py/futures/_lib.py
index db0e01a..97fd8ca 100644
--- a/src/mpi4py/futures/_lib.py
+++ b/src/mpi4py/futures/_lib.py
@@ -373,6 +373,7 @@ class SharedPoolCtx(object):
def barrier(comm):
assert comm.Is_inter()
try:
+ raise NotImplementedError
request = comm.Ibarrier()
backoff = Backoff()
while not request.Test():
>
> Some extra details:
> - using OpenMPI 1.10.1
> - most recent mpi4py from bitbucket
> - spawning processes using mpi does not seem to work on this system
> (neither openmpi or intel),
Well, that's usually the situation in many systems. 2017 is almost
over and we cannot use MPI features that were added to the standard in
1998.
> * I've thus either been using ``mpiexec python -m mpi4py.futures
> ...`` with the Pool executor
> * or ``mpiexec python ...`` with the Comm executor. Both hang.
>
That's the reason I have to add this `mpiexec python -m
mpi4py.futures`. This way, at least you have chance to execute you
neat script that runs just fine in a Raspberry Pi, but fails to
execute in a multi-million dollar system.
--
Lisandro Dalcin
============
Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/
4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa
Office Phone:
+966 12 808-0459