MPI, threading, and the GIL

Matthew Emmett

unread,

Sep 9, 2011, 3:41:46 PM9/9/11

to mpi4py, SciPy Users List

Hi everyone,

I am having trouble with MPI send/recv calls, and am wondering if I
have come up against the Python GIL. I am using mpi4py with MVAPICH2
and the threading Python module.

More specifically, our iterative algorithm needs to send data from
rank N to rank N+1, but the rank N+1 processor doesn't need this data
immediately - it has to do a few other things before it needs it. For
each MPI process, I have three threads: one thread for computations,
one thread for doing MPI sends, and one thread for doing MPI receives.

I have set this up in a similar manner to the sendrev.py example here:

http://code.google.com/p/mpi4py/source/browse/trunk/demo/threads/sendrecv.py

The behavior that I have come across is the following: the time taken
for each iteration of the computational part varies quite a bit. It
should remain roughly constant, which I have confirmed in other tests.
After all, the amount of work done in the computational part remains
the same during each iteration. It seems like the threads are not
running as smoothly as I expect, and I wonder if this is due to the
GIL and my use of threads.

Has anyone else dealt with a similar problem?

I have a slightly outdated F90 implementation of the algorithm that
isn't too far behind its Python cousin. I will try to bring it up to
date and try the new communication pattern, but it would be nice to
stay in Python land if possible.

Any suggestions would be appreciated. Thanks,
Matthew

Aron Ahmadia

unread,

Sep 9, 2011, 3:45:32 PM9/9/11

to mpi...@googlegroups.com, SciPy Users List

Hey Matt,

More specifically, our iterative algorithm needs to send data from
rank N to rank N+1, but the rank N+1 processor doesn't need this data
immediately - it has to do a few other things before it needs it. For
each MPI process, I have three threads: one thread for computations,
one thread for doing MPI sends, and one thread for doing MPI receives.

This is not idiomatic MPI. You can do the same thing with a single thread (and avoid GIL issues) by posting non-blocking sends and receives (MPI_Isend/MPI_Irecv) when you have the data to send and then issuing a 'wait' when you need the data to proceed on the receiving end.

Aron

--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To post to this group, send email to mpi...@googlegroups.com.
To unsubscribe from this group, send email to mpi4py+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en.

Lisandro Dalcin

unread,

Sep 9, 2011, 3:52:55 PM9/9/11

to mpi...@googlegroups.com

On 9 September 2011 16:45, Aron Ahmadia <ar...@ahmadia.net> wrote:
> Hey Matt,
> More specifically, our iterative algorithm needs to send data from
> rank N to rank N+1, but the rank N+1 processor doesn't need this data
> immediately - it has to do a few other things before it needs it. For
> each MPI process, I have three threads: one thread for computations,
> one thread for doing MPI sends, and one thread for doing MPI receives.
>
> This is not idiomatic MPI. You can do the same thing with a single thread
> (and avoid GIL issues) by posting non-blocking sends and receives
> (MPI_Isend/MPI_Irecv) when you have the data to send and then issuing a
> 'wait' when you need the data to proceed on the receiving end.

However, your need an MPI implementation truly supporting overlap of
communication and computation. But this should be the case with
MVAPICH2, right?

--
Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169

Matthew Emmett

unread,

Sep 9, 2011, 4:14:30 PM9/9/11

to mpi...@googlegroups.com

Hi Aaron, Lisandro,

On Fri, Sep 9, 2011 at 3:52 PM, Lisandro Dalcin <dal...@gmail.com> wrote:
> On 9 September 2011 16:45, Aron Ahmadia <ar...@ahmadia.net> wrote:
>> Hey Matt,
>> More specifically, our iterative algorithm needs to send data from
>> rank N to rank N+1, but the rank N+1 processor doesn't need this data
>> immediately - it has to do a few other things before it needs it. For
>> each MPI process, I have three threads: one thread for computations,
>> one thread for doing MPI sends, and one thread for doing MPI receives.
>>
>> This is not idiomatic MPI. You can do the same thing with a single thread
>> (and avoid GIL issues) by posting non-blocking sends and receives
>> (MPI_Isend/MPI_Irecv) when you have the data to send and then issuing a
>> 'wait' when you need the data to proceed on the receiving end.
>
> However, your need an MPI implementation truly supporting overlap of
> communication and computation. But this should be the case with
> MVAPICH2, right?

I had originally tried using Isend and Recv, but that didn't seem to
work out as I had hoped. I seemed to experience the same thing as
noted here:

http://groups.google.com/group/mpi4py/browse_thread/thread/ec58a37db1c5e109/9c2f44e9c039caf6

However, I have since switched to MVAPICH2 (from OpenMPI), so will try
again. Also, I haven't tried using any of the one-sided calls, which
upon further thought seem to be a natural fit for our algorithm.

Once again, thanks for the quick replies. Will let you what I end up with.

Matt

Lisandro Dalcin

unread,

Sep 9, 2011, 8:46:23 PM9/9/11

to mpi...@googlegroups.com

FYI, Open MPI can be built with a progress thread (it's a configure
option), so the behavior of Open MPI finally depends on that.

I insist: I would expect MVAPICH2 to not suffer from the progress issue.

BTW, please report back your findings!

Matthew Emmett

unread,

Sep 12, 2011, 11:44:20 AM9/12/11

to mpi...@googlegroups.com

Hi Aaron, Lisandro,

I just reponded to this thread over in scipy-users... just wanted to
report back here as well.

Aaron's suggestion was a good one: I posted receive requests early on
in each iteration with Irecv, and then waited for the data when I
really needed it. No more threading, just basic MPI calls. This
worked well when the send and receive calls weren't lined up. When
the calls are lined up, it turned out that using just plain Send and
Recv was best. Anyway, we're seeing great parallel efficiency and I'm
very happy.

As Lisandro pointed out, this probably worked out well since I am
using MVAPICH2. I haven't tried compiling OpenMPI with the progress
thread enabled. (I'm not sure if I'll get around to this, as I should
be doing runs for David and Aaron...)