Fwd: [Numpy-discussion] Improving Python+MPI import performance

67 views
Skip to first unread message

Dag Sverre Seljebotn

unread,
Jan 13, 2012, 3:51:58 AM1/13/12
to mpi...@googlegroups.com, Chris Kees
This looks very interesting,

Dag

-------- Original Message --------
Subject: [Numpy-discussion] Improving Python+MPI import performance
Date: Thu, 12 Jan 2012 17:13:41 -0800
From: Asher Langton <lang...@llnl.gov>
Reply-To: Discussion of Numerical Python <numpy-di...@scipy.org>
To: numpy-di...@scipy.org

Hi all,

(I originally posted this to the BayPIGgies list, where Fernando Perez
suggested I send it to the NumPy list as well. My apologies if you're
receiving this email twice.)

I work on a Python/C++ scientific code that runs as a number of
independent Python processes communicating via MPI. Unfortunately, as
some of you may have experienced, module importing does not scale well
in Python/MPI applications. For 32k processes on BlueGene/P, importing
100 trivial C-extension modules takes 5.5 hours, compared to 35
minutes for all other interpreter loading and initialization. We
developed a simple pure-Python module (based on knee.py, a
hierarchical import example) that cuts the import time from 5.5 hours
to 6 minutes.

The code is available here:

https://github.com/langton/MPI_Import

Usage, implementation details, and limitations are described in a
docstring at the beginning of the file (just after the mandatory
legalese).

I've talked with a few people who've faced the same problem and heard
about a variety of approaches, which range from putting all necessary
files in one directory to hacking the interpreter itself so it
distributes the module-loading over MPI. Last summer, I had a student
intern try a few of these approaches. It turned out that the problem
wasn't so much the simultaneous module loads, but rather the huge
number of failed open() calls (ENOENT) as the interpreter tries to
find the module files. In the MPI_Import module, we have rank 0
perform the module lookups and then broadcast the locations to the
rest of the processes. For our real-world scientific applications
written in Python and C++, this has meant that we can start a problem
and actually make computational progress before the batch allocation
ends.

If you try out the code, I'd appreciate any feedback you have:
performance results, bugfixes/feature-additions, or alternate
approaches to solving this problem. Thanks!

-Asher
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Aron Ahmadia

unread,
Jan 13, 2012, 4:04:59 AM1/13/12
to mpi...@googlegroups.com, Chris Kees
Dag, thanks for forwarding this on.

A

> --
> You received this message because you are subscribed to the Google Groups
> "mpi4py" group.
> To post to this group, send email to mpi...@googlegroups.com.
> To unsubscribe from this group, send email to
> mpi4py+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mpi4py?hl=en.
>

Matthew Turk

unread,
Jan 13, 2012, 11:30:56 AM1/13/12
to mpi...@googlegroups.com

Dag, I'd like to also thank you. I'm testing it now and actually
running into some funny errors with unittest on Python 2.7 causing
infinite recursion. If anyone is able to get this going, and could
report successes back to the group, that would be very helpful. (And
hopefully Asher may frequent this group as well.)

Asher Langton

unread,
Jan 13, 2012, 3:35:54 PM1/13/12
to mpi4py
On Jan 13, 8:30 am, Matthew Turk <matthewt...@gmail.com> wrote:
> Dag, I'd like to also thank you.  I'm testing it now and actually
> running into some funny errors with unittest on Python 2.7 causing
> infinite recursion.  If anyone is able to get this going, and could
> report successes back to the group, that would be very helpful.  (And
> hopefully Asher may frequent this group as well.)

If you have a chance, could you send me an example that reproduces the
errors? I'll try to figure out what the problem is.

Thanks,
Asher

Asher Langton

unread,
Jan 15, 2012, 1:43:34 AM1/15/12
to mpi4py
On Jan 13, 8:30 am, Matthew Turk <matthewt...@gmail.com> wrote:
> Dag, I'd like to also thank you.  I'm testing it now and actually
> running into some funny errors with unittest on Python 2.7 causing
> infinite recursion.  If anyone is able to get this going, and could
> report successes back to the group, that would be very helpful.  (And
> hopefully Asher may frequent this group as well.)

Could you send me an example that reproduces the error? I'll see if I
can track down the problem.

Thanks,
Asher
Reply all
Reply to author
Forward
0 new messages