Fwd: pypar install with multiple MPIs

30 views
Skip to first unread message

Ole Nielsen

unread,
Aug 11, 2012, 10:45:33 PM8/11/12
to pypar-...@googlegroups.com
Can someone here help with this issue?

Many thanks
Ole

---------- Forwarded message ----------
From: Steve Plimpton <sjp...@sandia.gov>
Date: Sun, Aug 12, 2012 at 3:42 AM
Subject: pypar install with multiple MPIs
To: Ole.Molle...@gmail.com
Cc: sjp...@sandia.gov


Hi Ole - I've used your pypar package extensively in the
past.  It's what we advocate for people to use
with the Python wrapper we provide for our molecular
simluation tool LAMMPS, see http://lammps.sandia.gov.

However, I'm having trouble building and installing the current
Pypar version on a new desktop box.

I believe the issue is that I have multiple MPI versions on
my box:

a) a system version, which is an old OpenMPI, under /usr/lib
b) an MPICH 2 version, I installed under /usr/local/lib
c) a current OpenMPI, I installed under /usr/local/openmpi

The library for LAMMPS, which we want to call from Python
is built with a specific MPI, typically MPICH, though
sometimes the new OpenMPI.

So when I build/install pypar, I need to be able to choose a specific
MPI to build it against, i.e. for the mpi.h and *.so file.  Otherwise
when I run a Python script using both pypar and the LAMMPS lib, it
crashes due to using 2 different MPIs.

Ideally, I'd like there to be a setup.py option that
tells the pypar install which MPI to use, but I don't
see how to do it.  I can't seem to get it to use anything
but OpenMPI.

I tried moved the openmpi to another dir, so it seems to now find the
MPICH mpicc.  And the setup.py build and install seem to now work OK
(no output lines with openmpi in them).

But when I run pypar, I still get this line
being invoked from pypar.py
    mpi = CDLL('libmpi.so.0', RTLD_GLOBAL)
which fails.

The MPICH lib is named libmpich.so, so that can't work.

So am I doing something wrong?  This must be a common problem, that
people have multiple MPI versions installed, espcially since Linux
OS's now often include an (old) installation.

A separate issue.  Is there functionality in the current Pypar, to
allow you create different sub-communicators of MPI_COMM_WORLD and
pass them to a user library as an argument from Python.  LAMMPS is
setup to take an MPI_Comm as an argument when it is instantiated, so
that a driving program (or Python script) can instantiate multiple
instances of LAMMPS on different sets of procs, but I don't know if
that is possible portably with Pypar, especially with different MPIs,
since the data type for MPI_Comm is different in different MPIs.

I think I asked you this Q a couple years ago and you said it
wasn't possible then.

Thanks,
Steve





Ole Nielsen

unread,
Aug 13, 2012, 6:35:47 AM8/13/12
to Steve Plimpton, Stephen Roberts, pypar-...@googlegroups.com
Hi Steve

Thanks for your mail and the question. I know that getting to the shared objects of both OpenMPI and MPICH has been a sticking point in recent years

See e.g.
where we had to put in the following hack to get OpenMPI running again: 

# Work around bug in OpenMPI:
from ctypes import *
mpi = CDLL('libmpi.so.0', RTLD_GLOBAL)
# End work around

and 

where I changed
libmpi.so.0 to the more generic libmpi.so

I also know that Stephen Roberts (CC'd) have been bitten by this, but I have not been able to understand this let alone solve it.
We run it on a cluster that has only OpenMPI installed.


I would be grateful for any help or insights.

Cheers and thanks
Ole

PS - Pypar uses MPI_COMM_WORLD by default. But it would not be a huge deal to enable other communicators.

Ole Nielsen

unread,
Aug 14, 2012, 12:13:20 AM8/14/12
to Steve Plimpton, stephen...@anu.edu.au, pypar-...@googlegroups.com
Thanks for working it around. Two comments

  1. You might want to work with the latest version of pypar where (although still a work around) libmpi.so.0 has been replaced by libmpi.so
  2. I really would love to get rid of this ctypes code. I was never necessary pre 2009 and I can't see why it should be now. So if you come across any hints from the communities around MPICH or OpenMPI in regard to this please let me know.


Cheers
Ole



On Mon, Aug 13, 2012 at 11:03 PM, Steve Plimpton <sjp...@sandia.gov> wrote:
Hi Ole - I got it to work, but only by editing your pypar.py,
at the bottom to look like this:

    # Work around bug in OpenMPI (December 2009):
    # SJP NOTE: added code to have it look for MPICH first, then OpenMPI MPI
    try:
        mpi = CDLL('libmpich.so')
    except:

        mpi = CDLL('libmpi.so.0', RTLD_GLOBAL)
    # End work around

As I see it, the problem is that this line

        mpi = CDLL('libmpi.so.0', RTLD_GLOBAL)
will never work if you are wanting MPICH MPI, since
its lib file is libmpich.so.

Again, what I am wanting to do is write a Python script
that wraps both MPI (via Pypar) and our code LAMMPS, which
itself builds with MPI and makes MPI calls.  So it is required
that both Pypar and LAMMPS use the same MPI, else the script
crashes immediately.  LAMMPS allows you to build with any MPI
that is on your box (since you specify the name and path
of the mpi.h and MPI lib file in your LAMMPS Makefile).

The way I see it, Pypar also needs to give you some control
over which MPI on the system that it builds against.  The odd thing
is that it seems to do this OK when you do setup.py build
and install, because it invokes mpicc.  So as long as it finds
the correct mpicc, then everything should be consistent.  In my
case, the mpicc is for MPICH, so I get this output
from "python setup.py build", as the last line

gcc -pthread -shared build/temp.linux-x86_64-2.7/mpiext.o -L/usr/local/lib -lmpich -lopa -lmpl -lrt -lpthread -o build/lib.linux-x86_64-2.7/pypar/mpiext.so

Note that it knows to use -lmpich and other libs that MPICH requires.
So all is good.

But when you "import pypar" into Python, it ends up running this line:

        mpi = CDLL('libmpi.so.0', RTLD_GLOBAL)
which loads the OpenMPI libmpi.o, which is fatal.  If I didn't
have OpenMPI on my box, it just gives the CDLL fail-to-load
error.

Since my code addition is a hack for my box, I suggest one of the following:

a) since the build knows it needs -lmpich and not -lmpi, keep that
   info around for the CDLL call, if possible
b) add a setup.py option that lets the user specify the correct name
   of the MPI *.so that is needed
c) you might also think about a setup.py option to give a path
   to mpicc, since you can have multiple of those on your box as
   well, if you have 2 or more MPIs installed


> PS - Pypar uses MPI_COMM_WORLD by default. But it would not be a huge deal
> to enable other communicators.

There are 2 issues I think.

a) First, Pypar would need to provide a wrap on MPI_Comm_split(), so
the Python script can create new communicators on subsets of procs.
Maybe it already has this?

b) Second, I need to be able to pass that new communicator (or
MPI_COMM_WORLD) to our LAMMPS lib, as an argument that ctypes
recognizes as being compatible with MPI_Comm.

Note that MPI_Comm is a datatype defined by MPI and is different
for different MPIs.  In MPICH it is just an int.  But in OpenMPI
it is an opaque data structure.  I don't know how Pypar
represents the MPI communicator.

Can you show me a couple lines of Pypar code that would
pass an MPI communicator (e.g. MPI_COMM_WORLD) to a simple C
function that ctypes wraps?  E.g. so that the C code can make
an MPI call with it?

Thanks for developing Pypar.  I like that it has a simple API and is
supported.

Steve

> Date: Mon, 13 Aug 2012 17:35:47 +0700
> From: Ole Nielsen <ole.molle...@gmail.com>
> CC: Stephen Roberts <stephen...@anu.edu.au>,
>         <pypar-...@googlegroups.com>
> Content-Type: multipart/alternative; boundary="0015173fec10bdfd8e04c723410d"
>
> [1:text/plain Hide]
> [2:text/html Show]


Reply all
Reply to author
Forward
0 new messages