Problems building and running mpi4py on a Cray XT5 (Kraken)

766 views
Skip to first unread message

anujm...@gmail.com

unread,
Sep 6, 2013, 3:50:17 PM9/6/13
to mpi...@googlegroups.com
Hello all!

Let me preface this by saying that I have successfully used mpi4py on another cluster.

I am trying to get all my packages to work properly, but mpi4py seems to hate me.  The C compiler is packaged into cc and the C++ compiler is CC.  The system has an implementation of MPI called MPT, and I can get MPI programs successfully running in C using the cc compiler.

When I install mpi4py from source, I change mpicc = cc and mpixx = CC.  I then do:

python setup.py build

The install seems to be fine, but there are a number of failures, mostly in not finding mpe and vt-* libraries.  I think this is OK, however, since most of the other checks are successful.  I have attached a log of the build.

After installing, When I try to do

from mpi4py import MPI

I get these errors:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: /opt/cray/lib64/libsci_gnu_mp.so.1: undefined symbol: _gfortran_transfer_complex_write

Additionally, when I try to run a MPI program as a job from the compute nodes, I get this:

aprun -n 12 python testmpi.py
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
Traceback (most recent call last):
  File "testmpi.py", line 1, in <module>
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory
    from mpi4py import MPI
ImportError: libgfortran.so.3: cannot open shared object file: No such file or directory

I am positive that libgfortran is in LD_LIBRARY_PATH as I have included the dir where it is to that variable, as well as copied and symlinked properly in a directory which I know works.

I have been tearing out my hair all day trying to figure this out.  Can anyone help me?
buildlog.txt

Lisandro Dalcin

unread,
Sep 8, 2013, 5:29:57 AM9/8/13
to mpi4py
Can you run "ldd /path/to/mpi4py/MPI.so" and show us the output?

Also try the following: Add the following lines to setup.cfg
(/path/to/gfortran/lib is the directory where libgfortran.so is
located)

[build_ext]
rpath = /path/to/gfortran/lib


--
Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169

anujm...@gmail.com

unread,
Sep 9, 2013, 1:02:09 PM9/9/13
to mpi...@googlegroups.com
Thanks for the reply!

Here is the output of ldd on MPI.so: (I have replaced my handle with 'user')

ldd scratch/.local/lib/python2.7/site-packages/mpi4py/MPI.so
linux-vdso.so.1 =>  (0x00007fffd4dff000)
libpython2.7.so.1.0 => /lustre/scratch/user/.local/lib/libpython2.7.so.1.0 (0x00007fbdb1058000)
libgfortran.so.3 => /opt/gcc/default/snos/lib64/libgfortran.so.3 (0x00007fbdb0e6d000)
libscicpp_gnu.so.1 => /opt/cray/lib64/libscicpp_gnu.so.1 (0x00007fbdb0c3c000)
libsci_gnu_mp.so.1 => /opt/cray/lib64/libsci_gnu_mp.so.1 (0x00007fbdaf79c000)
libstdc++.so.6 => /opt/gcc/default/snos/lib64/libstdc++.so.6 (0x00007fbdaf58c000)
libfftw3.so.3 => /opt/fftw/default/lib/libfftw3.so.3 (0x00007fbdaf30c000)
libfftw3f.so.3 => /opt/fftw/default/lib/libfftw3f.so.3 (0x00007fbdaf095000)
libmpich_gnu.so.gnu-46-1 => /opt/cray/lib64/libmpich_gnu.so.gnu-46-1 (0x00007fbdaed72000)
libmpl.so.0 => /opt/cray/lib64/libmpl.so.0 (0x00007fbdaec6e000)
librt.so.1 => /lib64/librt.so.1 (0x00007fbdaea64000)
libpmi.so.0 => /opt/cray/pmi/default/lib64/libpmi.so.0 (0x00007fbdae94a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fbdae72e000)
libm.so.6 => /lib64/libm.so.6 (0x00007fbdae4d7000)
libgomp.so.1 => /opt/gcc/default/snos/lib64/libgomp.so.1 (0x00007fbdae3ca000)
libc.so.6 => /lib64/libc.so.6 (0x00007fbdae071000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fbdade6c000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007fbdadc69000)
libmpich.so.1 => /opt/cray/lib64/libmpich.so.1 (0x00007fbdad946000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbdb173e000)
libgcc_s.so.1 => /opt/gcc/default/snos/lib64/libgcc_s.so.1 (0x00007fbdad82f000)
libportals.so.1 => /opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64/libportals.so.1 (0x00007fbdad626000)

So, it seems as though the library I am interested in is in /opt/gcc/default/snos/lib64.

To make sure I am having this problem, I try to run my command in an interactive job:

user@krakenpf4:~/scratch> aprun -n 12 python testmpi.py
Application 6874935 exit codes: 1
Application 6874935 resources: utime ~0s, stime ~1s

Clearly the problem still exists.

In order for python to find this, I believe I need to append this to LD_LIBRARY_PATH:

user@krakenpf4:~/scratch> echo $LD_LIBRARY_PATH
/lustre/scratch/user/.local/lib:/opt/gcc/default/snos/lib64:/sw/xt/globus/5.0.4/binary/lib:/opt/torque/2.5.11/lib:/opt/pgi/11.9.0/linux86-64/11.9/libso:/opt/pgi/11.9.0/linux86-64/11.9/lib:/opt/cray/MySQL/5.0.64-1.0301.2899.20.1.ss/lib64/mysql:/opt/cray/MySQL/5.0.64-1.0301.2899.20.1.ss/lib64:/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64:/opt/cray/atp/1.4.1/lib
user@krakenpf4:~/scratch> export LD_LIBRARY_PATH=/opt/gcc/default/snos/lib:$LD_LIBRARY_PATH
user@krakenpf4:~/scratch> echo $LD_LIBRARY_PATH
/opt/gcc/default/snos/lib:/lustre/scratch/user/.local/lib:/opt/gcc/default/snos/lib64:/sw/xt/globus/5.0.4/binary/lib:/opt/torque/2.5.11/lib:/opt/pgi/11.9.0/linux86-64/11.9/libso:/opt/pgi/11.9.0/linux86-64/11.9/lib:/opt/cray/MySQL/5.0.64-1.0301.2899.20.1.ss/lib64/mysql:/opt/cray/MySQL/5.0.64-1.0301.2899.20.1.ss/lib64:/opt/cray/portals/2.2.0-1.0301.26633.6.9.ss/lib64:/opt/cray/atp/1.4.1/lib

Again, I try to run the script:

user@krakenpf4:~/scratch> aprun -n 12 python testmpi.py
Application 6874936 exit codes: 1
Application 6874936 resources: utime ~0s, stime ~1s

Just to make sure the library is indeed in that folder, I check.

user@krakenpf4:~/scratch> ls /opt/gcc/default/snos/lib64
libgcc_s.so     libgfortran.so.3      libgomp.so.1      libmudflap.la        libmudflapth.la        libssp.la            libssp.so.0.0.0  libstdc++.so.6.0.13
libgcc_s.so.1   libgfortran.so.3.0.0  libgomp.so.1.0.0  libmudflap.so        libmudflapth.so        libssp_nonshared.a   libstdc++.a      libsupc++.a
libgfortran.a   libgomp.a             libgomp.spec      libmudflap.so.0      libmudflapth.so.0      libssp_nonshared.la  libstdc++.la     libsupc++.la
libgfortran.la  libgomp.la            libiberty.a       libmudflap.so.0.0.0  libmudflapth.so.0.0.0  libssp.so            libstdc++.so
libgfortran.so  libgomp.so            libmudflap.a      libmudflapth.a       libssp.a               libssp.so.0          libstdc++.so.6

I am very confused!  Thanks for the help. 

anujm...@gmail.com

unread,
Sep 9, 2013, 1:14:43 PM9/9/13
to mpi...@googlegroups.com, anujm...@gmail.com
Just FYI, I seemed to have also appended /opt/gcc/default/snos/lib to LD_LIBRARY_PATH, because I also found a libgfortran.so.3 there as well.  To be clear, BOTH of these paths are in LD_LIBRARY_PATH (though I believe the only relevant one is in lib64), and python still cannot find them.

anujm...@gmail.com

unread,
Sep 9, 2013, 1:27:28 PM9/9/13
to mpi...@googlegroups.com
I have additionally tried the fix you said to setup.cfg.  Alas, this did not resolve the problem.

Lisandro Dalcin

unread,
Sep 10, 2013, 8:18:10 AM9/10/13
to mpi4py
On 9 September 2013 20:27, <anujm...@gmail.com> wrote:
> I have additionally tried the fix you said to setup.cfg. Alas, this did not
> resolve the problem.
>

I have no clue what's going on... Can you download the
helloworld.{c|cxx|f90} from here
https://bitbucket.org/mpi4py/mpi4py/src/master/demo, compile then with
C, C++, f90 respectively, run "ldd" on the output executables, and
show me the output?

anujm...@gmail.com

unread,
Sep 11, 2013, 1:00:02 PM9/11/13
to mpi...@googlegroups.com
So it turns out that the cluster is set up very complicatedly, and the access to certain directories is altered whenever a job is launched.  So I have been compiling mpi4py correctly, but on the wrong (login/service) nodes instead of the compute nodes.  A simple module load has fixed all my problems.

Sorry to bother you with my inexperience! And thanks so much for the help.

As an aside, is there any chance for me to contribute to the project?  I love this module so much and it seems like the community is stellar.  Please let me know how I can help!

Once again, thanks.

Anuj

Lisandro Dalcin

unread,
Sep 12, 2013, 4:40:13 AM9/12/13
to mpi4py
On 11 September 2013 20:00, <anujm...@gmail.com> wrote:
> So it turns out that the cluster is set up very complicatedly, and the
> access to certain directories is altered whenever a job is launched. So I
> have been compiling mpi4py correctly, but on the wrong (login/service) nodes
> instead of the compute nodes. A simple module load has fixed all my
> problems.
>
> Sorry to bother you with my inexperience! And thanks so much for the help.
>
> As an aside, is there any chance for me to contribute to the project? I
> love this module so much and it seems like the community is stellar. Please
> let me know how I can help!
>

Can you try to build mpi4py from the development copy at bitbucket
(https://bitbucket.org/mpi4py/mpi4py) ?? I've made some changes to the
build code, and I simply cannot test in all the platforms, and
specially on supercomputers I do not have access... You need to git
clone the repository, and install Cython (mpi4py will invoke Cython to
generate the C wrappers).

mpi4py really lacks on documentation. Although there are a bunch of
third-party docs out there, I would love to have better official docs,
with more examples. But I really do not have time for this task. Any
contribution on this front is very welcome.

And of course, any code review or suggestion to improve performance
will be very welcome.

evalan...@gmail.com

unread,
Dec 23, 2014, 9:52:01 AM12/23/14
to mpi...@googlegroups.com, anujm...@gmail.com
Hello Anuj,

I seem to be running in to the same problem(the build log is the same as yours) with mpi4py installation on a Cray XC supercomputer. I installed mpi4py on another cluster using pip and it was working fine. I saw that you have figured out the solution for the problem. Can you please share more on how you solved it? I would be very grateful. Thank you.

Eva

Anuj Girdhar

unread,
Dec 23, 2014, 11:40:58 AM12/23/14
to evalan...@gmail.com, mpi...@googlegroups.com
Eva,

Here is a solution I have to another user. 

"It turned out to be a bit more complicated than that.

Firstly, on my cluster I had the python2.7-cnl module.  In addition, I needed to manually locate the libraries on the login nodes, transfer them to my scratch dir (which was readable by the compute nodes), and link to them there by adding to my path.  It was quite a pain, but if you do ldd on your mpi4py you can find out what libraries you need and find them accordingly."

Anuj Girdhar
Ph.D. Candidate
Department of Physics
University of Illinois at Urbana-Champaign

Aron Ahmadia

unread,
Dec 23, 2014, 11:52:52 AM12/23/14
to mpi...@googlegroups.com, evalan...@gmail.com, hash...@googlegroups.com
Folks,

We have put together HashDist profiles for several large Cray machines.  Sometimes these profiles can be helpful in figuring out how to build Python and mpi4py on the specific machines, and if you can contribute back, I would really appreciate it.

Even if you just have a shell script that reproduces your mpi4py build, I'm happy to contribute it to the main HashDist repository.

Here's an example of a much more complicated build stack that includes mpi4py on a Cray XE6: https://github.com/hashdist/hashstack/blob/master/examples/proteus.garnet.gnu.yaml

The HashDist developers may be a little less responsive this week, but I can answer any basic questions you have.

Regards,
Aron

--
You received this message because you are subscribed to the Google Groups "mpi4py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mpi4py+un...@googlegroups.com.
To post to this group, send email to mpi...@googlegroups.com.
Visit this group at http://groups.google.com/group/mpi4py.
To view this discussion on the web visit https://groups.google.com/d/msgid/mpi4py/CAO9Srg8Ev6sJfa_U6QoTRPKVFWXJjV17i7Axjs1%2BnkdsojgP-Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages