Problem with x10rt MPI

294 views
Skip to first unread message

John Gallagher

unread,
Dec 11, 2010, 4:39:25 PM12/11/10
to coms49...@googlegroups.com
Hi all, I'm having trouble getting the compiler or runtime to find
libmpichcxx.so.1.1
libmpich.so.1.1

even when I use the instructions for -post "# # -lmpi #"

I know where they are:
[jmg2016@athos projectlite]$ rpm -qal mpich2 |grep libmpi
/usr/lib64/mpich2/lib/libmpich.so.1
/usr/lib64/mpich2/lib/libmpich.so.1.2
/usr/lib64/mpich2/lib/libmpichcxx.so.1
/usr/lib64/mpich2/lib/libmpichcxx.so.1.2
/usr/lib64/mpich2/lib/libmpichf90.so.1
/usr/lib64/mpich2/lib/libmpichf90.so.1.2

but... this isn't on the ldconfig path
/sbin/ldconfig -v| grep ":"
/usr/local/lib:
/usr/lib/qt-3.3/lib:
/usr/lib64/qt-3.3/lib:
/lib:
/lib64:
/usr/lib:
/usr/lib64:
/sbin/ldconfig: /lib/i686: (hwcap: 0x0008000000000000)
/usr/lib64/tls: (hwcap: 0x8000000000000000)
/usr/lib64/sse2: (hwcap: 0x0000000004000000)


even if I add the correct LIBRARY_PATH and LD_LIBRARY_PATH to my
bashrc, running causes an error:

./ANNLite.mpi: error while loading shared libraries:
libmpichcxx.so.1.1: cannot open shared object file: No such file or
directory
srun --cpus-per-task=8 --ntasks=1 ./ANNLite.mpi 8
tests/OurPatterns.data 1000 5 10000


Has anyone else been able to get this to run properly?

Maybe some kind of rpath hack?

john

Mashooq Muhaimen

unread,
Dec 11, 2010, 4:47:48 PM12/11/10
to coms49...@googlegroups.com
Are you using a makefile ? I had this problem when I was trying to use
a makefile, probably due to some env variable issue ( path variable ?
). Try the command line, and it will find the path of the shared
libraries by itself ( or at least it did it for me).
If you get it to work, could you post if you are seeing MPI spawn
multiple copies of your program ?
Thanks,
Mashooq

John Gallagher

unread,
Dec 11, 2010, 4:57:27 PM12/11/10
to coms49...@googlegroups.com
Yeah, I always get this error, makefile or not. That is very
surprising that you are able to get it to work when not using a
makefile. I wonder what the reason is there... Anyone else having
luck? I can get pgas_sockets to work, but it is slow. Nothing else
works for me.

john

Vijay Saraswat

unread,
Dec 11, 2010, 5:16:52 PM12/11/10
to coms49...@googlegroups.com, David Cunningham
I cant figure out from this what could be wrong.

Can you send your makefile. How do you create the binary.

On 12/11/2010 4:39 PM, John Gallagher wrote:

John Gallagher

unread,
Dec 11, 2010, 5:38:54 PM12/11/10
to coms49...@googlegroups.com
Makefile:

APP=ANNLite
X10_PATH=/opt/x10-2.1.0/bin
SRC_PACKAGE=src

$(APP).mpi:
$(X10_PATH)/x10c++ -x10rt mpi -O -report postcompile=5 -post "# #
-lpmi #" -o $@ $(SRC_PACKAGE)/*.x10


Trying to make

make ANNLite.mpi
/opt/x10-2.1.0/bin/x10c++ -x10rt mpi -O -report postcompile=5 -post "#
# -lpmi #" -o ANNLite.mpi src/*.x10
Output files: [Network.cc, ANNDriver.h, GlobalToPlaceData.inc,
ANNMath.cc, GlobalData.h, Network.h, GlobalToPlaceData.h,
AsyncData.inc, ANNMath.inc, GlobalToPlaceData.cc, AsyncData.h,
PlaceData.inc, GlobalData.cc, ANNMath.h, ANNDriver.cc, AsyncData.cc,
PlaceData.h, BatchTrainer.inc, DataManager.cc, BatchTrainer.cc,
DataManager.inc, BatchTrainer.h, ANNDriver.inc, PlaceData.cc,
DataManager.h, GlobalData.inc, Network.inc]

Executing post-compiler mpicxx -g -I/opt/x10-2.1.0/include
-I/home/jmg2016/projectlite -I. -O2 -DNDEBUG -DNO_PLACE_CHECKS
-finline-functions -Wno-long-long -Wno-unused-parameter -pthread
-msse2 -mfpmath=sse -o /home/jmg2016/projectlite/ANNLite.mpi
Network.cc GlobalData.cc ANNDriver.cc ANNMath.cc AsyncData.cc
DataManager.cc BatchTrainer.cc GlobalToPlaceData.cc PlaceData.cc -lpmi
-L/opt/x10-2.1.0/lib -lx10 -DX10_USE_BDWGC -lgc -lx10rt_mpi -lpmi -ldl
-lm -lpthread -Wl,--rpath -Wl,/opt/x10-2.1.0/lib -Wl,-export-dynamic
-lrt
x10c++: /usr/bin/ld: warning: libmpichcxx.so.1.1, needed by
/opt/x10-2.1.0/lib/libx10rt_mpi.so, not found (try using -rpath or
-rpath-link)
/usr/bin/ld: warning: libmpich.so.1.1, needed by
/opt/x10-2.1.0/lib/libx10rt_mpi.so, not found (try using -rpath or
-rpath-link)


(note that -lpmi is twice, so there is really no need for the -post
anymore, but I left it in anyway)

ldd ANNLite.mpi
libmpichcxx.so.1.1 => not found
libmpich.so.1.1 => not found

(which makes sense because these libraries aren't in ldconfig, as my
ldconfig output implied before)

I also tried adding -Wl,--rpath -Wl,/usr/lib64/mpich2/lib to the post,
but I think x10c++ is actually running the final linker step itself
instead of in the compiler. In any case, I get the same error.


John

John Gallagher

unread,
Dec 12, 2010, 1:30:58 PM12/12/10
to coms49...@googlegroups.com
Can anyone give me an ldd dump from their working mpi code? I've
exhausted all the hacks I can think of. Also a `which mpicxx` and
your x10 version would be very helpful.

john

Mashooq Muhaimen

unread,
Dec 12, 2010, 1:57:19 PM12/12/10
to coms49...@googlegroups.com
[mm3858@athos ClrParallel1]$ x10c++ -t -v -report postcompile=1 -o
Clr.mpi -x10rt mpi -optimize *.x10
"/opt/ibm/java-x86_64-60/bin/java" -Xmx512m -classpath
"/opt/x10/lib/x10c.jar:/opt/x10/classes:/opt/x10/lib/x10c.jar:/opt/x10/lib/x1
0.jar:/opt/x10/lib/:/opt/x10/lib/:/opt/x10/lib/:/opt/x10/lib/lpg.jar:/opt/x10/lib/wala.jar:/opt/x10/lib/com.ibm.wala.cast.x10.jar:/opt
/x10/lib/org.eclipse.equinox.common_3.6.0.v20100503.jar"
polyglot.main.Main -extclass x10cuda.ExtensionInfo -sourcepath
"/opt/x10/li b/x10.jar" -OPTIMIZE=true '-report'
'postcompile=1' '-o' 'Clr.mpi' 'Clr.x10' 'EntropyCalculator.x10'
'FileHandler.x10' 'ScoreInfo.x1 0' 'TimeProfiler.x10'
Executing post-compiler /opt/openmpi-1.4/bin/mpicxx -g
-I/opt/x10/include -I/home/mm3858/ClrParallel1 -I. -O2 -DNDEBUG
-DNO_PLACE_CHEC KS -finline-functions -Wno-long-long
-Wno-unused-parameter -pthread -msse2 -mfpmath=sse -o
/home/mm3858/ClrParallel1/Clr.mpi FileHandl er.cc Clr.cc
TimeProfiler.cc EntropyCalculator.cc ScoreInfo.cc -L/opt/x10/lib -lx10
-DX10_USE_BDWGC -lgc -lx10rt_mpi -ldl -lm -lpthrea d
-Wl,--rpath -Wl,/opt/x10/lib -Wl,-export-dynamic -lrt

real 0m57.424s
user 1m15.891s
sys 0m7.550s
[mm3858@athos ClrParallel1]$ ldd Clr.mpi
libx10.so => /opt/x10/lib/libx10.so (0x00002ba7b5b77000)
libgc.so.1 => /opt/x10/lib/libgc.so.1 (0x00002ba7b61ef000)
libx10rt_mpi.so => /opt/x10/lib/libx10rt_mpi.so
(0x00002ba7b6445000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000365fc00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003660000000)
librt.so.1 => /lib64/librt.so.1 (0x0000003662400000)
libmpi_cxx.so.0 => /opt/openmpi-1.4/lib/libmpi_cxx.so.0
(0x00002ba7b6694000)
libmpi.so.0 => /opt/openmpi-1.4/lib/libmpi.so.0
(0x00002ba7b68ae000)
libopen-rte.so.0 => /opt/openmpi-1.4/lib/libopen-rte.so.0
(0x00002ba7b6b57000)
libopen-pal.so.0 => /opt/openmpi-1.4/lib/libopen-pal.so.0
(0x00002ba7b6da2000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003666c00000)
libutil.so.1 => /lib64/libutil.so.1 (0x000000366e200000)
libm.so.6 => /lib64/libm.so.6 (0x000000365f800000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x0000003665800000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003664400000)
libc.so.6 => /lib64/libc.so.6 (0x000000365f400000)
/lib64/ld-linux-x86-64.so.2 (0x000000365f000000)

John Gallagher

unread,
Dec 12, 2010, 2:22:36 PM12/12/10
to coms49...@googlegroups.com
Thanks! Looks like it compiles if your x10 is at /opt/x10. I was
using /opt/x10-2.1.0, which tries to pull in the mpich2 libraries. I
still can't get it to recognize more than one place, but at least it
runs.

john

Martha Kim

unread,
Dec 12, 2010, 2:28:16 PM12/12/10
to coms49...@googlegroups.com
Hi John,

I was just noodling around in parallel and was able to recreate your error with /opt/x10-2.1.0/.  I got it working for /opt/x10-2.1.0_ompi/ though.  The compile flags are a tad different from what you were using as I was sticking as closely to Shreedar's documentation as possible, but the differences are mostly in optimization settings.

It seems to run fine on multiple places.

Feel free to poach whatever is useful out of the attached files.

Martha
john-help.tar.gz
Reply all
Reply to author
Forward
0 new messages