Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Programs compiled on Mandriva 2007.1 won't start on 2007.0

6 views
Skip to first unread message

David Mathog

unread,
Jul 9, 2007, 1:49:32 PM7/9/07
to
Well this one is fun. On our beowulf the master node was 2006.0 and the
slaves 2007.0. A while back I installed 2007.1 on the master. All
seemed fine. Then I rebuilt pvm and it wouldn't run on the slaves.
After about a day and a half I've finally traced it down to a totally
bizarre failure - ANY program built on the master node will not run
on the slave.

Example:
(on master, Dual Opterons)
# uname -a
Linux safserver.bio.caltech.edu 2.6.17-14mdv #1 SMP Wed May 9 21:11:43
MDT 2007 i686 AMD Opteron(tm) Processor 246 HE GNU/Linux
# gcc --version
gcc (GCC) 4.1.2 20070302 (prerelease) (4.1.2-1mdv2007.1)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# cat >hello.c <<EOD
#include <stdio.h>
int main(void){
(void) fprintf(stdout,"HELLO\n");
}
EOD
# gcc -o hello hello.c
# ./hello
HELLO
# cp hello /usr/common/tmp #NFS mounted on all slaves


(on slave, single Athlon MP)
# /usr/common/tmp/hello
Floating point exception (core dumped)

Bizarre! So, rebuilt it like
(on master)
# gcc -g -o hello hello.c
# cp -f hello /usr/common/tmp
# cp -f hello.c /usr/common/tmp

(on slave)
# uname -a
Linux monkey01.cluster 2.6.19.3 #1 SMP Wed Feb 7 11:17:15 PST 2007 i686
AMD Athlon(tm) MP 2200+ GNU/Linux
# gcc --version
gcc (GCC) 4.1.1 20060724 (prerelease) (4.1.1-3mdk)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# gdb /usr/common/tmp/hello
(gdb) run
Starting program: /usr/common/tmp/hello
Failed to read a valid object file image from memory.

Program received signal SIGFPE, Arithmetic exception.
0xb7f8b96f in do_lookup_x (undef_name=0xb7e598d3 "_res", hash=420035,
ref=0xb7e52834, result=0xbffef1f0, scope=0xb7f9c838, i=0,
version=0xb7f78328, flags=0, skip=0x0, type_class=Variable
"type_class" is not available.
) at do-lookup.h:72
72 do-lookup.h: No such file or directory.
in do-lookup.h
(gdb) bt
#0 0xb7f8b96f in do_lookup_x (undef_name=0xb7e598d3 "_res",
hash=420035, ref=0xb7e52834, result=0xbffef1f0, scope=0xb7f9c838,
i=0, version=0xb7f78328, flags=0, skip=0x0, type_class=Variable
"type_class" is not available.
) at do-lookup.h:72
#1 0xb7f8bc87 in _dl_lookup_symbol_x (undef_name=0xb7e598d3 "_res",
undef_map=0xb7f78000, ref=0xbffef310,
symbol_scope=0xb7f781a8, version=0xb7f78328, type_class=0, flags=0,
skip_map=0x0) at dl-lookup.c:233
#2 0xb7f8d263 in _dl_relocate_object (l=Variable "l" is not available.
) at ../sysdeps/i386/dl-machine.h:354
#3 0xb7f8631f in dl_main (phdr=0x8048034, phnum=224,
user_entry=0xbffef700) at rtld.c:2235
#4 0xb7f9540e in _dl_sysdep_start (start_argptr=0xbffef760,
dl_main=0xb7f85050 <dl_main>) at ../elf/dl-sysdep.c:239
#5 0xb7f84709 in _dl_start (arg=0xbffef760) at rtld.c:333
#6 0xb7f83847 in _start () at rtld.c:788
(gdb) exit
# gcc -g -o hello /usr/common/tmp/hello.c
# hello
HELLO
# ldd -v ./hello
linux-gate.so.1 => (0xffffe000)
libc.so.6 => /lib/i686/libc.so.6 (0xb7e64000)
/lib/ld-linux.so.2 (0xb7f9c000)

Version information:
./hello:
libc.so.6 (GLIBC_2.0) => /lib/i686/libc.so.6
/lib/i686/libc.so.6:
ld-linux.so.2 (GLIBC_2.1) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.3) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
# ldd -v /usr/common/tmp/hello
linux-gate.so.1 => (0xffffe000)
libc.so.6 => /lib/i686/libc.so.6 (0xb7db6000)
/lib/ld-linux.so.2 (0xb7eee000)

Version information:
/usr/common/tmp/hello:
libc.so.6 (GLIBC_2.0) => /lib/i686/libc.so.6
/lib/i686/libc.so.6:
ld-linux.so.2 (GLIBC_2.1) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.3) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2


Something about the dlopen related startup of any program. Note that
programs compiled previously, when it was still Mandriva 2006.0, still
run find on Mandriva 2007.1 and 2007.0. It never even gets into the
user code. Conversely, programs built on the slaves all run just fine
on the master.

Anybody know what is going on here? It's one of the strangest,
and most massively inconvenient, glitches I've ever seen. The kernel
on 2007.0 was one I built from a kernel.org distribution, the one on
2007.1 is whatever came with the distro.

Thanks,

David Mathog


David Mathog

unread,
Jul 9, 2007, 5:35:14 PM7/9/07
to
David Mathog wrote:
> ANY program built on the master node will not run
> on the slave.

To rule out the possibility this had something to do with the
Opterons on the master I built hello.c on another 2007.1 machine
with an old 850 Mhz Athlon (not even an XP or MP). As before, the
binary created on the 2007.1 system when transferred to the 2007.0
system would not run, failing as before somewhere in the "dl" section
of the startup.

If anybody wants to verify this, here are the binaries:

ftp://saf.bio.caltech.edu/pub/pickup/hello.tar.gz

contents:

hello_2007_0 #built on a 2007.0 system
hello_2007_1 #built on a 2007.1 system
hello.c #compile with: gcc -g -o hello hello.c


Thanks,

David Mathog

David Mathog

unread,
Jul 9, 2007, 6:18:40 PM7/9/07
to
Found it by Google on terms from the back trace.

This issue effects also FC6 to FC5 (generally FC<6) and various other
new to old distributions. In brief, "ld" now has
a --hash-style argument and the default was changed from sysv to gnu.
Suffice it to say that this causes the compatibility problems described
earlier in this post. For more info see for instance:


http://newsgroups.linuxbroker.com/index.php?tab=com&newsgroup=comp.os.linux.development&article=145
http://lkml.org/lkml/2006/7/26/262

To build an application on 2007.1 that will run on earlier Mandriva
releases use either:

gcc -g -o hello -Wl,--hash-style=sysv hello.c

or

gcc -g -o hello -Wl,--hash-style=both hello.c

(That is, pass the required option to the linker).

Unfortunately the older "ld" on 2007.0 has no clue what hash-style=gnu
is, so there's no way to configure a 2007.0 system to do anything
reasonable with hash-style=gnu binaries from 2007.1 systems. The
binaries have to be built backwards compatible if they are to run on
older linux distros.

Regards,

David Mathog

0 new messages