fatal: relocation errors on Solaris x86 and OpenSolaris 64-bit.

372 views
Skip to first unread message

Dr. David Kirkby

unread,
Aug 29, 2010, 7:48:36 PM8/29/10
to ecls...@lists.sourceforge.net, sage-s...@googlegroups.com
Hi,
Several months ago I discovered that Maxima would not build in Sage on 64-bit
OpenSolaris. ECL built, but Maxima would not.

The error was:

make[3]: Entering directory
`/export/home/drkirkby/sage-4.4.2/spkg/build/maxima-5.20.1.p0/src/src'
test -d binary-ecl || mkdir binary-ecl
ecl -norc -eval '(progn (load "../lisp-utils/defsystem.lisp") (funcall (intern
(symbol-name :operate-on-system) :mk) "maxima" :compile :verbose t)
(build-maxima-lib))' -eval '(ext:quit)'
ld.so.1: ecl: fatal: relocation error: R_AMD64_PC32: file
/export/home/drkirkby/sage-4.4.2/local/lib//libecl.so: symbol main: value
0x22800097de04 does not fit
make[3]: *** [binary-ecl/maxima] Killed


More info at

http://trac.sagemath.org/sage_trac/ticket/9099

At the time, I did not have a clue whether this was a Maxima, ECL or Sage issue,
but now I believe it is an ECL issue, as the link-editor thinks the shared
object contains text relocations.

There's a command given on this Sun blog, which will show libraries with this
problem

http://blogs.sun.com/rie/entry/my_relocations_don_t_fit

After downloading the latest from your git repository about an hour ago, and
building ECL, with


export CC="gcc -m64"
./configure
make

I see:

drkirkby@hawk:/tmp/ecl$ elfdump -d ./build/libecl.so | fgrep TEXTREL
[23] TEXTREL 0
[31] FLAGS 0x4 [ TEXTREL ]


which indicates a problem.

One obvious reason for this is if the code is not compiled as position
independent code. Anohter is if the code contains assembly code which is not
position indepedant, but I think I've disabled that when I tried again with

$ ./configure --with-dffi=no

I also tried setting

CC="gcc -m64 -fPIC"

but that did not help either.

Have you any clue how I might resolve this?

Dave

Dr. David Kirkby

unread,
Aug 30, 2010, 7:52:35 AM8/30/10
to Juan Jose Garcia-Ripoll, ecls...@lists.sourceforge.net, sage-s...@googlegroups.com
On 08/30/10 09:10 AM, Juan Jose Garcia-Ripoll wrote:
> I really know nothing about Opensolaris 64 bit model and appropriate
> compiler flags. I would rather have an operating system that gave me just
> one sane build model by default, with not so much to tweak.
>
> However, this seems to be relevant:
> http://forums.sun.com/thread.jspa?threadID=5071225 The appropriate flags
> would be "-xarch=amd64 -KPIC" and all libraries should be built with the
> same flags. If you inspect the library for the object files that have those
> offending relocations it may well be that some components (gc, gmp, some
> files in ECL or in Maxima) are not compiled with the same flags.

Those are compiler flags for the Sun compiler. Not a lot of help with gcc.

> BTW, what is the status of Opensolaris support in Sage.

Sage now builds on OpenSolaris 32-bit, and passes all the Sage test suite.

> This project
> (Opensolaris) apparently has been totally discontinued by their creators.
>
> Juanjo

That looks to be true.

But Solaris 10 has not died, neither or x86 or SPARC. The issues are seen on
Solaris 10 SPARC too, so it not just an x86 problem. Maxima is not complaining
on SPARC, but the elfdump command clearly shows the problem is in the ECL
library file. That problem exits on 64-bit SPARC too.

kirkby@t2:64 ~/t2/64/sage-4.5.3.alpha2/local/lib$ uname -a
SunOS t2 5.10 Generic_141414-02 sun4v sparc SUNW,T5240
kirkby@t2:64 ~/t2/64/sage-4.5.3.alpha2/local/lib$ elfdump -d libecl.so | fgrep
TEXTREL
[18] TEXTREL 0
[30] FLAGS 0x4 [ TEXTREL ]
kirkby@t2:64 ~/t2/64/sage-4.5.3.alpha2/local/lib$

Two of the parts of Sage that are showing problems on 64-bit Solaris/OpenSolaris
buids (ECL and Cliquer) give an output from that command above. So if the
link-editor thinks a file contains non-pic code, then it will present a problem.

The problem can't be Maxima, as I don't need to even build Maxima to show the
ECL code has this problem.

The MPIR library does not exhibit this issue.

kirkby@t2:64 ~/t2/64/sage-4.5.3.alpha2/local/lib$ elfdump -d libmpir.so | fgrep
TEXTREL
kirkby@t2:64 ~/t2/64/sage-4.5.3.alpha2/local/lib$

Neither do most others. The only offenders in Sage are ECL, PolyBoRi and Cliquer.

Sorry, the title is a bit confusing. The problem is seen

* 64-bit SPARC
* 64-bit Solaris 10 on x86
* 64-bit OpenSolaris on x86

Note also, I built your latest sources outside of Sage. The issue is seen there.

Note also, that there's this compiler warnings when I build on Solaris (or any
sort)

/export/home/drkirkby/sage-4.5.3.alpha2/spkg/build/ecl-10.2.1.p2/src/src/c/dpp.c:680:13:
warning: too many arguments for format

When I use the debugging technique mentioned at

http://blogs.sun.com/rie/entry/my_relocations_don_t_fit

$ export LD_OPTIONS=-Dreloc
$ make

and look at the resulting output, it indicates to me, the problem is with dpp.c
- which maybe coincidence, or maybe not, is the same file giving a compiler
warning.


if test -f ../CROSS-DPP; then touch dpp; else \^M
gcc
-I/rootpool2/local/kirkby/t2/64/sage-4.5.3.alpha2/spkg/build/ecl-10.2.1.p2/src/src/c
-I/rootpool2/local/kirkby/t2/64/sage-4.5.3.alpha2/spkg/build/ecl-10.2.1.p2/src/build
-I./
/rootpool2/local/kirkby/t2/64/sage-4.5.3.alpha2/spkg/build/ecl-10.2.1.p2/src/src/c/dpp.c
-I/rootpool2/local/kirkby/t2/64/sage-4.5.3.alpha2/local/include -O2 -m64 -g
-Wall -fPIC -Dsun4sol2 -o dpp ; \^M
fi^M
/rootpool2/local/kirkby/t2/64/sage-4.5.3.alpha2/spkg/build/ecl-10.2.1.p2/src/src/c/dpp.c:
In function �put_declaration�:^M
/rootpool2/local/kirkby/t2/64/sage-4.5.3.alpha2/spkg/build/ecl-10.2.1.p2/src/src/c/dpp.c:678:
warning: too few arguments for format^M
/rootpool2/local/kirkby/t2/64/sage-4.5.3.alpha2/spkg/build/ecl-10.2.1.p2/src/src/c/dpp.c:680:
warning: too many arguments for format^M
debug: ^M
debug: collecting input relocations: section=.text,
file=/usr/local/gcc-4.4.1-sun-linker/bin/../lib/gcc/sparc-sun-solaris2.10/4.4.1/sparcv9/crt1.o^M
debug: type offset addend
section symbol^M
debug: in R_SPARC_WDISP30 0x1c 0
.rela.text atexit ^M
debug: out R_SPARC_JMP_SLOT 0x1c 0
.plt atexit ^M
debug: act R_SPARC_WDISP30 0x1c
.text atexit ^M
<Then *thousdands* more similar notices>

debug: collecting input relocations: section=.text,
file=/usr/local/gcc-4.4.1-sun-linker/bin/../lib/gcc/sparc-sun-solaris2.10/4.4.1/sparcv9/crtbegin.o^M
debug: type offset addend
section symbol^M
debug: in R_SPARC_GOT22 0xc 0
.rela.text completed.4129 ^M
debug: act R_SPARC_GOT22 0xc
.got completed.4129 ^M
debug: act R_SPARC_GOT22 0xc
.text completed.4129 ^M


Dave

Dr. David Kirkby

unread,
Aug 30, 2010, 8:59:08 AM8/30/10
to Juan Jose Garcia-Ripoll, ecls...@lists.sourceforge.net, sage-s...@googlegroups.com
On 08/30/10 01:18 PM, Juan Jose Garcia-Ripoll wrote:
> On Mon, Aug 30, 2010 at 1:52 PM, Dr. David Kirkby
> <david....@onetel.net>wrote:

>
>> But Solaris 10 has not died, neither or x86 or SPARC. The issues are seen
>> on Solaris 10 SPARC too, so it not just an x86 problem. Maxima is not
>> complaining on SPARC, but the elfdump command clearly shows the problem is
>> in the ECL library file. That problem exits on 64-bit SPARC too.
>>
>> kirkby@t2:64 ~/t2/64/sage-4.5.3.alpha2/local/lib$ uname -a
>> SunOS t2 5.10 Generic_141414-02 sun4v sparc SUNW,T5240
>> kirkby@t2:64 ~/t2/64/sage-4.5.3.alpha2/local/lib$ elfdump -d libecl.so |
>> fgrep TEXTREL
>> [18] TEXTREL 0
>> [30] FLAGS 0x4 [ TEXTREL ]
>>
>
> The output of fgrep is not at all informative. If TEXTREL is a signature of
> non-PIC code then elfdump must also provide the names of the symbols that
> have this problem. This will in turn provide a clue as to which object files
> were not compiled with PIC. Could you please provide this information, as I
> suggested in the previous email.


Yes. I can will provide a log.

> The problem can't be Maxima, as I don't need to even build Maxima to show
>> the ECL code has this problem.
>>
>

> Please understand my statements in the appropriate context. Maxima is
> compiled with ECL and while ECL runs just fine, the Maxima executable does
> not. The problem might be in the compilation statements that build the
> resulting code, since ECL seems to work just fine -- even if as you suggest
> there are non-PIC sections.


>
>
>> Note also, that there's this compiler warnings when I build on Solaris (or
>> any sort)
>>
>> /export/home/drkirkby/sage-4.5.3.alpha2/spkg/build/ecl-10.2.1.p2/src/src/c/dpp.c:680:13:
>> warning: too many arguments for format
>>
>

> The compiler is probably wrong. gcc here does not detect any problem and I
> do not see any obvious one in dpp.c In any case this file is not related to
> the relocation issues since its only responsibility is to preprocess C files
> when bootstrapping ECL and it seems to be doing it just fine.
>
> Juanjo
>
If I build using the Sun compiler, with the options you suggest, it fails to
build at all, since the Sun C compiler will not accept C99 code by default. So I
added the option -xc99=all.

drkirkby@hawk:/tmp/ecl$ echo $CC
cc -xarch=amd64 -KPIC -xc99=all


Unfortunately, ECL fails to build properly with that either - see below.
However, the library did build, and does not appear to show the text relocation
problems I got with gcc.

drkirkby@hawk:/tmp/ecl$ elfdump -d build/libecl.so | grep TEXTREL
drkirkby@hawk:/tmp/ecl$


;;; Note:
;;; Invoking external command:
;;; cc -xarch=amd64 -KPIC -xc99=all "-I/tmp/ecl/build/" -g -fPIC -Dsun4sol2
-I"/tmp/ecl/src/c" -w -c "/tmp/eclinitQpaGsg.c" -o "/tmp/eclinitQpaGsg.o"
cc: Warning: -xarch=amd64 is deprecated, use -m64 to create 64-bit programs
;;;
;;; Note:
;;; Invoking external command:
;;; cc -xarch=amd64 -KPIC -xc99=all -o "/tmp/ecl/build/rt.fas"
-L"/tmp/ecl/build/" "/tmp/eclinitQpaGsg.o" "/tmp/ecl/build/ext/rt.o" -dy -G
libecl.so -ldl -lm -lsocket -lnsl -lintl -lgmp
cc: Warning: -xarch=amd64 is deprecated, use -m64 to create 64-bit programs
;;;
;;; Note:
;;; Invoking external command:
;;; cc -xarch=amd64 -KPIC -xc99=all "-I/tmp/ecl/build/" -g -fPIC -Dsun4sol2
-I"/tmp/ecl/src/c" -w -c "/tmp/eclinitRpaGsg.c" -o "/tmp/eclinitRpaGsg.o"
cc: Warning: -xarch=amd64 is deprecated, use -m64 to create 64-bit programs
;;;
;;; Note:
;;; Invoking external command:
;;; cc -xarch=amd64 -KPIC -xc99=all -o "/tmp/ecl/build/bin/ecl"
-L"/tmp/ecl/build/" "/tmp/eclinitRpaGsg.o" "-L./" libecl.so -ldl -lm
-lsocket -lnsl -lintl
cc: Warning: -xarch=amd64 is deprecated, use -m64 to create 64-bit programs
Undefined first referenced
symbol in file
__gmpn_perfect_square_p /tmp/eclinitRpaGsg.o (symbol belongs to
implicit dependency /usr/lib/64/libgmp.so.3)
__gmpz_tdiv_q /tmp/eclinitRpaGsg.o (symbol belongs to
implicit dependency /usr/lib/64/libgmp.so.3)
__gmpq_set /tmp/eclinitRpaGsg.o (symbol belongs to
implicit dependency /usr/lib/64/libgmp.so.3)
__gmpz_set /tmp/eclinitRpaGsg.o (symbol belongs to
implicit dependency /usr/lib/64/libgmp.so.3)
__gmpn_add_n /tmp/eclinitRpaGsg.o (symbol belongs to
implicit dependency /usr/lib/64/libgmp.so.3)
__gmpn_sub_n /tmp/eclinitRpaGsg.o (symbol belongs to
implicit dependency /usr/lib/64/libgmp.so.3)
__gmpn_popcount /tmp/eclinitRpaGsg.o (symbol belongs to
implicit dependency /usr/lib/64/libgmp.so.3)
ld: fatal: symbol referencing errors. No output written to /tmp/ecl/build/bin/ecl
;;;
(SYSTEM "cc -xarch=amd64 -KPIC -xc99=all -o \"/tmp/ecl/build/bin/ecl\"
-L\"/tmp/ecl/build/\" \"/tmp/eclinitRpaGsg.o\" \"-L./\" libecl.so -ldl -lm
-lsocket -lnsl -lintl") returned non-zero value 1

Available restarts:

1. (CONTINUE) Continues anyway.

Broken at TPL.
File: #P"/tmp/ecl/src/lsp/top.lsp" (Position #20393)
SI>>

Dr. David Kirkby

unread,
Nov 5, 2010, 7:23:59 PM11/5/10
to Juan Jose Garcia-Ripoll, ecls...@lists.sourceforge.net, sage-s...@googlegroups.com
On 11/ 5/10 11:16 PM, Juan Jose Garcia-Ripoll wrote:
> On Mon, Aug 30, 2010 at 1:48 AM, Dr. David Kirkby
> <david....@onetel.net>wrote:
>

>> Several months ago I discovered that Maxima would not build in Sage on
>> 64-bit
>> OpenSolaris. ECL built, but Maxima would not. [...]

>> At the time, I did not have a clue whether this was a Maxima, ECL or Sage
>> issue,
>> but now I believe it is an ECL issue, as the link-editor thinks the shared
>> object contains text relocations.[...]
>
>
> Seems that the problem was caused by ECL's threaded interpreter
> optimization, which uses a computed goto. This computed goto is just working
> fine in all other versions of GCC in all other platforms. The obvious
> solution was to deactivate that optimization (a pity, because it is a
> powerful one) for Sun's platform. I have just finished a 64-bit build of ECL
> on Solaris using GCC without problems. Sorry for the delay.
>
> Juanjo
>

Thank you Juanjo.


How did you track the problem down? I've tried, but found it very hard to debug
the issue. In theory one should be able to tie it down to a function I believe,
but I was unable to.

Is it possible you could show me the code changes that were made. If possible,
it would be nice to produce a small test case which exhibits this problem, then
submit that as a gcc bug.

Dave

Dr. David Kirkby

unread,
Nov 5, 2010, 7:26:04 PM11/5/10
to Juan Jose Garcia-Ripoll, ecls...@lists.sourceforge.net, sage-s...@googlegroups.com
On 11/ 5/10 11:16 PM, Juan Jose Garcia-Ripoll wrote:
> On Mon, Aug 30, 2010 at 1:48 AM, Dr. David Kirkby
> <david....@onetel.net>wrote:
>
>> Several months ago I discovered that Maxima would not build in Sage on
>> 64-bit
>> OpenSolaris. ECL built, but Maxima would not. [...]

>> At the time, I did not have a clue whether this was a Maxima, ECL or Sage
>> issue,
>> but now I believe it is an ECL issue, as the link-editor thinks the shared
>> object contains text relocations.[...]
>
>
> Seems that the problem was caused by ECL's threaded interpreter
> optimization, which uses a computed goto. This computed goto is just working
> fine in all other versions of GCC in all other platforms. The obvious
> solution was to deactivate that optimization (a pity, because it is a
> powerful one) for Sun's platform. I have just finished a 64-bit build of ECL
> on Solaris using GCC without problems. Sorry for the delay.
>
> Juanjo
>


BTW, could you make the activation/deactivation an option? Since it causes no
problems on 32-bit builds to my knowledge, it would be nice to be able to enable
the optimization. Otherwise it will cause a performance regression on 32-bit
builds.

Dave

Reply all
Reply to author
Forward
0 new messages