ELF linker woes with GraalVM

20 views
Skip to first unread message

Waldek Kozaczuk

unread,
Oct 19, 2018, 1:58:29 PM10/19/18
to OSv Development
Recently I have been playing with GraalVM (https://github.com/oracle/graal) to see if it is possible to run it on OSv. To that extent I created new OSv app - https://github.com/cloudius-systems/osv-apps/tree/master/graalvm-example. As you can see it has a simple bootstrap main.so that loads a shared library libhello.so generated by GraalVM (similar to golang). The checked in line to build main.so is actually incorrect and should be 
$(CC) -pie -o $@ $(CFLAGS) -I. main.c -L. -lhello -ldl

In any case the app crashes like so:
page fault outside application, addr: 0x0000100000d3f000
[registers]
RIP: 0x00000000002fcb0c <elf::object::arch_relocate_rela(unsigned int, unsigned int, void*, long)+348>
RFL: 0x0000000000000206  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000000000000001  RBX: 0xffffa00001841800  RCX: 0x0000100000d3f030  RDX: 0x0000000000000000
RSI: 0x0000000000000000  RDI: 0xffffa00001841800  RBP: 0x00002000000ff090  R8:  0x0000100000ea9bb8
R9:  0x0000000000000008  R10: 0x0000000000000050  R11: 0x0000000000000000  R12: 0x00000000006a9bb8
R13: 0x0000000000000008  R14: 0x0000000000000050  R15: 0x0000000000000000  RSP: 0x00002000000ff050
Aborted

[backtrace]
0x0000000000298ea8 <???+2723496>
0x0000000000299f06 <mmu::vm_fault(unsigned long, exception_frame*)+294>
0x00000000002f9238 <page_fault+136>
0x00000000002f8086 <???+3113094>
0x00000000002ab5d7 <elf::object::relocate_rela()+263>
0x00000000002ae3b7 <elf::object::relocate()+199>
0x00000000002b1e32 <elf::program::load_object(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+1602>
0x00000000002b1088 <elf::object::load_needed(std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+520>
0x00000000002b1e26 <elf::program::load_object(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::vector<std::shared_ptr<elf::object>, std::allocator<std::shared_ptr<elf::object> > >&)+1590>
0x00000000002b267a <elf::program::get_library(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, bool)+330>
0x0000000000385de1 <osv::application::application(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<0x0000000000386527 <osv::application::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, bool, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>0x000000000038678a <osv::application::run(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+90>
0x00000000002135f9 <do_main_thread(void*)+2601>
0x00000000003b7ce5 <???+3898597>

After adding some debug statements I discovered that the page fault is caused by linker trying to write to area of memory-mapped file segment when relocating symbols (arch_relocate_rela) because the page is read-only per ELF permissions.

When I simply disable enforcing the ELF write permission to make any segment always writable like so:
@@ -328,18 +328,18 @@ void file::load_segment(const Elf64_Phdr& phdr)
     unsigned perm = 0;
     if (phdr.p_flags & PF_X)
         perm |= mmu::perm_exec;
-    if (phdr.p_flags & PF_W)
+    //if (phdr.p_flags & PF_W)
         perm |= mmu::perm_write;
     if (phdr.p_flags & PF_R)
         perm |= mmu::perm_read;
the program executes just fine.

What is also interesting when I build main as non-pie to run it on Linux it works fine as well:
$(CC) -o $@ $(CFLAGS) -I. main.c -L. -lhello -ldl

So it seems like Linux ELF loader is possibly more lenient than OSv or OSv has some bug. And possibly something with GraalVM generated ELF libhello.so is abnormal and the relocations should not be happening in segments marked as read-only. Or maybe OSv linker enforces the ELF permission (especially write ones) to early and it needs to be delayed after the relocation process is finished by doing mprotect().

BTW I have added some debug statements that print interesting stuff when OSv tries to link:
[/main.so]:Loading segment with base:0x100000400000 at:0x100000400000 of size:4096 from file at offset:0 which is locked:0
[/main.so]:Mmapped file at: 0x100000400000 with permissions - exec:1, write:0, read:4
[/main.so]:Loading segment with base:0x100000400000 at:0x100000600000 of size:8192 from file at offset:0 which is locked:0
[/main.so]:Mmapped file at: 0x100000600000 with permissions - exec:0, write:2, read:4
[/libhello.so]:Loading segment with base:0x100000800000 at:0x100000800000 of size:7806976 from file at offset:0 which is locked:0
[/libhello.so]:Mmapped file at: 0x100000800000 with permissions - exec:1, write:0, read:4
[/libhello.so]:Loading segment with base:0x100000800000 at:0x100001171000 of size:884736 from file at offset:7802880 which is locked:0
[/libhello.so]:Mmapped file at: 0x100001171000 with permissions - exec:0, write:2, read:4
[/libhello.so]:object::relocate_rela
object::relocate_rela: base:0x100000800000, sym:0, addr:0x100000d3f030, offset:53f030
--> Was vma found:1, access fault:1
page fault outside application, addr: 0x0000100000d3f000
[registers]
RIP: 0x00000000002fcb0c <elf::object::arch_relocate_rela(unsigned int, unsigned int, void*, long)+348>
RFL: 0x0000000000000206  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000000000000001  RBX: 0xffffa00001841800  RCX: 0x0000100000d3f030  RDX: 0x0000000000000000
RSI: 0x0000000000000000  RDI: 0xffffa00001841800  RBP: 0x00002000000ff090  R8:  0x0000100000ea9bb8
R9:  0x0000000000000008  R10: 0x0000000000000050  R11: 0x0000000000000000  R12: 0x00000000006a9bb8
R13: 0x0000000000000008  R14: 0x0000000000000050  R15: 0x0000000000000000  RSP: 0x00002000000ff050
Aborted

And some readelf info:
readelf -e libhello.so
....
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000007717dc 0x00000000007717dc  R E    200000
  LOAD           0x0000000000771dd0 0x0000000000971dd0 0x0000000000971dd0
                 0x00000000000d6730 0x00000000000d6738  RW     200000
  DYNAMIC        0x0000000000771de8 0x0000000000971de8 0x0000000000971de8
                 0x00000000000001f0 0x00000000000001f0  RW     8
  NOTE           0x00000000000001c8 0x00000000000001c8 0x00000000000001c8
                 0x0000000000000024 0x0000000000000024  R      4
  GNU_EH_FRAME   0x0000000000771740 0x0000000000771740 0x0000000000771740
                 0x0000000000000024 0x0000000000000024  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RWE    10
  GNU_RELRO      0x0000000000771dd0 0x0000000000971dd0 0x0000000000971dd0
                 0x0000000000000230 0x0000000000000230  R      1

The 0x100001171000 address falls within 1st non-writable segment (I think).

Lastly I was comparing it to golang example that obviously works and similar info is like so:
[/hello.so]:Loading segment with base:0x100000c00000 at:0x100000c00000 of size:1032192 from file at offset:0 which is locked:0
[/hello.so]:Mmapped file at: 0x100000c00000 with permissions - exec:1, write:0, read:4
[/hello.so]:Loading segment with base:0x100000c00000 at:0x100000efc000 of size:675840 from file at offset:1032192 which is locked:0
[/hello.so]:Mmapped file at: 0x100000efc000 with permissions - exec:0, write:2, read:4
[/hello.so]:object::relocate_rela
object::relocate_rela: base:0x100000c00000, sym:0, addr:0x100000efc300, offset:2fc300
object::relocate_rela: base:0x100000c00000, sym:0, addr:0x100000efc308, offset:2fc308
object::relocate_rela: base:0x100000c00000, sym:0, addr:0x100000efc310, offset:2fc310
object::relocate_rela: base:0x100000c00000, sym:0, addr:0x100000f0eeb0, offset:30eeb0
....

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000000fbc58 0x00000000000fbc58  R E    200000
  LOAD           0x00000000000fc300 0x00000000002fc300 0x00000000002fc300
                 0x00000000000a4184 0x00000000000c4ef8  RW     200000
  DYNAMIC        0x000000000018ddb8 0x000000000038ddb8 0x000000000038ddb8
                 0x0000000000000200 0x0000000000000200  RW     8
  NOTE           0x0000000000000238 0x0000000000000238 0x0000000000000238
                 0x0000000000000024 0x0000000000000024  R      4
  NOTE           0x00000000000fbc20 0x00000000000fbc20 0x00000000000fbc20
                 0x0000000000000038 0x0000000000000038  R      20
  TLS            0x00000000000fc300 0x00000000002fc300 0x00000000002fc300
                 0x0000000000000000 0x0000000000000008  R      8
  GNU_EH_FRAME   0x00000000000fb818 0x00000000000fb818 0x00000000000fb818
                 0x00000000000000ac 0x00000000000000ac  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     10
  GNU_RELRO      0x00000000000fc300 0x00000000002fc300 0x00000000002fc300
                 0x0000000000091d00 0x0000000000091d00  R      1


Any ideas as to what is wrong are greatly appreciated.

Regards,
Waldek

Rick Payne

unread,
Oct 19, 2018, 4:53:52 PM10/19/18
to Waldek Kozaczuk, OSv Development
On Fri, 2018-10-19 at 10:58 -0700, Waldek Kozaczuk wrote:
> Recently I have been playing with GraalVM (
> https://github.com/oracle/graal) to see if it is possible to run it
> on OSv. To that extent I created new OSv app -
> https://github.com/cloudius-systems/osv-apps/tree/master/graalvm-example
> . As you can see it has a simple bootstrap main.so that loads a
> shared library libhello.so generated by GraalVM (similar to golang).
> The checked in line to build main.so is actually incorrect and should
> be
> $(CC) -pie -o $@ $(CFLAGS) -I. main.c -L. -lhello -ldl
>
> In any case the app crashes like so:
> page fault outside application, addr: 0x0000100000d3f000
> [registers]

This looks very similar to the issue I had with GNU_RELRO sections. In
my case the getenv symbol was the cause of failure. I tried a few of
Nadav's suggestions but got no closer to solving it - then the problem
went away for me. Sorry I can't be much help.

(See my thread 'Page fault outside of application').

Rick


Waldek Kozaczuk

unread,
Oct 20, 2018, 12:42:10 AM10/20/18
to OSv Development
Thanks but I think my issue is different - very repeatable and does not depend on GCC version.

After all I think that the shared library libhello.so was compiled without -fPIC (even though I pass -H:+GeneratePIC option to GraalVM native-image) based on this test:
readelf -a libhello.so | grep -i textrel
 0x0000000000000016 (TEXTREL)            0x0

which is weird because as I understand for 64-bit shared library (not pie) gcc requires passing -fPIC I think. So something seems smells not right with native-image.

Anyhow it would be nice OSv detected TEXTREL and logged that instead of crashing with page fault.

Waldek

Waldek Kozaczuk

unread,
Oct 20, 2018, 10:30:17 AM10/20/18
to OSv Development
I have pushed the latest graalvm-example example so anyone can reproduce the problem.

So the mystery is why even though libhello.so was compiled without -fPIC (or maybe it was) it still works just fine on Linux and does not on OSv. How is it possible?

If it is a valid ELF than it looks like we could simply make the segment writable if TEXTREL present and fix permissions later after relacations.

Waldek

Nadav Har'El

unread,
Oct 21, 2018, 2:20:36 AM10/21/18
to Waldek Kozaczuk, Osv Dev
On Sat, Oct 20, 2018 at 7:42 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
Thanks but I think my issue is different - very repeatable and does not depend on GCC version.

After all I think that the shared library libhello.so was compiled without -fPIC (even though I pass -H:+GeneratePIC option to GraalVM native-image) based on this test:
readelf -a libhello.so | grep -i textrel
 0x0000000000000016 (TEXTREL)            0x0

which is weird because as I understand for 64-bit shared library (not pie) gcc requires passing -fPIC I think. So something seems smells not right with native-image.

Anyhow it would be nice OSv detected TEXTREL and logged that instead of crashing with page fault.

It's amazing how every time that I think I understood ELF, comes yet another detail I never noticed.

I'm curious where this TEXTREL entry comes from. The linker has an option --warn-shared-textrel (and also -z text) which you can use to warn about, or forbid, adding DT_TEXTREL - maybe this way we can understand where this comes from, and why.

Beyond understanding where this TEXTREL comes from, if we will come to an understanding that it is necessary to support, what we could do, when the DT_TEXTREL marker exists, map all the segments with write permissions (as you did), and then after relocation, in fix_permissions(), mprotect *all* the segments with the desired final permissions - not just the relro ones.


Waldek

On Friday, October 19, 2018 at 4:53:52 PM UTC-4, rickp wrote:
On Fri, 2018-10-19 at 10:58 -0700, Waldek Kozaczuk wrote:
> Recently I have been playing with GraalVM (
> https://github.com/oracle/graal) to see if it is possible to run it
> on OSv. To that extent I created new OSv app -
> https://github.com/cloudius-systems/osv-apps/tree/master/graalvm-example
> . As you can see it has a simple bootstrap main.so that loads a
> shared library libhello.so generated by GraalVM (similar to golang).
> The checked in line to build main.so is actually incorrect and should
> be
> $(CC) -pie -o $@ $(CFLAGS) -I. main.c -L. -lhello -ldl
>
> In any case the app crashes like so:
> page fault outside application, addr: 0x0000100000d3f000
> [registers]

This looks very similar to the issue I had with GNU_RELRO sections. In
my case the getenv symbol was the cause of failure. I tried a few of
Nadav's suggestions but got no closer to solving it - then the problem
went away for me. Sorry I can't be much help.

(See my thread 'Page fault outside of application').

Rick


--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nadav Har'El

unread,
Oct 21, 2018, 2:27:15 AM10/21/18
to Waldek Kozaczuk, Osv Dev
On Sat, Oct 20, 2018 at 5:30 PM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
I have pushed the latest graalvm-example example so anyone can reproduce the problem.

For me, I get a different problem:

$ scripts/build image=graalvm-example
$ scripts/run.py
OSv v0.52.0
eth0: 192.168.122.15
bad executable type (only shared-object or PIE supported). Powering off.

Is it possible that you did multiple builds and ended up with some sort of mix between different compilations? Can you try "scripts/build clean" or something more manual to remove old compiled stuff in graalvm-example?

Nadav Har'El

unread,
Oct 21, 2018, 3:09:34 AM10/21/18
to Waldek Kozaczuk, Osv Dev
On Sun, Oct 21, 2018 at 9:20 AM Nadav Har'El <n...@scylladb.com> wrote:


On Sat, Oct 20, 2018 at 7:42 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
Thanks but I think my issue is different - very repeatable and does not depend on GCC version.

After all I think that the shared library libhello.so was compiled without -fPIC (even though I pass -H:+GeneratePIC option to GraalVM native-image) based on this test:
readelf -a libhello.so | grep -i textrel
 0x0000000000000016 (TEXTREL)            0x0

which is weird because as I understand for 64-bit shared library (not pie) gcc requires passing -fPIC I think. So something seems smells not right with native-image.

Anyhow it would be nice OSv detected TEXTREL and logged that instead of crashing with page fault.

It's amazing how every time that I think I understood ELF, comes yet another detail I never noticed.

I'm curious where this TEXTREL entry comes from. The linker has an option --warn-shared-textrel (and also -z text) which you can use to warn about, or forbid, adding DT_TEXTREL - maybe this way we can understand where this comes from, and why.

Beyond understanding where this TEXTREL comes from, if we will come to an understanding that it is necessary to support, what we could do, when the DT_TEXTREL marker exists, map all the segments with write permissions (as you did), and then after relocation, in fix_permissions(), mprotect *all* the segments with the desired final permissions - not just the relro ones.

I opened https://github.com/cloudius-systems/osv/issues/1004 with my understanding of this issue.


Waldek Kozaczuk

unread,
Oct 21, 2018, 8:05:21 AM10/21/18
to Nadav Har'El, Osv Dev
I did not update the osv project to the latest osv-apps change so you have to independently git pull in the apps subdirectory. 

Sent from my iPhone
Reply all
Reply to author
Forward
0 new messages