/libhostfxr.so: failed looking up symbol _ZSt15system_categoryv (std::system_category())
[backtrace]0x00000000403562b5 <elf::object::symbol(unsigned int, bool)+1013>0x000000004035637f <elf::object::resolve_pltgot(unsigned int)+127>0x0000000040356559 <elf_resolve_pltgot+57>0x000000004039ce2f <???+1077530159>0x0000000000000001 <???+1>
@@ -1193,7 +1217,7 @@ program::program(void* addr) "libpthread.so.0", "libdl.so.2", "librt.so.1",- "libstdc++.so.6",+ //"libstdc++.so.6", "libaio.so.1", "libxenstore.so.3.0", "libcrypt.so.1",
readelf -s libcoreclr.so | grep gCurrentThreadInforeadelf: Warning: local symbol 31 found at index >= .dynsym's sh_info value of 1 31: 0000000000000000 24 TLS LOCAL HIDDEN 19 gCurrentThreadInfo 9799: 0000000000000000 24 TLS LOCAL HIDDEN 19 gCurrentThreadInfo
@@ -688,6 +691,10 @@ void object::relocate_rela() void *addr = _base + p->r_offset; auto addend = p->r_addend; + if (sym == 31) {+ continue;+ }+
trying to execute null pointer[backtrace]0x000000004039e2de <page_fault+302>0x000000004039d0a6 <???+1077530790>0x0000100000dcf492 <???+14480530>0x0000100000dcf67a <???+14481018>0x0000100000dcf024 <???+14479396>0x0000100000dcee8d <???+14478989>0x0000100000d1a991 <???+13740433>0x0000100000cf375c <???+13580124>0x0000100000a0ad8e <???+10530190>0xffffa000009035df <???+9450975>0xffff006f732e7468 <???+1932424296>
36 (0xffff8000015e1040) /HelloApp cpu0 status::running sched::thread::switch_to() at arch/x64/arch-switch.hh:108 vruntime 1.4495e-2037 (0xffff800001c93040) >/HelloApp cpu0 status::waiting do_poll(std::vector<poll_file, std::allocator<poll_file> >&, boost::optional<std::chrono::time_point<osv::clock::uptime, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > >) at core/poll.cc:274 vruntime 1.68081e-21
(gdb) bt#0 0x00000000403a4522 in processor::cli_hlt () at arch/x64/processor.hh:247#1 arch::halt_no_interrupts () at arch/x64/arch.hh:48#2 osv::halt () at arch/x64/power.cc:26#3 0x00000000402381a4 in abort (fmt=fmt@entry=0x4061c1a8 "trying to execute null pointer") at runtime.cc:132#4 0x000000004039e2df in page_fault (ef=0xffff8000015e6068) at arch/x64/mmu.cc:30#5 <signal handler called>#6 0x0000000000000000 in ?? ()#7 0x0000100000cf49da in ?? ()#8 0x0000200000200780 in ?? ()#9 0x0000000000000000 in ?? ()
(gdb) osv thread 37(gdb) bt#0 sched::thread::switch_to (this=0x230, this@entry=0xffff80000005b040) at arch/x64/arch-switch.hh:108#1 0x00000000403f7184 in sched::cpu::reschedule_from_interrupt (this=0xffff80000001e040, called_from_yield=called_from_yield@entry=false, preempt_after=..., preempt_after@entry=...) at core/sched.cc:339#2 0x00000000403f767c in sched::cpu::schedule () at include/osv/sched.hh:1310#3 0x00000000403f7d62 in sched::thread::wait (this=this@entry=0xffff800001c93040) at core/sched.cc:1214#4 0x0000000040415a08 in sched::thread::do_wait_until<sched::noninterruptible, sched::thread::dummy_lock, do_poll(std::vector<poll_file>&, file::timeout_t)::<lambda()> > (mtx=<synthetic pointer>..., pred=...) at /usr/include/c++/8/bits/atomic_base.h:390#5 sched::thread::wait_until<do_poll(std::vector<poll_file>&, file::timeout_t)::<lambda()> > (pred=...) at include/osv/sched.hh:1077#6 do_poll (pfd=std::vector of length 0, capacity 0, _timeout=...) at core/poll.cc:274#7 0x0000000040415da2 in file::poll_many (_pfd=0x200000300e68, _nfds=1, timeout=...) at /usr/include/c++/8/new:169#8 0x0000000040416041 in file::poll_sync (timeout=..., pfd=..., this=<optimized out>) at /usr/include/c++/8/new:169#9 poll_one (timeout=..., pfd=...) at core/poll.cc:334#10 poll (_pfd=0x200000300e68, _nfds=<optimized out>, _timeout=<optimized out>) at core/poll.cc:351#11 0x00001000010c970e in StgIO::ReadFromDisk(void*, unsigned int, unsigned int*) ()#12 0x00001000010c92e8 in StgIO::GetPtrForMem(unsigned int, unsigned int, void*&) ()#13 0x00001000010c8f64 in StgIO::FreePageMap() ()#14 0x00001000010d186d in MDInternalRW::FindTypeDef(char const*, char const*, unsigned int, unsigned int*) ()#15 0x000000004045b7e6 in pthread_private::pthread::<lambda()>::operator() (__closure=0xffff800000021798) at libc/pthread.cc:114#16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297#17 0x00000000403f8b07 in sched::thread_main_c (t=0xffff800001c93040) at arch/x64/arch-switch.hh:321#18 0x000000004039e023 in thread_main () at arch/x64/entry.S:113
The second issue was related to some old version symbols from the standard C++ library (yes like Java, .NET Core is implemented in C++). More specifically OSv would crash due to failing to find a symbol:
/libhostfxr.so: failed looking up symbol _ZSt15system_categoryv (std::system_category())[backtrace]0x00000000403562b5 <elf::object::symbol(unsigned int, bool)+1013>0x000000004035637f <elf::object::resolve_pltgot(unsigned int)+127>0x0000000040356559 <elf_resolve_pltgot+57>0x000000004039ce2f <???+1077530159>0x0000000000000001 <???+1>The symbol actually exists in OSv but apparently there are many versions of this symbol in the shared library version of libstdc++.so.6 which apparently are not present in the statically linked version of it in OSv kernel.
Is it because during static linking linker only uses the latest version of the symbol. In any case, the solution was to hide the libstdc++.so.6 from OSv dynamic linker and ibstdc++.so.6 from the host to the image. But I wonder if that is NOT as simple as that because I wonder if I am missing something and that something leads to my biggest problem which I am describing in the end.
@@ -1193,7 +1217,7 @@ program::program(void* addr)"libpthread.so.0","libdl.so.2","librt.so.1",- "libstdc++.so.6",+ //"libstdc++.so.6","libaio.so.1","libxenstore.so.3.0","libcrypt.so.1",So after I fixed that I came across another weird problem most likely caused by a linker which somehow Linux deals with but OSv does not. In essence one of symbols from .NET Core library libcoreclr.so - gCurrentThreadInfo - is not found by OSv even though it is there. But readelf, for example, complains about it:
readelf -s libcoreclr.so | grep gCurrentThreadInforeadelf: Warning: local symbol 31 found at index >= .dynsym's sh_info value of 131: 0000000000000000 24 TLS LOCAL HIDDEN 19 gCurrentThreadInfo9799: 0000000000000000 24 TLS LOCAL HIDDEN 19 gCurrentThreadInfoSo the first occurrence is in the '.dynsym' table which is weird, the second one is in the .symtab. Somehow I am able to run the same app just fine on Linux.
Here is what found about it in this example - https://github.com/dynup/kpatch/issues/854#issuecomment-390330525:"the local symbols after the globals in this section"versus ELF spec:"The global symbols immediately follow the local symbols in the symbol table. The first global symbol is identified by the symbol table sh_info value. Local and global symbols are always kept separate in this manner, and cannot be mixed together."
I have also found this issue in coreclr (one of the .NET Core components) - https://github.com/dotnet/coreclr/issues/23621 - where they report and fix almost identical 'symbol not found' for ARM musl by tweaking their build chain (switch from golden linker).
In that example, they deal with it by sorting the table. Not sure I really understand this problem. In either case, I came up with a terrible hack to deal with it myself that maybe also leads to my next and final issue which is the real blocker.
@@ -688,6 +691,10 @@ void object::relocate_rela()void *addr = _base + p->r_offset;auto addend = p->r_addend;+ if (sym == 31) {+ continue;+ }+
--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/e80111ab-b213-4571-87c0-898ab90636ef%40googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.
@@ -857,6 +890,10 @@ Elf64_Sym* object::lookup_symbol_gnu(const char* name) if (idx == 0) { return nullptr; }+ if (_pathname == "/libcoreclr.so" && strcmp(name,"gCurrentThreadInfo") == 0 ) {+ return &symtab[31]; //31 is the index of gCurrentThreadInfo as reported by readelf -s + } auto version_symtab = dynamic_exists(DT_VERSYM) ? dynamic_ptr<Elf64_Versym>(DT_VERSYM) : nullptr; do { if ((chains[idx] & ~1) != (hashval & ~1)) {
diff --git a/fs/vfs/main.cc b/fs/vfs/main.ccindex 7e028ffb..4e6223d6 100644--- a/fs/vfs/main.cc+++ b/fs/vfs/main.cc@@ -1454,6 +1454,7 @@ int fcntl(int fd, int cmd, int arg) // ignored in OSv anyway, as it doesn't support exec(). switch (cmd) { case F_DUPFD:+ case F_DUPFD_CLOEXEC: error = _fdalloc(fp, &ret, arg); if (error) goto out_errno;
327 } else if(!is_sig_ign(signal_actions[sig])) {328 if ((pid == 0) || (pid == -1)) {329 // That semantically means signalling everybody (or that, or the330 // user did getpid() and got 0, all the same. So we will signal331 // every thread that is waiting for this.332 //333 // The thread does not expect the signal handler to still be delivered,334 // so if we wake up some folks (usually just the one waiter), we should335 // not continue processing.336 //337 // FIXME: Maybe it could be a good idea for our getpid() to start338 // returning 1 so we can differentiate between those cases?339 if (wake_up_signal_waiters(sig)) {340 return 0;341 }342 }
diff --git a/core/elf.cc b/core/elf.ccindex 349e3515..d2f58aad 100644--- a/core/elf.cc+++ b/core/elf.cc@@ -889,6 +926,21 @@ Elf64_Sym* object::lookup_symbol(const char* name) return sym; } +symbol_module object::lookup_symbol_with_dependencies(const char* name) {+ auto sym = lookup_symbol(name);+ if (!sym) {+ for (auto dep: _needed) {+ auto sm = dep->lookup_symbol_with_dependencies(name); + if (sm.sym) {+ return sm;+ }+ }+ } else {+ return { sym, this};+ }+ return {nullptr, nullptr};+}+
Before I go into details of what I have found out and what the remedy might be, let me say that I have managed to run simple 'hello world' C# app on dotnet core on OSv. I have also made good progress on running httpserver example as well. You can find some of that here - https://github.com/dotnet/coreclr/issues/27847.Here is the list of the issues I have identified:1. The 'failed to find gCurrentThreadInfo' TLS symbol. I still do not understand the issue. It is most likely caused by a gold linker bug used with clang toolchain which coreclr uses. I am not clear how OSv dynamic linker should handle it. This musl issue might be related - https://www.openwall.com/lists/musl/2019/05/24/1. Meanwhile this super-hacky change fixes it:
@@ -857,6 +890,10 @@ Elf64_Sym* object::lookup_symbol_gnu(const char* name)if (idx == 0) {return nullptr;}+ if (_pathname == "/libcoreclr.so" && strcmp(name,"gCurrentThreadInfo") == 0 ) {+ return &symtab[31]; //31 is the index of gCurrentThreadInfo as reported by readelf -s+ }auto version_symtab = dynamic_exists(DT_VERSYM) ? dynamic_ptr<Elf64_Versym>(DT_VERSYM) : nullptr;do {if ((chains[idx] & ~1) != (hashval & ~1)) {
2. Problem with libstdc++.a where as Nadav explains some old C++ ABI symbols are missing. It seems a way to deal with this would be to add a loader option to hide libstdc++.so in by OSv kernel in lieu of specific libstdc++.so be added to the image.
3. Small issue with fcntl that caused the "trying to execute null pointer" crash (took me a while to figure that out). Here is how it can be fixed:
diff --git a/fs/vfs/main.cc b/fs/vfs/main.ccindex 7e028ffb..4e6223d6 100644--- a/fs/vfs/main.cc+++ b/fs/vfs/main.cc@@ -1454,6 +1454,7 @@ int fcntl(int fd, int cmd, int arg)// ignored in OSv anyway, as it doesn't support exec().switch (cmd) {case F_DUPFD:+ case F_DUPFD_CLOEXEC:error = _fdalloc(fp, &ret, arg);if (error)goto out_errno;
4. Dotnet insists that app PID cannot be zero as returned by getpid() and procfs which is the case with OSv. I am actually in the process of convincing dotnet developers to lift this restriction (on Windows pid 0 is not valid) and they are leaning in favor of that, But meanwhile how OSv is really "married" to PID being 0. Could it be 1? I saw this comment in libc/signal.cc
327 } else if(!is_sig_ign(signal_actions[sig])) {328 if ((pid == 0) || (pid == -1)) {329 // That semantically means signalling everybody (or that, or the330 // user did getpid() and got 0, all the same. So we will signal331 // every thread that is waiting for this.332 //333 // The thread does not expect the signal handler to still be delivered,334 // so if we wake up some folks (usually just the one waiter), we should335 // not continue processing.336 //337 // FIXME: Maybe it could be a good idea for our getpid() to start338 // returning 1 so we can differentiate between those cases?339 if (wake_up_signal_waiters(sig)) {340 return 0;341 }342 }
To make dotnet work on OSv I did change getpid() and procfs return 1.
5. The dlsym() has a bug where it does properly locate symbols found in the dependencies.
It seems this could be a fix:
diff --git a/core/elf.cc b/core/elf.ccindex 349e3515..d2f58aad 100644--- a/core/elf.cc+++ b/core/elf.cc@@ -889,6 +926,21 @@ Elf64_Sym* object::lookup_symbol(const char* name)return sym;}+symbol_module object::lookup_symbol_with_dependencies(const char* name) {+ auto sym = lookup_symbol(name);+ if (!sym) {+ for (auto dep: _needed) {+ auto sm = dep->lookup_symbol_with_dependencies(name);
+ if (sm.sym) {+ return sm;+ }+ }+ } else {+ return { sym, this};+ }+ return {nullptr, nullptr};+}+
Finally dotnet seems to be relying on inotify to monitor configuration changes which OSv does not implement (just stubs). To make the httpserver app work I managed to disable that dotnet feature but I wonder what it would take to implement inotify. Seems not a trivial exercise.