Trying to run .NET Core hello world on OSv - help needed!

165 views
Skip to first unread message

Waldek Kozaczuk

unread,
Nov 12, 2019, 12:10:58 AM11/12/19
to OSv Development
Hi,

A couple of months ago I made some tiny enhancements - https://github.com/cloudius-systems/osv/commit/d39860d112c33959970405006469604ad543a415 - to make simple mono example run on OSv. To those unfamiliar, mono is an open-source Dotnet platform (https://www.mono-project.com/) initiated outside of Microsoft originally. Later I also tried to make simple HTTP server mono app run and even made it to boot and serve some initial number to requests to learn in the end that mono heavily uses signals-related functions that OSv simply does not support. So to me, that was the dead-end of trying to better sypport mono on OSv.

Now, not that long time ago Microsoft announced .NET Core which is their new official open-source implementation of .NET that can run on Windows, Linux, and Mac. You can read more about it here - https://docs.microsoft.com/en-us/dotnet/core/about. So, in essence, .NET Core provides am alternative to Java - similarly - managed runtime based on bytecode with both JIT and AOT capabilities. And given it is fully supported by Microsoft and newer, it might be better to focus on it rather than mono. So here are my findings of trying to run .NET Core 'hello world' on OSv.

The first issue I encountered was that .NET Core runtime needs '/proc/self/exe' which support I have already implemented and sent path. That was the easy part.

The second issue was related to some old version symbols from the standard C++ library (yes like Java, .NET Core is implemented in C++). More specifically OSv would crash due to failing to find a symbol:

/libhostfxr.so: failed looking up symbol _ZSt15system_categoryv (std::system_category())

[backtrace]
0x00000000403562b5 <elf::object::symbol(unsigned int, bool)+1013>
0x000000004035637f <elf::object::resolve_pltgot(unsigned int)+127>
0x0000000040356559 <elf_resolve_pltgot+57>
0x000000004039ce2f <???+1077530159>
0x0000000000000001 <???+1>

The symbol actually exists in OSv but apparently there are many versions of this symbol in the shared library version of libstdc++.so.6 which apparently are not present in the statically linked version of it in OSv kernel. Is it because during static linking linker only uses the latest version of the symbol. In any case, the solution was to hide the libstdc++.so.6 from OSv dynamic linker and ibstdc++.so.6 from the host to the image. But I wonder if that is NOT as simple as that because I wonder if I am missing something and that something leads to my biggest problem which I am describing in the end.

@@ -1193,7 +1217,7 @@ program::program(void* addr)
           "libpthread.so.0",
           "libdl.so.2",
           "librt.so.1",
-          "libstdc++.so.6",
+          //"libstdc++.so.6",
           "libaio.so.1",
           "libxenstore.so.3.0",
           "libcrypt.so.1",
 
So after I fixed that I came across another weird problem most likely caused by a linker which somehow Linux deals with but OSv does not. In essence one of symbols from .NET Core library libcoreclr.so -  gCurrentThreadInfo - is not found by OSv even though it is there. But readelf, for example, complains about it:

readelf -s libcoreclr.so | grep gCurrentThreadInfo
readelf: Warning: local symbol 31 found at index >= .dynsym's sh_info value of 1
    31: 0000000000000000    24 TLS     LOCAL  HIDDEN    19 gCurrentThreadInfo
  9799: 0000000000000000    24 TLS     LOCAL  HIDDEN    19 gCurrentThreadInfo

So the first occurrence is in the '.dynsym' table which is weird, the second one is in the .symtab. Somehow I am able to run the same app just fine on Linux. 

Here is what found about it in this example - https://github.com/dynup/kpatch/issues/854#issuecomment-390330525:
"the local symbols after the globals in this section

versus ELF spec:

"The global symbols immediately follow the local symbols in the symbol table. The first global symbol is identified by the symbol table sh_info value. Local and global symbols are always kept separate in this manner, and cannot be mixed together."

I have also found this issue in coreclr (one of the .NET Core components) - https://github.com/dotnet/coreclr/issues/23621 - where they report and fix almost identical 'symbol not found' for ARM musl by tweaking their build chain (switch from golden linker).

In that example, they deal with it by sorting the table. Not sure I really understand this problem. In either case, I came up with a terrible hack to deal with it myself that maybe also leads to my next and final issue which is the real blocker.

@@ -688,6 +691,10 @@ void object::relocate_rela()
         void *addr = _base + p->r_offset;
         auto addend = p->r_addend;
 
+        if (sym == 31) {
+            continue;
+        }
+

So with all that the app boots but crashes like so:

trying to execute null pointer
[backtrace]
0x000000004039e2de <page_fault+302>
0x000000004039d0a6 <???+1077530790>
0x0000100000dcf492 <???+14480530>
0x0000100000dcf67a <???+14481018>
0x0000100000dcf024 <???+14479396>
0x0000100000dcee8d <???+14478989>
0x0000100000d1a991 <???+13740433>
0x0000100000cf375c <???+13580124>
0x0000100000a0ad8e <???+10530190>
0xffffa000009035df <???+9450975>
0xffff006f732e7468 <???+1932424296>

When I connect with dbg I get this stack trace (I believe for thread 36) which seems to indicate the stack is corrupt:
36 (0xffff8000015e1040) /HelloApp       cpu0 status::running sched::thread::switch_to() at arch/x64/arch-switch.hh:108 vruntime   1.4495e-20
37 (0xffff800001c93040) >/HelloApp      cpu0 status::waiting do_poll(std::vector<poll_file, std::allocator<poll_file> >&, boost::optional<std::chrono::time_point<osv::clock::uptime, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > >) at core/poll.cc:274 vruntime  1.68081e-21

(gdb) bt
#0  0x00000000403a4522 in processor::cli_hlt () at arch/x64/processor.hh:247
#1  arch::halt_no_interrupts () at arch/x64/arch.hh:48
#2  osv::halt () at arch/x64/power.cc:26
#3  0x00000000402381a4 in abort (fmt=fmt@entry=0x4061c1a8 "trying to execute null pointer") at runtime.cc:132
#4  0x000000004039e2df in page_fault (ef=0xffff8000015e6068) at arch/x64/mmu.cc:30
#5  <signal handler called>
#6  0x0000000000000000 in ?? ()
#7  0x0000100000cf49da in ?? ()
#8  0x0000200000200780 in ?? ()
#9  0x0000000000000000 in ?? ()

When I switch to another and only child thread the stack looks much better:
(gdb) osv thread 37
(gdb) bt
#0  sched::thread::switch_to (this=0x230, this@entry=0xffff80000005b040) at arch/x64/arch-switch.hh:108
#1  0x00000000403f7184 in sched::cpu::reschedule_from_interrupt (this=0xffff80000001e040, called_from_yield=called_from_yield@entry=false, 
    preempt_after=..., preempt_after@entry=...) at core/sched.cc:339
#2  0x00000000403f767c in sched::cpu::schedule () at include/osv/sched.hh:1310
#3  0x00000000403f7d62 in sched::thread::wait (this=this@entry=0xffff800001c93040) at core/sched.cc:1214
#4  0x0000000040415a08 in sched::thread::do_wait_until<sched::noninterruptible, sched::thread::dummy_lock, do_poll(std::vector<poll_file>&, file::timeout_t)::<lambda()> > (mtx=<synthetic pointer>..., pred=...) at /usr/include/c++/8/bits/atomic_base.h:390
#5  sched::thread::wait_until<do_poll(std::vector<poll_file>&, file::timeout_t)::<lambda()> > (pred=...) at include/osv/sched.hh:1077
#6  do_poll (pfd=std::vector of length 0, capacity 0, _timeout=...) at core/poll.cc:274
#7  0x0000000040415da2 in file::poll_many (_pfd=0x200000300e68, _nfds=1, timeout=...) at /usr/include/c++/8/new:169
#8  0x0000000040416041 in file::poll_sync (timeout=..., pfd=..., this=<optimized out>) at /usr/include/c++/8/new:169
#9  poll_one (timeout=..., pfd=...) at core/poll.cc:334
#10 poll (_pfd=0x200000300e68, _nfds=<optimized out>, _timeout=<optimized out>) at core/poll.cc:351
#11 0x00001000010c970e in StgIO::ReadFromDisk(void*, unsigned int, unsigned int*) ()
#12 0x00001000010c92e8 in StgIO::GetPtrForMem(unsigned int, unsigned int, void*&) ()
#13 0x00001000010c8f64 in StgIO::FreePageMap() ()
#14 0x00001000010d186d in MDInternalRW::FindTypeDef(char const*, char const*, unsigned int, unsigned int*) ()
#15 0x000000004045b7e6 in pthread_private::pthread::<lambda()>::operator() (__closure=0xffff800000021798) at libc/pthread.cc:114
#16 std::_Function_handler<void(), pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/8/bits/std_function.h:297
#17 0x00000000403f8b07 in sched::thread_main_c (t=0xffff800001c93040) at arch/x64/arch-switch.hh:321
#18 0x000000004039e023 in thread_main () at arch/x64/entry.S:113

I have a feeling this somehow has to do with TLS. I think .dotnet uses dynamic TLS which OSv supports well minus any bugs we are not aware of.

Any suggestions on how to debug/fix it?

Thanks in advance,
Waldek

Nadav Har'El

unread,
Nov 14, 2019, 5:09:34 AM11/14/19
to Waldek Kozaczuk, OSv Development
On Tue, Nov 12, 2019 at 7:11 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:

The second issue was related to some old version symbols from the standard C++ library (yes like Java, .NET Core is implemented in C++). More specifically OSv would crash due to failing to find a symbol:

/libhostfxr.so: failed looking up symbol _ZSt15system_categoryv (std::system_category())

[backtrace]
0x00000000403562b5 <elf::object::symbol(unsigned int, bool)+1013>
0x000000004035637f <elf::object::resolve_pltgot(unsigned int)+127>
0x0000000040356559 <elf_resolve_pltgot+57>
0x000000004039ce2f <???+1077530159>
0x0000000000000001 <???+1>

The symbol actually exists in OSv but apparently there are many versions of this symbol in the shared library version of libstdc++.so.6 which apparently are not present in the statically linked version of it in OSv kernel.
 
On my host I see:

$ nm -CD /usr/lib64/libstdc++.so.6.0.27
00000000000d7660 T std::_V2::system_category()
00000000000a8820 T std::system_category()

$ nm -C /usr/lib/gcc/x86_64-redhat-linux/9/libstdc++.a
0000000000000000 T std::_V2::system_category()

This "_V2" thing is not a symbol version, it's a namespace in the C++ code.
It seems to me like a bug in the static library which misses the one in the std namespace. Maybe it should be reported to gcc or Fedora, or you want to investigate it yourself, but I don't think this is an OSv bug. Apparently other people have noticed this too: https://lists.llvm.org/pipermail/llvm-dev/2018-May/123745.html

My guess is that what is happening is that new code compiles to use this _V2 ABI, so it doesn't have problems. But old code which uses the older ABI (without _V2) doesn't work with the static library.

 
Is it because during static linking linker only uses the latest version of the symbol. In any case, the solution was to hide the libstdc++.so.6 from OSv dynamic linker and ibstdc++.so.6 from the host to the image. But I wonder if that is NOT as simple as that because I wonder if I am missing something and that something leads to my biggest problem which I am describing in the end.

@@ -1193,7 +1217,7 @@ program::program(void* addr)
           "libpthread.so.0",
           "libdl.so.2",
           "librt.so.1",
-          "libstdc++.so.6",
+          //"libstdc++.so.6",
           "libaio.so.1",
           "libxenstore.so.3.0",
           "libcrypt.so.1",
 
So after I fixed that I came across another weird problem most likely caused by a linker which somehow Linux deals with but OSv does not. In essence one of symbols from .NET Core library libcoreclr.so -  gCurrentThreadInfo - is not found by OSv even though it is there. But readelf, for example, complains about it:

readelf -s libcoreclr.so | grep gCurrentThreadInfo
readelf: Warning: local symbol 31 found at index >= .dynsym's sh_info value of 1
    31: 0000000000000000    24 TLS     LOCAL  HIDDEN    19 gCurrentThreadInfo
  9799: 0000000000000000    24 TLS     LOCAL  HIDDEN    19 gCurrentThreadInfo

So the first occurrence is in the '.dynsym' table which is weird, the second one is in the .symtab. Somehow I am able to run the same app just fine on Linux. 

Maybe there's a problem that this is a STB_LOCAL and not STB_GLOBAL?

STB_LOCAL symbols should be visible inside the same object, but not outside, maybe we didn't implement this correctly. I guess you'll need to add printouts and see what goes on when the code tries to look for this symbol.
 

Here is what found about it in this example - https://github.com/dynup/kpatch/issues/854#issuecomment-390330525:
"the local symbols after the globals in this section

versus ELF spec:

"The global symbols immediately follow the local symbols in the symbol table. The first global symbol is identified by the symbol table sh_info value. Local and global symbols are always kept separate in this manner, and cannot be mixed together."

I wonder why we even need to care about this, though... Since each symbol is marked global or local, who cares about their order?

I have also found this issue in coreclr (one of the .NET Core components) - https://github.com/dotnet/coreclr/issues/23621 - where they report and fix almost identical 'symbol not found' for ARM musl by tweaking their build chain (switch from golden linker).

I don't understand the details there, but we already avoid gold linker by using in Makefile "LD=ld.bfd" (https://github.com/cloudius-systems/osv/commit/d21e39fee2fa0b9a90873340509b8f4031e44bf4)
 

In that example, they deal with it by sorting the table. Not sure I really understand this problem. In either case, I came up with a terrible hack to deal with it myself that maybe also leads to my next and final issue which is the real blocker.

@@ -688,6 +691,10 @@ void object::relocate_rela()
         void *addr = _base + p->r_offset;
         auto addend = p->r_addend;
 
+        if (sym == 31) {
+            continue;
+        }
+


But won't this cause this symbol not to work correctly? I assume the code needs it to work correctly? :-)
 
--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/e80111ab-b213-4571-87c0-898ab90636ef%40googlegroups.com.

Waldek Kozaczuk

unread,
Nov 14, 2019, 7:20:37 AM11/14/19
to OSv Development
Thanks for your response. 

Meanwhile, I have opened an issue in coreclr projects - https://github.com/dotnet/coreclr/issues/27847. I write there that I also discovered that libcoreclr.so is built to use -fstack-protector-strong which OSv does not support - https://github.com/cloudius-systems/osv/issues/589. I do not believe that has anything to do with "the missing symbol problem" nor "trying to execute null pointer" one. I even implemented  -fstack-protector-strong support in OSv which will eventually matter.
That is what I am trying to do. I think we might have a limitation in our dynamic linker when we temporarily mark all symbols as not visible using _visble flag so they are not visible to anyone when they are processed even the library itself. Could that be an issue? I even hardcoded it to force it visible and still does not help. 
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.

Waldek Kozaczuk

unread,
Nov 20, 2019, 11:01:00 PM11/20/19
to OSv Development
Before I go into details of what I have found out and what the remedy might be, let me say that I have managed to run simple 'hello world' C# app on dotnet core on OSv. I have also made good progress on running httpserver example as well. You can find some of that here - https://github.com/dotnet/coreclr/issues/27847.

Here is the list of the issues I have identified:

1. The 'failed to find gCurrentThreadInfo' TLS symbol. I still do not understand the issue. It is most likely caused by a gold linker bug used with clang toolchain which coreclr uses. I am not clear how OSv dynamic linker should handle it. This musl issue might be related - https://www.openwall.com/lists/musl/2019/05/24/1. Meanwhile this super-hacky change fixes it:

@@ -857,6 +890,10 @@ Elf64_Sym* object::lookup_symbol_gnu(const char* name)
     if (idx == 0) {
         return nullptr;
     }
+    if (_pathname == "/libcoreclr.so" && strcmp(name,"gCurrentThreadInfo") == 0 ) {
+        return &symtab[31]; //31 is the index of gCurrentThreadInfo as reported by readelf -s 
+    }
     auto version_symtab = dynamic_exists(DT_VERSYM) ? dynamic_ptr<Elf64_Versym>(DT_VERSYM) : nullptr;
     do {
         if ((chains[idx] & ~1) != (hashval & ~1)) {

2. Problem with libstdc++.a where as Nadav explains some old C++ ABI symbols are missing. It seems a way to deal with this would be to add a loader option to hide libstdc++.so in by OSv kernel in lieu of specific libstdc++.so be added to the image.

3. Small issue with fcntl that caused the "trying to execute null pointer" crash (took me a while to figure that out). Here is how it can be fixed:
diff --git a/fs/vfs/main.cc b/fs/vfs/main.cc
index 7e028ffb..4e6223d6 100644
--- a/fs/vfs/main.cc
+++ b/fs/vfs/main.cc
@@ -1454,6 +1454,7 @@ int fcntl(int fd, int cmd, int arg)
     // ignored in OSv anyway, as it doesn't support exec().
     switch (cmd) {
     case F_DUPFD:
+    case F_DUPFD_CLOEXEC:
         error = _fdalloc(fp, &ret, arg);
         if (error)
             goto out_errno;

4. Dotnet insists that app PID cannot be zero as returned by getpid() and procfs which is the case with OSv. I am actually in the process of convincing dotnet developers to lift this restriction (on Windows pid 0 is not valid) and they are leaning in favor of that, But meanwhile how OSv is really "married" to PID being 0. Could it be 1? I saw this comment in libc/signal.cc
327     } else if(!is_sig_ign(signal_actions[sig])) {
328         if ((pid == 0) || (pid == -1)) {
329             // That semantically means signalling everybody (or that, or the
330             // user did getpid() and got 0, all the same. So we will signal
331             // every thread that is waiting for this.
332             //
333             // The thread does not expect the signal handler to still be delivered,
334             // so if we wake up some folks (usually just the one waiter), we should
335             // not continue processing.
336             //
337             // FIXME: Maybe it could be a good idea for our getpid() to start
338             // returning 1 so we can differentiate between those cases?
339             if (wake_up_signal_waiters(sig)) {
340                 return 0;
341             }
342         }

To make dotnet work on OSv I did change getpid() and procfs return 1.

5. The dlsym() has a bug where it does properly locate symbols found in the dependencies.

It seems this could be a fix:
diff --git a/core/elf.cc b/core/elf.cc
index 349e3515..d2f58aad 100644
--- a/core/elf.cc
+++ b/core/elf.cc
@@ -889,6 +926,21 @@ Elf64_Sym* object::lookup_symbol(const char* name)
     return sym;
 }
 
+symbol_module object::lookup_symbol_with_dependencies(const char* name) {
+    auto sym = lookup_symbol(name);
+    if (!sym) {
+        for (auto dep: _needed) {
+            auto sm = dep->lookup_symbol_with_dependencies(name); 
+            if (sm.sym) {
+                return sm;
+            }
+        }
+    } else {
+        return { sym, this};
+    }
+    return {nullptr, nullptr};
+}
+

Finally dotnet seems to be relying on inotify to monitor configuration changes which OSv does not implement (just stubs). To make the httpserver app work I managed to disable that dotnet feature but I wonder what it would take to implement inotify. Seems not a trivial exercise. 

Walde

Nadav Har'El

unread,
Nov 24, 2019, 8:33:49 AM11/24/19
to Waldek Kozaczuk, OSv Development
On Thu, Nov 21, 2019 at 6:01 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
Before I go into details of what I have found out and what the remedy might be, let me say that I have managed to run simple 'hello world' C# app on dotnet core on OSv. I have also made good progress on running httpserver example as well. You can find some of that here - https://github.com/dotnet/coreclr/issues/27847.

Here is the list of the issues I have identified:

1. The 'failed to find gCurrentThreadInfo' TLS symbol. I still do not understand the issue. It is most likely caused by a gold linker bug used with clang toolchain which coreclr uses. I am not clear how OSv dynamic linker should handle it. This musl issue might be related - https://www.openwall.com/lists/musl/2019/05/24/1. Meanwhile this super-hacky change fixes it:

@@ -857,6 +890,10 @@ Elf64_Sym* object::lookup_symbol_gnu(const char* name)
     if (idx == 0) {
         return nullptr;
     }
+    if (_pathname == "/libcoreclr.so" && strcmp(name,"gCurrentThreadInfo") == 0 ) {
+        return &symtab[31]; //31 is the index of gCurrentThreadInfo as reported by readelf -s 
+    }
     auto version_symtab = dynamic_exists(DT_VERSYM) ? dynamic_ptr<Elf64_Versym>(DT_VERSYM) : nullptr;
     do {
         if ((chains[idx] & ~1) != (hashval & ~1)) {


I also don't understand this problem. If Linux runs this correctly, then we can't consider this a gold linker "bug" but perhaps a peculiarity, that we need to handle just like Linux does. I guess this needs more debugging, printouts, etc. to understand exactly why we can't find this symbol, or ignoring the section that contains it, or thinking this symbol is hidden, or whatever.
 
2. Problem with libstdc++.a where as Nadav explains some old C++ ABI symbols are missing. It seems a way to deal with this would be to add a loader option to hide libstdc++.so in by OSv kernel in lieu of specific libstdc++.so be added to the image.

Hiding the libstdc++.so inside OSv is one of our long-term goals (https://github.com/cloudius-systems/osv/issues/821) but it sounds like there must be an easier short-term solution.
Can you take a look at libstdc++'s source code (part of the gcc project...), and try to understand how and why that v1 symbol reaches the shared library but not the static library? Maybe this should be reported to them as a bug?

Finally, I'm curious why that dotnet library *needs* the older version of this symbol. Was it deliberately compiled to use some older C++ ABI? If so, why is that? Can we recompile it without that option?
 

3. Small issue with fcntl that caused the "trying to execute null pointer" crash (took me a while to figure that out). Here is how it can be fixed:
diff --git a/fs/vfs/main.cc b/fs/vfs/main.cc
index 7e028ffb..4e6223d6 100644
--- a/fs/vfs/main.cc
+++ b/fs/vfs/main.cc
@@ -1454,6 +1454,7 @@ int fcntl(int fd, int cmd, int arg)
     // ignored in OSv anyway, as it doesn't support exec().
     switch (cmd) {
     case F_DUPFD:
+    case F_DUPFD_CLOEXEC:
         error = _fdalloc(fp, &ret, arg);
         if (error)
             goto out_errno;

Looks like a correct patch, good in general. Please apply it.
 
4. Dotnet insists that app PID cannot be zero as returned by getpid() and procfs which is the case with OSv. I am actually in the process of convincing dotnet developers to lift this restriction (on Windows pid 0 is not valid) and they are leaning in favor of that, But meanwhile how OSv is really "married" to PID being 0. Could it be 1? I saw this comment in libc/signal.cc
327     } else if(!is_sig_ign(signal_actions[sig])) {
328         if ((pid == 0) || (pid == -1)) {
329             // That semantically means signalling everybody (or that, or the
330             // user did getpid() and got 0, all the same. So we will signal
331             // every thread that is waiting for this.
332             //
333             // The thread does not expect the signal handler to still be delivered,
334             // so if we wake up some folks (usually just the one waiter), we should
335             // not continue processing.
336             //
337             // FIXME: Maybe it could be a good idea for our getpid() to start
338             // returning 1 so we can differentiate between those cases?
339             if (wake_up_signal_waiters(sig)) {
340                 return 0;
341             }
342         }

To make dotnet work on OSv I did change getpid() and procfs return 1.

Interesting question.

No, I think there is no real reason why our only process should be called "0". It made a lot of sense to use the number 0,
because most functions like kill() already take "0" means to "all processes", which in a single-process setup is the same
as "this process", so instead of supporting both 0 and 17 (the id we'll choose), we supported just 0 :-)

But I agree, it makes a lot of sense to change it not to be 0. It can be any number - 1 or 17 or anything.
The nicest thing would be to have a constant, and use it everywhere (this constant can default to 1 but also support 0, etc.).


5. The dlsym() has a bug where it does properly locate symbols found in the dependencies.

Does *not*, I guess.
Interesting, I think you're right. Can you please open an issue about this and/or fix it?


It seems this could be a fix:
diff --git a/core/elf.cc b/core/elf.cc
index 349e3515..d2f58aad 100644
--- a/core/elf.cc
+++ b/core/elf.cc
@@ -889,6 +926,21 @@ Elf64_Sym* object::lookup_symbol(const char* name)
     return sym;
 }
 
+symbol_module object::lookup_symbol_with_dependencies(const char* name) {
+    auto sym = lookup_symbol(name);
+    if (!sym) {
+        for (auto dep: _needed) {
+            auto sm = dep->lookup_symbol_with_dependencies(name); 

What you're doing here is depth-first search, while the dlsym(3) manual page specifically calls for
"The search performed by dlsym() is breadth first through the dependency tree of these shared objects."
In other words, you need to first check all the dependencies of this object, and only then recursively
descend into their dependencies.

I wonder if we need to use elf::get_program()->with_modules() to ensure that concurrent dlsym() and dlopen()
calls don't cause a big mess. But I don't remember the details... Maybe just ignore this complication for now. 

+            if (sm.sym) {
+                return sm;
+            }
+        }
+    } else {
+        return { sym, this};
+    }
+    return {nullptr, nullptr};
+}
+

Finally dotnet seems to be relying on inotify to monitor configuration changes which OSv does not implement (just stubs). To make the httpserver app work I managed to disable that dotnet feature but I wonder what it would take to implement inotify. Seems not a trivial exercise.

Waldek Kozaczuk

unread,
Nov 25, 2019, 4:29:55 PM11/25/19
to OSv Development
I will create issues for some of those to better track them.
And possibly make PID value (0, 1, etc) a boot parameter? 
Reply all
Reply to author
Forward
0 new messages