Hiding C++ standard library other non-glibc symbols and its implications to the tests and OSv apps and modules

20 views
Skip to first unread message

Waldek Kozaczuk

unread,
Nov 3, 2021, 8:47:44 PM11/3/21
to OSv Development
Hi,

I have been pretty quiet on this list since the P99 conference. Partly because I was taking some time off and partly because I have been working quietly on designing and implementing the changes to the OSv build system to accommodate hiding C++ std and other non-glibc symbols as described in my P99 presentation.

This email will be pretty long so bear with me. My objective here is to describe in more detail what my exact plan is, what I have done so far and what is left. As you will see not everything is as easy as I hoped for. Or it is not easy to decide what the right thing to do is and this is where I am seeking your advice. Obviously, any feedback regarding my plan and what I have done so far is also very welcome.

The goal of this exercise is not only to lower the kernel size and therefore reduce memory utilization and to make it more secure by exposing glibc/musl libc symbols only as described by the issue #97 ("Be more selective on symbols exported from the kernel" - https://github.com/cloudius-systems/osv/issues/97). Hiding the symbols would also address other issues as well I think:
  • #821 - "Combining pre-compiled OSv kernel with pre-compiled executable"
  • #1161 - "Support running .NET apps on OSv" (depends on hiding std C++ to allow adding different version of it to the image)
  • #1110 - "Modularization/"Librarization" - create toolchain to optionally build custom kernel tailored to specific hypervisor or app
  • #1009 - "Make ZFS optional as a shared library"
So doing it right will benefit the users of OSv on many fronts. But I think there is some painful road ahead of us to accomplish that. And maybe we will have to make some compromises.

Roughly my original plan was this:
  1. Link libstdc++.a with '-no-whole-archive' flag.
  2. Expose the glibc symbols and hide everything else using a version script that lists symbols we want to expose.
  3. Enable garbage collection using the compiler flags (-ffunction-sections, -fdata-sections) and a linker flag "--gc-sections"
The new plan is slightly different with an extra initial step that I had thought originally would be too tedious to achieve:
  1. Hide most symbols in compilation phase using the '-fvisibility=hidden' and '-fvisibility-inlines-hidden' flags for most non-musl and non-'./libc/*' sources and use newly defines macros to enforce specific symbols as public (aka default visibility for some) or as hidden. This actually seems to work pretty well and was not as tedious as I expected. 
  2. Link libstdc++.a, libgcc_eh.a and libboost_system.a with '-no-whole-archive' flag and add extra linker flag '--exclude-libs libstdc++.a' to hide most std C++ symbols (to solve the problem of 5K symbols left with original approach).
  3. Expose the glibc symbols and hide everything else using a version script that lists symbols we want to expose - this should be very close to what is left after compiler hiding symbols in step 2 (an is based on my experiments).
  4. Enable garbage collection using the compiler flags (-ffunction-sections, -fdata-sections) and a linker flag "--gc-sections"
Why do I think it is a good idea to compile with '-fvisibility=hidden' which seems to be redundant with the step 3 (use version script)? It is because the compiler may be able to produce better machine code knowing that all or most symbols are hidden. This effect may be even more profound if we decide to enable -flto/lto optimizations later.

So far I have reached a point close to complete on step 1 and 2 (the step 3 and 4 is fairly easy to do). You can see all my work so far on my branch https://github.com/wkozaczuk/osv/tree/hide_stdcxx_and_most_symbols. But here a specific list with more details (so you can comment):
  • Create new script extract_symbols.sh to generate lists of symbols to be exported by OSv kernel for each library advertised in core/elf.cc (see https://github.com/wkozaczuk/osv/blob/hide_stdcxx_and_most_symbols/scripts/extract_symbols.sh
    • the script uses 'nm -C --dynamic --defined-only --extern-only' against given library on the host and then intersects them with symbols exported by loader-stripped.elf; this obviously means the lists might be slightly different on each distribution/version of Linux host as glibc keeps expanding
    • for now the script stores the list files at ./exported_symbols/osv_{lib}.symbols but eventually we would actually "freeze" those as they would become an input to assemble a version script used in the step 3 (see above); we would refresh every so often
  • Update conf/release.mk to add 'conf-hide = -fvisibility=hidden' and 'conf-hide-cxx = -fvisibility-inlines-hidden' to allow turning hiding of symbols on or off.
  • Update main Makefile (https://raw.githubusercontent.com/wkozaczuk/osv/hide_stdcxx_and_most_symbols/Makefile) to hide/expose symbols as needed which includes:
    • Modifying the variables ' kernel-defines' and CXXFLAGS accordingly to add conf-hide and conf-hide-cxx to the compiler rules for all files BUT musl/* and most of the ones under libc/*; the reason we do NOT hide musl sources is because musl has its own mechanism that uses macros like 'hidden' to annotate its internal symbols with __attribute__((__visibility__("hidden"))), therefore we do not need to hide or expose anything extra
    • Adding rules for subset of files under libc/* like pthread.cc to apply conf-hide and conf-hide-cxx to hide most symbols in those files except the ones we want to keep public using new OSV_*_API macros
    • Modifying linker rules for loader.elf and kernel.ed to enable '--no-whole-archive' like so: '--no-whole-archive $(libgcc.a) $(libstdc++.a) $(libgcc_eh.a) $(boost-libs) --exclude-libs libstdc++.a' (the last bit hides most symbols from libstdc++)
  • Add new header include/osv/export.sh that defines number of macros to expose or hide specific symbols:
    • #define OSV_LIBC_API __attribute__((__visibility__("default"))) // There are others for each library like OSV_LIBM_API, OSV_LIBAIO_API, etc which should help documenting symbols we export
    • #define OSV_HIDDEN __attribute__((__visibility__("hidden")))
  • Annotate selected symbols in number of files where we hide with conf-hide in Makefile with OSV_.*_API macros to expose selected symbols if they are part of libc, libm, etc
    • the good example is ./bsd/sys/kern/uipc_syscalls_wrap.cc where we expose socket API calls like so: 'OSV_LIBC_API int listen(int fd, int backlog)"
    • there are 33 source files with 315 symbols I had to modify and most of those were in libc/*, runtime.cc, ./fs/vfs/main.cc and ./bsd/sys/kern/uipc_syscalls_wrap.cc
    • the files under ./libc in this category would added to the rule libc_to_hide in the Makefile
  • Same as above annotate some symbols with OSV_HIDDEN in some files under libc/* to hide them (the files that are not in part of the libc_to_hide set) 
    • there are only 3 files where we use OSV_HIDDEN
  • Update most assembly files (*.s and *.S) to hide corresponding symbols with '.hidden' like so:
    •  .hidden gdt64_desc
  • Finally comment out the "libstdc++.so.6" in the list of libraries advertised in core/elf.cc and update relevant build scripts to NOT exclude the libstdc++.so.6 from the image 
I still need to make some tweaks to make sure that we can turn ON or OFF this symbol hiding mechanism depending on what is in ./conf/release.mk. That way I can start sending patches with what I have now as it would still support building kernel with all symbols exposed as it is now which will be needed until we somehow resolve other painful issues I am describing below later.

Obviously I still need to address the step 3 (version script) and 4 (gc) but these are easy ones. 

So far the tests I have conducted seem to indicate my changes have not broken debuging using 'scripts/loader.py' - all the symbols including the hidden ones are still in loader.elf but NOT in loader-stripped.elf.

The consequence of what I have described above and my plan in general is that OSv would ideally ONLY provide strict glibc/musl API to normal Linux apps including the C++ ones. The unit tests and OSv modules and apps written in C++ that currently often use C++ kernel symbols unfortunately might ruin this ideal and force us to make some compromises.

Let me start with the unit tests and how hiding symbols affects them (I will skip the tests/misc-*cc files for now but similar issues exist there). Right of the bat, I was able to successfully execute only 97 out 138 which left me with 41 ones broken. Pretty much in most of these 41 cases the tests would fail because some symbol they reference would no longer be exposed by kernel. By the way I could only build ROFS test image as building ZFS images is broken due to many symbols hidden.

Here are specific 10 tests that were broken and were quite easy to fix by changing to use standard glibc or C++ std:: symbols (which I think was right thing to do):
  • tests that use OSv debug() and I had to replace it with printf():
    • tst-except.so
    • tst-readdir-rofs.so
    • tst-sleep.so
  • tests that use OSv debug() and sched::* API and I had to replace with printf() and C++ std::thread and glibc get_nprocs() and sched_getcpu() routines:
    • tst-af-local.so 
    • tst-bsd-tcp1.so 
    • tst-pthread-affinity-inherit.so
    • tst-pthread-affinity.so
    • tst-yield.so
  • tests that use various OSv internal API that I changed to link with corresponding kernel object file:
    • tst-options.so - linked it with core/options.o
    • tst-commands.so - linked with core/commands.o and libc/string/stresep.o
And here are remaining 31 unit tests which I am not sure how fix but I have various ideas:
  • tst-app.so - uses  critical OSv internal API: application::*, osv/latch.hh
    • UNLIKELY to be changed to not be OSv-specific
  • tst-async.so - uses internal OSv apis - async, clock, trace and migration-lock
  • tst-bsd-evh.so - uses BSD event handler
  • tst-bsd-kthread.so - uses lots of BSD API - all seems C
  • tst-bsd-taskqueue.so - uses BSD task API - all seems C
  • tst-bsd-tcp1-zrcv.so - just like tst-bsd-tcp1.so PLUS uses OSv zero copy API
    • seems to be updatable NOT to use OSv sched and dubug
    • zcopy is a C api so can be exposed
  • tst-bsd-tcp1-zsnd.so - same as tst-bsd-tcp1-zrcv.so
  • tst-bsd-tcp1-zsndrcv.so - same as tst-bsd-tcp1-zrcv.so
  • tst-clock.so - uses OSv clock API
    • can be changed to use standard C++ api or the point was to directly test OSv clock API
  • tst-condvar.so - uses osv/condvar.hh
    • what would we loose to convert to use some standard condvar API instead of OSv sched::* API? Is it even possible?
  • tst-dax.cc - uses fs/virtiofs/virtiofs_dax.hh>
    • can we link with fs/virtiofs/virtiofs-dax objects?
  • tst-fpu.cc - uses ton of sched::* PLUS debug
    • not sure how easy to not use OSv internal API
  • tst-hub.so - uses TON of OSv internal API
    • uses tracepoint and memory:: API
    • possibly can add to OSv unit test API
  • tst-mmap.so - uses TON of sched:: symbols
  • tst-namespace.so -uses osv::run()
    • expose osv::run as C API?
  • tst-pin.so - uses TON of sched:: API
  • tst-preempt.so - uses a little of sched:: API
  • tst-rcu-hashtable.so - uses sched:: and rcu:: API 
  • tst-rcu-list.so - uses sched:: and rcu:: API 
  • tst-run.so - same as tst-namespace.so PLUS uses debug
  • tst-sampler.so - uses OSv prof API (only 2 symbols)
  • tst-sem-timed-wait.so - uses OSv sched::, clock:: and semaphore::
  • tst-small-malloc.so - uses tracepoint API
  • tst-solaris-taskq.so - uses BSD/solaris API
  • tst-threadcomplete.so - uses TON of sched:: API
  • tst-tls-pie.so - does NOT seem to depend of any OSv symbols but crashes (tst-tls.so works fine)
    • I discovered that reference to std::cout in the DT_INIT function causes the crash and replacing it with printf fixes it; maybe we have a bug in dynamic linker?
  • tst-tracepoint.so - depends of tracepoint API and osv::printf or debug
    • at least replace OSv debug() with printf()
  • tst-unordered-ring-mpsc.so - uses a little of  sched:: API
  • tst-vfs.so - uses a little of sched:: API and some internal OSv VFS api - namei or drele?
    • replacing sched::thread with std::thread is not enough and I after trying to link the object files with namei() and drele() it references even more internal symbols
  • tst-wait-for.so - uses TON of sched:: API
  • tst-without-namespace.so -  uses osv::run()
    • same thing to do as with tst-namespace.so
Here are some thoughts and ideas of how to address the above:
  1. Shall we consider those unit tests or more of integration tests? This will drive how we should address those.
  2. Can we freely replace sched::thread and related with std::thread in all cases? Was sched:: API used out of convenience or was there a specific reason to use OSv api and changing them to use std:: API would invalidate some of it? In general I think that using std:: or glibc API is better as it truly tests how real apps would behave.
  3. Sometimes linking the tests with the specific kernel objects works like with tst-options.cc and tst-commands.cc but it goes only so far. Like with tst-vfs.cc we would need to link quite big chunk of the kernel into a test object. Maybe there is a way to make some of that code in kernel to be more testable?
  4. In some cases we should expose some parts of OSv C++ api like osv::run() as C. We would need to do it as well for some modules as well.
  5. Another way (and possibly the only one) of making some of these tests work is to expose the internal OSv C++ or C symbols just for unit tests reason. More specifically we would define OSV_TEST_API macro that would make certain symbols visible if specific -DUNIT_TESTS option was set when compiling a file. Then we would have special version of the kernel elf file linked from the objects with such exposed symbols needed by unit tests. For example loader.elf could be used for unit tests, and kernel.elf would be a generic version with glibc symbols only exposed intended for all normal non-OSv apps. This might be necessary to make some ZFS apps like tools/mkfs.so, toolz/zfs.so work correctly anyway.
  6. Some tests like tst-run.cc rely on the osv::launch_error exception thrown by kernel and caught by an app. I do not think we have many cases of this and I think we should change how osv::run() is implemented and propagate all error conditions (like missing file) as error code instead of an exception in this case. I think that in general to make hiding non-glibc symbol work corretly we should NOT need/have to support passing exceptions betweek kernel and app. Obviosly throwing and catching exceptions within kernel should continue to work unaffected and same goes for an app and it seems to do so based on tst-exception.cc passing fine.
  7. Finally, not only for unit tests we should expose some non-glibc symbols regardless. For example zero-copy API functions constitute OSv "extra" API that should be made public, no? Also we already have some (one) C wrapper function - osv_get_all_app_threads() defined in core/osv_c_wrappers.c - this would be useful in httpserver module. We will probably need to expose more C++ apis like so.
Lastly the problems with unit tests described above illustrate very well similar problems we will face with the modules and other OSv specific apps:
  • libzfs.so - this seems to reference many ZFS related symbols that we could possibly link in
  • zfs.so
  • zpool.so
  • mount/mount-fs.so
  • mount/umount.so
  • mkfs/mkfs.so
  • uush/mkdir.so
  • uush/uush.so
  • uush/ls.so
  • tools/cpiod/cpiod.so
  • java.so - this might be most painful to fix as it depends on many internal OSv symbols
    • BTW regular java (/usr/bin/java not based on our wrapper) works just fine - for example java_no_wrapper unit test works perfectly fine).
  • golang.so
  • httpserver-api***so
  • cloud init
  • many other modules that possibly are obsolete at this point
Many of the modules/apps above use functions from core/option.cc. But I think this can be easily fixed by linking core/option.o with those apps (it is pretty tiny) rather than trying to expose it somehow as C API.

What shall we do in light of these issues with unit tests and OSv modules and apps?

Regards,
Waldek

PS. On a side note hiding most symbols affects the stacktraces:

OSv v0.55.0-15-g362accda
page fault outside application, addr: 0x0000000000000000
[registers]
RIP: 0x00000000403c64de <???+1077699806>
RFL: 0x0000000000010286  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000004000000000  RBX: 0x0000000000020000  RCX: 0x0000000000000026  RDX: 0x0000000000000000
RSI: 0x0000000040687330  RDI: 0xffff800000013040  RBP: 0xffff80000010ffc0  R8:  0xffff800000165f90
R9:  0xffff800000016500  R10: 0x8000000000000000  R11: 0xffff800000084170  R12: 0xffff80000010ff90
Out of memory: could not reclaim any further. Current memory: -88 Kb
[backtrace]
0x00000000403c23c0 <???+1077683136>
0x00000000403c3d4f <???+1077689679>
0x00000000403c3e1f <???+1077689887>
0x00000000403dae6b <???+1077784171>
0x000000004037ea72 <???+1077406322>

vs 

OSv v0.55.0-15-g67fcc08e
page fault outside application, addr: 0x0000000000000000
[registers]
RIP: 0x00000000403f951e <memory::page_pool::l2::refill()+206>
RFL: 0x0000000000010286  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000004000000000  RBX: 0x0000000000020000  RCX: 0x0000000000000026  RDX: 0x0000000000000000
RSI: 0x0000000040912370  RDI: 0xffff800000013040  RBP: 0xffff80000010ffc0  R8:  0xffff80000007cf90
R9:  0xffff800000016500  R10: 0xOut of memory: could not reclaim any further. Current memory: -8 Kb
[backtrace]
0x00000000403f6620 <memory::oom()+32>
0x00000000403f779f <memory::reclaimer::_do_reclaim()+287>
0x00000000403f786f <???+1077901423>
0x000000004040ee3b <thread_main_c+43>
0x00000000403add32 <???+1077599538>

I think we can deal with it some tooling which would allow users to tranlate "unreadable"
stack trace to a readable one based on the information in loader.elf which has all the symbols.

Nadav Har'El

unread,
Nov 4, 2021, 3:22:50 AM11/4/21
to Waldek Kozaczuk, OSv Development
On Thu, Nov 4, 2021 at 2:47 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
Hi,

I have been pretty quiet on this list since the P99 conference. Partly because I was taking some time off and partly because I have been working quietly on designing and implementing the changes to the OSv build system to accommodate hiding C++ std and other non-glibc symbols as described in my P99 presentation.

This email will be pretty long so bear with me. My objective here is to describe in more detail what my exact plan is, what I have done so far and what is left. As you will see not everything is as easy as I hoped for. Or it is not easy to decide what the right thing to do is and this is where I am seeking your advice. Obviously, any feedback regarding my plan and what I have done so far is also very welcome.

The goal of this exercise is not only to lower the kernel size and therefore reduce memory utilization and to make it more secure by exposing glibc/musl libc symbols only as described by the issue #97 ("Be more selective on symbols exported from the kernel" - https://github.com/cloudius-systems/osv/issues/97). Hiding the symbols would also address other issues as well I think:
  • #821 - "Combining pre-compiled OSv kernel with pre-compiled executable"
  • #1161 - "Support running .NET apps on OSv" (depends on hiding std C++ to allow adding different version of it to the image)
  • #1110 - "Modularization/"Librarization" - create toolchain to optionally build custom kernel tailored to specific hypervisor or app
  • #1009 - "Make ZFS optional as a shared library"
So doing it right will benefit the users of OSv on many fronts. But I think there is some painful road ahead of us to accomplish that. And maybe we will have to make some compromises.

Roughly my original plan was this:
  1. Link libstdc++.a with '-no-whole-archive' flag.
  2. Expose the glibc symbols and hide everything else using a version script that lists symbols we want to expose.
  3. Enable garbage collection using the compiler flags (-ffunction-sections, -fdata-sections) and a linker flag "--gc-sections"
The new plan is slightly different with an extra initial step that I had thought originally would be too tedious to achieve:
  1. Hide most symbols in compilation phase using the '-fvisibility=hidden' and '-fvisibility-inlines-hidden' flags for most non-musl and non-'./libc/*' sources and use newly defines macros to enforce specific symbols as public (aka default visibility for some) or as hidden. This actually seems to work pretty well and was not as tedious as I expected. 

This seems a good idea, but if I understand correctly, it means you will need to add some macro in front of each function we want to export which will export it.

I'm not worried that this would be "tedious", but what does worry me is how we can verify that we don't miss a lot of the symbols we were supposed to export.
Over the years, we tested OSv with many different Linux applications, and slowly slowly found more and more symbols we need to define. If we now forget to
export one of those, we may break one of these applications which we made to work on OSv some five years ago, and aren't using today. Our test suite does
not, unfortunately, have a long list of applications that we test continue to work.

What I propose you could do - and reading what you wrote below I'm not sure whether you were planning to do - is to write a script (like you wanted to write anyway)
that gets a list of symbols from some library (e.g., glibc) and then looks in OSv for symbols which 1. exist in OSv, but 2. are *not* exported.
Symbols found thus are a sure sign we forgot to export.
Note that I'm not proposing that we look for symbols which exist in the library but not in OSv - there may be many of those. Rather, I'm proposing we look for library symbols which *are* in OSv, but we forgot to export them.
 
  1. Link libstdc++.a, libgcc_eh.a and libboost_system.a with '-no-whole-archive' flag and add extra linker flag '--exclude-libs libstdc++.a' to hide most std C++ symbols (to solve the problem of 5K symbols left with original approach).
  2. Expose the glibc symbols and hide everything else using a version script that lists symbols we want to expose - this should be very close to what is left after compiler hiding symbols in step 2 (an is based on my experiments).

I thought you meant we'll have some macro on each individual symbol definition  (I see you had "OSV_LIBC_API" macro). Why do we also need a version script? What does it do?
  1. Enable garbage collection using the compiler flags (-ffunction-sections, -fdata-sections) and a linker flag "--gc-sections"
Why do I think it is a good idea to compile with '-fvisibility=hidden' which seems to be redundant with the step 3 (use version script)? It is because the compiler may be able to produce better machine code knowing that all or most symbols are hidden. This effect may be even more profound if we decide to enable -flto/lto optimizations later.

So far I have reached a point close to complete on step 1 and 2 (the step 3 and 4 is fairly easy to do). You can see all my work so far on my branch https://github.com/wkozaczuk/osv/tree/hide_stdcxx_and_most_symbols. But here a specific list with more details (so you can comment):
  • Create new script extract_symbols.sh to generate lists of symbols to be exported by OSv kernel for each library advertised in core/elf.cc (see https://github.com/wkozaczuk/osv/blob/hide_stdcxx_and_most_symbols/scripts/extract_symbols.sh
    • the script uses 'nm -C --dynamic --defined-only --extern-only' against given library on the host and then intersects them with symbols exported by loader-stripped.elf; this obviously means the lists might be slightly different on each distribution/version of Linux host as glibc keeps expanding
    • for now the script stores the list files at ./exported_symbols/osv_{lib}.symbols but eventually we would actually "freeze" those as they would become an input to assemble a version script used in the step 3 (see above); we would refresh every so often

I would propose that what the automatic script should do is to *warn* us about symbols we forgot to expert, as well as (separately) symbols we never implemented.
I think that a real human should then mark the symbols exported or not, manually.

It's definitely possible that different versions of glibc export different symbols, and OSv might want to export some sort of "union" of different versions - not necessarily what is available in some specific version.
On the other hand, if glibc is meticulous about backward-compatiblity, this shouldn't actually happen. So I don't know.
 
  • Update conf/release.mk to add 'conf-hide = -fvisibility=hidden' and 'conf-hide-cxx = -fvisibility-inlines-hidden' to allow turning hiding of symbols on or off.
  • Update main Makefile (https://raw.githubusercontent.com/wkozaczuk/osv/hide_stdcxx_and_most_symbols/Makefile) to hide/expose symbols as needed which includes:
    • Modifying the variables ' kernel-defines' and CXXFLAGS accordingly to add conf-hide and conf-hide-cxx to the compiler rules for all files BUT musl/* and most of the ones under libc/*; the reason we do NOT hide musl sources is because musl has its own mechanism that uses macros like 'hidden' to annotate its internal symbols with __attribute__((__visibility__("hidden"))), therefore we do not need to hide or expose anything extra
    • Adding rules for subset of files under libc/* like pthread.cc to apply conf-hide and conf-hide-cxx to hide most symbols in those files except the ones we want to keep public using new OSV_*_API macros
    • Modifying linker rules for loader.elf and kernel.ed to enable '--no-whole-archive' like so: '--no-whole-archive $(libgcc.a) $(libstdc++.a) $(libgcc_eh.a) $(boost-libs) --exclude-libs libstdc++.a' (the last bit hides most symbols from libstdc++)
  • Add new header include/osv/export.sh that defines number of macros to expose or hide specific symbols:
    • #define OSV_LIBC_API __attribute__((__visibility__("default"))) // There are others for each library like OSV_LIBM_API, OSV_LIBAIO_API, etc which should help documenting symbols we export
    • #define OSV_HIDDEN __attribute__((__visibility__("hidden")))
I don't think the difference between LIBC and LIBM is well-defined in modern Linux, so I'm not sure it helps to make this separation.
  • Annotate selected symbols in number of files where we hide with conf-hide in Makefile with OSV_.*_API macros to expose selected symbols if they are part of libc, libm, etc
    • the good example is ./bsd/sys/kern/uipc_syscalls_wrap.cc where we expose socket API calls like so: 'OSV_LIBC_API int listen(int fd, int backlog)"
    • there are 33 source files with 315 symbols I had to modify and most of those were in libc/*, runtime.cc, ./fs/vfs/main.cc and ./bsd/sys/kern/uipc_syscalls_wrap.cc
    • the files under ./libc in this category would added to the rule libc_to_hide in the Makefile

To repeat my suggestion above - how do you know that these 315 symbols is the full list, maybe there are actually 317 and you missed two?
This is the purpose of the script I proposed above - to help warn you if you missed any.
  • Same as above annotate some symbols with OSV_HIDDEN in some files under libc/* to hide them (the files that are not in part of the libc_to_hide set) 
    • there are only 3 files where we use OSV_HIDDEN
I didn't understand why you need OSV_HIDDEN if everything is hidden by default.
  • Update most assembly files (*.s and *.S) to hide corresponding symbols with '.hidden' like so:
    •  .hidden gdt64_desc
  • Finally comment out the "libstdc++.so.6" in the list of libraries advertised in core/elf.cc and update relevant build scripts to NOT exclude the libstdc++.so.6 from the image 

Yes, but as you also suggested below, I think this should be a build-time option. I think that in some cases - like a C++ program compiled on the same host as OSv -
the image may be smaller if it uses the same C++ library as OSv instead of including a second copy. This may even be true for interpreters (like Java) which
are written in C++ - not just for C++ application.
 
I still need to make some tweaks to make sure that we can turn ON or OFF this symbol hiding mechanism depending on what is in ./conf/release.mk. That way I can start sending patches with what I have now as it would still support building kernel with all symbols exposed as it is now which will be needed until we somehow resolve other painful issues I am describing below later.

Obviously I still need to address the step 3 (version script) and 4 (gc) but these are easy ones. 

So far the tests I have conducted seem to indicate my changes have not broken debuging using 'scripts/loader.py' - all the symbols including the hidden ones are still in loader.elf but NOT in loader-stripped.elf.

The consequence of what I have described above and my plan in general is that OSv would ideally ONLY provide strict glibc/musl API to normal Linux apps including the C++ ones. The unit tests and OSv modules and apps written in C++ that currently often use C++ kernel symbols unfortunately might ruin this ideal and force us to make some compromises.

We can split the unit tests to two types:

1. Some (hopefully most - this is confirmed by your findings below) tests check standard Linux APIs and whether they work correctly. Those don't need any hidden symbols.
2. Other tests check non-standard OSv APIs, and will not be built when the build mode is not to export those symbols.

We can modify Jenkins to run tests on both build modes.


Let me start with the unit tests and how hiding symbols affects them (I will skip the tests/misc-*cc files for now but similar issues exist there). Right of the bat, I was able to successfully execute only 97 out 138 which left me with 41 ones broken. Pretty much in most of these 41 cases the tests would fail because some symbol they reference would no longer be exposed by kernel. By the way I could only build ROFS test image as building ZFS images is broken due to many symbols hidden.

Here are specific 10 tests that were broken and were quite easy to fix by changing to use standard glibc or C++ std:: symbols (which I think was right thing to do):

Right. Tests shouldn't use non-standard OSv functions when standard ones exist.
This is also important so we can run the same test on Linux (to verify that the test actually checks the correct behavior).
I suggested above maybe we need to split the unit tests to two types. Some check *exported* functions (maybe you'd
call them "integration tests", I don't know, they still check only specific functionality. Maybe functional tests?), while
others check unexported functions and need to either be built with all symbols exported, or perhaps even could be
*statically linked* with OSv during a special linking phase for tests. In other words, I'm suggesting that maybe we
don't need to *compile* anything differently for tests - just to link differently.
 
  1. Can we freely replace sched::thread and related with std::thread in all cases? Was sched:: API used out of convenience or was there a specific reason to use OSv api and changing them to use std:: API would invalidate some of it? In general I think that using std:: or glibc API is better as it truly tests how real apps would behave.
I think in many cases the answer is yes, but we need to check on a case-by-case benefit.
As I noted, another benefit of not unnecessarily using OSv internals in tests is that the same test can also be run on Linux.
 
  1. Sometimes linking the tests with the specific kernel objects works like with tst-options.cc and tst-commands.cc but it goes only so far. Like with tst-vfs.cc we would need to link quite big chunk of the kernel into a test object. Maybe there is a way to make some of that code in kernel to be more testable?
  2. In some cases we should expose some parts of OSv C++ api like osv::run() as C. We would need to do it as well for some modules as well.
  3. Another way (and possibly the only one) of making some of these tests work is to expose the internal OSv C++ or C symbols just for unit tests reason. More specifically we would define OSV_TEST_API macro that would make certain symbols visible if specific -DUNIT_TESTS option was set when compiling a file. Then we would have special version of the kernel elf file linked from the objects with such exposed symbols needed by unit tests. For example loader.elf could be used for unit tests, and kernel.elf would be a generic version with glibc symbols only exposed intended for all normal non-OSv apps. This might be necessary to make some ZFS apps like tools/mkfs.so, toolz/zfs.so work correctly anyway.
  4. Some tests like tst-run.cc rely on the osv::launch_error exception thrown by kernel and caught by an app. I do not think we have many cases of this and I think we should change how osv::run() is implemented and propagate all error conditions (like missing file) as error code instead of an exception in this case. I think that in general to make hiding non-glibc symbol work corretly we should NOT need/have to support passing exceptions betweek kernel and app. Obviosly throwing and catching exceptions within kernel should continue to work unaffected and same goes for an app and it seems to do so based on tst-exception.cc passing fine.
  5. Finally, not only for unit tests we should expose some non-glibc symbols regardless. For example zero-copy API functions constitute OSv "extra" API that should be made public, no? Also we already have some (one) C wrapper function - osv_get_all_app_threads() defined in core/osv_c_wrappers.c - this would be useful in httpserver module. We will probably need to expose more C++ apis like so.
Lastly the problems with unit tests described above illustrate very well similar problems we will face with the modules and other OSv specific apps:
  • libzfs.so - this seems to reference many ZFS related symbols that we could possibly link in
I think this is an important observation. While we wanted to *only* export glibc symbols, this is probably not possible. By splitting pieces of OSv shared libraries (like this libzfs),
we're sort of making OSv a "microkernel" which supplies its own non-standard APIs and the separate libraries (like libzfs) use to implement the more traditional APIs.
So we should export these "microkernel" APIs as well. I'm not sure the "risk" of end-users using these symbols, or colliding with them, is very big - although it's definitely
not zero.
This is sad, athough perhaps a case of "not a bug, but a feature" - we wanted to make the kernel smaller, and all these
names made it bigger.

Am I correct that attaching with gdb, you do get the full backtrace here?


vs 

OSv v0.55.0-15-g67fcc08e
page fault outside application, addr: 0x0000000000000000
[registers]
RIP: 0x00000000403f951e <memory::page_pool::l2::refill()+206>
RFL: 0x0000000000010286  CS:  0x0000000000000008  SS:  0x0000000000000010
RAX: 0x0000004000000000  RBX: 0x0000000000020000  RCX: 0x0000000000000026  RDX: 0x0000000000000000
RSI: 0x0000000040912370  RDI: 0xffff800000013040  RBP: 0xffff80000010ffc0  R8:  0xffff80000007cf90
R9:  0xffff800000016500  R10: 0xOut of memory: could not reclaim any further. Current memory: -8 Kb
[backtrace]
0x00000000403f6620 <memory::oom()+32>
0x00000000403f779f <memory::reclaimer::_do_reclaim()+287>
0x00000000403f786f <???+1077901423>
0x000000004040ee3b <thread_main_c+43>
0x00000000403add32 <???+1077599538>

I think we can deal with it some tooling which would allow users to tranlate "unreadable"
stack trace to a readable one based on the information in loader.elf which has all the symbols.

Right. addr2line may do the trick.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/b7512e9e-71dd-4721-baec-306314d58dccn%40googlegroups.com.

Waldek Kozaczuk

unread,
Nov 4, 2021, 11:33:43 AM11/4/21
to Nadav Har'El, OSv Development
This is a valid concern and only now I realized that I have not adequately explained the mechanics of what I am doing here. What you are suggesting above is more less what the new script ./scripts/extract_symbols.sh does or allows one to do (see below with a hyperlink to the copy of it in my branch). In essence as I was working to come up with the correct changes to the makefile and all affected source files where I had to add "OSV_LIBC_API" macro (and very occasionally OSV_HIDDEN_API one), I would first run extract_symbols.sh against the loader-stripped.elf before hiding any of the 17,000 symbols in it (in general before making any changes) in order to get lists of symbols already exported by OSv kernel for each Linux library like libc.so.6. The script would actually produce these files:

// The below lists contain symbols exported by each library on Linux host found using 'nm -C --dynamic --defined-only --extern-only ${ELF_PATH}'
_ld-linux-x86-64.so.2_all.symbols'
_ld-musl-x86_64.so.1_all.symbols
_libaio.so.1_all.symbols
_libboost_system.so_all.symbols
_libbsd.so.0_all.symbols
_libcrypt.so.1_all.symbols
_libc.so.6_all.symbols
_libdl.so.2_all.symbols
_libm.so.6_all.symbols
_libpthread.so.0_all.symbols
_libresolv.so.2_all.symbols
_librt.so.1_all.symbols
_libutil.so_all.symbols
_libxenstore.so.3.0_all.symbols

// This file would contain all ~17K symbols exported by OSv kernel
loader.symbols

// These files contain lists of symbols that are intersections between between loader.symbols and corresponding *_all.symbols above found using 'comm -12 loader.symbols _ld-linux-x86-64.so.2_all.symbols > osv_${LIB_NAME}.symbols'
osv_ld-linux-x86-64.so.2.symbols
osv_ld-musl-x86_64.so.1.symbols
osv_libaio.so.1.symbols
osv_libboost_system.so.symbols
osv_libbsd.so.0.symbols
osv_libcrypt.so.1.symbols
osv_libc.so.6.symbols
osv_libdl.so.2.symbols
osv_libm.so.6.symbols
osv_libpthread.so.0.symbols
osv_libresolv.so.2.symbols
osv_librt.so.1.symbols
osv_libutil.so.symbols
osv_libxenstore.so.3.0.symbols

So the osv_lib*.symbols files would give me a starting point where I know all the symbols that should stay exported. From this point on as I would be iterating on my changes to the Makefile and the relevant source files to hide specific symbols, I would re-run extract_symbol.sh again and again and compare new osv_lib*.symbols to the original ones. This would give me a quite meticulous method to verify that we are not losing any symbols as we hide them.

BTW through this process I found a new library we should be advertising in core/elf.cc- libbsd.so.0 - because we implement these symbols:
arc4random
explicit_bzero
fgetln
fpurge
MD5Final
MD5Init
MD5Update
optreset
reallocarray
strlcat
strlcpy

 
  1. Link libstdc++.a, libgcc_eh.a and libboost_system.a with '-no-whole-archive' flag and add extra linker flag '--exclude-libs libstdc++.a' to hide most std C++ symbols (to solve the problem of 5K symbols left with original approach).
  2. Expose the glibc symbols and hide everything else using a version script that lists symbols we want to expose - this should be very close to what is left after compiler hiding symbols in step 2 (an is based on my experiments).

I thought you meant we'll have some macro on each individual symbol definition  (I see you had "OSV_LIBC_API" macro). Why do we also need a version script? What does it do?
 
The version script is applied in the linking phase to provide the ultimate list of symbols to export in the elf and is critical to make garbage collection of the machine code to work.

  1. Enable garbage collection using the compiler flags (-ffunction-sections, -fdata-sections) and a linker flag "--gc-sections"
Why do I think it is a good idea to compile with '-fvisibility=hidden' which seems to be redundant with the step 3 (use version script)? It is because the compiler may be able to produce better machine code knowing that all or most symbols are hidden. This effect may be even more profound if we decide to enable -flto/lto optimizations later.

So far I have reached a point close to complete on step 1 and 2 (the step 3 and 4 is fairly easy to do). You can see all my work so far on my branch https://github.com/wkozaczuk/osv/tree/hide_stdcxx_and_most_symbols. But here a specific list with more details (so you can comment):
  • Create new script extract_symbols.sh to generate lists of symbols to be exported by OSv kernel for each library advertised in core/elf.cc (see https://github.com/wkozaczuk/osv/blob/hide_stdcxx_and_most_symbols/scripts/extract_symbols.sh
    • the script uses 'nm -C --dynamic --defined-only --extern-only' against given library on the host and then intersects them with symbols exported by loader-stripped.elf; this obviously means the lists might be slightly different on each distribution/version of Linux host as glibc keeps expanding
    • for now the script stores the list files at ./exported_symbols/osv_{lib}.symbols but eventually we would actually "freeze" those as they would become an input to assemble a version script used in the step 3 (see above); we would refresh every so often

I would propose that what the automatic script should do is to *warn* us about symbols we forgot to expert, as well as (separately) symbols we never implemented.
I think that a real human should then mark the symbols exported or not, manually.
 
Right the extract_symbols.sh or something else called by scripts/build should do.


It's definitely possible that different versions of glibc export different symbols, and OSv might want to export some sort of "union" of different versions - not necessarily what is available in some specific version.
On the other hand, if glibc is meticulous about backward-compatiblity, this shouldn't actually happen. So I don't know.
 
Yep we need some sort of union across many distributions (Ubuntu, Fedora, etc) and possible. And whatever that union is we should store the resulting osv_l*.symbols as part of the repo which we will use to verify we do not lose (add?) any symbols. From time to time we would need to update these lists as for example we add new glibc symbols to OSv.

 
  • Update conf/release.mk to add 'conf-hide = -fvisibility=hidden' and 'conf-hide-cxx = -fvisibility-inlines-hidden' to allow turning hiding of symbols on or off.
  • Update main Makefile (https://raw.githubusercontent.com/wkozaczuk/osv/hide_stdcxx_and_most_symbols/Makefile) to hide/expose symbols as needed which includes:
    • Modifying the variables ' kernel-defines' and CXXFLAGS accordingly to add conf-hide and conf-hide-cxx to the compiler rules for all files BUT musl/* and most of the ones under libc/*; the reason we do NOT hide musl sources is because musl has its own mechanism that uses macros like 'hidden' to annotate its internal symbols with __attribute__((__visibility__("hidden"))), therefore we do not need to hide or expose anything extra
    • Adding rules for subset of files under libc/* like pthread.cc to apply conf-hide and conf-hide-cxx to hide most symbols in those files except the ones we want to keep public using new OSV_*_API macros
    • Modifying linker rules for loader.elf and kernel.ed to enable '--no-whole-archive' like so: '--no-whole-archive $(libgcc.a) $(libstdc++.a) $(libgcc_eh.a) $(boost-libs) --exclude-libs libstdc++.a' (the last bit hides most symbols from libstdc++)
  • Add new header include/osv/export.sh that defines number of macros to expose or hide specific symbols:
    • #define OSV_LIBC_API __attribute__((__visibility__("default"))) // There are others for each library like OSV_LIBM_API, OSV_LIBAIO_API, etc which should help documenting symbols we export
    • #define OSV_HIDDEN __attribute__((__visibility__("hidden")))
I don't think the difference between LIBC and LIBM is well-defined in modern Linux, so I'm not sure it helps to make this separation.
The main benefit is the automatic documentation of what library SO file given symbol OSv exports belongs to. On other hand the osv_*symbols would provide it so maybe we should just have one OSV_LIBC_API macro instead of 10 which really do the same thing.
  • Annotate selected symbols in number of files where we hide with conf-hide in Makefile with OSV_.*_API macros to expose selected symbols if they are part of libc, libm, etc
    • the good example is ./bsd/sys/kern/uipc_syscalls_wrap.cc where we expose socket API calls like so: 'OSV_LIBC_API int listen(int fd, int backlog)"
    • there are 33 source files with 315 symbols I had to modify and most of those were in libc/*, runtime.cc, ./fs/vfs/main.cc and ./bsd/sys/kern/uipc_syscalls_wrap.cc
    • the files under ./libc in this category would added to the rule libc_to_hide in the Makefile

To repeat my suggestion above - how do you know that these 315 symbols is the full list, maybe there are actually 317 and you missed two?
This is the purpose of the script I proposed above - to help warn you if you missed any.
I hope my explanation about extract_symbols.sh above and how I used it addresses this.
  • Same as above annotate some symbols with OSV_HIDDEN in some files under libc/* to hide them (the files that are not in part of the libc_to_hide set) 
    • there are only 3 files where we use OSV_HIDDEN
I didn't understand why you need OSV_HIDDEN if everything is hidden by default.
It was used very occasionally. Pretty much most non-musl source files (which are C++) would have symbols hidden per makefile so that would have to use OSV_LIBC_API in libc/pthreads.cc, for example. But in some cases in order to minimize places to add OSV_LIBC_API, I would have a file compiled without -fvisibility=hidden and hide specific symbols in it with OSV_HIDDEN (just like musl hidden macro doing the same thing).
  • Update most assembly files (*.s and *.S) to hide corresponding symbols with '.hidden' like so:
    •  .hidden gdt64_desc
  • Finally comment out the "libstdc++.so.6" in the list of libraries advertised in core/elf.cc and update relevant build scripts to NOT exclude the libstdc++.so.6 from the image 

Yes, but as you also suggested below, I think this should be a build-time option. I think that in some cases - like a C++ program compiled on the same host as OSv -
the image may be smaller if it uses the same C++ library as OSv instead of including a second copy. This may even be true for interpreters (like Java) which
are written in C++ - not just for C++ application.
Sure I want hiding symbols to be driven by an option. 
Right this is what I was able to do with tst-options.cc and tst-commands.cc. 
Please note this is an old example I found somewhere in my earlier emails and put it here. 

Am I correct that attaching with gdb, you do get the full backtrace here?
Yes. 
Reply all
Reply to author
Forward
0 new messages