Hi,
I have been pretty quiet on this list since the P99 conference. Partly because I was taking some time off and partly because I have been working quietly on designing and implementing the changes to the OSv build system to accommodate hiding C++ std and other non-glibc symbols as described in my P99 presentation.
This email will be pretty long so bear with me. My objective here is to describe in more detail what my exact plan is, what I have done so far and what is left. As you will see not everything is as easy as I hoped for. Or it is not easy to decide what the right thing to do is and this is where I am seeking your advice. Obviously, any feedback regarding my plan and what I have done so far is also very welcome.
The goal of this exercise is not only to lower the kernel size and therefore reduce memory utilization and to make it more secure by exposing glibc/musl libc symbols only as described by the issue #97 ("
Be more selective on symbols exported from the kernel" -
https://github.com/cloudius-systems/osv/issues/97). Hiding the symbols would also address other issues as well I think:
- #821 - "Combining pre-compiled OSv kernel with pre-compiled executable"
- #1161 - "Support running .NET apps on OSv" (depends on hiding std C++ to allow adding different version of it to the image)
- #1110 - "Modularization/"Librarization" - create toolchain to optionally build custom kernel tailored to specific hypervisor or app
- #1009 - "Make ZFS optional as a shared library"
So doing it right will benefit the users of OSv on many fronts. But I think there is some painful road ahead of us to accomplish that. And maybe we will have to make some compromises.
Roughly my original plan was this:
- Link libstdc++.a with '-no-whole-archive' flag.
- Expose the glibc symbols and hide everything else using a version script that lists symbols we want to expose.
- Enable garbage collection using the compiler flags (-ffunction-sections, -fdata-sections) and a linker flag "--gc-sections"
The new plan is slightly different with an extra initial step that I had thought originally would be too tedious to achieve:
- Hide most symbols in compilation phase using the '-fvisibility=hidden' and '-fvisibility-inlines-hidden' flags for most non-musl and non-'./libc/*' sources and use newly defines macros to enforce specific symbols as public (aka default visibility for some) or as hidden. This actually seems to work pretty well and was not as tedious as I expected.
- Link libstdc++.a, libgcc_eh.a and libboost_system.a with '-no-whole-archive' flag and add extra linker flag '--exclude-libs libstdc++.a' to hide most std C++ symbols (to solve the problem of 5K symbols left with original approach).
- Expose the glibc symbols and hide everything else using a version script that lists symbols we want to expose - this should be very close to what is left after compiler hiding symbols in step 2 (an is based on my experiments).
- Enable garbage collection using the compiler flags (-ffunction-sections, -fdata-sections) and a linker flag "--gc-sections"
Why do I think it is a good idea to compile with '-fvisibility=hidden' which seems to be redundant with the step 3 (use version script)? It is because the compiler may be able to produce better machine code knowing that all or most symbols are hidden. This effect may be even more profound if we decide to enable -flto/lto optimizations later.
- Create new script extract_symbols.sh to generate lists of symbols to be exported by OSv kernel for each library advertised in core/elf.cc (see https://github.com/wkozaczuk/osv/blob/hide_stdcxx_and_most_symbols/scripts/extract_symbols.sh)
- the script uses 'nm -C --dynamic --defined-only --extern-only' against given library on the host and then intersects them with symbols exported by loader-stripped.elf; this obviously means the lists might be slightly different on each distribution/version of Linux host as glibc keeps expanding
- for now the script stores the list files at ./exported_symbols/osv_{lib}.symbols but eventually we would actually "freeze" those as they would become an input to assemble a version script used in the step 3 (see above); we would refresh every so often
- Update conf/release.mk to add 'conf-hide = -fvisibility=hidden' and 'conf-hide-cxx = -fvisibility-inlines-hidden' to allow turning hiding of symbols on or off.
- Update main Makefile (https://raw.githubusercontent.com/wkozaczuk/osv/hide_stdcxx_and_most_symbols/Makefile) to hide/expose symbols as needed which includes:
- Modifying the variables ' kernel-defines' and CXXFLAGS accordingly to add conf-hide and conf-hide-cxx to the compiler rules for all files BUT musl/* and most of the ones under libc/*; the reason we do NOT hide musl sources is because musl has its own mechanism that uses macros like 'hidden' to annotate its internal symbols with __attribute__((__visibility__("hidden"))), therefore we do not need to hide or expose anything extra
- Adding rules for subset of files under libc/* like pthread.cc to apply conf-hide and conf-hide-cxx to hide most symbols in those files except the ones we want to keep public using new OSV_*_API macros
- Modifying linker rules for loader.elf and kernel.ed to enable '--no-whole-archive' like so: '--no-whole-archive $(libgcc.a) $(libstdc++.a) $(libgcc_eh.a) $(boost-libs) --exclude-libs libstdc++.a' (the last bit hides most symbols from libstdc++)
- Add new header include/osv/export.sh that defines number of macros to expose or hide specific symbols:
- #define OSV_LIBC_API __attribute__((__visibility__("default"))) // There are others for each library like OSV_LIBM_API, OSV_LIBAIO_API, etc which should help documenting symbols we export
- #define OSV_HIDDEN __attribute__((__visibility__("hidden")))
- Annotate selected symbols in number of files where we hide with conf-hide in Makefile with OSV_.*_API macros to expose selected symbols if they are part of libc, libm, etc
- the good example is ./bsd/sys/kern/uipc_syscalls_wrap.cc where we expose socket API calls like so: 'OSV_LIBC_API int listen(int fd, int backlog)"
- there are 33 source files with 315 symbols I had to modify and most of those were in libc/*, runtime.cc, ./fs/vfs/main.cc and ./bsd/sys/kern/uipc_syscalls_wrap.cc
- the files under ./libc in this category would added to the rule libc_to_hide in the Makefile
- Same as above annotate some symbols with OSV_HIDDEN in some files under libc/* to hide them (the files that are not in part of the libc_to_hide set)
- there are only 3 files where we use OSV_HIDDEN
- Update most assembly files (*.s and *.S) to hide corresponding symbols with '.hidden' like so:
- Finally comment out the "libstdc++.so.6" in the list of libraries advertised in core/elf.cc and update relevant build scripts to NOT exclude the libstdc++.so.6 from the image
I still need to make some tweaks to make sure that we can turn ON or OFF this symbol hiding mechanism depending on what is in ./conf/
release.mk. That way I can start sending patches with what I have now as it would still support building kernel with all symbols exposed as it is now which will be needed until we somehow resolve other painful issues I am describing below later.
Obviously I still need to address the step 3 (version script) and 4 (gc) but these are easy ones.
So far the tests I have conducted seem to indicate my changes have not broken debuging using 'scripts/loader.py' - all the symbols including the hidden ones are still in loader.elf but NOT in loader-stripped.elf.
The consequence of what I have described above and my plan in general is that OSv would ideally ONLY provide strict glibc/musl API to normal Linux apps including the C++ ones. The unit tests and OSv modules and apps written in C++ that currently often use C++ kernel symbols unfortunately might ruin this ideal and force us to make some compromises.
Let me start with the unit tests and how hiding symbols affects them (I will skip the tests/misc-*cc files for now but similar issues exist there). Right of the bat, I was able to successfully execute only 97 out 138 which left me with 41 ones broken. Pretty much in most of these 41 cases the tests would fail because some symbol they reference would no longer be exposed by kernel. By the way I could only build ROFS test image as building ZFS images is broken due to many symbols hidden.
Here are specific 10 tests that were broken and were quite easy to fix by changing to use standard glibc or C++ std:: symbols (which I think was right thing to do):
- tests that use OSv debug() and I had to replace it with printf():
- tst-except.so
- tst-readdir-rofs.so
- tst-sleep.so
- tests that use OSv debug() and sched::* API and I had to replace with printf() and C++ std::thread and glibc get_nprocs() and sched_getcpu() routines:
- tst-af-local.so
- tst-bsd-tcp1.so
- tst-pthread-affinity-inherit.so
- tst-pthread-affinity.so
- tst-yield.so
- tests that use various OSv internal API that I changed to link with corresponding kernel object file:
- tst-options.so - linked it with core/options.o
- tst-commands.so - linked with core/commands.o and libc/string/stresep.o
And here are remaining 31 unit tests which I am not sure how fix but I have various ideas:
- tst-app.so - uses critical OSv internal API: application::*, osv/latch.hh
- UNLIKELY to be changed to not be OSv-specific
- tst-async.so - uses internal OSv apis - async, clock, trace and migration-lock
- tst-bsd-evh.so - uses BSD event handler
- tst-bsd-kthread.so - uses lots of BSD API - all seems C
- tst-bsd-taskqueue.so - uses BSD task API - all seems C
- tst-bsd-tcp1-zrcv.so - just like tst-bsd-tcp1.so PLUS uses OSv zero copy API
- seems to be updatable NOT to use OSv sched and dubug
- zcopy is a C api so can be exposed
- tst-bsd-tcp1-zsnd.so - same as tst-bsd-tcp1-zrcv.so
- tst-bsd-tcp1-zsndrcv.so - same as tst-bsd-tcp1-zrcv.so
- tst-clock.so - uses OSv clock API
- can be changed to use standard C++ api or the point was to directly test OSv clock API
- tst-condvar.so - uses osv/condvar.hh
- what would we loose to convert to use some standard condvar API instead of OSv sched::* API? Is it even possible?
- tst-dax.cc - uses fs/virtiofs/virtiofs_dax.hh>
- can we link with fs/virtiofs/virtiofs-dax objects?
- tst-fpu.cc - uses ton of sched::* PLUS debug
- not sure how easy to not use OSv internal API
- tst-hub.so - uses TON of OSv internal API
- uses tracepoint and memory:: API
- possibly can add to OSv unit test API
- tst-mmap.so - uses TON of sched:: symbols
- tst-namespace.so -uses osv::run()
- expose osv::run as C API?
- tst-pin.so - uses TON of sched:: API
- tst-preempt.so - uses a little of sched:: API
- tst-rcu-hashtable.so - uses sched:: and rcu:: API
- tst-rcu-list.so - uses sched:: and rcu:: API
- tst-run.so - same as tst-namespace.so PLUS uses debug
- tst-sampler.so - uses OSv prof API (only 2 symbols)
- tst-sem-timed-wait.so - uses OSv sched::, clock:: and semaphore::
- tst-small-malloc.so - uses tracepoint API
- tst-solaris-taskq.so - uses BSD/solaris API
- tst-threadcomplete.so - uses TON of sched:: API
- tst-tls-pie.so - does NOT seem to depend of any OSv symbols but crashes (tst-tls.so works fine)
- I discovered that reference to std::cout in the DT_INIT function causes the crash and replacing it with printf fixes it; maybe we have a bug in dynamic linker?
- tst-tracepoint.so - depends of tracepoint API and osv::printf or debug
- at least replace OSv debug() with printf()
- tst-unordered-ring-mpsc.so - uses a little of sched:: API
- tst-vfs.so - uses a little of sched:: API and some internal OSv VFS api - namei or drele?
- replacing sched::thread with std::thread is not enough and I after trying to link the object files with namei() and drele() it references even more internal symbols
- tst-wait-for.so - uses TON of sched:: API
- tst-without-namespace.so - uses osv::run()
- same thing to do as with tst-namespace.so
Here are some thoughts and ideas of how to address the above:
- Shall we consider those unit tests or more of integration tests? This will drive how we should address those.
- Can we freely replace sched::thread and related with std::thread in all cases? Was sched:: API used out of convenience or was there a specific reason to use OSv api and changing them to use std:: API would invalidate some of it? In general I think that using std:: or glibc API is better as it truly tests how real apps would behave.
- Sometimes linking the tests with the specific kernel objects works like with tst-options.cc and tst-commands.cc but it goes only so far. Like with tst-vfs.cc we would need to link quite big chunk of the kernel into a test object. Maybe there is a way to make some of that code in kernel to be more testable?
- In some cases we should expose some parts of OSv C++ api like osv::run() as C. We would need to do it as well for some modules as well.
- Another way (and possibly the only one) of making some of these tests work is to expose the internal OSv C++ or C symbols just for unit tests reason. More specifically we would define OSV_TEST_API macro that would make certain symbols visible if specific -DUNIT_TESTS option was set when compiling a file. Then we would have special version of the kernel elf file linked from the objects with such exposed symbols needed by unit tests. For example loader.elf could be used for unit tests, and kernel.elf would be a generic version with glibc symbols only exposed intended for all normal non-OSv apps. This might be necessary to make some ZFS apps like tools/mkfs.so, toolz/zfs.so work correctly anyway.
- Some tests like tst-run.cc rely on the osv::launch_error exception thrown by kernel and caught by an app. I do not think we have many cases of this and I think we should change how osv::run() is implemented and propagate all error conditions (like missing file) as error code instead of an exception in this case. I think that in general to make hiding non-glibc symbol work corretly we should NOT need/have to support passing exceptions betweek kernel and app. Obviosly throwing and catching exceptions within kernel should continue to work unaffected and same goes for an app and it seems to do so based on tst-exception.cc passing fine.
- Finally, not only for unit tests we should expose some non-glibc symbols regardless. For example zero-copy API functions constitute OSv "extra" API that should be made public, no? Also we already have some (one) C wrapper function - osv_get_all_app_threads() defined in core/osv_c_wrappers.c - this would be useful in httpserver module. We will probably need to expose more C++ apis like so.
Lastly the problems with unit tests described above illustrate very well similar problems we will face with the modules and other OSv specific apps:
- libzfs.so - this seems to reference many ZFS related symbols that we could possibly link in
- zfs.so
- zpool.so
- mount/mount-fs.so
- mount/umount.so
- mkfs/mkfs.so
- uush/mkdir.so
- uush/uush.so
- uush/ls.so
- tools/cpiod/cpiod.so
- java.so - this might be most painful to fix as it depends on many internal OSv symbols
- BTW regular java (/usr/bin/java not based on our wrapper) works just fine - for example java_no_wrapper unit test works perfectly fine).
- golang.so
- httpserver-api***so
- cloud init
- many other modules that possibly are obsolete at this point
Many of the modules/apps above use functions from core/option.cc. But I think this can be easily fixed by linking core/option.o with those apps (it is pretty tiny) rather than trying to expose it somehow as C API.
What shall we do in light of these issues with unit tests and OSv modules and apps?
Regards,
Waldek
PS. On a side note hiding most symbols affects the stacktraces:
OSv v0.55.0-15-g362accda
page fault outside application, addr: 0x0000000000000000
[registers]
RIP: 0x00000000403c64de <???+1077699806>
RFL: 0x0000000000010286 CS: 0x0000000000000008 SS: 0x0000000000000010
RAX: 0x0000004000000000 RBX: 0x0000000000020000 RCX: 0x0000000000000026 RDX: 0x0000000000000000
RSI: 0x0000000040687330 RDI: 0xffff800000013040 RBP: 0xffff80000010ffc0 R8: 0xffff800000165f90
R9: 0xffff800000016500 R10: 0x8000000000000000 R11: 0xffff800000084170 R12: 0xffff80000010ff90
Out of memory: could not reclaim any further. Current memory: -88 Kb
[backtrace]
0x00000000403c23c0 <???+1077683136>
0x00000000403c3d4f <???+1077689679>
0x00000000403c3e1f <???+1077689887>
0x00000000403dae6b <???+1077784171>
0x000000004037ea72 <???+1077406322>
vs
OSv v0.55.0-15-g67fcc08e
page fault outside application, addr: 0x0000000000000000
[registers]
RIP: 0x00000000403f951e <memory::page_pool::l2::refill()+206>
RFL: 0x0000000000010286 CS: 0x0000000000000008 SS: 0x0000000000000010
RAX: 0x0000004000000000 RBX: 0x0000000000020000 RCX: 0x0000000000000026 RDX: 0x0000000000000000
RSI: 0x0000000040912370 RDI: 0xffff800000013040 RBP: 0xffff80000010ffc0 R8: 0xffff80000007cf90
R9: 0xffff800000016500 R10: 0xOut of memory: could not reclaim any further. Current memory: -8 Kb
[backtrace]
0x00000000403f6620 <memory::oom()+32>
0x00000000403f779f <memory::reclaimer::_do_reclaim()+287>
0x00000000403f786f <???+1077901423>
0x000000004040ee3b <thread_main_c+43>
0x00000000403add32 <???+1077599538>
I think we can deal with it some tooling which would allow users to tranlate "unreadable"
stack trace to a readable one based on the information in loader.elf which has all the symbols.