Shrinking kernel

29 views
Skip to first unread message

Waldek Kozaczuk

unread,
May 17, 2020, 12:59:35 AM5/17/20
to OSv Development
One of the things, I would like to tackle for the next release is making the loader.elf smaller. The primary motivation is to lower memory utilization. But at the same time, I would also like to keep the kernel (the release version) as debuggable as it is now (or close to). 

There are already 2 issues that could help us with that:
https://github.com/cloudius-systems/osv/issues/97 - Be more selective on symbols exported from the kernel 
https://github.com/cloudius-systems/osv/issues/106 - Consider single instantiation for some templates (this may apply to more templates than just debug()).

Besides that, there are other things we can try:
1. Use compiler flags --ffunction-sections -fdata-sections and linker one -gc-sections.
2. Remove RTTI using -fno-rtti (I think we have over 1000 typeinfo entries in the symbols tables) - I do not think we have many dynamic_cast and hopefully, these can be eliminated.
3. Controversial - eliminate exception usage in kernel (how critical is this?) and then use fno-exceptions.
4. Use lto - given we have travis enabled there could be an option passed to makefile/build that could let one build kernel with lto if we deem it to be a dangerous/experimental feature.
5. Do not link C++ std library whole-archive and effectively hide it. How would we support internal apps like cpiod, httpserver, cloud init, etc? Create C API for any symbols that are C++ right now and link those C+++ apps statically against libstd++?

Other things we should try without sacrificing any functionality?

As far as option 1 goes, I have already played with it a bit and applied this patch:

diff --git a/Makefile b/Makefile
index db3c68cf..1b121fd8 100644
--- a/Makefile
+++ b/Makefile
@@ -279,7 +279,7 @@ gcc-sysroot = $(if $(CROSS_PREFIX), --sysroot $(aarch64_gccbase)) \
 #
 #   mydir/*.o EXTRA_FLAGS = <MY_STUFF>
 EXTRA_FLAGS = -D__OSV_CORE__ -DOSV_KERNEL_BASE=$(kernel_base) -DOSV_KERNEL_VM_BASE=$(kernel_vm_base) \
-       -DOSV_KERNEL_VM_SHIFT=$(kernel_vm_shift) -DOSV_LZKERNEL_BASE=$(lzkernel_base)
+       -DOSV_KERNEL_VM_SHIFT=$(kernel_vm_shift) -DOSV_LZKERNEL_BASE=$(lzkernel_base) -ffunction-sections
 EXTRA_LIBS =
 COMMON = $(autodepend) -g -Wall -Wno-pointer-arith $(CFLAGS_WERROR) -Wformat=0 -Wno-format-security \
        -D __BSD_VISIBLE=1 -U _FORTIFY_SOURCE -fno-stack-protector $(INCLUDES) \
@@ -1859,7 +1859,7 @@ $(out)/loader.elf: $(stage1_targets) arch/$(arch)/loader.ld $(out)/bootfs.o $(lo
            $(^:%.ld=-T %.ld) \
            --whole-archive \
              $(libstdc++.a) $(libgcc_eh.a) \
-             $(boost-libs) \
+             $(boost-libs) --gc-sections \
            --no-whole-archive $(libgcc.a), \
                LINK loader.elf)
        @# Build libosv.so matching this loader.elf. This is not a separate
@@ -1875,7 +1875,7 @@ $(out)/kernel.elf: $(stage1_targets) arch/$(arch)/loader.ld $(out)/empty_bootfs.
            $(^:%.ld=-T %.ld) \
            --whole-archive \
              $(libstdc++.a) $(libgcc_eh.a) \
-             $(boost-libs) \
+             $(boost-libs) --gc-sections \
            --no-whole-archive $(libgcc.a), \
                LINK kernel.elf)
        $(call quiet, $(STRIP) $(out)/kernel.elf -o $(out)/kernel-stripped.elf, STRIP kernel.elf -> kernel-stripped.elf )
diff --git a/arch/x64/loader.ld b/arch/x64/loader.ld
index f981859d..ab5cf75b 100644
--- a/arch/x64/loader.ld
+++ b/arch/x64/loader.ld
@@ -56,10 +56,10 @@ SECTIONS
         memcpy_decode_end = .;
     } :text
 
-    .eh_frame : AT(ADDR(.eh_frame) - OSV_KERNEL_VM_SHIFT) { *(.eh_frame) } : text
+    .eh_frame : AT(ADDR(.eh_frame) - OSV_KERNEL_VM_SHIFT) { *(.eh_frame) KEEP(*(.eh_frame)); } : text
     .rodata : AT(ADDR(.rodata) - OSV_KERNEL_VM_SHIFT) { *(.rodata*) } :text
-    .eh_frame : AT(ADDR(.eh_frame) - OSV_KERNEL_VM_SHIFT) { *(.eh_frame) } :text
-    .eh_frame_hdr : AT(ADDR(.eh_frame_hdr) - OSV_KERNEL_VM_SHIFT) { *(.eh_frame_hdr) } :text :eh_frame
+    .eh_frame : AT(ADDR(.eh_frame) - OSV_KERNEL_VM_SHIFT) { *(.eh_frame) KEEP(*(.eh_frame)); } :text
+    .eh_frame_hdr : AT(ADDR(.eh_frame_hdr) - OSV_KERNEL_VM_SHIFT) { *(.eh_frame_hdr) KEEP(*(.eh_frame_hdr)); } :text :eh_frame
     .note : AT(ADDR(.note) - OSV_KERNEL_VM_SHIFT) { *(.note*) } :text :note
     .gcc_except_table : AT(ADDR(.gcc_except_table) - OSV_KERNEL_VM_SHIFT) { *(.gcc_except_table) *(.gcc_except_table.*) } : text
     .tracepoint_patch_sites ALIGN(8) : AT(ADDR(.tracepoint_patch_sites) - OSV_KERNEL_VM_SHIFT) {

Unfortunately, OSv aborts like so:

OSv v0.55.0-5-g13d2b5fc
Aborted
Halting.

Any ideas what could have happened? I am guessing linker removed to much code :-) but which one?

I have also experimented with lto on one the httpserver-monitoring-api module and I saw quite a significant reduction of the size of the app so file - from  828K to 600K. But the linking phase took over 30 seconds (from a couple of seconds).

The app seemed to work (at least started). But at some point with an earlier experiment, I saw some issues with exception handling. I jam am not able to recreate any more. Possibly because of slightly newer gcc on Ubuntu 20.04.

But some promising results.

Waldek

Nadav Har'El

unread,
May 26, 2020, 4:26:09 AM5/26/20
to Waldek Kozaczuk, OSv Development
On Sun, May 17, 2020 at 7:59 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
One of the things, I would like to tackle for the next release is making the loader.elf smaller. The primary motivation is to lower memory utilization. But at the same time, I would also like to keep the kernel (the release version) as debuggable as it is now (or close to). 

There are already 2 issues that could help us with that:
https://github.com/cloudius-systems/osv/issues/97 - Be more selective on symbols exported from the kernel 
https://github.com/cloudius-systems/osv/issues/106 - Consider single instantiation for some templates (this may apply to more templates than just debug()).

It is very unclear why issue 106 makes the kernel smaller. It shouldn't... Please make sure you look at the size of loader-stripped.elf - not loader.elf.

If you really want to work on the size of loader-stripped.elf you should probably use objdump/readelf/nm to try to figure out what are the biggest parts of this object. Do we have some big functions we need to fix? Do we have big static arrays (BSS) we can allocate at runtime?
Here are some example commits I did in the past making the kernel smaller by using these ideas:

45f93e16f4727d506135101b51b9f2ea98e3a651 - zfs: smaller kernel by dropping utf8 normalization support
8693761b737ad74e3d116f923bf0b4323d9df8b4 - build: don't put unnecessary libraries in every image
c50a090f086968fbfc24ff0a1f085ebd570aa77d - trace: only allocate trace_log when tracepoints are enabled


Besides that, there are other things we can try:
1. Use compiler flags --ffunction-sections -fdata-sections and linker one -gc-sections.
2. Remove RTTI using -fno-rtti (I think we have over 1000 typeinfo entries in the symbols tables) - I do not think we have many dynamic_cast and hopefully, these can be eliminated.
3. Controversial - eliminate exception usage in kernel (how critical is this?) and then use fno-exceptions.

Yes, this would be controversial... I believe we do have some written-for-C++ code in the kernel which
does use exceptions, but never tried to estimate how much, or how difficult it would be to get rid of it.
 
4. Use lto - given we have travis enabled there could be an option passed to makefile/build that could let one build kernel with lto if we deem it to be a dangerous/experimental feature.
5. Do not link C++ std library whole-archive and effectively hide it. How would we support internal apps like cpiod, httpserver, cloud init, etc? Create C API for any symbols that are C++ right now and link those C+++ apps statically against libstd++?

Other things we should try without sacrificing any functionality?

As far as option 1 goes, I have already played with it a bit and applied this patch:

diff --git a/Makefile b/Makefile
index db3c68cf..1b121fd8 100644
--- a/Makefile
+++ b/Makefile
@@ -279,7 +279,7 @@ gcc-sysroot = $(if $(CROSS_PREFIX), --sysroot $(aarch64_gccbase)) \
 #
 #   mydir/*.o EXTRA_FLAGS = <MY_STUFF>
 EXTRA_FLAGS = -D__OSV_CORE__ -DOSV_KERNEL_BASE=$(kernel_base) -DOSV_KERNEL_VM_BASE=$(kernel_vm_base) \
-       -DOSV_KERNEL_VM_SHIFT=$(kernel_vm_shift) -DOSV_LZKERNEL_BASE=$(lzkernel_base)
+       -DOSV_KERNEL_VM_SHIFT=$(kernel_vm_shift) -DOSV_LZKERNEL_BASE=$(lzkernel_base) -ffunction-sections
 EXTRA_LIBS =
 COMMON = $(autodepend) -g -Wall -Wno-pointer-arith $(CFLAGS_WERROR) -Wformat=0 -Wno-format-security \
        -D __BSD_VISIBLE=1 -U _FORTIFY_SOURCE -fno-stack-protector $(INCLUDES) \
@@ -1859,7 +1859,7 @@ $(out)/loader.elf: $(stage1_targets) arch/$(arch)/loader.ld $(out)/bootfs.o $(lo
            $(^:%.ld=-T %.ld) \
            --whole-archive \
              $(libstdc++.a) $(libgcc_eh.a) \
-             $(boost-libs) \
+             $(boost-libs) --gc-sections \

I wonder what kind of size saving this might bring. Why should it bring any saving at all? Just by using shorter (?) relative jumps in code?
Hard to say without gdb... The text "Aborted" suggests abort() was deliberately called by something. gdb may tell you what.


I have also experimented with lto on one the httpserver-monitoring-api module and I saw quite a significant reduction of the size of the app so file - from  828K to 600K. But the linking phase took over 30 seconds (from a couple of seconds).

The app seemed to work (at least started). But at some point with an earlier experiment, I saw some issues with exception handling. I jam am not able to recreate any more. Possibly because of slightly newer gcc on Ubuntu 20.04.

But some promising results.

Waldek

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/b8608978-f28e-42e3-a37f-0e18f11a678f%40googlegroups.com.

Waldek Kozaczuk

unread,
May 26, 2020, 11:28:45 AM5/26/20
to OSv Development
Right now not much, as everything gets exported/included from all objects that are supplied to the linker. But once we start using version script to export what we really want (as per #97) the compiler/linker flags - "--ffunction-sections -fdata-sections" and "-gc-sections" should automatically make unneeded code simply "fall off" (garbage collected) and leave only what is needed. No? Am I wrong in how it works?
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages