Library unikernel or modularization of OSv kernel

32 views
Skip to first unread message

Waldek Kozaczuk

unread,
May 5, 2018, 12:48:16 PM5/5/18
to OSv Development
What if there was a way early into the booting process to load specific features of OSv, that right now are linked in as part of kernel, as libraries? For example if we mounted ROFS as early and possible we could load other elements of the logic (configuration, ZFS, boost program options library, dhcp, etc) on demand and only as needed. Same way even some drivers could be loaded on demand later.

Why would it be beneficial?
  • OSv kernel would be smaller and load even faster
  • OSv kernel can be modularized and therefore better tailored for specific hypervisor or application needs  
What do you think?

Waldek

Geraldo Netto

unread,
May 5, 2018, 11:02:30 PM5/5/18
to Waldek Kozaczuk, OSv Development
Hello Waldek/Friends!

That would be great Waldek!

Maybe some of those references might sound interesting/inspiring :)

Probably you're already aware, but there are some really interesting papers from TU Dresden/KIT/DATA61 CSIRO
But they are mainly focused on L4 family:
http://os.itec.kit.edu (if I'm not mistaken, our friend "problame" from issue  #952 studies here :P)



Kind Regards,

Geraldo Netto
Sapere Aude => Non dvcor, dvco
http://exdev.sf.net/

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dor Laor

unread,
May 6, 2018, 12:53:23 AM5/6/18
to Waldek Kozaczuk, OSv Development
On Sat, May 5, 2018 at 9:48 AM, Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
What if there was a way early into the booting process to load specific features of OSv, that right now are linked in as part of kernel, as libraries? For example if we mounted ROFS as early and possible we could load other elements of the logic (configuration, ZFS, boost program options library, dhcp, etc) on demand and only as needed. Same way even some drivers could be loaded on demand later.

Why would it be beneficial?

You can gain it all by preparing the right image ahead of time, so you'll only have the features/setup you wish before boot.

What's good about your approach is that you can boot a generic image ahead of time and only customize it once
you actually need to use it (when the first packet arrives) and thus save the boot time
 
  • OSv kernel would be smaller and load even faster
  • OSv kernel can be modularized and therefore better tailored for specific hypervisor or application needs  
What do you think?

Waldek

--

Nadav Har'El

unread,
May 6, 2018, 4:20:16 AM5/6/18
to Waldek Kozaczuk, OSv Development
I think the benefits of loadable kernel modules are obvious, and I still remember how nice they were when they were introduced to Linux, circa 1995 :-)

However, these benefits can be offset by other issues such as performance (dynamically-loaded may be slightly slower because of issues like TLS and other stuff, though I bet the difference will be very small) and complexity (of the build system, of the code, etc.).

There's also the question of what to split into modules: If we split off something small, the benefit of taking to out of the kernel will be small. If we split off something which is commonly used, loading it dynamically may be even slower than loading it as part of the kernel (though this will need to be measured). 

I think before starting such an effort, we should have in mind some use case - where, 1. we know that we don't need certain parts of the kernel, and 2. we really really care about the kernel's load time or size (and the already small size and time we have aren't small enough).

Nadav.

Nadav Har'El

unread,
May 6, 2018, 4:38:58 AM5/6/18
to Dor Laor, Waldek Kozaczuk, OSv Development
On Sun, May 6, 2018 at 7:52 AM, Dor Laor <d...@scylladb.com> wrote:
On Sat, May 5, 2018 at 9:48 AM, Waldek Kozaczuk <jwkoz...@gmail.com> wrote:
What if there was a way early into the booting process to load specific features of OSv, that right now are linked in as part of kernel, as libraries? For example if we mounted ROFS as early and possible we could load other elements of the logic (configuration, ZFS, boost program options library, dhcp, etc) on demand and only as needed. Same way even some drivers could be loaded on demand later.

Why would it be beneficial?

You can gain it all by preparing the right image ahead of time, so you'll only have the features/setup you wish before boot.

Right. I remember that Linux started this way (you choose the parts of the kernel you want in compile time), and then around 1995, they added loadable kernel modules.

The best feature of kernel modules was that it allows you not to compile your own kernel, and rather take from some distribution a large collection of pre-compiled modules, and then use only the one you need.
This may be useful for projects like Capstan where you want to use pre-compiled code, but may still want a smaller kernel.

The biggest question I have, is what sort of applications will truely see a big gain from the smaller kernel. After all, OSv is relatively small, and doesn't have thousands of rarely used drivers and features like Linux has, so there's not *too* much to shave off this way. There is, but not dramatic like it is in Linux.

Geraldo Netto

unread,
May 6, 2018, 2:07:57 PM5/6/18
to Nadav Har'El, Dor Laor, Waldek Kozaczuk, OSv Development
Hey Guys,

Considering Dor/Nadav input, maybe we could approach it differently
What if we could do a static analysis of application code to determine which syscalls/posix apis the application uses
and then we only add the syscalls/posix interfaces that are really required?
IMHO, Musl is flexible enough for this


Kind Regards,

Geraldo Netto
Sapere Aude => Non dvcor, dvco
http://exdev.sf.net/

Joe Duarte

unread,
May 7, 2018, 3:40:17 AM5/7/18
to OSv Development
Hi all – I'd call Geraldo's idea exogenous tree-shaking, where the code to shake out would not be application code, but external libraries, kernel modules, apis, syscalls, etc. This sounds good if OSv supports it – if it's modular enough along the right boundaries for such tree-shaking.

Nadav asked "The biggest question I have, is what sort of applications will truely see a big gain from the smaller kernel."

Good question. Lots of possible answers. Using conventional approaches, I predict you'll see the biggest gains if you use link-time optimization (-flto) and profile guided optimization on the remaining, smaller kernel. I searched the enormous OSv root makefile, and lto is never used. Some of the biggest wins in real world applications are not from -O2 or -O3, but from -flto and pgo.

But there's an interesting approach that would bring OSv and the application together in a novel way: the ALLVM project:

An ‘ALLVM system’ is one in which all software components — except a small set needed for bootstrapping — are represented in a virtual instruction set instead of native machine code. The goal of the approach is to enable sophisticated compiler analyses and transformations to be applied across arbitrary software boundaries — not just caller-callee boundaries analyzed using traditional interprocedural techniques, but also several others: between applications and third-party libraries; applications and the underlying operating system; and between communicating processes in a distributed system.
Many software components already ship as virtual instruction sets (loosely defined as “not a native hardware instruction set”), including software in managed languages like Java, C# and Scala; scripting languages like Python and Javascript; and GPGPU code in languages like CUDA and OpenCL. The major change ALLVM enables is for statically compiled languages like C, C++, Fortran, OCaml, Swift, etc. For software written in these languages, we represent and ship code using the LLVM Virtual Instruction Set (see http://llvm.org), previously developed in our research group and now widely used in production systems, including MacOS, iOS, and FreeBSD. LLVM already provides some of the capabilities required for an ALLVM system, including the ability to ship software in LLVM bitcode form and the ability to perform install-time and just-in-time compilation.
 
The key difference between LLVM and ALLVM is that LLVM enables individual software components to be analyzed and optimized throughout their lifetime (“lifelong compilation”) whereas ALLVM enables all the software on a system to be analyzed and optimized together, throughout the lifetime of the software (“system-wide, lifelong compilation”). Several research projects within the ALLVM umbrella are exploring the performance, reliability, security and software engineering benefits of the ALLVM approach.

This sounds cool as hell. If it works – I haven't had time to explore how experimental or usable it is right now, but I feel like it's inevitable. It makes too much sense to fail.

Another observation. When OSv is built/installed, you take the standard nix approach of installing countless app and library binaries that were compiled by random strangers using extremely weak compiler optimization settings. Like this line:

apt-get install build-essential libboost-all-dev genromfs autoconf libtool openjdk-7-jdk ant qemu-utils maven libmaven-shade-plugin-java python-dpkt tcpdump gdb qemu-system-x86 gawk gnutls-bin openssl python-requests lib32stdc++-4.9-dev p11-kit


My big complaint against the Unix/Linux culture is that people don't pay nearly enough attention to the compiler, and to all the things modern compilers can do for them. Behind every one of the packages above will be a vanilla makefile that does not use lto or other major optimizations, fast math, and which treats a Pentium 4 or something as the default baseline hardware and instruction set (because they don't even stipulate a -march in many cases). Beyond that, there are other compilers in the world – you don't have to use gcc. For example, it's likely (from scattered evidence) that the Intel compiler (Parallel Studio) is better than gcc 7.x. clang might be better now too, though the gcc vs. clang benchmarks of the past were somewhat murky. And MS Visual Studio 2017 might build better code than gcc for both Windows Servers and Linux. (It might still lean on clang for Linux, or the Intel compiler – not sure).

For both optimizations and modularity, you might want to take a look at Intel's Clear Linux. Note also that compiling the OS oneself, or at least the kernel, is somewhat more common in the FreeBSD community than in Linux land – there might be something useful in what they're doing other than pruning unneeded drivers, for example, so it might be worth a look.

I'm pretty sure we're giving away more than 10 percent performance with the status quo approach of vanilla makefiles and unused compiler optimization flags. At this point, people shouldn't even be talking about gcc 4.x or even 5.x. We need to move forward, and I think there's a lot of performance headroom left for OSv right now, with gcc 7.3, lto, pgo, etc. Ideally a unikernel should support exogenous tree-shaking of the sort Geraldo proposed, where the app only gets the libraries and calls it needs. (I think the Microsoft unikernels or library OSes were like this.) Then we should be able to optimize the hell out of such apps/kernel hybrids with modern compilers, Haswell and up stipulations on march, and boundary busting bytecode like the ALLVM project.

Speaking of static analysis, why did OSv stop using the free Coverity Scan service?


It would be handy to be able to break out different profiles and subsets of OSv and submit them to Coverity separately. In principle, the application/kernel hybrids that y'all are talking about could be forked into different projects and also submitted to Coverity Scan.

Cheers,

Joe Duarte, PhD
Phoenix, AZ


To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages