Benchmarking OSv

170 views
Skip to first unread message

twee...@comcast.net

unread,
Feb 24, 2020, 9:32:25 PM2/24/20
to OSv Development
Running some benchmarks on OSv. Getting some results opposite to what I was expecting based off what I have learned of unikernels. When running my benchmark on the host with 32 cores, it runs in about 2s. However, when running the OSv build inside on KVM (run.py), I pass in 32 cores and it takes 9s to complete. My assumption was it would be the same speed, possibly faster. Any ideas as to why the decrease?

Dor Laor

unread,
Feb 24, 2020, 9:45:06 PM2/24/20
to twee...@comcast.net, OSv Development
It depends on many factors. You should check what's the bottleneck first.
I'd start with a single core and later with 2 cores and grow it exponentially.
It either may be the IO overhead (most chances the network
virtualization) or the
cpu overhead with regard to locking. OSv has a different filesystem too.
Let's see what you have to share first.

On Mon, Feb 24, 2020 at 6:32 PM <twee...@comcast.net> wrote:
>
> Running some benchmarks on OSv. Getting some results opposite to what I was expecting based off what I have learned of unikernels. When running my benchmark on the host with 32 cores, it runs in about 2s. However, when running the OSv build inside on KVM (run.py), I pass in 32 cores and it takes 9s to complete. My assumption was it would be the same speed, possibly faster. Any ideas as to why the decrease?
>
> --
> You received this message because you are subscribed to the Google Groups "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/cac7dca2-eaa3-4c66-a8b7-705182e67942%40googlegroups.com.

twee...@comcast.net

unread,
Feb 24, 2020, 10:30:17 PM2/24/20
to OSv Development
As follows:
OSv booted on KVM-
1 core: 21.18s
2: 12.25
4: 7.08
8: 4.95s
16: 6.59s
32: 9.28s

Now on just regular Ubuntu it is as follows:
1: 15.22s
2: 9.64s
4: 6.4s
8: 3.6s
16: 2.93s
32: 2.21s


To your networking statement, is there a way to disable setting up the networking environment? My benchmark does not require any form of an internet connection.

In the event anyone is curious as to what I am bench marking, it is the Splash-2 apps / kernels with a rather large problem size being solved. In this benchmark it is a 2048x2048 matrix

Nadav Har'El

unread,
Feb 25, 2020, 3:12:38 AM2/25/20
to Matthew Weekley, OSv Development
On Tue, Feb 25, 2020 at 5:30 AM <twee...@comcast.net> wrote:
As follows:
OSv booted on KVM-
1 core: 21.18s
2: 12.25
4: 7.08
8: 4.95s
16: 6.59s
32: 9.28s

Now on just regular Ubuntu it is as follows:
1: 15.22s
2: 9.64s
4: 6.4s
8: 3.6s
16: 2.93s
32: 2.21s


While *ideally* a unikernel like OSv could provide better performance than traditional kernels because of things like lower system call and context switch overhead, less locking, and other things, there are many things working *against* this ideal, and resulting in disappointing performance comparisons:

1. Most modern high-performance software has evolved on Linux, and evolved with its limitations in mind.
So for example, if Linux's context switches are slow, application developers start writing software which lowers the number of context switches - even to the point of just one thread per core. If Linux's system calls are slow, application developers start to batch many operations in one system call, starting with epoll() some 20 years ago, and culminating with io_uring recently introduced to Linux. With these applications, it is pointless to speed up system calls or context switches, because these take a tiny percentage of the runtime.

2. They say a chain is as weak as its weakest link.
This is even more true in many-core performance (due to Amdahl's law). Complex software uses many many OS features. If OSv speeds up context switches and system calls and networking (say) by 10%, but then some other thing the software does is 2 times slower than in Linux, it is very possible that OSv's overall performance will be *lower* than Linux. Unfortunately, this is exactly what we saw a few years ago when ScyllaDB (then "Cloudius Systems") was actively benchmarking and developing OSv: Many benchmarks we tried were initially slower in OSv than in Linux. When we profiled what happened, we discovered that although many things in OSv were better than Linux, one (or a few) specific things in OSv which were significantly efficient in OSv than in Linux. This could be some silly filesystem feature we never thought was very important but was very frequently used in this application, it could be that OSv's scheduler wasn't as clever as Linux's to handle this specific use case. It could be that some specific algorithm was lock-free in Linux but uses locks in OSv, so becomes increasingly worse on OSv with the more CPUs you have. The main point is that if an application uses 100 different OS features - Linux had hundreds of developers optimizing *each* of these 100 features for years. The handful OSv developers focused on specific features and made clever improvements to them - but the rest are probably less optimized than Linux.

3. Many-core development is hot
When the OSv project started 7 years ago, it was already becoming clear that many-core machines were the future, but it wasn't as obvious as it is today. So OSv could get some early wins by developing some clever lock-reducing improvements to its networking stack and other places. But the Linux developers are not idiots, and spent the last 7 years improving Linux's scalability on many-core systems. And they went further than OSv ever got - they support NUMA configurations, multi-queue network cards, and a plethora of new ideas for improving scalability on many core systems. On modern many-core, multi-socket, multi-queue-network-card systems, there is a high chance that Linux will be faster than OSv.

4. Posix API is slow
This is related to the first issue (of software having evolved on Linux), but this time for more "traditional" software and not really state-of-the-art applications using the latest fads like io_uring. This traditional software is using the Posix API - filesystem, networking, memory handling etc. that was designed decades ago, and make various problematic guarantees. Just as one example, the possibility to poll the same file descriptor from many threads requires a lock every time this file descriptor is used. This slows down both Linux and OSv, but not giving OSv any advantage over Linux, because both need to correctly support the same slow API. Or even the contrary - Linux and its hundreds of developers continue to come up with clever tricks for each of these details (e.g., use RCU instead of locks for file descriptors) while OSv's few (today, *very* few) developers only had time to optimize a few specific cases.

So it's no longer clear that if raw performance is your goal, OSv is the right direction. OSv can still be valuable for other reasons - smaller self-contained images, smaller code base, faster boot, etc. For raw performance, our company (ScyllaDB) went on a different direction: The Seastar library (http://nadav.harel.org.il/seastar/, https://github.com/scylladb/seastar) allows writing high-performance applications on regular Linux, by avoiding or minimizing all the features that makes Linux slow (like context switches) or cause scalability problems on modern many-core machines. Initially Seastar ran on both Linux and OSv (with identical performance, because it avoided the heavy parts of both), but unfortunately today it is using too many new Linux features which don't work on OSv - so it no longer runs on OSv.
 

To your networking statement, is there a way to disable setting up the networking environment? My benchmark does not require any form of an internet connection.

In the event anyone is curious as to what I am bench marking, it is the Splash-2 apps / kernels with a rather large problem size being solved. In this benchmark it is a 2048x2048 matrix

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.

Matias Vara

unread,
Feb 25, 2020, 6:41:25 AM2/25/20
to Nadav Har'El, Matthew Weekley, OSv Development
Hello, 

This is a very interesting analysis. I wonder if you always want to use Linux as the OS for your guests. I played a bit with microservices and I had the impression that this may be a use-case in which you could try with something different than Linux as guest. For example, if you want to do continuous deployment of microservices, your guest has to boot fast. This requirement may not be easily achieved by using Linux. This is just an example and I know some people that they are booting Linux very fast. 

Matias

twee...@comcast.net

unread,
Feb 25, 2020, 7:00:16 AM2/25/20
to OSv Development
Very well explained. Thank you for that. That does make perfect sense as well.

Waldek Kozaczuk

unread,
Feb 25, 2020, 8:52:48 AM2/25/20
to OSv Development
Hi,

I am quite late to the party :-) Could you run OSv on single CPU with verbose on (add -V to run.py) and send us the output so we can see a little more what is happening. To disable networking you need to add '--nics=0' (for all 50 options run.py supports run it with '--help'). I am not familiar with that benchmark but I wonder if it needs read-write FS (ZFS in OSv case), if not that you can build OSv images with read-only FS (./scripts/build fs=rofs). Lastly, you can improve boot time by running OSv on firecracker (https://github.com/cloudius-systems/osv/wiki/Running-OSv-on-Firecracker) or on QEMU microvm (-p qemu_imcrovm - requires QEMU >= 4.1), with read-only FS on both OSv should boot within 5ms, ZFS within 40ms). Last thing - writing to console on OSv can be quite slow, I wonder how much this benchmark does it.

While I definitely agree with my colleague Nadav, where he essentially says do not use OSv if the raw performance matters (database for example) and Linux will beat it no matter what, OSv may have advantages in use cases where pure performance does not matter (it still needs to be reasonable). I think the best use cases for OSv are serverless or stateless apps (microservices or web assembly) running on single CPU where all state management is delegated to a remote persistent store (most custom-built business apps are like that) and where high isolation matters. 

Relatedly, I think it might be more useful to think of OSv (and other unikernels) as highly isolated processes. To that end, we still need to optimize memory overhead (stacks for example) and improve virtio-fs support (in this case to run the app on OSv you do not need full image, just kernel to run a Linux app).

Also, I think the lack of good tooling in unikernel space affects their adoption. Compare it with docker - build, push, pull, run. OSv has its equivalent - capstan - but at this point, we do not really have a registry where one can pull the latest OSv kernel or push, pull images. Trying to run an app on OSv is still quite painful to a business app developer - it probably takes at least 30 minutes or so. 

Lastly, I think one of the main reasons for Docker adoption, was repeatability (besides its fantastic ease of use) where one can create an image and expect it to run almost the same way in production. Imagine you can achieve that with OSv. 

Waldek

twee...@comcast.net

unread,
Feb 25, 2020, 10:05:08 AM2/25/20
to OSv Development
Thanks for the response! I will get this information to you after work with the few modifications you recommended! The application is essentially just testing CPU performance using multiprocessing. Nothing too fancy about it! The code I am using can be found at:


In side of the kernels folder located at radix.c and I change the problem size to 16,777,206. 

If you happen to examine the code, do ignore the lacking cleanness of the code...we just smashed everything into one file for simplicity on our end. (Running the same code across all platforms being benchmarked). 

Waldek Kozaczuk

unread,
Feb 25, 2020, 1:09:08 PM2/25/20
to OSv Development
So I did try to build and run the radix test (please note my Ubuntu laptop has only 4 cores and hyper-threading disabled). BTW it seems that particular benchmark does not need read-write FS so I used ROFS):

./scripts/manifest_from_host.sh -w ../splash2-posix/kernels/radix/radix && ./scripts/build fs=rofs --append-manifest -j4


Linux host 1 cpu:

./radix -p 1 -r4096


Integer Radix Sort

    262144 Keys

    1 Processors

    Radix = 4096

    Max key = 524288



                 PROCESS STATISTICS

              Total            Rank            Sort

Proc          Time             Time            Time

   0           7335            2568            4765


                 TIMING INFORMATION

Start time                        : 1582652832386234

Initialization finish time        : 1582652832444092

Overall finish time               : 1582652832451427

Total time with initialization    :            65193

Total time without initialization :             7335



Linux host 2 cpus:
./radix -p 2 -r4096

Integer Radix Sort
     262144 Keys
     2 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0           4325            1571            2704

                 TIMING INFORMATION
Start time                        : 1582652821496771
Initialization finish time        : 1582652821531279
Overall finish time               : 1582652821535604
Total time with initialization    :            38833
Total time without initialization :             4325

host 4 cpus:
./radix -p 4 -r4096

Integer Radix Sort
     262144 Keys
     4 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0           2599            1077            1470

                 TIMING INFORMATION
Start time                        : 1582653906150199
Initialization finish time        : 1582653906171932
Overall finish time               : 1582653906174531
Total time with initialization    :            24332
Total time without initialization :             2599


OSv 1 CPU
./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 --nogdb -m 64M -c 1 --block-device-cache writeback,aio=threads -e '/radix -p 1 -r4096'
OSv v0.54.0-119-g4ee4b788
Booted up in 3.75 ms
Cmdline: /radix -p 1 -r4096 

Integer Radix Sort
     262144 Keys
     1 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0           6060            2002            4049

                 TIMING INFORMATION
Start time                        : 1582652845450708
Initialization finish time        : 1582652845500348
Overall finish time               : 1582652845506408
Total time with initialization    :            55700
Total time without initialization :             6060

OSv 2 CPUs:
./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 --nogdb -m 64M -c 2 --block-device-cache writeback,aio=threads -e '/radix -p 2 -r4096'
OSv v0.54.0-119-g4ee4b788
Booted up in 4.81 ms
Cmdline: /radix -p 2 -r4096 

Integer Radix Sort
     262144 Keys
     2 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0           5797            1702            4089

                 TIMING INFORMATION
Start time                        : 1582653305076852
Initialization finish time        : 1582653305129792
Overall finish time               : 1582653305135589
Total time with initialization    :            58737
Total time without initialization :             5797

OSv 4 cpus
./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 --nogdb -m 64M -c 4 --block-device-cache writeback,aio=threads -e '/radix -p 4 -r4096'
OSv v0.54.0-119-g4ee4b788
Booted up in 5.26 ms
Cmdline: /radix -p 4 -r4096 

Integer Radix Sort
     262144 Keys
     4 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0           6498            2393            4099

                 TIMING INFORMATION
Start time                        : 1582653946823458
Initialization finish time        : 1582653946875522
Overall finish time               : 1582653946882020
Total time with initialization    :            58562
Total time without initialization :             6498


As you can see with single CPU the benchmark seems to be 10-15 % faster. But with two and four CPUs OSv barely sees any improvements, whereas on host the app runs 40% faster. So OSv does not seem to scale at all (somebody mentioned it used to) so it would be nice to understand why. OSv has many sophisticated tracing tools that can help here - https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py


Waldek

BTW1. I tried to bump size of the matrix to something higher but with -r8192 the app crashes on both Linux and OSv.
BTW2. It would be interestingly to compare OSv with Linux guest (vs host).

Waldek Kozaczuk

unread,
Feb 25, 2020, 1:31:17 PM2/25/20
to OSv Development
Also I ran the 2 CPU example with all tracepoints on and here is what I got:

./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 -m 64M -c 2 --block-device-cache writeback,aio=threads -e '/radix -p 2 -r4096' -H --trace \*

#In other terminal
./scripts/trace.py extract
./scripts/trace.py summary
Collected 38141 samples spanning 100.38 ms

Time ranges:

  CPU 0x01:  0.000000000 -  0.100380272 =  100.38 ms
  CPU 0x00:  0.083725677 -  0.100295947 =   16.57 ms

Tracepoint statistics:

  name                           count
  ----                           -----
  access_scanner                  5145
  async_worker_started               1
  clear_pte                        256
  condvar_wait                       8
  condvar_wake_all                  12
  memory_free                       64
  memory_malloc                     68
  memory_malloc_large                9
  memory_malloc_mempool             38
  memory_malloc_page                 3
  memory_page_alloc                  9
  memory_page_free                 262
  mutex_lock                      5367
  mutex_lock_wait                   28
  mutex_lock_wake                   30
  mutex_receive_lock                 8
  mutex_send_lock                    8
  mutex_unlock                    5377
  pcpu_worker_sheriff_started        1
  pool_alloc                        38
  pool_free                         52
  pool_free_same_cpu                52
  sched_idle                        13
  sched_idle_ret                    13
  sched_ipi                          7
  sched_load                       118
  sched_migrate                      1
  sched_preempt                     23
  sched_queue                       71
  sched_sched                      101
  sched_switch                      70
  sched_wait                        46
  sched_wait_ret                    43
  sched_wake                      5197
  thread_create                      4
  timer_cancel                    5209
  timer_fired                     5150
  timer_set                       5211
  vfs_pwritev                       13
  vfs_pwritev_ret                   13
  waitqueue_wake_all                 1
  waitqueue_wake_one                 1

./scripts/trace.py cpu-load
 0.000000000             1
 0.000000000             1
 0.000000000             1
 0.000002133             0
 0.000002546             1
 0.000002987             1
 0.000030307             2
 0.000030768             2
 0.000032967             1
 0.000040996             2
 0.000041268             2
 0.000041831             1
 0.000043297             2
 0.000043585             2
 0.000045945             1
 0.000046650             0
 0.000290645             1
 0.000291750             1
 0.000294524             2
 0.000295683             1
 0.000297979             0
 0.000304896             1
 0.000305348             1
 0.000306794             2
 0.000307488             1
 0.000309413             0
 0.000316847             1
 0.000317216             1
 0.000318711             2
 0.000319370             1
 0.000321079             0
 0.000327622             1
 0.000328009             1
 0.000531069             2
 0.000532382             1
 0.000539432             0
 0.000573914             1
 0.000574651             1
 0.000576728             0
 0.000584365             1
 0.000584997             1
 0.000587286             0
 0.000591755             1
 0.000592399             1
 0.000594461             0
 0.000598470             1
 0.000599040             1
 0.000611236             0
 0.000835164             1
 0.000836416             1
 0.000843416             2
 0.000843890             2
 0.000845046             1
 0.000856800             2
 0.000857064             2
 0.000858037             1
 0.000862489             0
 0.086250040          2  0
 0.086252051          3  0
 0.086253257          2  0
 0.086254377          3  0
 0.086296669          2  0
 0.086297441          3  0
 0.086336375          2  0
 0.086337328          3  0
 0.086337723          2  0
 0.086338657          3  0
 0.087719001          2  0
 0.087720113          3  0
 0.089164101          2  0
 0.089165836          3  0
 0.089166234          2  0
 0.089167249          3  0
 0.000000000             1
 0.000000000             1
 0.000000000             1
 0.000002133             0
 0.000002546             1
 0.000002987             1
 0.000030307             2
 0.000030768             2
 0.000032967             1
 0.000040996             2
 0.000041268             2
 0.000041831             1
 0.000043297             2
 0.000043585             2
 0.000045945             1
 0.000046650             0
 0.000290645             1
 0.000291750             1
 0.000294524             2
 0.000295683             1
 0.000297979             0
 0.000304896             1
 0.000305348             1
 0.000306794             2
 0.000307488             1
 0.000309413             0
 0.000316847             1
 0.000317216             1
 0.000318711             2
 0.000319370             1
 0.000321079             0
 0.000327622             1
 0.000328009             1
 0.000531069             2
 0.000532382             1
 0.000539432             0
 0.000573914             1
 0.000574651             1
 0.000576728             0
 0.000584365             1
 0.000584997             1
 0.000587286             0
 0.000591755             1
 0.000592399             1
 0.000594461             0

Is my understanding correct that the load was not spread evenly across both cpus?

zhiting zhu

unread,
Feb 25, 2020, 1:38:29 PM2/25/20
to Waldek Kozaczuk, OSv Development
Since osv is running as vm, would it be more fair to run the benchmark in linux vm for comparison? 

Zhiting

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.

twee...@comcast.net

unread,
Feb 25, 2020, 2:12:10 PM2/25/20
to OSv Development
Wow, awesome results! I will try to reproduce shortly from here! If you want to increase the problem size, add -n 16777216 (which is 64x bigger than the current problem size of 262144, something will take take a couple of seconds to run). 

I am taking note of your commands to run the build and will be doing the same on my machine to see what I produce. I will post screenshots :)


On Tuesday, February 25, 2020 at 1:09:08 PM UTC-5, Waldek Kozaczuk wrote:

Waldek Kozaczuk

unread,
Feb 25, 2020, 2:50:52 PM2/25/20
to OSv Development
With the problem size bigger I see OSv consistently beating Linux host (at least on my laptop, Ubuntu 19.10).

Linux:
./radix -p 1 -r4096 -n16777216

Integer Radix Sort
     16777216 Keys
     1 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         620025          127815          492206

                 TIMING INFORMATION
Start time                        : 1582659397056280
Initialization finish time        : 1582659400086786
Overall finish time               : 1582659400706811
Total time with initialization    :          3650531
Total time without initialization :           620025

./radix -p 2 -r4096 -n16777216

Integer Radix Sort
     16777216 Keys
     2 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         333298           74005          258944

                 TIMING INFORMATION
Start time                        : 1582659471193401
Initialization finish time        : 1582659472761435
Overall finish time               : 1582659473094733
Total time with initialization    :          1901332
Total time without initialization :           333298

./radix -p 4 -r4096 -n16777216

Integer Radix Sort
     16777216 Keys
     4 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         176586           38013          137823

                 TIMING INFORMATION
Start time                        : 1582659586192838
Initialization finish time        : 1582659586997985
Overall finish time               : 1582659587174571
Total time with initialization    :           981733
Total time without initialization :           176586

OSv:
./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 --nogdb -m 1G -c 1 --block-device-cache writeback,aio=threads -e '/radix -p 1 -r4096 -n16777216'
OSv v0.54.0-119-g4ee4b788
Booted up in 4.15 ms
Cmdline: /radix -p 1 -r4096 -n16777216 

Integer Radix Sort
     16777216 Keys
     1 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         535304          130142          405145

                 TIMING INFORMATION
Start time                        : 1582659265555955
Initialization finish time        : 1582659268568276
Overall finish time               : 1582659269103580
Total time with initialization    :          3547625
Total time without initialization :           535304

./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 --nogdb -m 1G -c 2 --block-device-cache writeback,aio=threads -e '/radix -p 2 -r4096 -n16777216'
OSv v0.54.0-119-g4ee4b788
Booted up in 5.39 ms
Cmdline: /radix -p 2 -r4096 -n16777216 

Integer Radix Sort
     16777216 Keys
     2 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         293180           78253          211421

                 TIMING INFORMATION
Start time                        : 1582659500041834
Initialization finish time        : 1582659501640537
Overall finish time               : 1582659501933717
Total time with initialization    :          1891883
Total time without initialization :           293180

./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 --nogdb -m 1G -c 4 --block-device-cache writeback,aio=threads -e '/radix -p 4 -r4096 -n16777216'
OSv v0.54.0-119-g4ee4b788
Booted up in 5.03 ms
Cmdline: /radix -p 4 -r4096 -n16777216 

Integer Radix Sort
     16777216 Keys
     4 Processors
     Radix = 4096
     Max key = 524288


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         163844           47966          114362

                 TIMING INFORMATION
Start time                        : 1582659574048566
Initialization finish time        : 1582659575031430
Overall finish time               : 1582659575195274
Total time with initialization    :          1146708
Total time without initialization :           163844

So maybe (at least in this case) OSv scales pretty well with the number of CPUs:

Most likely because in this use case is very computation-heavy and OSv does not make many exits to the host (I wonder if OSv tracing mechanism has a way to show count of all exists). 

twee...@comcast.net

unread,
Feb 25, 2020, 7:12:50 PM2/25/20
to OSv Development
1 cpu on linux:

Integer Radix Sort
     
16777216 Keys
     
1 Processors

     
Radix = 1024

     
Max key = 524288


                 PROCESS STATISTICS
               
Total            Rank            Sort
 
Proc          Time             Time            Time

   
0        1510833          311553         1199271

                 TIMING INFORMATION
Start time                        : 1582675055600374
Initialization finish time        : 1582675061040740
Overall finish time               : 1582675062551573
Total time with initialization    :          6951199
Total time without initialization :          1510833




2cpu on linux:
Integer Radix Sort
     
16777216 Keys
     
2 Processors

     
Radix = 1024

     
Max key = 524288


                 PROCESS STATISTICS
               
Total            Rank            Sort
 
Proc          Time             Time            Time

   
0         760267          159099          597266

                 TIMING INFORMATION
Start time                        : 1582675165095355
Initialization finish time        : 1582675167810016
Overall finish time               : 1582675168570283
Total time with initialization    :          3474928
Total time without initialization :           760267


4cpu on linux:
Integer Radix Sort
     
16777216 Keys
     
4 Processors

     
Radix = 1024

     
Max key = 524288


                 PROCESS STATISTICS
               
Total            Rank            Sort
 
Proc          Time             Time            Time

   
0         463616          104424          358749

                 TIMING INFORMATION
Start time                        : 1582675303803195
Initialization finish time        : 1582675305167855
Overall finish time               : 1582675305631471
Total time with initialization    :          1828276
Total time without initialization :           463616


8cpu on linux
Integer Radix Sort
     
16777216 Keys

     
8 Processors
     
Radix = 1024

     
Max key = 524288


                 PROCESS STATISTICS
               
Total            Rank            Sort
 
Proc          Time             Time            Time

   
0         247667           57513          162350

                 TIMING INFORMATION
Start time                        : 1582675359221291
Initialization finish time        : 1582675359903200
Overall finish time               : 1582675360150867
Total time with initialization    :           929576
Total time without initialization :           247667


1cpu on osv:
 ./scripts/run.py -p kvm -V --nics 0 --nogdb -m 1G -c 1 --block-device-cache writeback,aio=threads -e '/radix -p 1 -n16777216'
OSv v0.54.0-108-g69486729
1 CPUs detected
Firmware vendor: SeaBIOS
bsd
: initializing - done
VFS
: mounting ramfs at /
VFS
: mounting devfs at /dev
net
: initializing - done
vga
: Add VGA device instance
virtio
-blk: Add blk device instances 0 as vblk0, devsize=4357632
random
: virtio-rng registered as a source.
random
: <Software, Yarrow> initialized
VFS
: unmounting /dev
VFS
: mounting rofs at /rofs
VFS
: mounting devfs at /dev
VFS
: mounting procfs at /proc
VFS
: mounting sysfs at /sys
VFS
: mounting ramfs at /tmp
Booted up in 320.89 ms
Cmdline: /radix -p 1 -n16777216


Integer Radix Sort
     16777216 Keys
     1 Processors
     Radix = 1024
     Max key = 524288

random: device unblocked.


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0        1536974          312648         1224318

                 TIMING INFORMATION
Start time                        : 1582675604420160
Initialization finish time        : 1582675609828738
Overall finish time               : 1582675611365712
Total time with initialization    :          6945552
Total time without initialization :          1536974

program exited with status 0
VFS: unmounting /
dev
VFS
: unmounting /proc
VFS
: unmounting /
ROFS
: spent 2.42 ms reading from disk
ROFS
: read 76 512-byte blocks from disk
ROFS
: allocated 73 512-byte blocks of cache memory
ROFS
: hit ratio is 91.89%
Powering off.
2cpu osv:
./scripts/run.py -p kvm -V --nics 0 --nogdb -m 1G -c 2 --block-device-cache writeback,aio=threads -e '/radix -p 2 -n16777216'
OSv v0.54.0-108-g69486729
2 CPUs detected
Firmware vendor: SeaBIOS
bsd
: initializing - done
VFS
: mounting ramfs at /
VFS
: mounting devfs at /dev
net
: initializing - done
vga
: Add VGA device instance
virtio
-blk: Add blk device instances 0 as vblk0, devsize=4357632
random
: virtio-rng registered as a source.
random
: <Software, Yarrow> initialized
VFS
: unmounting /dev
VFS
: mounting rofs at /rofs
VFS
: mounting devfs at /dev
VFS
: mounting procfs at /proc
VFS
: mounting sysfs at /sys
VFS
: mounting ramfs at /tmp
Booted up in 329.94 ms
Cmdline: /radix -p 2 -n16777216


Integer Radix Sort
     16777216 Keys
     2 Processors
     Radix = 1024
     Max key = 524288

random: device unblocked.


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         897061          172935          721892

                 TIMING INFORMATION
Start time                        : 1582675669879986
Initialization finish time        : 1582675672671371
Overall finish time               : 1582675673568432
Total time with initialization    :          3688446
Total time without initialization :           897061

program exited with status 0
VFS: unmounting /
dev
VFS
: unmounting /proc
VFS
: unmounting /
ROFS
: spent 1.48 ms reading from disk
ROFS
: read 76 512-byte blocks from disk
ROFS
: allocated 73 512-byte blocks of cache memory
ROFS
: hit ratio is 91.89%
Powering off.

4 cpu OSV:
./scripts/run.py -p kvm -V --nics 0 --nogdb -m 1G -c 4 --block-device-cache writeback,aio=threads -e '/radix -p 4 -n16777216'
OSv v0.54.0-108-g69486729
4 CPUs detected
Firmware vendor: SeaBIOS
bsd
: initializing - done
VFS
: mounting ramfs at /
VFS
: mounting devfs at /dev
net
: initializing - done
vga
: Add VGA device instance
virtio
-blk: Add blk device instances 0 as vblk0, devsize=4357632
random
: virtio-rng registered as a source.
random
: <Software, Yarrow> initialized
VFS
: unmounting /dev
VFS
: mounting rofs at /rofs
VFS
: mounting devfs at /dev
VFS
: mounting procfs at /proc
VFS
: mounting sysfs at /sys
VFS
: mounting ramfs at /tmp
Booted up in 319.83 ms
Cmdline: /radix -p 4 -n16777216


Integer Radix Sort
     16777216 Keys
     4 Processors
     Radix = 1024
     Max key = 524288

random: device unblocked.


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         600536          155428          336133

                 TIMING INFORMATION
Start time                        : 1582675736500393
Initialization finish time        : 1582675738118761
Overall finish time               : 1582675738719297
Total time with initialization    :          2218904
Total time without initialization :           600536

program exited with status 0
VFS: unmounting /
dev
VFS
: unmounting /proc
VFS
: unmounting /
ROFS
: spent 2.52 ms reading from disk
ROFS
: read 76 512-byte blocks from disk
ROFS
: allocated 73 512-byte blocks of cache memory
ROFS
: hit ratio is 91.89%
Powering off.

8cpu osv
./scripts/run.py -p kvm -V --nics 0 --nogdb -m 1G -c 8 --block-device-cache writeback,aio=threads -e '/radix -p 8 -n16777216'
OSv v0.54.0-108-g69486729
8 CPUs detected
Firmware vendor: SeaBIOS
bsd
: initializing - done
VFS
: mounting ramfs at /
VFS
: mounting devfs at /dev
net
: initializing - done
vga
: Add VGA device instance
virtio
-blk: Add blk device instances 0 as vblk0, devsize=4357632
random
: virtio-rng registered as a source.
random
: <Software, Yarrow> initialized
VFS
: unmounting /dev
VFS
: mounting rofs at /rofs
VFS
: mounting devfs at /dev
VFS
: mounting procfs at /proc
VFS
: mounting sysfs at /sys
VFS
: mounting ramfs at /tmp
Booted up in 331.23 ms
Cmdline: /radix -p 8 -n16777216


Integer Radix Sort
     16777216 Keys
     8 Processors
     Radix = 1024
     Max key = 524288

random: device unblocked.


                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         585043          129105          234489

                 TIMING INFORMATION
Start time                        : 1582675788233688
Initialization finish time        : 1582675789537907
Overall finish time               : 1582675790122950
Total time with initialization    :          1889262
Total time without initialization :           585043

program exited with status 0
VFS: unmounting /
dev
VFS
: unmounting /proc
VFS
: unmounting /
ROFS
: spent 2.56 ms reading from disk
ROFS
: read 76 512-byte blocks from disk
ROFS
: allocated 73 512-byte blocks of cache memory
ROFS
: hit ratio is 91.89%
Powering off.
Here are my results!

Waldek Kozaczuk

unread,
Feb 25, 2020, 11:11:31 PM2/25/20
to OSv Development
I see that you are running OSv on regular QEMU (see '-p kvm' unlike '-p qemu_microvm') hence very slow boot time. But when I run the same OSv guest command like yours on regular qemu  on my laptop I get boot time around 120ms which is 3 times faster than yours: 

./scripts/run.py -p kvm -V --nics 0 --nogdb -m 1G -c 1 --block-device-cache writeback,aio=threads -e '/radix -p 1 -n16777216'
OSv v0.54.0-119-g4ee4b788
1 CPUs detected
Firmware vendor: SeaBIOS
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done
vga: Add VGA device instance
virtio-blk: Add blk device instances 0 as vblk0, devsize=6480896
random: virtio-rng registered as a source.
random: intel drng, rdrand registered as a source.
random: <Software, Yarrow> initialized
VFS: unmounting /dev
VFS: mounting rofs at /rofs
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
VFS: mounting sysfs at /sys
VFS: mounting ramfs at /tmp
Booted up in 126.02 ms
Cmdline: /radix -p 1 -n16777216

Integer Radix Sort
     16777216 Keys
     1 Processors
     Radix = 1024
     Max key = 524288

random: device unblocked.

                 PROCESS STATISTICS
               Total            Rank            Sort
 Proc          Time             Time            Time
    0         459824          131091          328714

                 TIMING INFORMATION
Start time                        : 1582690117331280
Initialization finish time        : 1582690120377120
Overall finish time               : 1582690120836944
Total time with initialization    :          3505664
Total time without initialization :           459824

program exited with status 0
VFS: unmounting /dev
VFS: unmounting /proc
VFS: unmounting /
ROFS: spent 0.28 ms reading from disk
ROFS: read 84 512-byte blocks from disk
ROFS: allocated 81 512-byte blocks of cache memory
ROFS: hit ratio is 94.12%
Powering off.

Could it be that you run OSv in a nested virtualization setup?
...

Waldek Kozaczuk

unread,
Feb 25, 2020, 11:24:54 PM2/25/20
to OSv Development
Can you also tell us a bit more about your setup - machine (VM or bare metal), linux host (version), hyperthreading, version of QEMU, etc.

When you run on Linux, you mean on the host or on the Linux guest?

random
: virtio-rng registered <span s

twee...@comcast.net

unread,
Feb 26, 2020, 6:18:22 AM2/26/20
to OSv Development
I am running OSv on Ubuntu 18.04 through a VM, hyperthreading is enabled, and my qemu version is:

 /usr/bin/qemu-system-x86_64 --version
QEMU emulator version
2.11.1(Debian 1:2.11+dfsg-1ubuntu7.23)

Which that explains why I don't have qemu microvm.....mine says latest version and you say I need 4.1>=???

Waldek Kozaczuk

unread,
Feb 27, 2020, 12:17:31 AM2/27/20
to OSv Development
2.11 is not terribly old but since then QEMU has made big improvements affecting OSv boot time (even non-microvm machine). If you do not want to upgrade your machine to newer Ubuntu - 19.10 comes with QEMU 4.0) or build qemu from source, you may also consider running qemu and OSv on it in a Fedora 31 docker container (Fedora 31 comes with qemu 4.1). You can use this docker file as an example to create the corresponding image:

FROM fedora:31

RUN yum install -y git python3 qemu-system-x86 qemu-img

# - prepare directories
RUN mkdir /git-repos

# - clone OSv
WORKDIR /git-repos

CMD /bin/bash

Also, I am interested in how fast the same benchmark app boots on the firecracker (light KVM-based hypervisor) on your machine (scripts/firecracker.py will automatically pull firecracker binary). I would expect it to boot in 5-10 ms.

Lastly, could you tell us a bit more about the specs of the hardware you are running your Linux host on? My laptop is 7 years old macbook pro with i7 2.3GHz 4-core cpu.
Reply all
Reply to author
Forward
0 new messages