Golang
1 CPU
Requests/sec: 24313.06Requests/sec: 23874.74Requests/sec: 23300.26
2 CPUs
Requests/sec: 37089.26Requests/sec: 35475.22Requests/sec: 33581.87
4 CPUs
Requests/sec: 42747.11Requests/sec: 43057.99Requests/sec: 42346.27
Java
1 CPU
Requests/sec: 41049.41Requests/sec: 43622.81Requests/sec: 44777.60
2 CPUs
Requests/sec: 46245.95Requests/sec: 45746.48Requests/sec: 46224.42
4 CPUs
Requests/sec: 48128.33Requests/sec: 45467.53Requests/sec: 45776.45
Rust
1 CPU
Requests/sec: 43455.34Requests/sec: 43927.73Requests/sec: 41100.072 CPUs
Requests/sec: 49120.31Requests/sec: 49298.28Requests/sec: 48076.984 CPUs
Requests/sec: 51477.57Requests/sec: 51587.92Requests/sec: 49118.68
Golang
1 cpu
Requests/sec: 16721.56Requests/sec: 16422.33Requests/sec: 16540.24
2 cpus
Requests/sec: 28538.35Requests/sec: 26676.68Requests/sec: 28100.00
4 cpus
Requests/sec: 36448.57Requests/sec: 33808.45Requests/sec: 34383.20
Java
1 cpu
Requests/sec: 20191.95Requests/sec: 21384.60Requests/sec: 21705.82
2 cpus
Requests/sec: 40876.17Requests/sec: 40625.69Requests/sec: 43766.45
4 cpus
Requests/sec: 46336.07Requests/sec: 45933.35Requests/sec: 45467.22
Rust
1 cpu
Requests/sec: 23604.27Requests/sec: 23379.86Requests/sec: 23477.19
2 cpus
Requests/sec: 46973.84Requests/sec: 46590.41Requests/sec: 46128.15
4 cpus
Requests/sec: 49491.98Requests/sec: 50255.20Requests/sec: 50183.11
Golang
1 CPU
Requests/sec: 14498.02Requests/sec: 14373.21Requests/sec: 14213.61
2 CPU
Requests/sec: 28201.27Requests/sec: 28600.92Requests/sec: 28558.334 CPU
Requests/sec: 48983.83Requests/sec: 47590.97Requests/sec: 45758.82
Java
1 CPU
Requests/sec: 18217.58Requests/sec: 17709.30Requests/sec: 19829.012 CPU
Requests/sec: 33188.75Requests/sec: 33233.55Requests/sec: 36951.054 CPU
Requests/sec: 47718.13Requests/sec: 46456.51Requests/sec: 48408.99
Rust
Could not get same rust on Alpine linux that uses musl
Golang
1 CPU
Requests/sec: 24568.70Requests/sec: 24621.82Requests/sec: 24451.52
2 CPU
Requests/sec: 49366.54Requests/sec: 48510.87Requests/sec: 43809.97
4 CPU
Requests/sec: 53613.09Requests/sec: 53033.38Requests/sec: 51422.59
Java
1 CPU
Requests/sec: 40078.52Requests/sec: 43850.54Requests/sec: 44588.22
2 CPUs
Requests/sec: 48792.39Requests/sec: 51170.05Requests/sec: 52033.04
4 CPUs
Requests/sec: 51409.24Requests/sec: 52756.73Requests/sec: 47126.19
Rust
1 CPU
Requests/sec: 40220.04Requests/sec: 44601.38Requests/sec: 44419.06
2 CPUs
Requests/sec: 53420.56Requests/sec: 53490.33Requests/sec: 53320.99
4 CPUs
Requests/sec: 53892.23Requests/sec: 52814.93Requests/sec: 54050.13
[{"name":"Write presentation","completed":false,"due":"2019-03-23T15:30:40.579556117+00:00"},{"name":"Host meetup","completed":false,"due":"2019-03-23T15:30:40.579599959+00:00"},{"name":"Run tests","completed":false,"due":"2019-03-23T15:30:40.579600610+00:00"},{"name":"Stand in traffic","completed":false,"due":"2019-03-23T15:30:40.579601081+00:00"},{"name":"Learn Rust","completed":false,"due":"2019-03-23T15:30:40.579601548+00:00"}]-----------------------------------Running 30s test @ http://192.168.1.73:8080/todos 10 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.86ms 1.20ms 30.81ms 62.92% Req/Sec 5.42k 175.14 5.67k 87.71% 1622198 requests in 30.10s, 841.55MB readRequests/sec: 53892.23Transfer/sec: 27.96MB-----------------------------------Running 30s test @ http://192.168.1.73:8080/todos 10 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.90ms 1.19ms 8.98ms 58.18% Req/Sec 5.31k 324.18 5.66k 90.10% 1589778 requests in 30.10s, 824.73MB readRequests/sec: 52814.93Transfer/sec: 27.40MB-----------------------------------Running 30s test @ http://192.168.1.73:8080/todos 10 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.85ms 1.14ms 8.39ms 54.70% Req/Sec 5.44k 204.22 7.38k 92.12% 1626902 requests in 30.10s, 843.99MB readRequests/sec: 54050.13Transfer/sec: 28.04MB
Connecting to host 192.168.1.102, port 5201[ 5] local 192.168.1.98 port 65179 connected to 192.168.1.102 port 5201[ ID] Interval Transfer Bitrate[ 5] 0.00-1.00 sec 111 MBytes 930 Mbits/sec[ 5] 1.00-2.00 sec 111 MBytes 932 Mbits/sec[ 5] 2.00-3.00 sec 112 MBytes 938 Mbits/sec[ 5] 3.00-4.00 sec 112 MBytes 939 Mbits/sec[ 5] 4.00-5.00 sec 112 MBytes 940 Mbits/sec[ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec[ 5] 6.00-7.00 sec 112 MBytes 940 Mbits/sec[ 5] 7.00-8.00 sec 112 MBytes 940 Mbits/sec[ 5] 8.00-9.00 sec 112 MBytes 941 Mbits/sec[ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec[ 5] 10.00-11.00 sec 112 MBytes 939 Mbits/sec[ 5] 11.00-12.00 sec 112 MBytes 941 Mbits/sec[ 5] 12.00-13.00 sec 112 MBytes 941 Mbits/sec[ 5] 13.00-14.00 sec 112 MBytes 942 Mbits/sec[ 5] 14.00-15.00 sec 112 MBytes 941 Mbits/sec[ 5] 15.00-16.00 sec 111 MBytes 927 Mbits/sec[ 5] 16.00-17.00 sec 112 MBytes 941 Mbits/sec[ 5] 17.00-18.00 sec 112 MBytes 942 Mbits/sec[ 5] 18.00-19.00 sec 112 MBytes 941 Mbits/sec[ 5] 19.00-20.00 sec 112 MBytes 941 Mbits/sec[ 5] 20.00-21.00 sec 112 MBytes 936 Mbits/sec[ 5] 21.00-22.00 sec 112 MBytes 940 Mbits/sec[ 5] 22.00-23.00 sec 112 MBytes 941 Mbits/sec[ 5] 23.00-24.00 sec 112 MBytes 941 Mbits/sec[ 5] 24.00-25.00 sec 112 MBytes 941 Mbits/sec[ 5] 25.00-26.00 sec 112 MBytes 941 Mbits/sec[ 5] 26.00-27.00 sec 112 MBytes 940 Mbits/sec[ 5] 27.00-28.00 sec 112 MBytes 941 Mbits/sec[ 5] 28.00-29.00 sec 112 MBytes 940 Mbits/sec[ 5] 29.00-30.00 sec 112 MBytes 941 Mbits/sec- - - - - - - - - - - - - - - - - - - - - - - - -[ ID] Interval Transfer Bitrate[ 5] 0.00-30.00 sec 3.28 GBytes 939 Mbits/sec sender[ 5] 0.00-30.00 sec 3.28 GBytes 939 Mbits/sec receiver
iperf Done.
Golang
1 CPU
Requests/sec: 25188.60Requests/sec: 24664.43Requests/sec: 23935.77
2 CPUs
Requests/sec: 37118.95Requests/sec: 37108.96Requests/sec: 35997.58
4 CPUs
Requests/sec: 49987.20Requests/sec: 48710.74Requests/sec: 44789.96
Java
1 CPU
Requests/sec: 43648.02Requests/sec: 45457.98Requests/sec: 41818.132 CPUs
Requests/sec: 76224.39Requests/sec: 75734.63Requests/sec: 70597.354 CPUs
Requests/sec: 80543.30Requests/sec: 75187.46Requests/sec: 72986.93
Rust
1 CPU
Requests/sec: 42392.75Requests/sec: 39679.21Requests/sec: 37871.492 CPUs
Requests/sec: 82484.67Requests/sec: 83272.65Requests/sec: 71671.134 CPUs
Requests/sec: 95910.23Requests/sec: 86811.76Requests/sec: 83213.93
Golang
1 CPU
Requests/sec: 24191.63Requests/sec: 23574.89Requests/sec: 23716.332 CPUs
Requests/sec: 34889.01Requests/sec: 34487.01Requests/sec: 34468.034 CPUs
Requests/sec: 48850.24Requests/sec: 48690.09Requests/sec: 48356.66
Java
1 CPU
Requests/sec: 32267.09Requests/sec: 34670.41Requests/sec: 34828.682 CPUs
Requests/sec: 47533.94Requests/sec: 50734.05Requests/sec: 50203.984 CPUs
Requests/sec: 69644.61Requests/sec: 72704.40Requests/sec: 70805.84
Rust
1 CPU
Requests/sec: 37061.52Requests/sec: 36637.62Requests/sec: 33154.572 CPUs
Requests/sec: 51743.94Requests/sec: 51476.78Requests/sec: 50934.274 CPUs
Requests/sec: 75125.41Requests/sec: 74051.27Requests/sec: 74434.78
--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Last week I spent some time investigating OSv performance and comparing it to Docker and Linux guests.
The test setup looked like this:Host:
- MacBook Pro with Intel i7 4 cores CPU with hyperthreading (8 cpus reported by lscpu) with 16GB of RAM with Ubuntu 18.10 on it
- firecracker 0.15.0
- QEMU 2.12.0
Client machine:
- similar to the one above with wrk as a test client firing requests using 10 threads and 100 open connections for 30 seconds in 3 series one by one (please see this test script - https://github.com/wkozaczuk/unikernels-v-containers/blob/master/test-restapi-with-wrk.sh).
- wrk by default uses Keep-Alive for http connections so TCP handshake is minimal
The host and client machine were connected directly to 1 GBit ethernet switch and host exposed guest IP using a bridged TAP nic (please see the script used - https://raw.githubusercontent.com/cloudius-systems/osv/master/scripts/setup-external-bridge.sh).You can find scripts to start applications on OSv and docker here - https://github.com/wkozaczuk/unikernels-v-containers (run* scripts). Please note --cpu-set parameter used in docker script to limit number of CPUs.You can find detailed results under https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote.
Hello Waldek,The experiments are very interesting. I showed something similar at OSSumit'18 (see https://github.com/torokernel/papers/blob/master/OSSummit18.pdf). What I do not understand from your conclusions is why do you expect that OSv scales with the number of cores? Maybe I did not understand something.
While the performance numbers indicate something, a mac book is a horrible environment for performancetesting. There are effects of other desktop apps, hyperthreading, etc.
Also 1gbps network can be a bottle neck.
Every benchmark case should have a matching performanceanalysis and point to the bottleneck reason - cpu/networking/contect switching/locking/filesystem/..
Just hyperthread vs a different thread in another core is very significant change.Need to pin the qemu threads in the host to the right physical threads.
Better to run on a good physical server (like i3.metal on AWS or similar, could be smaller but not 2 cores) andtrack all the metrics appropriately. Best is to isolate workloads (and make sure they scale linearly too) in terms of cpu/mem/net/disk and only thenshow how a more complex workload performs.
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.
On Wed, Mar 27, 2019 at 10:48 AM Pekka Enberg <pen...@scylladb.com> wrote:Hi Waldek!On Wed, Mar 27, 2019 at 12:29 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:Last week I spent some time investigating OSv performance and comparing it to Docker and Linux guests.Nice!On Wed, Mar 27, 2019 at 12:29 AM Waldek Kozaczuk <jwkoz...@gmail.com> wrote:The test setup looked like this:Host:
- MacBook Pro with Intel i7 4 cores CPU with hyperthreading (8 cpus reported by lscpu) with 16GB of RAM with Ubuntu 18.10 on it
- firecracker 0.15.0
- QEMU 2.12.0
Client machine:
- similar to the one above with wrk as a test client firing requests using 10 threads and 100 open connections for 30 seconds in 3 series one by one (please see this test script - https://github.com/wkozaczuk/unikernels-v-containers/blob/master/test-restapi-with-wrk.sh).
- wrk by default uses Keep-Alive for http connections so TCP handshake is minimal
The host and client machine were connected directly to 1 GBit ethernet switch and host exposed guest IP using a bridged TAP nic (please see the script used - https://raw.githubusercontent.com/cloudius-systems/osv/master/scripts/setup-external-bridge.sh).You can find scripts to start applications on OSv and docker here - https://github.com/wkozaczuk/unikernels-v-containers (run* scripts). Please note --cpu-set parameter used in docker script to limit number of CPUs.You can find detailed results under https://github.com/wkozaczuk/unikernels-v-containers/tree/master/test_results/remote.Some questions about the evaluation setup and measurements:- Did you establish a baseline with bare metal configuration?
- Did you measure CPU utilization during the throughput tests? This is important because you could be hitting CPU limits with QEMU and Firecracker because of software processing needed by virtualized networking.
- Are the QEMU and Firecracker tests using virtio or vhost?
- Is Docker also configured to use the bridge device? If not, QEMU and Firecracker also have some additional overheads from the bridging.
- Is multiqueue enabled for QEMU and Firecracker? If not, this would limit the ability to leverage multiple vCPUs.
- Is QEMU or Firecracker setting CPU affinity for the vCPU threads? If not, two or more vCPUs could be running on the same physical CPU, which obviously limits throughput.
Oh forgot the obvious:- Is CPU scaling governor set to performance? Also, if the CPU has TurboBoost, is it disabled?
- Pekka
To unsubscribe from this group and stop receiving emails from it, send an email to osv...@googlegroups.com.
Overall I must say I am not a performance tuning/measuring expert and clearly have lots of things to learn ;-) BTW can you point me to any performance setup/procedures/docs that you guys used with OSv?
I also feel I have tried to kill too many birds with one stone. Ideally I should have divided whole thing into 3 categories:- OSv on firecracker vs QEMU- OSv vs Docker- OSv vs Linux guestOn Tuesday, March 26, 2019 at 8:32:00 PM UTC-4, דור לאור wrote:While the performance numbers indicate something, a mac book is a horrible environment for performancetesting. There are effects of other desktop apps, hyperthreading, etc.Well that is what I have available in my home lab :-) I understand you are suggesting that apps running on the MacBook might affect and skew the results. I made sure the only apps open was one or two terminal windows. I had also mpstat open and most of the time CPUs were idle when tests were not running. But I get your point that ideally I should use proper headless server machine. I also get the effect of hyper threading - is there a way to switch it off in Linux by some kind of boot parameter?
Also 1gbps network can be a bottle neck.Very likely, I have been suspecting same thing.Every benchmark case should have a matching performanceanalysis and point to the bottleneck reason - cpu/networking/contect switching/locking/filesystem/..To figure this out I guess I would need to use OSv tracing capability - https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py
Just hyperthread vs a different thread in another core is very significant change.Need to pin the qemu threads in the host to the right physical threads.I was not even aware that one can pin to specific CPUs. What parameters pass to qemu?
Better to run on a good physical server (like i3.metal on AWS or similar, could be smaller but not 2 cores) andtrack all the metrics appropriately. Best is to isolate workloads (and make sure they scale linearly too) in terms of cpu/mem/net/disk and only thenshow how a more complex workload performs.Cannot afford 5$ per hour ;-) Unless I have fully automated test suite.My dream would be to have an automated process I could trigger with a single click of a button that would:1) Use cloud formation template to create a VPC with all components of the test environment.2) Automatically start each instance under test and corresponding test client3) Automatically collect all test results (both wrk and possibly tracing data) and put them somewhere in S3.Finally If I had a suite of visualization tools that would generate whatever graphs I need to analyze. It would save soooooo much time. Possibly under hour => then I could pay 5 bucks for it ;-)But it takes time to build one ;-)
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
Some questions about the evaluation setup and measurements:- Did you establish a baseline with bare metal configuration?How would I create baseline with with bare metal configuration for 1, 2, 4 CPUs? With docker or qemu I can specify number of cpus.
- Did you measure CPU utilization during the throughput tests? This is important because you could be hitting CPU limits with QEMU and Firecracker because of software processing needed by virtualized networking.Nothing rigorous. I has mpstat running and I could see that during 1 and 2 cpu tests they were pretty highly utilized (80-90%) but only 40-50% for 4 cpu tests. But nothing I recorded.
- Are the QEMU and Firecracker tests using virtio or vhost?I thought OSv only support virtio. Sorry to be ignorant. I heard the terms but what is actually the difference between vhost and virtio?
- Is Docker also configured to use the bridge device? If not, QEMU and Firecracker also have some additional overheads from the bridging.I need to check. Per this - https://raw.githubusercontent.com/wkozaczuk/unikernels-v-containers/master/run-rest-in-docker.shI am sure - I would expose container port to the host. So I think I was bypassing the bridge.BTW is there a way to run OSv on QEMU without a bridge to make it visible on LAN?
- Is multiqueue enabled for QEMU and Firecracker? If not, this would limit the ability to leverage multiple vCPUs.No idea what you are talking about ;-)