Virtio-net poor parallel latency

66 views
Skip to first unread message

Darren L

unread,
Mar 13, 2024, 3:13:58 PMMar 13
to OSv Development
Hello!

I was wondering if I could get any pointers on why I am receiving significant latency issues using the virtio-net driver when processing multiple parallel clients. Hopefully I can explain my issue enough to be replicated.

Testing environment:
- Comparison: Ubuntu Server (Linux) VM and OSv (used the option "-nv" in the run.py script for tap networking)
- In common: 4 CPU cores, 4GB of RAM, QEMU KVM, used "taskset" to pin to the same cores
- Program: java-httpserver program from the apps directory, java8
- What was sent: data of varying sizes (1KB to 1MB, 4MB, 8MB...) on the same machine to the VMs

Observations:
- With single-threaded requests and low data sizes, I was able to measure a latency on OSv that is lower than the Linux VM latency
    - example: for 32KB I measured ~4ms for OSv and 9.8ms for the Linux VM
- At high data sizes (256KB+), OSv started to measure a higher latency than the Linux VM
- When I sent multiple requests at the same time, OSv suffered a much larger average latency penalty
    - example, at 1MB data size and 16 parallel requests, average latency was:
        - OSv: 120ms (min-max 14-225ms, std: 62ms)
        - Linux VM: 82ms (min-max 24-144ms, std: 34ms) for the Linux VM

Other notes:
- I've been using the OSv profiling tools and have seen that the hot spots typically were in virtio::virtio_driver::wait_for_queue and virtio::net::receiver, but I was unable to identify the exact issue on why this latency is the case
- I also noticed when tracing the network layer (https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#tracing-network-layer), there were a lot of net_packet_handling lines; about as much as there were net_packet_in lines for 1MB, which might indicate that the packets are not being processed fast enough and are delayed because it is put in a queue?

Hope this is clear enough! I am hoping to understand whether I am misconfiguring OSv or something similar to figure out why this latency difference is occurring. Thank you for the help in advance, and happy to provide any more information as needed.

Dor Laor

unread,
Mar 13, 2024, 5:23:53 PMMar 13
to Darren L, OSv Development
Lots of good details. It's not simple to figure out what's the issue since 
you have hypervisor, host, OS and JVM variables. 

How many threads does the host have? Make sure there are enough hardware threads for the 
guest, virtio on the host and the client. This way all OSv's runable threads will be schedulable. 
You can also measure the amount of vmexits and process scheduling on the host.
There is a chance the JVM is an issue too, can you do the same with netperf?


--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/0ace3980-5036-4df9-9e46-7396bb20ce9fn%40googlegroups.com.

Darren L

unread,
Mar 18, 2024, 2:28:58 PMMar 18
to OSv Development
Hello!

Thank you for the suggestions. My testing environment is a i9-13900H, which has 20 total threads, of which I am allocating the first 8 to OSv and the next 4 to the client. These 12 in-use threads exist on 6 hyperthreaded cores.

I wasn't sure how to measure the vmexits and/or process scheduling on the host.

I didn't see netperf on recent versions of OSv in the /tools, it seems to have existed in OSv v0.5 then afterwards was removed. I did run a similar benchmark on Python 3.10 and ran the same parallel tests, and I received much lower latency numbers compared to the Java 8 version. In this case, I received latency numbers of 45ms (min-max 18-63, std: 12) compared to a possibly confusing measurement for Linux VM of 837ms (min-max 39-1903, std: 817). If this does suggest that the JVM is the issue, what steps should I take to debug this problem? The application I am using must use Java 8 to run; it cannot be run on any other platform.

I know these are not the exact details you requested, but I am more than happy to learn how I can capture the other details, if necessary. Thank you!

Dor Laor

unread,
Mar 18, 2024, 5:54:18 PMMar 18
to Darren L, OSv Development
On Mon, Mar 18, 2024 at 8:29 PM Darren L <lucern...@gmail.com> wrote:
Hello!

Thank you for the suggestions. My testing environment is a i9-13900H, which has 20 total threads, of which I am allocating the first 8 to OSv and the next 4 to the client. These 12 in-use threads exist on 6 hyperthreaded cores.

I wasn't sure how to measure the vmexits and/or process scheduling on the host.

Guest/hypervisor efficiency is many times a function of how many times the guest
exits to the host. Lower is better
 

I didn't see netperf on recent versions of OSv in the /tools, it seems to have existed in OSv v0.5 then afterwards was removed. I did run a similar benchmark on Python 3.10 and ran the same parallel tests, and I received much lower latency numbers compared to the Java 8 version. In this case, I received latency numbers of 45ms (min-max 18-63, std: 12) compared to a possibly confusing measurement for Linux VM of 837ms (min-max 39-1903, std: 817). If this does suggest that the JVM is the issue, what steps should I take to debug this problem? The application I am using must use Java 8 to run; it cannot be run on any other platform.

Eliminate Java is one option. Another is to use a recent JVM (17) and ZGC and hopefully there wouldn't
be GC events (not sure it's a real issue here)
 

Waldek Kozaczuk

unread,
Mar 21, 2024, 11:45:58 AMMar 21
to OSv Development
Hi,

I would add that the java-httpserver is NOT the best representative of Java http server examples. It uses the internal JDK HTTP server - com.sun.net.httpserver.HttpServer. I suggest you try other java example like jetty, tomcat, akka-http (scala app on JVM).

In my experience Java networking apps performed quite well compared to Linux guest (please see slide 23 of https://www.p99conf.io/session/osv-unikernel-optimizing-guest-os-to-run-stateless-and-serverless-apps-in-the-cloud/).

Regards,
Waldek 
Reply all
Reply to author
Forward
0 new messages