OSv feedback, TCP performance and profiling?

180 views
Skip to first unread message

Michael Clark

unread,
Jun 17, 2014, 10:31:01 AM6/17/14
to osv...@googlegroups.com
Hi All,

I'm working on a C++11 httpd with a slightly different architecture to nginx and I have recently tried it in OSv.

Forgive me as I am new to OSv, and I understand it is a work in progress.

In any case, here is some background on what I'm doing and some feedback on OSv.

Firstly some background. I have an m:n threaded server design (no forks) and i'm using socketpair AF_LOCAL SOCK_DGRAM sockets and sendmsg/revcmsg kqueue/epoll to message between threads (wake up thread pool poll event loops to accept inbound connections, and to pass connections between thread pools of varying widths). This approach has an advantage of a single TLB versus using forked processes in nginx. nginx is fine assuming the scheduler keeps the processes on the same core (doesn't trash TLB). I have better performance than nginx in specific scenarios. I have not aggressively optimized yet and the server is still a work in progress (and not yet published). My primary goal is simple C++11 code using modern patterns. e.g. no exceptions and using pair<size_t,io_error>, avoiding nullptr, etc (modern golang type patterns in C++11).

I thought I would give OSv a try as I like the concept and understand the benefit of the libOS design (remove servo-loop). In any case during my port, the first issue I encountered was the lack of SOCK_DGRAM. I was able to change to SOCK_STREAM, so solved I this problem. The reason I am using AF_LOCAL/SOCK_DGRAM sockets is primarily for wakeups (as threads are sitting in poll/epoll_wait/kevent) and eventually so I can use SO_PASSCRED to send file descriptors into sandboxes (for a zerocopy alternative to FastCGI/SCGI), however this design would be radically different with OSv. My server design also allows different states within the HTTP state machine to be assigned to different sized thread pools, however in this case I have the server configured to load balance connections over 4 threads that handle all states (less re-scheduling). The design approach is to allow requests that block to be assigned to larger thread pools or be suspended. In anycase back to OSv.

Here are some benchmarks with a Realtek RTL8111/8168/8411 on the core-i7 975 server and a core-i7 990X client with Marvell 88E8056 both running linux 3.14.1-amd64. I have old hardware. I wish I had SR-IOV system or a spare NIC and I could test using PCI passthrough. At the moment it is a bridge setup on the server.

# nginx (4 processes) running linux-3.14-1-amd64
mclark@liquid:~/src/wrk$ ./wrk -t8 -c5000 -d10s http://192.168.0.8:80/index.html
  8 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    30.38ms  137.95ms   8.46s    95.74%
    Req/Sec    14.62k     2.30k   28.22k    77.28%
  1151107 requests in 9.99s, 0.91GB read
  Socket errors: connect 0, read 14446, write 0, timeout 3244
Requests/sec: 115277.95
Transfer/sec:     93.33MB

# c++11 httpd (4 threads) running linux-3.14-1-amd64
mclark@liquid:~/src/wrk$ ./wrk -t8 -c5000 -d10s http://192.168.0.8:8080/index.html
  8 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    33.73ms  127.36ms   4.04s    95.79%
    Req/Sec    15.18k     3.17k   32.57k    74.83%
  1196463 requests in 9.99s, 0.90GB read
  Socket errors: connect 0, read 0, write 0, timeout 3335
Requests/sec: 119790.26
Transfer/sec:     92.65MB

# c++11 httpd (4 threads) running in OSv with 4 CPUs
# sudo ./scripts/run.py -nv -c 4 -b br0 -e http_server.so
# compiled with g++-4.8 -c -fPIC -shared -pthread -g -O3 -std=c++11 -static-libstdc++
mclark@liquid:~/src/wrk$ ./wrk -t8 -c5000 -d10s http://192.168.0.12:8080/index.html
  8 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   119.70ms  179.72ms   3.23s    87.79%
    Req/Sec     3.46k     1.87k    8.34k    63.06%
  294862 requests in 9.99s, 228.68MB read
  Socket errors: connect 0, read 0, write 0, timeout 6550
Requests/sec:  29516.47
Transfer/sec:     22.89MB

This is the OSv startup log for my server

OSv v0.09-135-g1fefc90
4 CPUs detected
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
RAM disk at 0x0xffff80003ee46030 (4096K bytes)
net: initializing - done
eth0: ethernet address: 52:54:0:12:34:56
virtio-blk: Add blk device instances 0 as vblk0, devsize=10842275840
random: <Software, Yarrow> initialized
VFS: mounting zfs at /zfs
zfs: mounting osv/zfs from device /dev/vblk0.1
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
BSD shrinker: event handler list found: 0xffffa0003ed90d80
BSD shrinker found: 1
BSD shrinker: unlocked, running
[I/39 dhcp]: Waiting for IP...
random: unblocking device.
[I/212 dhcp]: Server acknowledged IP for interface eth0
eth0: 192.168.0.12
[I/212 dhcp]: Configuring eth0: ip 192.168.0.12 subnet mask 255.255.255.0 gateway 192.168.0.1 MTU 1500
DEBUG: OSv http_server
INFO: listening on: 0.0.0.0:8080
ERROR: setsockopt(SO_RCVBUF) failed: Not a socket
ERROR: setsockopt(SO_SNDBUF) failed: Not a socket
DEBUG: listener,dispatcher,keepalive,worker,linger:0: started
ERROR: setsockopt(SO_RCVBUF) failed: Not a socket
ERROR: setsockopt(SO_SNDBUF) failed: Not a socket
DEBUG: listener,dispatcher,keepalive,worker,linger:0: started
ERROR: setsockopt(SO_RCVBUF) failed: Not a socket
ERROR: setsockopt(SO_SNDBUF) failed: Not a socket
DEBUG: listener,dispatcher,keepalive,worker,linger:0: started
ERROR: setsockopt(SO_RCVBUF) failed: Not a socket
ERROR: setsockopt(SO_SNDBUF) failed: Not a socket
DEBUG: listener,dispatcher,keepalive,worker,linger:0: started

I was successfully using gdb with OSv thanks to the good documentation and that helped me solve a few issues. At this point I just have some observations:

* We are making lots of new connections so it is slightly different to the memcached case; a good torture test perhaps ;-)
* At one point I ran out of memory in new [], however instead of seeing a crash or exit, the server hung (this could be a bug in my code).
* I noticed epoll was implemented using poll so I am running my poll implementation (I have poll, epoll and kqueue, not using libevent as it is all in C++11)
* I am using std::thread with 4 threads, however I seem to get the same performance as 1 thread  whether I start OSv with 1 CPU or 4 CPUs
* C++11 <thread> thread.get_id() is returning 0 hence the 0 in the output
* Is there any way to run a profiler in OSV i.e. link in gpertools?
* setsockopt TCP_CORK (or TCP_NOPUSH in BSD API), SO_RCVBUF, SO_SNDBUF seem to be missing. I used TCP_CORK to combine headers and the first part of the response body in a single packet. I could perhaps fix this with smarter buffer handling on my side.
* SOCK_DGRAM is missing. I changed my code to send/recv and SOCK_STREAM so I didn't see whether the sendmsg/recvmsg iovec methods were implemented. As I mentioned this design approach was to enable passing file descriptors between process sandboxes and would need a bit of a rethink for OSv
* where do I look to find the new way to implement a TCP server i.e. I assuming it is not via poll? i saw it mentioned in the slide deck... any pointers?

I wouldn't put a high priority on any of these issues as the C++11 httpd is still very early days, however I thought some of the feedback might be useful. I am willing to perform further tests... and can perhaps share a private fork with apps/httpc1x with someone (my code is not quite at a publishable level yet).

Thanks and Regards,
Michael.

Pekka Enberg

unread,
Jun 17, 2014, 11:07:11 AM6/17/14
to Michael Clark, Osv Dev, Tomasz Grabiec, Vladislav Zolotarov, Nadav Har'El
Hi Michael,
Thanks for reporting these on the list. Very interesting results!

We have a lock contention issue in the TCP/IP stack with routing and
ARP entries and issue in TX throughput that we're currently working
on. You might be affected by those.

BTW, if you are able to dig in little bit deeper, you can use
"virt-stat" to produce an overall picture of the workload for both
Linux and OSv:

https://github.com/penberg/virt-stat

You can then also try out the built-in sampling profiler in OSv:

https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#cpu-sampler

If it's a lock contention issue, you can use the lock tracing infrastructure:

https://github.com/cloudius-systems/osv/wiki/Debugging-Excessive-and-Contended-Mutex-Locks

Pekka

Michael Clark

unread,
Jun 17, 2014, 11:27:35 AM6/17/14
to osv...@googlegroups.com, michae...@mac.com, tgra...@cloudius-systems.com, vl...@cloudius-systems.com, n...@cloudius-systems.com


On Tuesday, 17 June 2014 23:07:11 UTC+8, Pekka Enberg wrote:

Thanks for reporting these on the list. Very interesting results!

We have a lock contention issue in the TCP/IP stack with routing and
ARP entries and issue in TX throughput that we're currently working
on. You might be affected by those.

BTW, if you are able to dig in little bit deeper, you can use
"virt-stat" to produce an overall picture of the workload for both
Linux and OSv:

https://github.com/penberg/virt-stat

You can then also try out the built-in sampling profiler in OSv:

https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py#cpu-sampler

If it's a lock contention issue, you can use the lock tracing infrastructure:

https://github.com/cloudius-systems/osv/wiki/Debugging-Excessive-and-Contended-Mutex-Locks

Hi Pekka,

Thanks for the info. It's getting close to midnight here in Singapore. I'll take a look at virt-stat, cpu-sampler and contented lock debugging when I get time over the next few days (or weekend) and report back.

I do have a mutex in one place for dispatching messages when the socketpair buffer is full, and in another for the pre-allocated connection pool. I'm still iterating on my design. I would like to use the socketpair purely for thread wakeup in poll, and implement a multiple producer single consumer lockless queue for each thread (for inter-thread IPC) using the new C++11 <atomic> stuff... it is still a work in progress... I also have redundant calls to epoll_ctl on the linux implementation that I can cut down...

Regards,
Michael

Dor Laor

unread,
Jun 18, 2014, 3:41:03 AM6/18/14
to Michael Clark, Osv Dev
On Tue, Jun 17, 2014 at 5:31 PM, Michael Clark <michae...@mac.com> wrote:
Do the httpd on Linux run on the host? Just for comparison of apples to apples - can you please test it inside
a linux guest.
However we do need to complete the SOCK_DGRAM and the rest.
You're welcome to open github issues for it and even help to enable it (it was disabled when
we first ported the FreeBSD tcp stack to OSv, you can find it using git blame).
 

--
You received this message because you are subscribed to the Google Groups "OSv Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to osv-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Clark

unread,
Jun 18, 2014, 11:09:43 PM6/18/14
to osv...@googlegroups.com, michae...@mac.com
Hi,

Yes I don't think it's an Apple's to Apple's test. From what I understand about OSv, there is potential for OSv to outperform the host, especially if we were to map hardware NICs into the OSv VM. OSv has several advantages long-term when the network stack has been tuned as there is no TLB trashing and essentially no kernel mode context switch for network syscalls (e.g. MMU programming and cache flushes), if I understand correctly (please correct me if I am wrong) i.e. everything is running in a single address space and the only remaining servo-loop is the BSD sockets API. I am aware it is a work in progress and I think you guys have done great work. No blame anywhere. I was surprised how easy it was to port my C++11 code to a new OS architecture so quickly.

I think an Apple's to Kiwis comparison would be to use PCI passthrough or a virtual function on an SR-IOV NIC. I may need to get some hardware... e.g. a couple of Intel X540-T2's however I don't think my BIOS supports SR-IOV, and I am on a budget, so let me know if you have spare NICs and I'll PM you my postal address ;-). My hardware support VTd. The CPU on the target system is Nehalem. I might be better to swap around the client and server and run the http server and OSv on the Westmere system. Is there a tangible difference in VT support between Nehalem and Westmere that would make any difference to my tests? I also believe I am bandwidth limited. The server can do 160,000reqs/sec on loopback, similar to nginx...

This is the newer CPU Westmere client (running wrk)

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid

This is the older CPU Nehalem server (running KVM and OSv)

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid

Here are some tests with nginx and httpc1x in a Linux VM (4 CPUs pinned to 4 physical cores e.g. different core id in /proc/cpuinfo) on the exact same host for a kiwis to oranges comparison.

# nginx (4 processes) VM guest running linux-3.14-1-amd64 (host linux-3.14-1-amd64)
mclark@liquid:~/src/wrk$ ./wrk -t8 -c5000 -d10s http://192.168.0.18:80/index.html
  8 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    48.44ms  251.54ms   3.78s    98.52%
    Req/Sec    10.58k     4.07k   27.49k    65.99%
  851575 requests in 9.99s, 689.47MB read
  Socket errors: connect 0, read 5289, write 0, timeout 8650
Requests/sec:  85216.76
Transfer/sec:     68.99MB
# c++11 httpd (4 threads) VM guest running linux-3.14-1-amd64 (host linux-3.14-1-amd64)
mclark@liquid:~/src/wrk$ ./wrk -t8 -c5000 -d10s http://192.168.0.18:8080/index.html
  8 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   162.86ms  570.03ms   4.06s    94.96%
    Req/Sec    10.78k     4.16k   25.28k    66.55%
  886562 requests in 9.99s, 685.69MB read
  Socket errors: connect 0, read 0, write 0, timeout 3583
Requests/sec:  88758.97
Transfer/sec:     68.65MB

Pekka Enberg

unread,
Jun 19, 2014, 2:39:44 AM6/19/14
to Michael Clark, Osv Dev, Avi Kivity
On Tue, Jun 17, 2014 at 5:31 PM, Michael Clark <michae...@mac.com> wrote:
> * SOCK_DGRAM is missing. I changed my code to send/recv and SOCK_STREAM so I
> didn't see whether the sendmsg/recvmsg iovec methods were implemented. As I
> mentioned this design approach was to enable passing file descriptors
> between process sandboxes and would need a bit of a rethink for OSv

What kind of error are you seeing? Reading the code, SOCK_DGRAM seems
to be wired in parts of the TCP stack. Perhaps we missed something.

- Pekka

Michael Clark

unread,
Jun 19, 2014, 3:32:46 AM6/19/14
to osv...@googlegroups.com, michae...@mac.com, a...@cloudius-systems.com

I believe I hit this assertion (however I changed to SOCK_STREAM and am using send/recv instead of sendmsg/recvmsg).

osv/libc/af_local.cc:100

int socketpair_af_local(int type, int proto, int sv[2])
{
    assert(type == SOCK_STREAM);
    assert(proto == 0);
    pipe_buffer_ref b1{new pipe_buffer};
    pipe_buffer_ref b2{new pipe_buffer};
    try {
        fileref f1 = make_file<af_local>(b1, b2);
        fileref f2 = make_file<af_local>(std::move(b2), std::move(b1));
        fdesc fd1(f1);
        fdesc fd2(f2);
        // all went well, user owns descriptors now
        sv[0] = fd1.release();
        sv[1] = fd2.release();
        return 0;
    } catch (int error) {
        return libc_error(error);
    }
}

However I'm not exercising this code-path at the moment, as I during the benchmarks I had my server configured to not shift connections between threads. My concept is to get the HTTP connection state machine to shift connections to different threads depending on their poll behavior (and can work well despite availability of epoll/kqueue). The idea is eventually that handlers that do blocking things (like access a database) get handed to larger thread pools, or put to sleep, but for static content I have each thread handling all states (previous benchmarks). dispatcher does header processing (which you need to do before you know where to route the connection) e.g.

INFO: listening on: 0.0.0.0:8080
INFO: listening on: [::]:8886
INFO: listening on: 127.0.0.1:8887
DEBUG: listener,dispatcher,keepalive,worker,linger:0x7ffb68e4b700: started
DEBUG: listener,dispatcher,keepalive,worker,linger:0x7ffb67648700: started
DEBUG: listener,dispatcher,keepalive,worker,linger:0x7ffb6864a700: started
DEBUG: listener,dispatcher,keepalive,worker,linger:0x7ffb67e49700: started

Below is another configuration, that will bounce handlers into different threads, but has less performance for static content however is useful for dynamic content i.e. handlers that can block, or for sending file-descriptors into sandboxes (other processes) using SO_PASSCRED. This web server is still very much a work in progress. I don't have SO_PASSCRED sandboxes implemented yet and OSv sandboxes need a lot of re-thinking. I can't send a file descriptor into OSv for obvious reasons so I would need to proxy using virto and map a ringbuffer and inject interrupts. OSv is a git of a game-changer for my thinking...

INFO: listening on: 0.0.0.0:8080
INFO: listening on: [::]:8886
INFO: listening on: 127.0.0.1:8887
DEBUG:                               dispatcher:0x7f9855792700: started
DEBUG:                               dispatcher:0x7f9854f91700: started
DEBUG:                                   worker:0x7f9854790700: started
DEBUG:                                   worker:0x7f985378e700: started
DEBUG:                                   worker:0x7f9853f8f700: started
DEBUG:                                   worker:0x7f9852f8d700: started
DEBUG:                                keepalive:0x7f985278c700: started
DEBUG:                                   linger:0x7f9851f8b700: started
DEBUG:                                 listener:0x7f985178a700: started

I'm currently switching around my client and server. Running wrk on the Nahelam and httpc1x on the Westmere. It seems on the more powerful Westmere I am substantially faster than nginx in KVM most like due to the cost of MMU/TLB ops with a multi-process server like nginx vs a multi-threaded server... I will redo the OSv benchmark with OSv running on the more powerful Westmere. I also need to verify./scripts/run.py --vcpus 4 allocates physically separate cores?, or disable hyper-threading on my machine. I haven't had a chance to look at cpu_sampling or lock contention.

# nginx (4 processes) VM guest running linux-3.14-1-amd64 (Westmere host linux-3.14-1-amd64)
mclark@munty:~/src/wrk$ ./wrk -t8 -c5000 -d10s http://192.168.0.19:80/index.html
  8 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    33.40ms   99.72ms   7.02s    91.81%
    Req/Sec    11.08k     2.00k   20.65k    71.56%
  854693 requests in 10.00s, 692.01MB read
  Socket errors: connect 0, read 30317, write 0, timeout 7167
Requests/sec:  85485.49
Transfer/sec:     69.21MB
# c++11 httpd (4 threads) VM guest running linux-3.14-1-amd64 (Westmere host linux-3.14-1-amd64)
mclark@munty:~/src/wrk$ ./wrk -t8 -c5000 -d10s http://192.168.0.19:8080/index.html
  8 threads and 5000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    41.44ms  136.89ms   5.06s    89.71%
    Req/Sec    13.10k     1.88k   28.10k    77.39%
  1018554 requests in 10.00s, 787.78MB read
  Socket errors: connect 0, read 0, write 0, timeout 3774
Requests/sec: 101892.21
Transfer/sec:     78.81MB


Glauber Costa

unread,
Jun 19, 2014, 3:38:42 AM6/19/14
to Michael Clark, Osv Dev
Your analysis is spot on. We have delivered extremely good performance in some workloads already, which made us very happy. But some of the best architectural features will only be leveraged long term. We are actively working in some of them. We also lack maturity in some code paths and every one in a while we find out some very low numbers that we try to address as quick as we can =)
 
I think an Apple's to Kiwis comparison would be to use PCI passthrough or a virtual function on an SR-IOV NIC. I may need to get some hardware... e.g. a couple of Intel X540-T2's however I don't think my BIOS supports SR-IOV, and I am on a budget, so let me know if you have spare NICs and I'll PM you my postal address ;-). My hardware support VTd. The CPU on the target system is Nehalem. I might be better to swap around the client and server and run the http server and OSv on the Westmere system. Is there a tangible difference in VT support between Nehalem and Westmere that would make any difference to my tests? I also believe I am bandwidth limited. The server can do 160,000reqs/sec on loopback, similar to nginx...

The main problem with the NICs, is that we lack the drivers and support. This is certainly in our plans, but the human time to actually code it is harder to find than the actual hardware =) (we are all tackling other priorities ATM).

--

Nadav Har'El

unread,
Jun 20, 2014, 4:55:32 PM6/20/14
to Michael Clark, Osv Dev, Avi Kivity
On Thu, Jun 19, 2014 at 3:32 AM, Michael Clark <michae...@mac.com> wrote:
What kind of error are you seeing? Reading the code, SOCK_DGRAM seems
to be wired in parts of the TCP stack. Perhaps we missed something.

I believe I hit this assertion (however I changed to SOCK_STREAM and am using send/recv instead of sendmsg/recvmsg).

osv/libc/af_local.cc:100

int socketpair_af_local(int type, int proto, int sv[2])
{
    assert(type == SOCK_STREAM);

This is not the "network stack" per se, this is the AF_LOCAL ("unix domain socket") code, which we did not take from BSD but rather wrote on our own. If you're curious why, this was the time when the BSD network stack still wasn't working correctly in OSv, and we needed Unix-domain sockets quickly because Java used them for various things unrelated to any real networking. Also, we had such a hard time porting the BSD stuff, that Avi decided, quite rightly, that it was easy just to implement this part from scratch.

Adding a datagram unix-domain socket implementation should be pretty easy. I'll add a bug tracker issue for that.
 

Michael Clark

unread,
Jul 1, 2014, 5:30:59 AM7/1/14
to osv...@googlegroups.com, michae...@mac.com


On Thursday, 19 June 2014 15:38:42 UTC+8, Glauber Costa wrote:
Your analysis is spot on. We have delivered extremely good performance in some workloads already, which made us very happy. But some of the best architectural features will only be leveraged long term. We are actively working in some of them. We also lack maturity in some code paths and every one in a while we find out some very low numbers that we try to address as quick as we can =)
 
I think an Apple's to Kiwis comparison would be to use PCI passthrough or a virtual function on an SR-IOV NIC. I may need to get some hardware... e.g. a couple of Intel X540-T2's however I don't think my BIOS supports SR-IOV, and I am on a budget, so let me know if you have spare NICs and I'll PM you my postal address ;-). My hardware support VTd. The CPU on the target system is Nehalem. I might be better to swap around the client and server and run the http server and OSv on the Westmere system. Is there a tangible difference in VT support between Nehalem and Westmere that would make any difference to my tests? I also believe I am bandwidth limited. The server can do 160,000reqs/sec on loopback, similar to nginx...

The main problem with the NICs, is that we lack the drivers and support. This is certainly in our plans, but the human time to actually code it is harder to find than the actual hardware =) (we are all tackling other priorities ATM).

OK. Sorry for the lag. Multi-tasking. I am using vhost-net on my benchmark setup which is only a Marvell GigE on the server and a Realtek GigE on the client. I am toying with the idea of getting a couple of Intel X540-T2 10GbE cards on eBay, although I don't want to invest too much unless its worthwhile.

I read that ntop's PF_RING zero copy can support KVM with specific Intel drivers (and single copy with other drivers) without the need for PCI passthrough or special guest device drivers. Is there a strategy to use this kind of approach in OSv? e.g. map zerocopy ringbuffers from the host drivers to avoid individually porting drivers.

  http://www.ntop.org/pf_ring/introducing-pf_ring-zc-zero-copy/
  https://github.com/xtao/PF_RING/tree/master/drivers/PF_RING_aware/intel/ixgbe/ixgbe-3.21.2-zc

I notice ntop's PF_RING ZC has some proprietary components however the changes to ixgbe to support PF_RING must be GPL licensed, which means it wouldn't necessarily be infeasible to use other means to plumb the ring buffer from the network card to the guest using virtio? I'm not a linux or FreeBSD MM guru however I know the principles.

or is netmap a better approach (to avoid individually porting drivers) as netmap supports both Linux and FreeBSD and is not proprietary AFAIK:

  http://info.iet.unipi.it/~luigi/netmap/
  https://code.google.com/p/netmap/

Just curious about whether either of these are on the roadmap...

Pekka Enberg

unread,
Jul 2, 2014, 6:29:05 AM7/2/14
to Michael Clark, Osv Dev, Takuya ASADA
On Tue, Jul 1, 2014 at 12:30 PM, Michael Clark <michae...@mac.com> wrote:
> I read that ntop's PF_RING zero copy can support KVM with specific Intel
> drivers (and single copy with other drivers) without the need for PCI
> passthrough or special guest device drivers. Is there a strategy to use this
> kind of approach in OSv? e.g. map zerocopy ringbuffers from the host drivers
> to avoid individually porting drivers.
>
> http://www.ntop.org/pf_ring/introducing-pf_ring-zc-zero-copy/
>
> I notice ntop's PF_RING ZC has some proprietary components however the
> changes to ixgbe to support PF_RING must be GPL licensed, which means it
> wouldn't necessarily be infeasible to use other means to plumb the ring
> buffer from the network card to the guest using virtio? I'm not a linux or
> FreeBSD MM guru however I know the principles.
>
> or is netmap a better approach (to avoid individually porting drivers) as
> netmap supports both Linux and FreeBSD and is not proprietary AFAIK:
>
> http://info.iet.unipi.it/~luigi/netmap/
> https://code.google.com/p/netmap/
>
> Just curious about whether either of these are on the roadmap...

IIRC, Takuya was talking about netmap in the past. I'm not sure how
beneficial that is with OSv's network channels.

As for PF_RING_ZC, this is the first time I've heard about it. If it
requires no new guest side drivers, then it should be usable from OSv.

- Pekka
Reply all
Reply to author
Forward
0 new messages