Make virtio-net.c ring size configurable?

Luke Gorrie

unread,

Feb 14, 2014, 8:43:14 AM2/14/14

to qemu-devel, snabb...@googlegroups.com

Howdy!

Observation: virtio-net.c hard-codes the vring size to 256 buffers.

Could this reasonably be made configurable, or would that be likely to cause a problem?

In Snabb Switch we are creating a 1:1 mapping between Virtio-net descriptors and VMDq hardware receive descriptors. The VMDq queues support 32768 buffers and I'd like to match this on the QEMU/Virtio-net side -- or at least come close.

Cheers!

-Luke

Mario Smarduch

unread,

Feb 14, 2014, 2:34:36 PM2/14/14

to Luke Gorrie, qemu-devel, snabb...@googlegroups.com

For PCI that seems to be hardcoded. For 'virtio-mmio' call to get QUEUE_NUM
checks if vring.num != 0 and returns VIRTQUEUE_MAX_SIZE (1024). Later the
guest writes VIRTIO_MMIO_QUEUE_NUM this new size (early on in probe) and
virtio_queue_set_num() adjusts the vring_desc, avail, .. values accordingly.
The PCI variant doesn't support write to VIRTIO_PCI_QUEUE_NUM.

You might be able to try something like that adjusting max value.

- Mario

Stefan Hajnoczi

unread,

Feb 24, 2014, 10:20:04 AM2/24/14

to Luke Gorrie, qemu-devel, snabb...@googlegroups.com

In reality virtio-net can use many more buffers because it has the
VIRTIO_RING_F_INDIRECT_DESC feature. Each descriptor can point to a
whole new descriptor table.

Do you want the 1:1 mapping to achieve best performance or just to
simplify the coding?

Since vhost_net does many Gbit/s I doubt the ring size is a limiting
factor although there are still periodic discussions about tweaking the
direct vs indirect descriptor heuristic.

Stefan

Luke Gorrie

unread,

Feb 24, 2014, 11:14:04 AM2/24/14

to snabb...@googlegroups.com, qemu-devel

On 24 February 2014 16:20, Stefan Hajnoczi <stef...@gmail.com> wrote:

Do you want the 1:1 mapping to achieve best performance or just to

simplify the coding?

We want to keep the real-time constraints on the data plane comfortable.

The question I ask myself is: How long can I buffer packets during processing before something is dropped?

256 buffers can be consumed in 17 microseconds on a 10G interface. That's uncomfortably tight for me. I would like every buffer in the data path to be dimensioned for at least 100us of traffic - ideally more like 1ms. That gives us more flexibility for scheduling work, handling configuration changes, etc. So I'd love to have the guest know to keep us fed with e.g. 32768 buffers at all times.

Our data plane is batch-oriented and deals with "breaths" of 100+ packets at a time. So we're a bit more hungry for buffers than a data plane that's optimized for minimum latency instead.

What do you think? Can I reliably get the buffers I want with VIRTIO_RING_F_INDIRECT_DESC or should I increase the vring size?

Since vhost_net does many Gbit/s I doubt the ring size is a limiting
factor although there are still periodic discussions about tweaking the
direct vs indirect descriptor heuristic.

FWIW the workloads I'm focused on are high rates of small packets as seen by a switch/router/firewall/etc devices. I've found that it's possible to struggle with these workloads even when getting solid performance on e.g. TCP bulk transfer with TSO. So I'm prepared for the possibility that what works well for others may well not work well for our application.

Luke Gorrie

unread,

Feb 24, 2014, 2:16:45 PM2/24/14

to snabb...@googlegroups.com, qemu-devel

On 24 February 2014 16:20, Stefan Hajnoczi <stef...@gmail.com> wrote:

On Fri, Feb 14, 2014 at 02:43:14PM +0100, Luke Gorrie wrote:
> In Snabb Switch we are creating a 1:1 mapping between Virtio-net
> descriptors and VMDq hardware receive descriptors. The VMDq queues support
> 32768 buffers and I'd like to match this on the QEMU/Virtio-net side -- or
> at least come close.

[...]

Do you want the 1:1 mapping to achieve best performance or just to
simplify the coding?

More background:

The 1:1 mapping between hardware RX descriptors and Virtio-net descriptors is for best performance, specifically for zero-copy operation. We want the NIC to DMA the packets directly into guest memory and that's why we need to pre-populate the NIC descriptor lists with suitable memory obtained from the guest via the Virtio-net avail ring.

Stefan Hajnoczi

unread,

Feb 27, 2014, 9:17:44 AM2/27/14

to Michael S. Tsirkin, snabb...@googlegroups.com, qemu-devel, Luke Gorrie

On Mon, Feb 24, 2014 at 05:14:04PM +0100, Luke Gorrie wrote:
> On 24 February 2014 16:20, Stefan Hajnoczi <stef...@gmail.com> wrote:
>
> > Do you want the 1:1 mapping to achieve best performance or just to
> > simplify the coding?
> >
>
> We want to keep the real-time constraints on the data plane comfortable.
>
> The question I ask myself is: How long can I buffer packets during
> processing before something is dropped?
>
> 256 buffers can be consumed in 17 microseconds on a 10G interface.

This is a good point. The virtio-net vring is too small at 256 buffers
for workloads that want to send/receive small packets at 10 Gbit/s line
rate. (Minimum UDP packet size is 52 bytes!)

Michael: Luke has asked to increase the virtio-net virtqueue size.
Thoughts?

Stefan

Michael S. Tsirkin

unread,

Feb 27, 2014, 9:49:49 AM2/27/14

to Stefan Hajnoczi, snabb...@googlegroups.com, qemu-devel, Luke Gorrie

Heh you want to increase the bufferbloat?
Each buffer pointer takes up 16 bytes so we are using order-2
allocations as it is, anything more and it'll start to fail
if hotplug happens long after boot.

AFAIK baremetal does not push line rate with 1 byte payload
either.

--
MST

Luke Gorrie

unread,

Feb 28, 2014, 3:02:10 AM2/28/14

to snabb...@googlegroups.com, Stefan Hajnoczi, qemu-devel

On 27 February 2014 15:49, Michael S. Tsirkin <m...@redhat.com> wrote:

> Michael: Luke has asked to increase the virtio-net virtqueue size.
> Thoughts?
>
> Stefan

Heh you want to increase the bufferbloat?

I'm sensitive to this. (I have actually built a commercial anti-bufferbloat network device for ISPs in the recent past.) I will go to great lengths to keep latency below 1 millisecond but beyond that I'm more flexible.

Each buffer pointer takes up 16 bytes so we are using order-2
allocations as it is, anything more and it'll start to fail
if hotplug happens long after boot.

(Sorry I don't have the background to understand this issue.)

AFAIK baremetal does not push line rate with 1 byte payload
either.

To me it feels normal to do this in the commercial networking industry. Many networking vendors will sell you a NIC with a software interface to drive it at line rate from userspace: Intel, Myricom, SolarFlare, Chelsio, Mellanox. They really work. Lots of high-end commercial network devices are built on these simple and cheap components.

Here's one detailed performance test that Luca Deri did based on standard Intel CPU and NIC and all packet sizes: http://www.ntop.org/wp-content/uploads/2012/04/DNA_ip_forward_RFC2544.pdf

For my project now I need to drive 6x10G ports worth of network traffic through Virtio-net to KVM guests. That's the ballpark of what ISPs I'm talking with require to be able to use Virtio-net instead of SR-IOV+Passthrough. They really want to use Virtio-net for a variety of reasons and the only barrier is performance for router-like workloads.

I'm working on Deutsche Telekom's TeraStream project [1] [2] and success will mean that Virtio-net drives all internet traffic for national ISPs. That would be really cool imo :-).

[1] TeraStream blurb http://blog.ipspace.net/2013/11/deutsche-telekom-terastream-designed.html

[2] TeraStream talk http://ripe67.ripe.net/archives/video/3/

xch...@gmail.com

unread,

May 12, 2016, 1:38:06 PM5/12/16

to Snabb Switch development, qemu-...@nongnu.org

Luke, I might have a similar problem... I am wondering if you end up increasing the ring buffer size yourself.

My problem is on the tx side. When sending many small udp packets, I am seeing "outgoing packets dropped" in "netstat -s" increase quickly. Increasing txqueue of the interface and wmem size in sysctl doesn't seem to help at all. tx ring size is what I am looking at now. My VM however is connected to a bridge and then to OpenVSwitch - so I might have other bottlenecks...

Thanks!

Reply all

Reply to author

Forward