virtio and offloading

1,403 views
Skip to first unread message

Javier Guerra Giraldez

unread,
Nov 30, 2014, 5:51:24 AM11/30/14
to snabb...@googlegroups.com
Hi all,

I need some help from the virtio gurus...

i've been trying to make the checksum offloading capabilities of the
Intel NIC available to NFV setups. all the tests done with made-up
packets do work, but there's at least a couple of issues when trying
to get it meshed with the virtio code.

first about negotiation:

it seems that i only need to add C.VIRTIO_NET_F_CSUM to the
supported_features variable defined in lib/virtio/net_device.lua,
right?

when i do that, the output of "ethtool -k eth0" from within the VM is
as follows:

Features for eth0:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
tcp-segmentation-offload: off
tx-tcp-segmentation: off [fixed]
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]


i don't know if that's enough, there are still too many "off [fixed]"
lines, so the original test_checksum function of lib/nfv/selftest.sh
still fails, as it checks for rx-checksumming, tx-checksumming,
tx-checksum-ipv4 and rx-checksum-ipv6 to be all 'on'

regardless all those 'off [fixed]', the iperf traffic i get does have
the C.PACKET_NEEDS_CSUM field of packet.info.flags set, so the driver
is relegating the checksum to the 'hardware'.

Q1:
maybe the "tx-checksumming: on" and "tx-checksum-ip-generic: on" are
enough even if the more specific tx-checksum-ipv4 and tx-checksum-ipv6
are off?

Q2:
rx-checksumming is off, but I can set the C.PACKET_CSUM_VALID field of
packet.info.flags if the NIC has already checked the checksums anyway.
Would the virtio driver accept it and skip checking incoming packets
again?



Second, kernel driver complains:

even while small traffic (ping) does work, bigger loads (iperf) die
with a message like "virtio_net virtio0: output.0:id 4294967295 out of
range" that seems to be generated from the virtqueue_get_buf()
function in drivers/virtio/virtio_ring.c
(http://lxr.free-electrons.com/source/drivers/virtio/virtio_ring.c#L552)
it looks like there's an '.id' field of some vring structure that
gets an absurd number

does that mean i'm putting corrupted data in the virtio ring?

the checksum engines in the Intel chip are managed by inserting some
non-packet entries in the TX ring. initially, things were a little
confusing in the packet-releasing code until i managed to make it
ignore those entries. maybe some of this non-packet is leaking to the
virtio code?

Q3:
what does that "id" field means? it comes from
"vq->vring.used->ring[last_used].id". is that a private structure of
the virtio driver? or does it get from the vhost part in SnS?

Q4:
is there any way to know if this bug is happening at the sending or
receiving path?

Q4:
some debugging tips... at which points of the Lua virtio code could I
check if the data flowing to/from the VM is sane?

--
Javier

Luke Gorrie

unread,
Nov 30, 2014, 9:39:41 AM11/30/14
to snabb...@googlegroups.com
Howdy Javier,

Great hacking! Getting offloads working from Virtio<->Hardware will really benefit VMs that terminate terminate traffic locally (as opposed to the packet forwarding router-like VMs we have been optimizing for already).

On 30 November 2014 at 11:51, Javier Guerra Giraldez <jav...@snabb.co> wrote:
it seems that i only need to add C.VIRTIO_NET_F_CSUM to the
supported_features variable defined in lib/virtio/net_device.lua,
right?
[...] 
regardless all those 'off [fixed]', the iperf traffic i get does have
the C.PACKET_NEEDS_CSUM field of packet.info.flags set, so the driver
is relegating the checksum to the 'hardware'.

If you are getting packets with the PACKET_NEEDS_CSUM flag then I reckon you're on the right track.
 

Q1:
maybe the "tx-checksumming: on" and "tx-checksum-ip-generic: on" are
enough even if the more specific tx-checksum-ipv4 and tx-checksum-ipv6
are off?

Since you are getting packets with the PACKET_NEEDS_CSUM flag set it does sound like the feature you are negotiating now is enough for this test case.
 
Q2:
rx-checksumming is off, but I can set the C.PACKET_CSUM_VALID field of
packet.info.flags if the NIC has already checked the checksums anyway.
Would the virtio driver accept it and skip checking incoming packets
again?

Looks that way to me for the Linux kernel driver (http://lxr.free-electrons.com/source/drivers/net/virtio_net.c#L485).

even while small traffic (ping) does work, bigger loads (iperf) die
with a message like "virtio_net virtio0: output.0:id 4294967295 out of
range" that seems to be generated from the virtqueue_get_buf()
function in drivers/virtio/virtio_ring.c
(http://lxr.free-electrons.com/source/drivers/virtio/virtio_ring.c#L552)
 it looks like there's an '.id' field of some vring structure that
gets an absurd number

does that mean i'm putting corrupted data in the virtio ring?

This looks like a problem with acknowledging back to the VM that a packet has been transmitted.

TL;DR Fix is: sed -i 's/int16_t header_id/uint16_t header_id/' src/core/packet.h

Background:

Each time we free a buffer obtained from a virtual machine we make a callback like this:

  buffer.free() -> net_device.return_virtio_buffer() -> virtq.put_buffer()

to notify the virtual machine that we are done with that buffer. This is done by pushing the buffer's index onto the "used ring" shared memory area.

The error in the guest is saying that it received a bad index on its used ring. The index should be between 0 and 65535 but instead it got 4,294,967,295 i.e. the uint32 representation of -1.

The root problem is that our vhost-user code has a bug when freeing multi-iovec packets transmitted by the VM. The reason we didn't see this before is that Linux guests have a strange behaviour that they don't send multi-iovec packets unless the offloads are enabled. So when you enabled offloads it caused the Linux guest to behave in a different way that exposed a bug in Snabb Switch.

The fix is simple: in src/core/packet.h the field buffer_origin.info.header_id should be unsigned (replace int16_t with uint16_t). This way net_device:return_virtio_buffer() will correctly detect the value 'invalid_header_id' (0xFFFF) instead of being tricked by -1.

(This probably answers all of your questions but potentially redundant inline answers below:)

Q3:
what does that "id" field means?  it comes from
"vq->vring.used->ring[last_used].id".  is that a private structure of
the virtio driver? or does it get from the vhost part in SnS?


This is a shared memory structure that the vhost code uses to communicate when buffers have been used (DMA completed).

Q4:
is there any way to know if this bug is happening at the sending or
receiving path?

This was on the VM->network transmit path. I see this from "output.0" in the error message in the guest kernel i.e. the error relates to output ring #0 of the guest.
 

Q4:
some debugging tips... at which points of the Lua virtio code could I
check if the data flowing to/from the VM is sane?

The main areas are lib.virt.virtq (I added an assert in put_buffer() there) or lib.virtio.net_device.

You can also debug compile the virtio_net driver inside the virtual machine. That is really helpful though I'm not sure what the most convenient way to do that within our test framework is.

Hope that helps!
-Luke


Javier Guerra Giraldez

unread,
Nov 30, 2014, 11:43:41 PM11/30/14
to snabb...@googlegroups.com
On 30 November 2014 at 09:39, Luke Gorrie <lu...@snabb.co> wrote:
> TL;DR Fix is: sed -i 's/int16_t header_id/uint16_t header_id/'
> src/core/packet.h

yes! that was it!


> The root problem is that our vhost-user code has a bug when freeing
> multi-iovec packets transmitted by the VM. The reason we didn't see this
> before is that Linux guests have a strange behaviour that they don't send
> multi-iovec packets unless the offloads are enabled. So when you enabled
> offloads it caused the Linux guest to behave in a different way that exposed
> a bug in Snabb Switch.


the 'ethtool -k eth0' output includes these lines:

scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on

but only when supported_features includes C.VIRTIO_NET_F_CSUM.

i've updated the PR, now on to TSO...

--
Javier
Reply all
Reply to author
Forward
0 new messages