SONiC VS performance

317 views
Skip to first unread message

Ben Jarvis

unread,
Jul 15, 2022, 11:39:57 AM7/15/22
to sonicproject
Hi all,

I am attempting to build a virtual environment using SONiC that mirrors our physical deployment.   To do this I'm building a SONiC VS 202205 image, but instead of running it in KVM I'm running it in VMWare vSphere.   The VM has multiple NICs connected to VM networks.   There are other VMs on those networks that want to communicate with each other via the SONiC VS switch.  

The VS build is using the baked in Force10 config which uses eth0 as the mgmt port, eth1 is bound to Ethernet0, eth2 bound to Ethernet4, and so on.   This seems to work.  I can generate a config_db.json, put a config on the VS switch that is similar to my config on real switches and exercise the entire environment in an analogous way to the physical deployment. Cool.

That said, network performance is very slow when going through the Ethernet# interfaces, as opposed to going through the raw ethX interfaces.   For example, suppose I just manually put a IP address and a route directly on eth6, which is connected to my "upstream" network:

# ifconfig eth6 10.67.25.19 netmask 255.255.255.0
# ip route add 0.0.0.0/0 via 10.67.25.1

Then I download a file with curl from a server on the other side of that link:

# curl --output iso http://anotherserver/ubuntu-20.04.4-live-server-amd64.iso
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 54 1270M   54  687M    0     0   183M      0  0:00:06  0:00:03  0:00:03  183M

I get a throughput of 183MiB/, essentially by bypassing the SONiC stack.

Then if I remove the IP from eth6 and instead put it on Ethernet20, which is bound to Eth6:

# ip addr del  10.67.25.19/24 dev eth6
# ip route del 0.0.0.0/0 via 10.67.25.1
# config interface ip add Ethernet20  10.67.25.19/24
# config route  add prefix vrf default 0.0.0.0/0 nexthop vrf default 10.67.25.1

Then I download the same file again:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0 1270M    0 6768k    0     0   432k      0  0:50:10  0:00:15  0:49:55  470k

I get only about 470 KiB/s, so drastically slower.  I can confirm that the packets are going out the eth6 interface in both cases with tcpdump.

I expected that the VS implementation would involve some compromises and not necessarily be super performant, but such an extreme difference seems suspicious.  It is as though something in the stack is intentionally or unintentionally injecting a lot of additional latency.

If anyone has any suggestions about how to investigate further, I would appreciate it.

Thanks,
Ben

Ben Jarvis

unread,
Sep 26, 2022, 4:19:39 PM9/26/22
to sonicproject
For posterity sake --

Disabling "generic receive offload" (GRO) on the veth interface improves performance for me by about 500x in this scenario.

E.g., ethtool --offload eth6 gro off

Above and beyond that, the code in sonic-sairedis that handles movement of packets between the veth and the tap interfaces could be made much more performant by using PACKET_MMAP or maybe just libpcap.  As is, it's making two or three system calls per packet copied when copying packets in the veth->tap direction.  I may try to make some enhancements there in the future.

Ben
Reply all
Reply to author
Forward
0 new messages