BBR shows lower throughput in 10Gb LAN than Reno

347 views
Skip to first unread message

Clark Mi

unread,
Feb 19, 2019, 9:18:46 PM2/19/19
to BBR Development
Hi,

I have done a iperf3 test to compare the BBR vs Reno throughput in a 10Gb LAN. The test result shows TCP Tx throughput using BBR is lower than using Reno. 

The test bed topology:

                  server 1 (192.168.200.5) ========== switch (10Gb) ========== server 2 (192.168.200.6)

iperf server is running on server 2. 

       iperf3 -s

iperf client is running on server 1.

      iperf3 -c 192.168.200.6 -i 1 -t 10

When using BBR congestion control algorithm, the TCP throughput is 

sudo sysctl -a | grep tcp_congestion_control
net.ipv4.tcp_congestion_control = bbr

iperf3 -c 192.168.200.6 -i 1 -t 10
Connecting to host 192.168.200.6, port 5201
[ 5] local 192.168.200.5 port 60130 connected to 192.168.200.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 909 MBytes 7.62 Gbits/sec 104 195 KBytes
[ 5] 1.00-2.00 sec 905 MBytes 7.59 Gbits/sec 3 195 KBytes
[ 5] 2.00-3.00 sec 901 MBytes 7.55 Gbits/sec 16 195 KBytes
[ 5] 3.00-4.00 sec 902 MBytes 7.57 Gbits/sec 0 195 KBytes
[ 5] 4.00-5.00 sec 891 MBytes 7.47 Gbits/sec 12 195 KBytes
[ 5] 5.00-6.00 sec 887 MBytes 7.44 Gbits/sec 0 195 KBytes
[ 5] 6.00-7.00 sec 894 MBytes 7.50 Gbits/sec 1 195 KBytes
[ 5] 7.00-8.00 sec 911 MBytes 7.65 Gbits/sec 10 195 KBytes
[ 5] 8.00-9.00 sec 863 MBytes 7.24 Gbits/sec 0 195 KBytes
[ 5] 9.00-10.00 sec 861 MBytes 7.22 Gbits/sec 0 195 KBytes

  • - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval Transfer Bitrate Retr
    [ 5] 0.00-10.00 sec 8.71 GBytes 7.49 Gbits/sec 146 sender
    [ 5] 0.00-10.04 sec 8.71 GBytes 7.46 Gbits/sec receiver

When using Reno congestion control algorithm, the TCP throughput is 

sudo sysctl -a | grep tcp_congestion_control
net.ipv4.tcp_congestion_control = reno

iperf3 -c 192.168.200.6 -i 1 -t 10
Connecting to host 192.168.200.6, port 5201
[ 5] local 192.168.200.5 port 60134 connected to 192.168.200.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.10 GBytes 9.43 Gbits/sec 43 481 KBytes
[ 5] 1.00-2.00 sec 1.10 GBytes 9.42 Gbits/sec 4 506 KBytes
[ 5] 2.00-3.00 sec 1.10 GBytes 9.42 Gbits/sec 9 505 KBytes
[ 5] 3.00-4.00 sec 1.10 GBytes 9.42 Gbits/sec 0 518 KBytes
[ 5] 4.00-5.00 sec 1.10 GBytes 9.42 Gbits/sec 0 527 KBytes
[ 5] 5.00-6.00 sec 1.10 GBytes 9.41 Gbits/sec 0 539 KBytes
[ 5] 6.00-7.00 sec 1.09 GBytes 9.41 Gbits/sec 0 546 KBytes
[ 5] 7.00-8.00 sec 1.10 GBytes 9.42 Gbits/sec 0 546 KBytes
[ 5] 8.00-9.00 sec 1.10 GBytes 9.42 Gbits/sec 0 547 KBytes
[ 5] 9.00-10.00 sec 1.10 GBytes 9.42 Gbits/sec 5 556 KBytes

  • - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval Transfer Bitrate Retr
    [ 5] 0.00-10.00 sec 11.0 GBytes 9.42 Gbits/sec 61 sender
    [ 5] 0.00-10.04 sec 11.0 GBytes 9.38 Gbits/sec receiver
We can see there is obvious throughput gap, BBR 7.46Gbps vs Reno 9.38Gbps.  The congestion window of BBR is 195KBytes while the congestion window of Reno is 500+KBytes. Base on captured packets trace, the RTT in this test bed is about 0.2ms. The BDP should be around 250K bytes. It looks but not sure the smaller congestion window of BBR impacts the throughput.

It seems BBR doesn't occupy the whole physical network bandwidth in this test. I expect BBR has same peak throughput with other congestion control algorithms, such as Reno, but it seems not. Does someone know why? Thanks.

The captured TCP packets of BBR is attached.

BRs,
Clark
clear-bbr-tx.pcapng.xz

Eric Dumazet

unread,
Feb 19, 2019, 9:51:38 PM2/19/19
to Clark Mi, BBR Development
This very might be caused by a driver issue. Here we can reach ~30Gbit
on one BBR flow.

What NIC are you using ? (ethtool -i ethX)

What is the output of "ethtool -c ethX"
> --
> You received this message because you are subscribed to the Google Groups "BBR Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Neal Cardwell

unread,
Feb 19, 2019, 10:05:26 PM2/19/19
to Eric Dumazet, Clark Mi, BBR Development
Thanks for the report!

I agree with Eric that this seems like a (receiver-side) driver/LRO/GRO issue.

Looking at the trace (attached) we can see high degrees of aggregation
in the ACK stream, with long 240us silences followed by aggregated
ACKs covering more than a single TSO burst. A well-configured receiver
will generally send an ACK for at least every maximally-sized TSO
burst. And this receiver is not doing that. If you can run the
commands Eric listed, we can help see if the receiver's behavior can
be optimized.

That said, if you upgrade the sender by checking out and building the
latest Linux net-next branch then the latest sender-side Linux TCP BBR
should be able to cope with that high degree of aggregation and still
reach pretty good utilization, using the following recent Linux
net-next commit:
78dc70ebaa38 tcp_bbr: adapt cwnd based on ack aggregation estimation

thanks,
neal
bbr-2019-02-19-10G-aggregation-1.png

Clark Mi

unread,
Feb 19, 2019, 10:32:45 PM2/19/19
to BBR Development
same NICs are used on two servers.

here is the output.

 ethtool -i eno2
driver: i40e
version: 2.3.2-k
firmware-version: 3.31 0x80000d31 1.1767.0
expansion-rom-version:
bus-info: 0000:3d:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes


ethtool -c eno2
Coalesce parameters for eno2:
Adaptive RX: on  TX: on
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 50
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 256

tx-usecs: 50
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 256

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0


Thanks



在 2019年2月20日星期三 UTC+8上午10:51:38,Eric Dumazet写道:

Neal Cardwell

unread,
Feb 19, 2019, 11:11:18 PM2/19/19
to Clark Mi, BBR Development
Thanks! Can you also please run the following to show offload settings:

ethtool -k <device_name>

thanks,
neal

Clark Mi

unread,
Feb 19, 2019, 11:54:31 PM2/19/19
to BBR Development
Here are the output.

 ethtool -k eno2
Features for eno2:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: on
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]


在 2019年2月20日星期三 UTC+8下午12:11:18,Neal Cardwell写道:

Clark Mi

unread,
Feb 20, 2019, 4:07:27 AM2/20/19
to BBR Development
Thanks a lot, Neal.

The issue looks indeed caused by the ack aggregation. I follow your suggestion and apply the 78dc70ebaa38 patch into my kernel. The test result looks great and BBR TCP throughput can reach same level with Reno.

$sudo sysctl -a | grep congestion
net.ipv4.tcp_allowed_congestion_control = reno bbr
net.ipv4.tcp_available_congestion_control = reno bbr
net.ipv4.tcp_congestion_control = bbr

Iperf3 output:

iperf3 -c 192.168.200.6 -i 1 -t 10
Connecting to host 192.168.200.6, port 5201
[  5] local 192.168.200.5 port 33648 connected to 192.168.200.6 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.10 GBytes  9.42 Gbits/sec  104    390 KBytes
[  5]   1.00-2.00   sec  1.10 GBytes  9.42 Gbits/sec    0    365 KBytes
[  5]   2.00-3.00   sec  1.09 GBytes  9.41 Gbits/sec    8    354 KBytes
[  5]   3.00-4.00   sec  1.09 GBytes  9.41 Gbits/sec    0    376 KBytes
[  5]   4.00-5.00   sec  1.10 GBytes  9.42 Gbits/sec    0    351 KBytes
[  5]   5.00-6.00   sec  1.09 GBytes  9.41 Gbits/sec    4    368 KBytes
[  5]   6.00-7.00   sec  1.10 GBytes  9.42 Gbits/sec    0    370 KBytes
[  5]   7.00-8.00   sec  1.10 GBytes  9.42 Gbits/sec    0    376 KBytes
[  5]   8.00-9.00   sec  1.09 GBytes  9.41 Gbits/sec    0    402 KBytes
[  5]   9.00-10.00  sec  1.09 GBytes  9.41 Gbits/sec    7    373 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.0 GBytes  9.41 Gbits/sec  123             sender
[  5]   0.00-10.04  sec  11.0 GBytes  9.37 Gbits/sec                  receiver

BTW, this patch would be extremely important for 11ac WiFi network as 11ac requires much heavy aggregation, A-MSDU and A-MPDU. 

Thanks again for your work.


在 2019年2月20日星期三 UTC+8上午11:05:26,Neal Cardwell写道:

Neal Cardwell

unread,
Feb 20, 2019, 10:24:42 AM2/20/19
to Clark Mi, BBR Development
Great! Thanks for applying the patch and running the experiment and reporting the detailed results! Looks like the latest Linux TCP BBR achieved the same throughput as Reno, with a smaller cwnd.
 
BTW, this patch would be extremely important for 11ac WiFi network as 11ac requires much heavy aggregation, A-MSDU and A-MPDU. 

Yes, great point. In fact the patch was primarily motivated by, and tested with, wifi links, exactly for that reason.
 
Thanks again for your work.

Thanks for your testing and reports!

neal

Clark Mi

unread,
Feb 21, 2019, 4:34:56 AM2/21/19
to BBR Development
Hi Neal,

I see this patch has been committed into the upstream mainline branch. Do you have plan to merge this patch to other long terms branches? Thanks.

在 2019年2月20日星期三 UTC+8下午11:24:42,Neal Cardwell写道:
Reply all
Reply to author
Forward
0 new messages