How do you measure the performance benefit, and what is it observed to be?

841 views
Skip to first unread message

Kevin Burke

unread,
Sep 18, 2016, 11:21:22 PM9/18/16
to BBR Development
Hi,
I read the patch description posted here: https://patchwork.ozlabs.org/patch/671069/

It seems really neat! 

In the first paragraph, you state: BBR has "significantly increased throughput and reduced latency forconnections on Google's internal backbone networks and google.com and YouTube Web servers."

How do you measure these? Do you have the actual numbers for improvements?

Stephen Gunn

unread,
Sep 19, 2016, 12:02:52 AM9/19/16
to Kevin Burke, BBR Development
Hi Kevin -

I'm one of the Site Reliability Engineers that helped test/deploy BBR.
I suspect The ACM Queue paper will cover the details you seek.

For now -- I will say that the latency improvement you quoted is a
comparison of completion-tail-latency measured in the RPC library.

- Steve

Eric Dumazet

unread,
Sep 19, 2016, 12:18:43 AM9/19/16
to BBR Development
The BBR ACM paper will have some numbers, but really limited.

The best numbers ever are the ones you can get now this is upstream/public code.

Compared to Cubic, there is 2 to 4 orders of magnitude difference on lossy environments.

Example of 100ms rtt, and 1% packet loss. Cubic performs very badly there.

$ netperf -H 10.246.7.152 -l 30 -- -K cubic
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    40.00       3.27   

$ netperf -H 10.246.7.152 -l 30 -- -K bbr  
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    30.25    9150.01   

While the BBR flow is running, "ss -temoi dst 10.246.7.152" also tells us that rtt is kept minimal, no 'bufferbloat'.

ESTAB      0      189086440           10.x.y.z:38627       10.246.7.152:43995                 timer:(on,301ms,0) ino:8564173 sk:26 <->
skmem:(r0,rb428800,t132880,tb200000000,f1560,w192141800,o0,bl11520,d0) ts sack 
 bbr {bw:12078.7Mbps mrtt:100.024 pacing_gain:1 cwnd_gain:2} 
 wscale:14,7 rto:301 rtt:100.134/0.003 mss:1448 cwnd:208728 bytes_acked:27140670745 segs_out:19056011 
 segs_in:2157289 data_segs_out:19056009 send 24146.7Mbps lastrcv:80856873 
 pacing_rate 12512.5Mbps unacked:123692 retrans:197/188716 lost:197 sacked:25810 rcv_space:29200 notsent:9980944 minrtt:100.022

Dave Taht

unread,
Sep 19, 2016, 2:01:43 PM9/19/16
to Eric Dumazet, BBR Development
I put up some very limited test results here.

http://blog.cerowrt.org/post/bbrs_basic_beauty/

I was *really* impressed by how low it held the RTT, while holding
bandwidth high. Please let me know if I'm misinterpreting the new
"sawtooth" or got anything else wrong.

Eric Dumazet

unread,
Sep 19, 2016, 2:09:17 PM9/19/16
to BBR Development, edum...@google.com
Hi Dave

Not sure if you used sch_fq   (does include pacing) instead of fq_codel (FQ without pacing) in your tests ?

Dave Taht

unread,
Sep 19, 2016, 2:21:02 PM9/19/16
to Eric Dumazet, BBR Development
On Mon, Sep 19, 2016 at 11:09 AM, 'Eric Dumazet' via BBR Development
<bbr...@googlegroups.com> wrote:
> Hi Dave
>
> Not sure if you used sch_fq (does include pacing) instead of fq_codel (FQ
> without pacing) in your tests ?

Used sch_fq on the server side throughout on what I posted. The
middlebox was htb + fq_codel.

The clients were generally fq_codel or the fq_codel'd wifi stuff, but
it was generally testing in their upload direction only.

I did a whole bunch more tests than this, but it would take me ages to write up.
What sort of data would you like? If you have any suggestions for
other stuff to try on my testbeds? Most of my tests until 3 days ago,
were primarily oriented to blowing up a bunch of wifi stations.

A mental model has always been "someone streaming" as a background
wifi workload for one out of the 4 people in a given home, and I
figure I'll use BBR for that now.

btw: What sort of damage can I do to myself with bbr + fq_codel server
side? I'll go try that at some point, by accident, I'm sure. :)


>
>
> On Monday, September 19, 2016 at 11:01:43 AM UTC-7, Dave Taht wrote:
>>
>> I put up some very limited test results here.
>>
>> http://blog.cerowrt.org/post/bbrs_basic_beauty/
>>
>> I was *really* impressed by how low it held the RTT, while holding
>> bandwidth high. Please let me know if I'm misinterpreting the new
>> "sawtooth" or got anything else wrong.
>
> --
> You received this message because you are subscribed to the Google Groups
> "BBR Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bbr-dev+u...@googlegroups.com.
> To post to this group, send email to bbr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/bbr-dev/746a1541-0b70-4d3b-ad25-77fcf494950c%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Eric Dumazet

unread,
Sep 19, 2016, 2:25:57 PM9/19/16
to BBR Development, edum...@google.com
Well, not using pacing would increase latencies and drops in some situations.

Thanks for the clarification !

Dave Taht

unread,
Sep 19, 2016, 2:39:34 PM9/19/16
to Eric Dumazet, BBR Development
I pushed out an update of the topology to the post, akamai will take a
while to propagate it. Certainly testers missing the sch_fq
requirement would be bad!

<pre>
server 1Gbit running cubic/reno/bbr with sch_fq on its interfaces
|
enp3s0 w/netem 24 ms each way
delay box
enp4s0 w/sqm-scripts 20mbit both ways
|
client(s) 1Gbit (or wifi) with fq_codel, cubic
</pre>

Also, over the years, I've settled on using the sqm-scripts to always
get the rate limiter right (and I probably should fold the bfifo and
pfifo modes into the mainline code), on one interface...

and use netem on an entirely different interface and ifb. I NEVER run
netem on the same box as the client or server, and distrust netem's
rate limiter. I have seen so many people get netem wrong...

On the other hand, life is 4.8 now and for all I know I can now safely
run all this stuff on one box inside of containers and I'll be able to
believe the result?

Jeremy Carroll

unread,
Sep 19, 2016, 3:54:28 PM9/19/16
to BBR Development
I had the exact same question. What metrics are used to determine improvement. We are thinking of evaluating this at my current work. First thing is to setup observability to see impact.

Wondering what synthetic benchmarks used (Netperf, etc..). And what actual measurement used (Client side, Kernel side, etc..).


On Sunday, September 18, 2016 at 8:21:22 PM UTC-7, Kevin Burke wrote:

Yuchung Cheng

unread,
Sep 19, 2016, 6:37:34 PM9/19/16
to BBR Development


On Monday, September 19, 2016 at 12:54:28 PM UTC-7, Jeremy Carroll wrote:
I had the exact same question. What metrics are used to determine improvement. We are thinking of evaluating this at my current work. First thing is to setup observability to see impact.

Wondering what synthetic benchmarks used (Netperf, etc..). And what actual measurement used (Client side, Kernel side, etc..).
Let me add to Stephan's answer above.

Netperf is used in the synthetic test with emulated BW, RTT, buffer-size, loss etc. In production, in addition to network stats like RTT, throughput, and losses, we look e2e metrics like the RPC level request to response time, request backlogs, and application specific metrics (e.g., video quality of experience, browser object load time), with a/b test framework. Our paper does not cover all that (due to size limit), but we'd be happy to discuss more on the detail instrumentation. An interesting lesson we've learned is that, often bbr can't reach its potential b/c of current system config. We should upstream some of the instrumentation to netdev later.

Tomasz Jamroszczak

unread,
Sep 20, 2016, 3:33:46 AM9/20/16
to Dave Taht, BBR Development
You write: "I think what we are doing for wifi remains worthwhile. And,
to give the BBR developers their day in the sun, I’m not going to publish
those results". But that is interesting - how does the BBR deal with WiFi
and its non-random, bursty packet loss. Do you have any insights to share?

--
Best Regards
Tomasz Jamroszczak

Neal Cardwell

unread,
Sep 21, 2016, 1:24:09 PM9/21/16
to Tomasz Jamroszczak, Dave Taht, BBR Development
By design, BBR first and foremost bases its sending rate on the actual delivery rate of the network, rather than packet loss or delay signals. So if the packet loss is low enough that it does not impact the overall delivery rate of the path, then BBR is able to fully utilize the path. In general, if the packet loss rate is below 15% then BBR is able to fully utilize the path (reaching link_bandwidth*(1 - loss_rate)). This 15% threshold is a design parameter, rather than a fundamental limit of the algorithm.

In the A/B experiments I have done with real wifi networks I have access to, BBR tends to do as well or better than CUBIC. And the throughput numbers we see for YouTube traffic show similar trends: on the whole, BBR tends to do as well or better than CUBIC on most cellular, wifi, DSL, or cable modem paths.

neal

 

Yuchung Cheng

unread,
Sep 21, 2016, 1:44:01 PM9/21/16
to Neal Cardwell, Tomasz Jamroszczak, Dave Taht, BBR Development
On Wed, Sep 21, 2016 at 10:23 AM, 'Neal Cardwell' via BBR Development <bbr...@googlegroups.com> wrote:
On Tue, Sep 20, 2016 at 3:33 AM, Tomasz Jamroszczak <tjamro...@opera.com> wrote:
On Mon, 19 Sep 2016 20:01:41 +0200, Dave Taht <dave...@gmail.com> wrote:

I put up some very limited test results here.

http://blog.cerowrt.org/post/bbrs_basic_beauty/

I was *really* impressed by how low it held the RTT, while holding
bandwidth high. Please let me know if I'm misinterpreting the new
"sawtooth" or got anything else wrong.


        You write: "I think what we are doing for wifi remains worthwhile. And, to give the BBR developers their day in the sun, I’m not going to publish those results".  But that is interesting - how does the BBR deal with WiFi and its non-random, bursty packet loss.  Do you have any insights to share?
My experience by studying YouTube traces is that WiFi tends to perform aggressive link layer retransmission so what appears to TCP sender is sudden delay spikes and stretched acks (or sometimes decimated ACKs by some driver/cable-networks). This may inflate the delivery rate which we've to compensate pragmatically in our rate estimation module (by doing a min(tx_rate, delivery_rate))

On the other aspect, the sudden delay and then burst would cause inflight to fill a cwnd = BDP and lose utilization. Therefore we have to make a pragmatic compromise to use a higher cwnd to deal with this. The Queue paper describes these real-world mitigations.

Occasionally we do see burst of tcp packet losses, but the traces would appear to be a drive-by shooting: competing (cubic/reno?) traffic blowing up the buffer and everybody takes a toll.

And Dave please share good and bad results about BBR. We like to see and know why it's not sunny :-)
 

By design, BBR first and foremost bases its sending rate on the actual delivery rate of the network, rather than packet loss or delay signals. So if the packet loss is low enough that it does not impact the overall delivery rate of the path, then BBR is able to fully utilize the path. In general, if the packet loss rate is below 15% then BBR is able to fully utilize the path (reaching link_bandwidth*(1 - loss_rate)). This 15% threshold is a design parameter, rather than a fundamental limit of the algorithm.

In the A/B experiments I have done with real wifi networks I have access to, BBR tends to do as well or better than CUBIC. And the throughput numbers we see for YouTube traffic show similar trends: on the whole, BBR tends to do as well or better than CUBIC on most cellular, wifi, DSL, or cable modem paths.

neal

 

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+unsubscribe@googlegroups.com.

x...@redhat.com

unread,
Oct 15, 2018, 11:34:04 PM10/15/18
to BBR Development
Hi Eric,
I can not get the 9Gbps for bbr, I just get 800Mbps on 10G card with delay 100ms loss 1%, can you help to show me what's problem of my test?

[root@server bbr]# cat server.sh 
#!/bin/bash
# client(sender)    ---      server(receiver)
# tc qdisc: fq               tc qdisc: ingress + netem(delay 100ms loss 1%)

#setup env ethtool -i ens2f0| grep driver ethtool ens2f0| grep Speed ip link set dev ens2f0 up ip add add 192.168.100.2/24 dev ens2f0 modprobe ifb numifbs=1 ip link set dev ifb0 up tc qdisc add dev ens2f0 ingress tc filter add dev ens2f0 parent ffff: matchall action mirred egress redirect dev ifb0 tc filter show dev ens2f0 ingress tc qdisc add dev ifb0 root netem delay 100ms loss 1% [root@client bbr]# cat client.sh #!/bin/bash # client(sender) --- server(receiver) # tc qdisc: fq tc qdisc: ingress + netem(delay 100ms loss 1%) # setup env ethtool -i ens1| grep driver ethtool ens1| grep Speed ip link set dev ens2f0 up ip add add 192.168.100.1/24 dev ens2f0 tc qdisc del dev ens1 root tc qdisc add dev ens1 root fq tc qdisc show dev ens1

====
Steps to Reproduce:
1. set tcp buff to 128M on client and server(10 Gbps x 100ms = 125MB BDP) 
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
net.ipv4.tcp_rmem = 4096	87380	268435456
net.ipv4.tcp_wmem = 4096	65536	268435456
2. on server:
 # sh ./server.sh
 # iperf3 -s -1

3. on client:
 # sh ./client.sh
 # iperf3 -c 192.168.100.2 -C bbr

Eric Dumazet

unread,
Oct 15, 2018, 11:43:56 PM10/15/18
to x...@redhat.com, BBR Development
On Mon, Oct 15, 2018 at 8:34 PM <x...@redhat.com> wrote:
>
> Hi Eric,
> I can not get the 9Gbps for bbr, I just get 800Mbps on 10G card with delay 100ms loss 1%, can you help to show me what's problem of my test?
>
> [root@server bbr]# cat server.sh
> #!/bin/bash
> # client(sender) --- server(receiver)
> # tc qdisc: fq tc qdisc: ingress + netem(delay 100ms loss 1%)
>
> #setup env
> ethtool -i ens2f0| grep driver
> ethtool ens2f0| grep Speed
> ip link set dev ens2f0 up
> ip add add 192.168.100.2/24 dev ens2f0
>
> modprobe ifb numifbs=1
> ip link set dev ifb0 up
> tc qdisc add dev ens2f0 ingress
> tc filter add dev ens2f0 parent ffff: matchall action mirred egress redirect dev ifb0
> tc filter show dev ens2f0 ingress
> tc qdisc add dev ifb0 root netem delay 100ms loss 1%

My guess is that you need to increase the number of packets that this
netem is able to store before tail-dropping in-excess packets.

Default limit for netem is 1000 packets

eg :

tc qdisc add dev ifb0 root netem limit 100000 delay 100ms loss 1%

While test is running you can monitor the backlog of this netem

tc -s qdisc show dev ifb0


>
> [root@client bbr]# cat client.sh
> #!/bin/bash
>
> # client(sender) --- server(receiver)
> # tc qdisc: fq tc qdisc: ingress + netem(delay 100ms loss 1%)
>
> # setup env
> ethtool -i ens1| grep driver
> ethtool ens1| grep Speed
> ip link set dev ens2f0 up
> ip add add 192.168.100.1/24 dev ens2f0
>
> tc qdisc del dev ens1 root
> tc qdisc add dev ens1 root fq
> tc qdisc show dev ens1
>
>
> ====
>
> Steps to Reproduce:
> 1. set tcp buff to 128M on client and server(10 Gbps x 100ms = 125MB BDP)
> net.core.rmem_max = 268435456
> net.core.wmem_max = 268435456

These net.core. sysctls are not used by TCP.
> --
> You received this message because you are subscribed to the Google Groups "BBR Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.

Eric Dumazet

unread,
Oct 16, 2018, 12:05:06 AM10/16/18
to x...@redhat.com, BBR Development
Also, setting the rmem[2] and wmem[2] to 125 MB (BDP of 100ms *
10Gbit) would be ok for losless link.

If you want a loss being repaired without stall, you need at least
twice the BDP, in the case packets are lost only once
(The retransmit is not lost)

Seeing my old message, ss command was displaying wscale:14,7 meaning
that I probably used
1 GB of rmem[2] and wmem[2] for the tests.

x...@redhat.com

unread,
Oct 16, 2018, 5:48:58 AM10/16/18
to BBR Development
Eric,
Thanks, it help a lot.
I tried to set rmem[2] and wmem[2] to the max value 2G, and netem limit 100000

using the same command with netperf, I got the max 8.5Gpbs, still can not reach 9.1G, is there any other what I missed?
if using iperf3 and omit the first n seconds, I can get 9.1Gbps,

# netperf -H 192.168.168.2 -l 30 -- -K bbr
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.168.2 () port 0 AF_INET

Recv   Send    Send                         
Socket Socket  Message  Elapsed             
Size   Size    Size     Time     Throughput 
bytes  bytes   bytes    secs.    10^6bits/sec 

 87380  65536  65536    31.29    8423.69  

# iperf3 -c 192.168.168.2 -C bbr -t 30 -i 2 -O 8
Connecting to host 192.168.168.2, port 5201
[  5] local 192.168.168.1 port 33958 connected to 192.168.168.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-2.00   sec   300 MBytes  1.26 Gbits/sec  351   44.2 MBytes       (omitted)
[  5]   2.00-4.00   sec  2.68 GBytes  11.5 Gbits/sec  13895    223 MBytes       (omitted)
[  5]   4.00-6.00   sec  2.05 GBytes  8.79 Gbits/sec  16404    223 MBytes       (omitted)
[  5]   2.00-2.00   sec  2.13 GBytes  4.57 Gbits/sec  16551    223 MBytes      
[  5]   2.00-4.00   sec  2.16 GBytes  9.29 Gbits/sec  14982    221 MBytes      
[  5]   4.00-6.00   sec  2.12 GBytes  9.10 Gbits/sec  15484    223 MBytes      
[  5]   6.00-8.00   sec  2.00 GBytes  8.61 Gbits/sec  16936    222 MBytes      
[  5]   8.00-10.00  sec  2.23 GBytes  9.58 Gbits/sec  15713    223 MBytes      
[  5]  10.00-12.00  sec  2.06 GBytes  8.83 Gbits/sec  15086    221 MBytes      
[  5]  12.00-14.00  sec  2.16 GBytes  9.28 Gbits/sec  14822    221 MBytes      
[  5]  14.00-16.00  sec  2.14 GBytes  9.21 Gbits/sec  15977    223 MBytes      
[  5]  16.00-18.00  sec  2.13 GBytes  9.16 Gbits/sec  15096    224 MBytes      
[  5]  18.00-20.00  sec  2.12 GBytes  9.12 Gbits/sec  15818    222 MBytes      
[  5]  20.00-22.00  sec  2.12 GBytes  9.11 Gbits/sec  15889    221 MBytes      
[  5]  22.00-24.00  sec  2.03 GBytes  8.72 Gbits/sec  16642    222 MBytes      
[  5]  24.00-26.00  sec  2.15 GBytes  9.25 Gbits/sec  15410    223 MBytes      
[  5]  26.00-28.00  sec  2.13 GBytes  9.15 Gbits/sec  16528    222 MBytes      
[  5]  28.00-30.00  sec  2.03 GBytes  8.71 Gbits/sec  14985    222 MBytes      
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  31.7 GBytes  9.08 Gbits/sec  235919             sender
[  5]   0.00-30.10  sec  32.0 GBytes  9.12 Gbits/sec                  receiver

iperf Done.

Eric Dumazet

unread,
Oct 16, 2018, 8:52:21 AM10/16/18
to x...@redhat.com, BBR Development
To be clear, my hosts had 40Gbit NIC, not 10Gbit ones.

1) You can not get line rate if you have drops really...

2) Sustained 10Gbit with 1% drops (of full size GRO packets given your
current netem script) probably requires a lot of cpu cycles,
you might be cpu limited on your hosts.


Let me do the test on a 10Gbit testbed (mlx4 NIC) and latest David
Miller net-next tree
(Thus lacking one patch from Neal Cardwell fixing BBR after EDT adoption)

First round, with no netem to get base numbers (One run with cubic,
another with bbr)

lpaa5:/export/hda3/google/edumazet# ./netperf -P0 -H lpaa6,4 -l 30 -Cc
-- -K cubic

540000 262144 262144 30.00 9399.02 0.52 1.26 0.216 0.526

lpaa5:/export/hda3/google/edumazet# ./netperf -P0 -H lpaa6,4 -l 30 -Cc
-- -K bbr

540000 262144 262144 30.00 9282.61 0.56 1.31 0.235 0.555


Then with a 100ms delay (and no losses) netem at ingress on receiver

lpaa5:/export/hda3/google/edumazet# ./netperf -P0 -H lpaa6,4 -l 30 -Cc
-- -K cubic

540000 262144 262144 30.24 6147.90 0.59 1.02 0.378 0.655

lpaa5:/export/hda3/google/edumazet# ./netperf -P0 -H lpaa6,4 -l 30 -Cc
-- -K bbr

540000 262144 262144 31.21 8714.94 0.77 1.53 0.347 0.690

Then adding one percent drops in netem :

lpaa5:/export/hda3/google/edumazet# ./netperf -P0 -H lpaa6,4 -l 30 -Cc
-- -K cubic
540000 262144 262144 33.22 2.40 0.06 0.07 95.610 112.048

lpaa5:/export/hda3/google/edumazet# ./netperf -P0 -H lpaa6,4 -l 30 -Cc
-- -K bbr
540000 262144 262144 31.15 8438.88 0.70 1.41 0.328 0.657

We can see that having 1% drops hardly impact BBR

Xiumei Mu

unread,
Oct 16, 2018, 10:22:40 AM10/16/18
to Eric Dumazet, BBR Development
Yes, BBR is really very good congestion control, Thanks for your great help!
Reply all
Reply to author
Forward
0 new messages