How to explain BBR's throughput slumps at 20% loss?

149 views
Skip to first unread message

Tao Tse

unread,
Sep 11, 2019, 1:52:30 AM9/11/19
to BBR Development

微信截图_20190911134408.png

Hi,

As the test in the paper, why the throughput slumps at 20% loss?
* Is there any hard limitation in the implementation?
* Or dose it theoretically act like this?

Thanks!

--
xtao

Neal Cardwell

unread,
Sep 11, 2019, 10:06:25 AM9/11/19
to Tao Tse, BBR Development
Hi,

Thanks for your question! The fact that BBRv1 throughput is low with loss rates beyond 20% is due to the way a key a design parameter in BBRv1 impacts algorithm behavior. We would have liked to expand upon this in the  original BBR article, but ran into space limits, so I'll try to outline the answer here.

In BBRv1, the estimated bandwidth is the windowed maximum of recent observed delivery rate samples. When a BBRv1 flow probes for bandwidth, it sends at 1.25x the estimated bandwidth. As noted in the BBR article, "The maximum possible throughput is the link rate times fraction delivered (= 1 − lossRate)". So while probing the maximum possible throughput is:

  bw_probe_delivery_rate = bw_probe_send_rate * (1 - lossRate)  
  bw_probe_delivery_rate = 1.25 * BtlBw       * (1 - lossRate)

So then the question is: what is the maximum lossRate at which the flow will be able to probe for bandwidth and measure a bw_probe_delivery_rate that is big enough to at least match its current BtlBw, allowing the flow to "refresh" its BtlBw estimate and continue sending at BtlBw? We can express this as:

          bw_probe_delivery_rate >= BtlBw
   1.25 * BtlBw * (1 - lossRate) >= BtlBw

Solving for lossRate we get:

 1.25 * BtlBw * (1 - lossRate) >= BtlBw
        1.25  * (1 - lossRate) >= 1
                  1 - lossRate >= 1/1.25
                             1 >= 1/1.25 + lossRate
                    1 - 1/1.25 >= lossRate
                      lossRate <= 0.2

So this model says that a flow will only be able to sustain or refresh its BtlBw estimate if the lossRate is <= 0.2. This explains why at loss rates above 0.2, the flow is unable to "refresh" its BtlBw estimate, and its throughput falls. Of course the code does not exactly match this model, and at loss rates near the border, say 10%-20%, the detailed dynamics are such that the throughput falls slightly below this idealized simple model. But I think this captures the essence of the theoretical 20% loss rate limit for BBRv1.

Please note that the dynamics for BBRv2 are considerably different, and for packet loss BBRv2  instead has a separate explicit loss_thresh parameter, a target loss rate that it tries to stay below, using explicit measurements of the loss rate. BBRv2 was discussed at IETF 104 and 105, and the BBRv2 alpha Linux TCP code is here.

best,
neal



--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/f13cd4a9-c916-4846-bd35-5c8a7eb3be31%40googlegroups.com.

Tao Tse

unread,
Sep 11, 2019, 1:02:51 PM9/11/19
to BBR Development
Neal,

Thanks for your detailed explanation! 

I've got it.

--
sincerely
xtao
To unsubscribe from this group and stop receiving emails from it, send an email to bbr...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages