Are we heading towards a BBR-dominant Internet?

449 views
Skip to first unread message

Dave Taht

unread,
Aug 25, 2022, 8:01:58 PM8/25/22
to bloat, BBR Development, ay...@comp.nus.edu.sg
I rather enjoyed this one. I can't help but wonder what would happen
if we plugged some different assumptions into their model.

https://www.comp.nus.edu.sg/~bleong/publications/imc2022-nash.pdf

--
FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
Dave Täht CEO, TekLibre, LLC

Neal Cardwell

unread,
Aug 26, 2022, 9:36:47 AM8/26/22
to Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Yes, I agree the assumptions are key here. One key aspect of this paper is that it focuses on the steady-state behavior of bulk flows.

Once you allow for short flows (like web pages, RPCs, etc) to dynamically enter and leave a bottleneck, the considerations become different. As is well-known, Reno/CUBIC will starve themselves if new flows enter and cause loss too frequently. For CUBIC, for a somewhat typical 30ms broadband path with a flow fair share of 25 Mbit/sec, if new flows enter and cause loss more frequently than roughly every 2 seconds then CUBIC will not be able to utilize its fair share. For a high-speed WAN path, with 100ms RTT and fair share of 10 Gbit/sec,  if new flows enter and cause loss more frequently than roughly every 40 seconds then CUBIC will not be able to utilize its fair share. Basically, loss-based CC can starve itself in some very typical kinds of dynamic scenarios that happen in the real world.

BBR is not trying to maintain a higher throughput than CUBIC in these kinds of scenarios with steady-state bulk flows. BBR is trying to be robust to the kinds of random packet loss that happen in the real world when there are flows dynamically entering/leaving a bottleneck.

cheers,
neal




_______________________________________________
Bloat mailing list
Bl...@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Bob McMahon

unread,
Aug 26, 2022, 4:54:45 PM8/26/22
to Neal Cardwell, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Hi Neal,

Any thoughts on tooling to generate and measure the traffic flows BBR is designed to optimize? I've been adding some low duty cycle support in iperf 2 with things like --bounceback and --burst-period and --burst-period. We could pull the size and period from a known distribution or distributions though not sure what to pick.

Thanks,
Bob

Bob

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/CADVnQykKbnxpNcpuZATug_4VLhV1%3DaoTTQE2263o8HF9ye_TQg%40mail.gmail.com.

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Neal Cardwell

unread,
Aug 27, 2022, 10:44:46 AM8/27/22
to Bob McMahon, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Hi Bob,

Good question. I can imagine a number of different techniques to generate and measure the traffic flows for this kind of study, and don't have any particular suggestions.

neal

Bob McMahon

unread,
Aug 27, 2022, 4:43:18 PM8/27/22
to Neal Cardwell, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Curious to what you're doing during development, if you can share?

Thanks,
Bob

Neal Cardwell

unread,
Aug 28, 2022, 2:43:26 PM8/28/22
to Bob McMahon, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Sure. For testing these kinds of properties of the BBR algorithm we use various transperf test cases. The transperf tool is something Soheil Hassas Yeganeh and our team cooked up and open-sourced here:

Bob McMahon

unread,
Aug 28, 2022, 6:39:40 PM8/28/22
to Neal Cardwell, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Hi Neal,

These look like steady-state bulk flow tests unless I'm missing something.

Bob

Neal Cardwell

unread,
Aug 28, 2022, 7:54:08 PM8/28/22
to Bob McMahon, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
If you are talking about the screenshot of the UI at  https://github.com/google/transperf, yes, that particular test is a simple bulk flow test to show a simple case to give a sense of what the UI looks like. :-)

We use a few different approaches that can examine dynamic flows causing packet loss: 

(1) The test configuration language is Python, so you can construct arbitrarily fancy dynamic flow scenarios with arbitrary numbers of flows starting and stopping at arbitrary times.

(2) The tests can also use netperf command line options to run periodic short transfers. (And we welcome patches to integrate support for other tools.)

(3) We also run a fair number of tests for robustness to loss just using randomly injected packet loss (using netem).

These are just some of the approaches we have used, and I don't claim that these are the only or best approaches to look at this. :-)

cheers,
neal

Bob McMahon

unread,
Aug 29, 2022, 12:47:51 PM8/29/22
to Neal Cardwell, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Thanks Neal. You might want to check out the flows released as iperf 2. Basically instantiate flows and run them. There typically is a controller running python3 (v 3.10 or better) that uses ssh pipes to DUTs. The design is event driven and utilizes python's asyncio which is quite powerful. The DUTs just need iperf2 and ssh.

The code is at an alpha level and we're looking for broader industry support and contributions. Both in realtime plotting but also in things like multivariate regression detection using statistical process controls (SPC) e.g. Hoteling. There is some crude clustering code around latency too which uses Kolmogorov-Smirnov distance matrices per the histograms.

A suggestion is that those in developer and control test roles synchronize their device clocks with PTP. Iperf 2 supports one way delay (OWD) calculations but these only work if the clocks are sync'd.  These in turn can be used per Little's law to calculate effective average queue depth, though this typically assumes a steady state measurement.

Bob 

Neal Cardwell

unread,
Aug 29, 2022, 4:07:37 PM8/29/22
to Bob McMahon, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Thanks for the pointers,  Bob.

best regards,
neal


Bob McMahon

unread,
Aug 29, 2022, 6:17:04 PM8/29/22
to Neal Cardwell, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Sure thing. Defining some multivariate signals using "non-bulk flow" or "realistic" traffic scenarios that could be automated (and a proxy for user QoE) would be very useful for L2 driver, MAC/PHY, and AP scheduling engineers allowing them to provide the best quality packet forwarding plane products possible for the transport layers and those engineering teams.

Bob 

Ayush Mishra

unread,
Mar 28, 2023, 5:36:14 AM3/28/23
to Neal Cardwell, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
Hey Neal,

I was revisiting this thread before presenting this paper in iccrg tomorrow - and I was particularly intrigued by one of the motivations you mentioned for BBR:

"BBR is not trying to maintain a higher throughput than CUBIC in these kinds of scenarios with steady-state bulk flows. BBR is trying to be robust to the kinds of random packet loss that happen in the real world when there are flows dynamically entering/leaving a bottleneck."

BBRv1 essentially tried to deal with this problem by doing away with packet loss as a congestion signal and having an entirely different philosophy to congestion control. However, if we set aside the issue of buffer bloat, I would imagine packet loss is a bad congestion signal in this situation because most loss-based congestion control algorithms use it as a binary signal with a binary response (back-off or no back-off). In other words, I feel the blame must be placed on not just the congestion signal, but also on how most algorithms respond to this congestion signal.

On a per-packet basis, packet loss is a binary signal. But over a window, the loss percentage and distribution, for example, can be a rich signal. There is probably scope for differentiating between different kinds of packet losses (and deciding how to react to them) when packet loss is coupled with the most recent delay measurement too. Now that BBRv2 reacts to packet loss, are you making any of these considerations too?

This is not something I plan to present in iccrg tomorrow, just something I was curious about :)

Warmest regards,
Ayush

--

Dave Taht

unread,
Mar 28, 2023, 6:44:21 AM3/28/23
to Ayush Mishra, Neal Cardwell, bloat, BBR Development, ay...@comp.nus.edu.sg
I am not keeping up with iccrg as well as I could, but IMHO, loss,
marking and delay can often be correlated. I did recently start up a
bit of testing of BBRv2 over starlink over on the starlink mailing
list.
--
AMA March 31: https://www.broadband.io/c/broadband-grant-events/dave-taht

Neal Cardwell

unread,
Apr 2, 2023, 9:45:31 AM4/2/23
to Ayush Mishra, Dave Taht, bloat, BBR Development, ay...@comp.nus.edu.sg
On Tue, Mar 28, 2023 at 5:36 AM Ayush Mishra <ayumis...@gmail.com> wrote:
Hey Neal,

I was revisiting this thread before presenting this paper in iccrg tomorrow - and I was particularly intrigued by one of the motivations you mentioned for BBR:

"BBR is not trying to maintain a higher throughput than CUBIC in these kinds of scenarios with steady-state bulk flows. BBR is trying to be robust to the kinds of random packet loss that happen in the real world when there are flows dynamically entering/leaving a bottleneck."

BBRv1 essentially tried to deal with this problem by doing away with packet loss as a congestion signal and having an entirely different philosophy to congestion control. However, if we set aside the issue of buffer bloat, I would imagine packet loss is a bad congestion signal in this situation because most loss-based congestion control algorithms use it as a binary signal with a binary response (back-off or no back-off). In other words, I feel the blame must be placed on not just the congestion signal, but also on how most algorithms respond to this congestion signal.

I would even go a little further, and say we don't need to "blame" loss as a congestion signal: usually it's telling us something useful and important.

AFAICT the problem is in the combination of:
 (a) only using loss as a signal
 (b) only reacting to whether there is packet loss in a round trip as a signal
 (c) only using a single multiplicative decrease as a response to loss detected in fast recovery

AFAICT any algorithm that has those properties (like Reno and CUBIC) simply can't scale to large BDPs if there are typical levels of loss or the traffic or available bandwidth is dynamic. At large BDPs and typically achievable loss rates, there will be packet loss in every round trip and the connection will always be decreasing rather than increasing, so will starve. For example, with a BDP of 10 Gbps * 100ms and MTU of 1500 bytes and loss rate of  0.0012% we'd expect a packet loss every round trip, and so we would expect starvation. In particular, a single CUBIC flow over such a path needs >40 secs between experiencing any losses, or a loss rate less than 0.0000029%  (2.9e-8) [ https://tools.ietf.org/html/rfc8312#section-5.2 ].
 
On a per-packet basis, packet loss is a binary signal. But over a window, the loss percentage and distribution, for example, can be a rich signal. There is probably scope for differentiating between different kinds of packet losses (and deciding how to react to them) when packet loss is coupled with the most recent delay measurement too. Now that BBRv2 reacts to packet loss, are you making any of these considerations too?

Yes, I agree there is useful information there, and BBRv2 does look explicitly and indirectly at the loss rate when making decisions. BBRv2 does not look at coupling the loss signal with the most recent delay measurement, but I agree that seems like a fruitful direction, and we have been considering that as a component of future CC algorithms.
 
This is not something I plan to present in iccrg tomorrow, just something I was curious about :)

Thanks for posting! I agree these are interesting topics. :-)

best regards,
neal

Neal Cardwell

unread,
Apr 2, 2023, 10:03:02 AM4/2/23
to Sebastian Moeller, Ayush Mishra, BBR Development, ay...@comp.nus.edu.sg, bloat


On Sun, Apr 2, 2023 at 8:14 AM Sebastian Moeller <moel...@gmx.de> wrote:
Hi Ayush,


> On Mar 28, 2023, at 11:36, Ayush Mishra via Bloat <bl...@lists.bufferbloat.net> wrote:
>
> Hey Neal,
>
> I was revisiting this thread before presenting this paper in iccrg tomorrow - and I was particularly intrigued by one of the motivations you mentioned for BBR:
>
> "BBR is not trying to maintain a higher throughput than CUBIC in these kinds of scenarios with steady-state bulk flows. BBR is trying to be robust to the kinds of random packet loss that happen in the real world when there are flows dynamically entering/leaving a bottleneck."

But isn't "when there are flows dynamically entering" actually a bona fide reason for the already established flows to scale back a bit, to give the new-commers some room to establish themselves?

Yes, I agree that "when there are flows dynamically entering" is actually a bona fide reason for the already established flows to scale back to give the newcomers some room to establish themselves. I'm not arguing against scaling back to give the newcomers some room to establish themselves. I'm arguing against the specific way that Reno and CUBIC behave to try to accomplish that. :-)

 
> BBRv1 essentially tried to deal with this problem by doing away with packet loss as a congestion signal and having an entirely different philosophy to congestion control. However, if we set aside the issue of buffer bloat, I would imagine packet loss is a bad congestion signal in this situation because most loss-based congestion control algorithms use it as a binary signal with a binary response (back-off or no back-off). In other words, I feel the blame must be placed on not just the congestion signal, but also on how most algorithms respond to this congestion signal.

        Fair enough, but even if we assume a capacity based loss we really do not know:
a) did the immediate traffic simply exceed the bottleneck's queue (assuming a fixed egress capacity/rate)
b) did the immediate traffic simply exceed the bottleneck's egress capacity (think variable rate link that just dropped in rate, while traffic rate was constant)

In case a) we might be OK with doing a gentle reduction (and take a bit to do so) in case b) we probably should be doing a less gentle reduction and preferably ASAP.

Agreed. And that's the approach that BBRv2 takes; it would behave differently in the two cases. In case (a) it would essentially notice that packets are being dropped and yet the delivery rate remains high, so would infer that in-flight is too high but the estimated bandwidth seems OK, so it would immediately reduce the cwnd slightly but maintain the pacing rate. In case (b) it would notice that the loss rate is high and delivery rate has reduced substantially, so would immediately and substantially reduce both the cwnd and pacing rate.
 
>
> On a per-packet basis, packet loss is a binary signal. But over a window, the loss percentage and distribution, for example, can be a rich signal. There is probably scope for differentiating between different kinds of packet losses

        Sure, as long as a veridical congestion detection is still timely enough not to make case b) above worse...

Agreed.
 
> (and deciding how to react to them) when packet loss is coupled with the most recent delay measurement too.

        Hmm, say we get a "all is fine" delay probe at time X, at X+1 the capacity drops to 50% and we incur a drop, will the most recent delay data actually be informative for the near future?

Usually it takes an ACK (a dupack or ACK carrying a SACK block) ACKing data that transited the network path *after* the loss to infer the loss (consistent with the RACK philosophy), and that ACK will usually provide a delay sample. So when there is loss usually there will be a delay signal that is at least as fresh as the loss signal, providing a hint about the state of the bottleneck queue after the loss. So even with loss I'd imagine that using that most recent delay data should usually be informative about the near future.

best regards,
neal

 
Regards
        Sebastian

Ayush Mishra

unread,
Apr 2, 2023, 9:49:54 PM4/2/23
to Neal Cardwell, Sebastian Moeller, BBR Development, ay...@comp.nus.edu.sg, bloat
On Sun, Apr 2, 2023 at 10:03 PM Neal Cardwell <ncar...@google.com> wrote:


On Sun, Apr 2, 2023 at 8:14 AM Sebastian Moeller <moel...@gmx.de> wrote:
Hi Ayush,

> On Mar 28, 2023, at 11:36, Ayush Mishra via Bloat <bl...@lists.bufferbloat.net> wrote:
>
> Hey Neal,
>
> I was revisiting this thread before presenting this paper in iccrg tomorrow - and I was particularly intrigued by one of the motivations you mentioned for BBR:
>
> "BBR is not trying to maintain a higher throughput than CUBIC in these kinds of scenarios with steady-state bulk flows. BBR is trying to be robust to the kinds of random packet loss that happen in the real world when there are flows dynamically entering/leaving a bottleneck."

But isn't "when there are flows dynamically entering" actually a bona fide reason for the already established flows to scale back a bit, to give the new-commers some room to establish themselves?

Yes, I agree that "when there are flows dynamically entering" is actually a bona fide reason for the already established flows to scale back to give the newcomers some room to establish themselves. I'm not arguing against scaling back to give the newcomers some room to establish themselves. I'm arguing against the specific way that Reno and CUBIC behave to try to accomplish that. :-)
==> I agree too. But I think one of the key challenges here could be when the dynamically entering flows are extremely tiny (which I imagine is quite common). In those cases, there is a possibility that by the time the long-running flow backs off, the congestion it was responding to has already ended because the tiny flows have exited the bottleneck (think microbursts caused by flows that last 1-2 RTTs). In a perfect world we'd like to deal with elephant and mice flows in isolation at the switch, but there are likely things we can do from the endpoint too. Maybe some kind of a two-phase backoff, with the second phase only kicking in after a period of hysteresis to make sure it's responding to persistent congestion and not just brief microbursts. This is just off the top of my head, so I'm not sure how something like this would play out in the overall dynamics and convergence of the algorithm that implements it.

Neal Cardwell

unread,
Apr 3, 2023, 9:41:27 AM4/3/23
to Sebastian Moeller, Ayush Mishra, BBR Development, ay...@comp.nus.edu.sg, bloat


On Mon, Apr 3, 2023 at 2:25 AM Sebastian Moeller <moel...@gmx.de> wrote:
Hi Neil,


thanks for your response. To make it clear I appreciate this discussion and I do in no way want to imply the BBRs are doing anything untoward here this is about understanding the principles better.



> On Apr 2, 2023, at 16:02, Neal Cardwell <ncar...@google.com> wrote:
>
>
>
> On Sun, Apr 2, 2023 at 8:14 AM Sebastian Moeller <moel...@gmx.de> wrote:
> Hi Ayush,
>
> > On Mar 28, 2023, at 11:36, Ayush Mishra via Bloat <bl...@lists.bufferbloat.net> wrote:
> >
> > Hey Neal,
> >
> > I was revisiting this thread before presenting this paper in iccrg tomorrow - and I was particularly intrigued by one of the motivations you mentioned for BBR:
> >
> > "BBR is not trying to maintain a higher throughput than CUBIC in these kinds of scenarios with steady-state bulk flows. BBR is trying to be robust to the kinds of random packet loss that happen in the real world when there are flows dynamically entering/leaving a bottleneck."
>
> But isn't "when there are flows dynamically entering" actually a bona fide reason for the already established flows to scale back a bit, to give the new-commers some room to establish themselves?
>
> Yes, I agree that "when there are flows dynamically entering" is actually a bona fide reason for the already established flows to scale back to give the newcomers some room to establish themselves. I'm not arguing against scaling back to give the newcomers some room to establish themselves. I'm arguing against the specific way that Reno and CUBIC behave to try to accomplish that. :-)

        [SM] Fair enough. There likely is room for improvements



> > BBRv1 essentially tried to deal with this problem by doing away with packet loss as a congestion signal and having an entirely different philosophy to congestion control. However, if we set aside the issue of buffer bloat, I would imagine packet loss is a bad congestion signal in this situation because most loss-based congestion control algorithms use it as a binary signal with a binary response (back-off or no back-off). In other words, I feel the blame must be placed on not just the congestion signal, but also on how most algorithms respond to this congestion signal.
>
>         Fair enough, but even if we assume a capacity based loss we really do not know:
> a) did the immediate traffic simply exceed the bottleneck's queue (assuming a fixed egress capacity/rate)
> b) did the immediate traffic simply exceed the bottleneck's egress capacity (think variable rate link that just dropped in rate, while traffic rate was constant)
>
> In case a) we might be OK with doing a gentle reduction (and take a bit to do so) in case b) we probably should be doing a less gentle reduction and preferably ASAP.
>
> Agreed. And that's the approach that BBRv2 takes; it would behave differently in the two cases. In case (a) it would essentially notice that packets are being dropped and yet the delivery rate remains high, so would infer that in-flight is too high but the estimated bandwidth seems OK, so it would immediately reduce the cwnd slightly but maintain the pacing rate.

        [SM] Showing my confusion here: will reducing the cwnd not result in a reduced pacing rate at one point? Or are we talking about the immediate response here and not the (slightly) longer term average?

Good question. In BBR (v1 and v2) the cwnd and pacing rate are largely independent. In the BBR model the pacing rate is not computed using (pacing_rate = k * cwnd / srtt), as is the practice for most CC modules in Linux TCP, for example (including CUBIC and Reno, if those are used with pacing from the fq qdisc). So I do mean that the cwnd would not, by itself result in a reduced pacing rate. The cwnd might indirectly result in a reduced pacing rate if the lower cwnd causes a lower delivery rate.
 
> In case (b) it would notice that the loss rate is high and delivery rate has reduced substantially, so would immediately and substantially reduce both the cwnd and pacing rate.

        [SM] But to notice a high loss rate, will we not have to wait and withhold our response (a bit) longer, or are we talking about DupACKs showing more than one segment missing? (Both can be fine and used in conjunction, I just wonder what you had in mind here)

The BBRv2 code makes the decision about the amount to reduce cwnd and pacing rate at the end of the first round trip of loss recovery (and at each round trip boundary beyond that if the recovery lasts multiple rounds). So the response is indeed slightly delayed.
 

> >
> > On a per-packet basis, packet loss is a binary signal. But over a window, the loss percentage and distribution, for example, can be a rich signal. There is probably scope for differentiating between different kinds of packet losses
>
>         Sure, as long as a veridical congestion detection is still timely enough not to make case b) above worse...
>
> Agreed.

> > (and deciding how to react to them) when packet loss is coupled with the most recent delay measurement too.
>
>         Hmm, say we get a "all is fine" delay probe at time X, at X+1 the capacity drops to 50% and we incur a drop, will the most recent delay data actually be informative for the near future?
>
> Usually it takes an ACK (a dupack or ACK carrying a SACK block) ACKing data that transited the network path *after* the loss to infer the loss (consistent with the RACK philosophy), and that ACK will usually provide a delay sample. So when there is loss usually there will be a delay signal that is at least as fresh as the loss signal, providing a hint about the state of the bottleneck queue after the loss. So even with loss I'd imagine that using that most recent delay data should usually be informative about the near future.

        [SM] Thanks, I think I was confusing the timing of the bandwidth probing steps with the latency measurements, thanks for clearing that up, and sorry... Yes, I agree that delay measurement is as good as it gets, and yes typically we should be able to extrapolate a bit into the future...

Thanks!

best regards,
neal

 
Many Thanks & Best Regards
Reply all
Reply to author
Forward
0 new messages