How dose the loss-based congestion control cause the bufferbloat problem?

826 views
Skip to first unread message

Tao X

unread,
Nov 5, 2016, 4:01:03 AM11/5/16
to BBR Development
From the patch's description, I read the content below, but it's still confused to me.
Could anybody give me more explanation?
Thanks a lot!

On today's Internet, loss-based congestion control causes the infamous
bufferbloat problem, often causing seconds of needless queuing delay,
since it fills the bloated buffers in many last-mile links.

Jonathan Morton

unread,
Nov 5, 2016, 5:38:02 AM11/5/16
to Tao X, BBR Development
Simply because loss does not typically begin to occur until the buffers are already full. Since the buffers can often hold many seconds worth of data…

This applies to dumb FIFO buffers, which are the most common type at the moment. There are various types of AQM (Active Queue Management) which do better. BBR, however, is designed to work well even with dumb FIFOs.

- Jonathan Morton

Tao X

unread,
Nov 5, 2016, 5:55:41 AM11/5/16
to BBR Development, g.xi...@gmail.com
Thank you, Jonathan.

As you said, BBR is designed to work well with the bufferbloat problem.
That indicates bufferbloat problem always exists since the switches, routers'' always hold data packets.

But the content I referred says 'loss-based congestion control CAUSES the infamous bufferbloat problem', and the reason 'since it fills the bloated buffers in many last-mile links'  is still confused!

Neal Cardwell

unread,
Nov 5, 2016, 9:13:00 AM11/5/16
to Tao X, BBR Development
On Sat, Nov 5, 2016 at 5:55 AM, Tao X <g.xi...@gmail.com> wrote:
> Thank you, Jonathan.
> As you said, BBR is designed to work well with the bufferbloat problem.

Yes, exactly. BBR avoids bufferbloat by building a model of the
network path that allows the sender to keep a reasonable amount of
data inflight.

> That indicates bufferbloat problem always exists since the switches,
> routers'' always hold data packets.

Bufferbloat doesn't always exist. Let's take the definition of
bufferbloat from the bufferbloat project site at
https://www.bufferbloat.net/projects/bloat/wiki/Introduction/ :

"Bufferbloat is the undesirable latency that comes from a router
or other network equipment buffering too much data."

So, from my perspective, to get bufferbloat you need two ingredients:

(1) a network bottleneck link with a buffer that is very deep relative
to the link's bandwidth, so that it *can* buffer to much data

(2) a congestion control algorithm (controlling the sending rate of
packets going through that link) that decides to use so much of that
buffer space that the buffer *does* buffer too much data, leading to
unacceptably high queuing latency at that bottleneck.

Many last-mile links in the Internet fulfill (1), and the dominant
loss-based congestion control algorithms in the Internet, Reno and
CUBIC, fulfill (2).

> But the content I referred says 'loss-based congestion control CAUSES the
> infamous bufferbloat problem', and the reason 'since it fills the bloated
> buffers in many last-mile links' is still confused!

Yes, "loss-based congestion control causes the infamous bufferbloat
problem", in the sense that it provides the second ingredient (2) in
the recipe above: it decides to fill all available buffer space.

And we can also say that loss-based congestion control "fills the
bloated buffers in many last-mile links". Here by "bloated" we mean
that the buffers in many last-mile links are excessively large, given
the bandwidth of those links.

I'd recommend one of the nice articles explaining bufferbloat. Here's
one good one:

Bufferbloat: Dark Buffers in the Internet
http://queue.acm.org/detail.cfm?id=2071893

Hope that helps to clarify that passage.

Thanks,
neal

Jonathan Morton

unread,
Nov 5, 2016, 9:29:00 AM11/5/16
to Tao X, BBR Development

> On 5 Nov, 2016, at 11:55, Tao X <g.xi...@gmail.com> wrote:
>
> That indicates bufferbloat problem always exists since the switches, routers'' always hold data packets.
>
> But the content I referred says 'loss-based congestion control CAUSES the infamous bufferbloat problem', and the reason 'since it fills the bloated buffers in many last-mile links' is still confused!

It might be more correct to say that the “bufferbloat problem” of induced delay is caused by the *interaction* of loss-based congestion control with large, dumb buffers.

All of the “conventional” TCP congestion control algorithms currently in use detect congestion only through loss. (AQMs can also use ECN signalling, but we’re talking about dumb FIFOs here.) However, we *perceive* congestion when the queue induces a lot of delay.

Delay-based congestion control has been tried before, and it works pretty well (see TCP Vegas) - but only if there aren’t any conventional loss-based flows sharing the link. If there are, they will fill up the buffers *anyway*, while the delay-based algorithms backs off and gets almost zero throughput. This effect can actually be seen within a single algorithm, if you look up Microsoft’s "Compound TCP”.

We say that loss-based congestion control “outcompetes” delay-based congestion control. For that reason, the latter has never caught on for general use, even though it’s objectively superior. It *is* used, in the form of LEDBAT, in uTP - *specifically* for the purpose of yielding to conventional traffic, and thus reducing the perceived network load of torrent traffic.

So, the bufferbloat problem exists with any loss-based congestion control algorithm, and would not exist if everyone used delay-based congestion control. We can therefore fairly say that loss-based congestion control is a direct cause of bufferbloat. It is not the *only* necessary condition, but it is necessary.

In other places, you will see statements that large buffers are the cause of bufferbloat (which, indeed, is where the term comes from). These work from the assumption that loss-based congestion control is in use (as is normally the case), and AQM is not (likewise).

In fact, AQM is usually applied on top of an underlying buffer which is quite large, but it acts to keep the queue almost empty on average, and (if it is the flow-isolating type) to reorder packets so that latency-sensitive packets experience less delay than with strict FIFO ordering.

- Jonathan Morton

Tao X

unread,
Nov 5, 2016, 11:16:53 AM11/5/16
to BBR Development, g.xi...@gmail.com
Thanks, neal
After reading the bufferbloat project's wiki, the problem becomes more clear to me.

I've never got deep in this problem before.
Actually, it was no a problem in my mind.

Through your explanation, I know more about the network and will pay more attention on it.

Thansk a lot!
xtao

Tao X

unread,
Nov 5, 2016, 11:31:48 AM11/5/16
to BBR Development, g.xi...@gmail.com
Thank you Jonathan for you further information.

It helps a lot for me.
----
Now, in my opinion, the delay-based congestion control algorithms will be very sensitive when bufferbloat happening, then they slow down immediately to let the link become normal.
About the loss-based congestion control algorithms, bufferbloat is trying to not loss packet, just buffer them, so, the algorithms can't take action ASAP, finally, the more packets buffering, the worse of network will be.

Is that right?

Thanks!
xtao

Neal Cardwell

unread,
Nov 6, 2016, 2:40:31 PM11/6/16
to Tao X, BBR Development
Thanks, Jonathan. A very nice summary.

On the topic of "delay-based" congestion control, I just wanted to add a bit
more commentary. As has been discussed on the list previously, BBR is not
really "delay-based", at least in the traditional sense (like Vegas, Timely,
CDG). BBR is not based on backing off in response to a single signal like loss
or RTT increases. If BBR could be said to be "based" on any one thing, it's
"model-based": it has a model of the network with two parameters: bottleneck
bandwidth and round-trip propagation time. So delay increases do not always
lead to a slower sending rate or lower volume of data in flight. For example,
if the round-trip propagation delay increases but the bandwidth stays constant,
BBR can actually increase the amount of data in flight in order to achieve its
fair share of the bandwidth available in the longer pipe.

Because of this, while Vegas tends to be beat by CUBIC at almost any buffer
depth, if the bottleneck buffer is "reasonably sized" (say, up to roughly
2*BDP) then BBR will tend to at least match CUBIC/Reno throughput.

To illustrate this, consider the the goodput for two 60-second flows over an
emulated 10 Mbit/sec link with 40ms RTT, with buffers of various sizes
(expressed as a fraction of BDP). Here we consider two cases:

(1) 1 CUBIC flow vs. 1 Vegas flow:

buf  CUBIC  Vegas
---- -----  -----
0.5   6.81   2.73
1     8.49   1.06
2     9.12   0.43
4     9.15   0.4


(2) 1 CUBIC flow vs. 1 BBR flow:

buf  CUBIC  BBR
---- -----  -----
0.5   1.74   7.67
1     1.71   7.7
2     4.79   4.72
4     6.12   3.43

If the buffer is bloated (say, bigger than 4*BDP), then so far BBR team's
philosophy has been to not pour gasoline on a fire by bloating the queue still
further to try to gain a larger share of bandwidth. That said, if need be then
we could put heuristics in place to estimate when BBR is in such situations, and
behave like Reno or CUBIC.

neal

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tao X

unread,
Nov 22, 2016, 3:14:12 AM11/22/16
to BBR Development, g.xi...@gmail.com
Hi, neal

I'm interested in your test results.
Could you provide the "UNIT" of the values in your test results?(I guess it's Mbps on CUBIC/Vegas/BBR's values?)
And, what testing tools did you use to get the results?

--------------------------------------------

On the question of "what is BBR based on?"

In my comprehension, BBR can be said as "BDP-based", due to BBR always 
calculates the bandwidth and RTT, these two factors result in BDP.

BTW, according to the BBR's paper's title, it's congestion-based.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.

Neal Cardwell

unread,
Nov 22, 2016, 2:17:30 PM11/22/16
to Tao X, BBR Development
On Tue, Nov 22, 2016 at 3:14 AM, Tao X <g.xi...@gmail.com> wrote:
> Hi, neal
>
> I'm interested in your test results.
> Could you provide the "UNIT" of the values in your test results?(I guess
> it's Mbps on CUBIC/Vegas/BBR's values?)

Yes, those numbers are in Mbps.

> And, what testing tools did you use to get the results?

Those tests used netperf for traffic and netem for network emulation.

> On the question of "what is BBR based on?"
>
> In my comprehension, BBR can be said as "BDP-based", due to BBR always
> calculates the bandwidth and RTT, these two factors result in BDP.

Yes, saying BBR is "BDP-based" would be one way to capture a large
aspect of BBR's behavior.

As the BBR paper in ACM Queue discusses, there are two conditions that
BBR tries to meet, in order to achieve high throughput and low delay:

A connection runs with the highest throughput and lowest delay
when (rate balance) the bottleneck packet arrival rate equals BtlBw,
and (full pipe) the total data in flight is equal to the BDP
(= BtlBw × RTprop).

So yes, BBR tries to operate near the BDP (= BtlBw × RTprop), to meet
the "full pipe" condition.

But BBR also tries to meet the "rate balance" condition, by pacing at
or very near the bottleneck bandwidth most of the time.

> BTW, according to the BBR's paper's title, it's congestion-based.

To quote the BBR paper in ACM Queue again:

Congestion is just sustained operation to the right of the BDP line,
and congestion control is some scheme to bound how far to the right
a connection operates on average.

So you might say that BBR is "congestion-based" in the sense that it
explicitly tries to bound how far above the BDP the sending flows
operate. In fact in BBR inflight is explicitly bounded to 2*BDP, and
in practice BBR's pacing gain cycling algorithm can often keep
inflight closer to 1*BDP in many cases. And we're working now on
expanding the set of cases in which inflight is closer to 1*BDP than
2*BDP.

This is in contrast to loss-based congestion control, which reacts to
packet losses, which can happen much later than when congestion occurs
(bufferbloat, caused by deep FIFO buffers) or much earlier than when
congestion occurs (in high-speed WAN traffic going through
shallow-buffered switches).

neal

Shiyao Ma

unread,
Dec 26, 2016, 8:06:16 AM12/26/16
to BBR Development
Hi, Neal.

In your simulation, you said netem is used.  

I am interested how you managed to chain two classless qdisc, namely fq and netem.

On my box, doing a:
tc qdisc add dev SOME-ETH root handle 1: fq
tc qdisc add dev SOME-ETH  parent 1:  handle 2: netem delay 10ms

will ouput: RTNETLINK answers: Operation not supported.

Jonathan Morton

unread,
Dec 26, 2016, 9:46:08 AM12/26/16
to Shiyao Ma, BBR Development

> On 26 Dec, 2016, at 15:06, Shiyao Ma <i...@introo.me> wrote:
>
> I am interested how you managed to chain two classless qdisc, namely fq and netem.

There are two reasonable approaches that are often used:

The most common is to use a separate machine from both endpoints for “network emulation” purposes. This results in the classic “dumbbell” topology, which has the virtue of leaving the endpoints ignorant and independent of the network being emulated.

If you don’t have a spare machine, you could instead run netem on ingress, rather than on egress. To do this, you need to redirect ingress traffic to an IFB device, and attach netem to the IFB interface. I also suggest setting this up symmetrically, ie. on the ingress of both endpoints, rather than on just one of them.

- Jonathan Morton

Neal Cardwell

unread,
Dec 26, 2016, 10:06:34 AM12/26/16
to Jonathan Morton, Shiyao Ma, BBR Development
Jonathan did a nice job of outlining the two easiest methods I'm aware of for network emulation with Linux.

I would also add that for Linux it's important to not try to do the network emulation on the sending machine, because the TCP Small Queues (TSQ) mechanism applies flow control for all the transmit-side queues on the sending machine, including the qdisc layer. So if netem is rate-shaping traffic on the sending machine, then TSQ will adapt to that, slowing the TCP transmit rate to limit the amount of data in the netem queue, and thus TSQ itself (rather than the congestion control module) will tend to become the controlling factor. Such a test would end up testing TSQ rather than congestion control. :-)

cheers,
neal


--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages