BBR quits STARTUP mode prematurely when rate is limited by receive window

Qian Li

unread,

Feb 17, 2024, 8:49:28 AMFeb 17

to bbr...@googlegroups.com

Hello,

The following (copied from tcp_bbr.c in Linux kernel) describes when BBR quits STARTUP. "BBR estimates that STARTUP filled the pipe if the estimated bw hasn't changed by at least bbr_full_bw_thresh (25%) after bbr_full_bw_cnt (3) non-app-limited rounds".

However, BBR has neglected rwnd-limited cases. A receiver may employ receive window tuning algorithms [1] or receiver side congestion control algorithms [2]. I am also working on a receiver side CCA – X.

I discovered that when X increases rate by less than 25% per RTT for 3 consecutive RTTS, BBR quits STARTUP even when cwnd is still well below BDP (only one BBR flow in the emulated network). This hurts performance seriously.

I have noticed that bbr_check_full_bw_reached() does not update full_bw_reached if rate is limited by application. It is preferrable that BBR also checks rwnd_limited cases by e.g. changing

from if (bbr_full_bw_reached(sk) || !bbr->round_start || rs->is_app_limited) return;

to if (bbr_full_bw_reached(sk) || !bbr->round_start || rs->is_app_limited || is_rwnd_limited) return;

Further, the Linux kernel's pacing algorithm sends out well spaced single packets when pacing rate is very low. This may help when bottleneck capacity is very small, but not when bottleneck capacity is sufficient but only rwnd/cwnd is small .

In the case of X, it starts from 2 segments per RTT and gradually increases rwnd to BDP. In a setup with 100 Mbps bottleneck and 200 ms rtt, when rwnd is 2 MSS, BBR sends singles packets with roughly 100 ms inter-packet spacing. Sometimes, a packet is sent nearly 100 ms after an ack is received. This breaks receiver side RTT estimation (used by dynamic right sizing and some delay based receiver side algorithms).

Because the unit of pacing is an SKB, we may probably need to ensure that there are at least two segments in an SKB when bottleneck capacity is sufficient even though pacing rate (determined by min(cwnd, rwnd)) is very low.

Please let me know if you would like to do any adaptations to BBR to support receiver side window tuning algorithms?

[1] Heesu Im et al. Receiver-Side TCP Countermeasure to Bufferbloat in Wireless Access Networks. In IEEE TRANSACTIONS ON MOBILE COMPUTING.

[2] M. Bagnulo et al. rledbat: receiver-driven low extra delay background transport for tcp. https://datatracker.ietf.org/doc/html/draft-irtf-iccrg-rledbat-03

rLEDBAT: receiver-driven Low Extra Delay Background Transport for TCP

This document specifies the rLEDBAT, a set of mechanisms that enable the execution of a less-than-best-effort congestion control algorithm for TCP at the receiver end.

datatracker.ietf.org

Best regards,

Qian

Neal Cardwell

unread,

Feb 17, 2024, 12:08:18 PMFeb 17

to Qian Li, bbr...@googlegroups.com

On Sat, Feb 17, 2024 at 8:49 AM Qian Li <li_qi...@hotmail.com> wrote:

Hello,

Thanks for your post!

The following (copied from tcp_bbr.c in Linux kernel) describes when BBR quits STARTUP. "BBR estimates that STARTUP filled the pipe if the estimated bw hasn't changed by at least bbr_full_bw_thresh (25%) after bbr_full_bw_cnt (3) non-app-limited rounds".

However, BBR has neglected rwnd-limited cases. A receiver may employ receive window tuning algorithms [1] or receiver side congestion control algorithms [2]. I am also working on a receiver side CCA – X.

I discovered that when X increases rate by less than 25% per RTT for 3 consecutive RTTS, BBR quits STARTUP even when cwnd is still well below BDP (only one BBR flow in the emulated network). This hurts performance seriously.

I have noticed that bbr_check_full_bw_reached() does not update full_bw_reached if rate is limited by application. It is preferrable that BBR also checks rwnd_limited cases by e.g. changing

from if (bbr_full_bw_reached(sk) || !bbr->round_start || rs->is_app_limited) return;

to if (bbr_full_bw_reached(sk) || !bbr->round_start || rs->is_app_limited || is_rwnd_limited) return;

Yes, we are aware of this issue. :-) In fact, early versions of BBR used an approach equivalent to the one you suggest, where receive window limits are treated as equivalent to limits in the sender application's rate of sending. This meant that, when it came to decisions about whether to exit STARTUP mode, if a flow was persistently receive-window limited then it would not leave STARTUP. The problem we found with that approach was that (at least with TCP) persistent receive-window limits (where the receive window is between, say, 1x and 2.5x BDP) are extremely common, and so it was common for BBR to stay in STARTUP for very long intervals, often the entire lifetime of connections. In those cases, due to the higher cwnd gain and pacing gain values in STARTUP, the result was persistently high queuing at the bottleneck – often a BDP or more of queue. This seemed undesirable. So, while I agree there is room for improvement in this area, I suspect the best solution is not as simple as this. :-) We welcome research studies about this question...

Further, the Linux kernel's pacing algorithm sends out well spaced single packets when pacing rate is very low. This may help when bottleneck capacity is very small, but not when bottleneck capacity is sufficient but only rwnd/cwnd is small .

In the case of X, it starts from 2 segments per RTT and gradually increases rwnd to BDP. In a setup with 100 Mbps bottleneck and 200 ms rtt, when rwnd is 2 MSS, BBR sends singles packets with roughly 100 ms inter-packet spacing. Sometimes, a packet is sent nearly 100 ms after an ack is received. This breaks receiver side RTT estimation (used by dynamic right sizing and some delay based receiver side algorithms).

Because the unit of pacing is an SKB, we may probably need to ensure that there are at least two segments in an SKB when bottleneck capacity is sufficient even though pacing rate (determined by min(cwnd, rwnd)) is very low.

FWIW, BBRv3 uses a TSO sizing approach that AFAICT corresponds to your suggestion: it uses a min TSO/GSO burst size of sysctl_tcp_min_tso_segs, which defaults to 2 segments per TSO/GSO burst:

https://github.com/google/bbr/blob/7542cc7c41c0492a0cdbeb77e295cbfdcd9f5e11/net/ipv4/tcp_bbr.c#L501

How does BBRv3 perform in your tests? (You can find the BBRv3 README here, with instructions on how to check it out and build and test it.)

best regards,

neal

Please let me know if you would like to do any adaptations to BBR to support receiver side window tuning algorithms?

[1] Heesu Im et al. Receiver-Side TCP Countermeasure to Bufferbloat in Wireless Access Networks. In IEEE TRANSACTIONS ON MOBILE COMPUTING.

[2] M. Bagnulo et al. rledbat: receiver-driven low extra delay background transport for tcp. https://datatracker.ietf.org/doc/html/draft-irtf-iccrg-rledbat-03

rLEDBAT: receiver-driven Low Extra Delay Background Transport for TCP

This document specifies the rLEDBAT, a set of mechanisms that enable the execution of a less-than-best-effort congestion control algorithm for TCP at the receiver end.

datatracker.ietf.org

Best regards,

Qian

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/SA1PR20MB5198E508162DA9A2DD439888B0532%40SA1PR20MB5198.namprd20.prod.outlook.com.

Qian Li

unread,

Feb 18, 2024, 4:12:55 AMFeb 18

to Neal Cardwell, bbr...@googlegroups.com

Hello Neal,

I apologize for my ignorance about BBR development history. Please see my inline responses below.

From: Neal Cardwell <ncar...@google.com>
Sent: Sunday, February 18, 2024 1:07 AM
To: Qian Li <li_qi...@hotmail.com>
Cc: bbr...@googlegroups.com <bbr...@googlegroups.com>
Subject: Re: [bbr-dev] BBR quits STARTUP mode prematurely when rate is limited by receive window

On Sat, Feb 17, 2024 at 8:49 AM Qian Li <li_qi...@hotmail.com> wrote:

Hello,

Thanks for your post!

The following (copied from tcp_bbr.c in Linux kernel) describes when BBR quits STARTUP. "BBR estimates that STARTUP filled the pipe if the estimated bw hasn't changed by at least bbr_full_bw_thresh (25%) after bbr_full_bw_cnt (3) non-app-limited rounds".

However, BBR has neglected rwnd-limited cases. A receiver may employ receive window tuning algorithms [1] or receiver side congestion control algorithms [2]. I am also working on a receiver side CCA – X.

I discovered that when X increases rate by less than 25% per RTT for 3 consecutive RTTS, BBR quits STARTUP even when cwnd is still well below BDP (only one BBR flow in the emulated network). This hurts performance seriously.

I have noticed that bbr_check_full_bw_reached() does not update full_bw_reached if rate is limited by application. It is preferrable that BBR also checks rwnd_limited cases by e.g. changing

from if (bbr_full_bw_reached(sk) || !bbr->round_start || rs->is_app_limited) return;

to if (bbr_full_bw_reached(sk) || !bbr->round_start || rs->is_app_limited || is_rwnd_limited) return;

Yes, we are aware of this issue. :-) In fact, early versions of BBR used an approach equivalent to the one you suggest, where receive window limits are treated as equivalent to limits in the sender application's rate of sending. This meant that, when it came to decisions about whether to exit STARTUP mode, if a flow was persistently receive-window limited then it would not leave STARTUP. The problem we found with that approach was that (at least with TCP) persistent receive-window limits (where the receive window is between, say, 1x and 2.5x BDP) are extremely common, and so it was common for BBR to stay in STARTUP for very long intervals, often the entire lifetime of connections. In those cases, due to the higher cwnd gain and pacing gain values in STARTUP, the result was persistently high queuing at the bottleneck – often a BDP or more of queue. This seemed undesirable. So, while I agree there is room for improvement in this area, I suspect the best solution is not as simple as this. :-) We welcome research studies about this question...

Thanks for the explanation. Now I understand why BBR quits STARTUP even when rate is limited by rwnd. Then I have a new question. Can BBR do such check in bbr_reset_probe_bw_mode()? For example.
if (is_rwnd_limited)
enters phase 0;
else
randomly choosing one phase (excluding phase 1);
I only skimmed BBR's specification and source code (version 1), so my understanding might be wrong. To me, the above code can prevent BBR from fixing its rate/cwnd to a constant for several RTTs (if it starts with a phase other than 0 or 1) while the receiver is increasing rwnd and there is still plenty of available bandwidth at the bottleneck.

Further, the Linux kernel's pacing algorithm sends out well spaced single packets when pacing rate is very low. This may help when bottleneck capacity is very small, but not when bottleneck capacity is sufficient but only rwnd/cwnd is small .

In the case of X, it starts from 2 segments per RTT and gradually increases rwnd to BDP. In a setup with 100 Mbps bottleneck and 200 ms rtt, when rwnd is 2 MSS, BBR sends singles packets with roughly 100 ms inter-packet spacing. Sometimes, a packet is sent nearly 100 ms after an ack is received. This breaks receiver side RTT estimation (used by dynamic right sizing and some delay based receiver side algorithms).

Because the unit of pacing is an SKB, we may probably need to ensure that there are at least two segments in an SKB when bottleneck capacity is sufficient even though pacing rate (determined by min(cwnd, rwnd)) is very low.

FWIW, BBRv3 uses a TSO sizing approach that AFAICT corresponds to your suggestion: it uses a min TSO/GSO burst size of sysctl_tcp_min_tso_segs, which defaults to 2 segments per TSO/GSO burst:

https://github.com/google/bbr/blob/7542cc7c41c0492a0cdbeb77e295cbfdcd9f5e11/net/ipv4/tcp_bbr.c#L501

How does BBRv3 perform in your tests? (You can find the BBRv3 README here, with instructions on how to check it out and build and test it.)

Thanks for the pointers. I will test BBR v3.

Best regards,

Qian

Qian Li

unread,

Feb 23, 2024, 9:03:17 AMFeb 23

to Neal Cardwell, bbr...@googlegroups.com

Hello again,

In my last email, I proposed to make PROBE_BW always start from phase 0 if a connection is limited by rwnd. I implemented my idea, and I discovered that this proposal can only mitigate the problem. Because BBR increases more slowly in the PROBE_BW mode than in the STARTUP mode, the convergence time (to BDP) is still longer than expected. This is more pronounced when a connection's RTT is long.

Then I implemented another idea of mine – BBR only quits STARTUP when rate increase is less than 25% AND estimated queuing delay is greater than T ms (set to 4 ms in my test) for three consecutive RTTs. This new proposal works fine with my receiver side CC. I also made a receiver always set its rwnd to 1.5 BDP for the lifetime of a connection. The modified BBR code also works well with this receiver. Finally, I tested the modified BBR with a normal receiver and it works as expected. I haven't tested it in other scenarios. The following is the pseudocode.

bw_thresh = (u64)qbbr->full_bw * qbbr_full_bw_thresh >> BBR_SCALE;

rtt_diff = current_rtt - min_rtt;

if (qbbr_max_bw(sk) < bw_thresh && rtt_diff > 4000) {

++qbbr->full_bw_cnt;

} else {

qbbr->full_bw = qbbr_max_bw(sk);

qbbr->full_bw_cnt = 0;

}

qbbr->full_bw_reached = qbbr->full_bw_cnt >= qbbr_full_bw_cnt;

In order to get quick initial results, in my implementation, current_rtt is simply tp->srtt_us >> 3. However, a different RTT filter can be used. Min_rtt is BBR's min_rtt plus transmission delay of a full-length segment. In a formal implementation, min_rtt can be the minimum RTT measured by all full-length segments (currently, BBR's RTT measurements include non full-length segments, which can make min_rtt estimate biased). This is necessary because when bottleneck capacity is low, transmission delay can be non-negligible.

Would you test my second proposal further and consider it as a bug fix if it works well in tested scenarios?

Best regards,

Qian

From: bbr...@googlegroups.com <bbr...@googlegroups.com> on behalf of Qian Li <li_qi...@hotmail.com>
Sent: Sunday, February 18, 2024 5:12 PM
To: Neal Cardwell <ncar...@google.com>

To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/PH7PR20MB518535CEFE76E8478C15EA20B0522%40PH7PR20MB5185.namprd20.prod.outlook.com.

Neal Cardwell

unread,

Feb 23, 2024, 2:35:45 PMFeb 23

to Qian Li, bbr...@googlegroups.com

Thanks for sharing your experiments!

> if (qbbr_max_bw(sk) < bw_thresh && rtt_diff > 4000) {

...

> Would you test my second proposal further and

> consider it as a bug fix if it works well in tested scenarios?

IMHO this magic constant of 4000 microseconds seems somewhat concerning. In many common low-RTT scenarios (e.g., an Ethernet LAN in a home / enterprise / datacenter ) we would never expect a flow to have an rtt_diff > 4000 us, but would still like this mechanism to be able to exit STARTUP....

best regards,

neal

Qian Li

unread,

Feb 25, 2024, 3:59:56 AMFeb 25

to Neal Cardwell, bbr...@googlegroups.com

Hello Neal,

Thanks for your response.

It was just a proof of concept experiment. And 4000 us was almost arbitrarily chosen. I used it because that based on my observations, a couple of milliseconds of difference between minimum RTT and current RTT is common (for emulated long RTT paths) when the network is not congested.

By using a positive threshold (4000 us), I just wanted to make sure that slight RTT increase caused by random factors other than congestion won't make BBR quit STARTUP.

However, you can experiment with other values. For example, you can probably replace 4000 with min(a * min_RTT, b), where a is a fractional number and b is a positive integer. I will leave the further investigation into this issue to you.

Best regards,

Qian

From: Neal Cardwell <ncar...@google.com>
Sent: Saturday, February 24, 2024 3:35 AM

Reply all

Reply to author

Forward