Comments on new BBRv2 draft (draft-cardwell-iccrg-bbr-congestion-control-01)

238 views
Skip to first unread message

Adrian Z

unread,
Nov 16, 2021, 6:58:13 AM11/16/21
to BBR Development
Hi,

First of all, thanks for posting the draft for BBRv2! It's very appreciated.

I've got a few questions about it:

-According to the draft, pacing_gain in Startup is 4*ln(2) ~= 2.773, which is the "new" value from the derivation from 2018 [1]. The code still uses the "old" value 2/ln(2) ~= 2.885. I suppose the code is simply "outdated" here and future versions of BBRv2 will use 2.773?

-What is cwnd_gain in Startup and Drain? According to the draft (page 19, Section 4.3.1.1 and the "control behavior" table on page 51, Section 4.6.1) it is 2. But in the code, it is the same as pacing_gain (2.885 or 2.773).

-On page 26, Section 4.3.3.5.2, the draft says "T_reno_bound = pick_randomly_either({62, 63})" and then never uses T_reno_bound again. Right after defining it, it says "T_reno = min(BDP in packets, 62) round trips" - should this 62 be T_reno_bound instead? Accordingly, on page 28, Section 4.3.3.5.3, it says "rounds = min(reno_rounds, 63)" (copied from the source code, where BBRv2 also just uses the fixed value 63 without any randomization). Should this 63 be the randomized T_reno_bound, too? Is it even worth randomizing this when you merely pick one out of {62, 63}?

-Also regarding page 26, Section 4.3.3.5.2, isn't T_reno = min( min(bdp, cwnd), 62 or 63) ? (cf. code lines 1748-1750)

-Startup exits if the loss rate exceeds its 2% threshold, but only if there where at least bbr_full_loss_cnt loss events in the current round. The draft states bbr_full_loss_cnt = 3 (page 21, Section 4.3.1.3), but in the code bbr_full_loss_cnt = 8. Which value is more up-to-date?

-On page 24, Section 4.3.3.4, the draft explains how a flow behaves in ProbeBW:Up "If the flow has not set BBR.inflight_hi or BBR.bw_hi". How can bw_hi not be set in ProbeBW:Up?

-The "control behavior" table on page 51, Section 4.6.1 states that pacing_gain in Drain is 0.5, but isn't it the multiplicative inverse of its value in Startup? [2]

-The description of ProbeBW:Down on page 22, Section 4.3.3.1 seems to miss the part of the exit condition that checks whether inflight < estimated BDP. The code as well as the code sample on page 30 of the draft include this check.



Best regards
Adrian

Neal Cardwell

unread,
Nov 16, 2021, 12:01:01 PM11/16/21
to Adrian Z, BBR Development
On Tue, Nov 16, 2021 at 6:58 AM Adrian Z <adria...@gmail.com> wrote:
Hi,

First of all, thanks for posting the draft for BBRv2! It's very appreciated.

Thanks for the careful reading and excellent comments/questions! I will add your name to the Acknowledgments section of the draft; if you are comfortable sharing your last name I can add that as well.

 
I've got a few questions about it:

-According to the draft, pacing_gain in Startup is 4*ln(2) ~= 2.773, which is the "new" value from the derivation from 2018 [1]. The code still uses the "old" value 2/ln(2) ~= 2.885. I suppose the code is simply "outdated" here and future versions of BBRv2 will use 2.773?

Yes, the draft reflects the theory and the plan,  and the code is in transition and still using the old value. There are ongoing production A/B experiments and the plan was to update the code once those experiments  have established that the 2.773 value yields good results in practice in production and not just in theory.
 

-What is cwnd_gain in Startup and Drain? According to the draft (page 19, Section 4.3.1.1 and the "control behavior" table on page 51, Section 4.6.1) it is 2. But in the code, it is the same as pacing_gain (2.885 or 2.773).

Here again (for cwnd_gain in Startup and Drain) the code is outdated. Same situation as the pacing_gain in Startup.
 
-On page 26, Section 4.3.3.5.2, the draft says "T_reno_bound = pick_randomly_either({62, 63})" and then never uses T_reno_bound again. Right after defining it, it says "T_reno = min(BDP in packets, 62) round trips" - should this 62 be T_reno_bound instead?

Yes, thanks for catching that. I have updated the draft from:
T_reno = min(BDP in packets, 62) round trips
to:
T_reno = min(BDP in packets, T_reno_bound) round trips
 
Accordingly, on page 28, Section 4.3.3.5.3, it says "rounds = min(reno_rounds, 63)" (copied from the source code, where BBRv2 also just uses the fixed value 63 without any randomization). Should this 63 be the randomized T_reno_bound, too? Is it even worth randomizing this when you merely pick one out of {62, 63}?

The "rounds = min(reno_rounds, 63)" is intentional. Even though the 63 is a constant, the dynamics are still randomized, because the initialization of BBR.rounds_since_bw_probe is randomly picked to be 0 or 1. In our experience having even one round trip of randomization can be important for fairness convergence; otherwise low-bandwidth flows can probe for bandwidth in lockstep with high-bandwidth flows, causing the low-bandwidth flows to be unable to effectively probe bandwidth and converge upward to their fair share.
 
-Also regarding page 26, Section 4.3.3.5.2, isn't T_reno = min( min(bdp, cwnd), 62 or 63) ? (cf. code lines 1748-1750)

Yes, thanks. I have tightened up:
  • T_reno = min(BDP in packets, T_reno_bound) round trips

to:
  • reno_bdp = min(BBR.bdp, cwnd)

  • T_reno = min(reno_bdp, T_reno_bound) round trips


-Startup exits if the loss rate exceeds its 2% threshold, but only if there where at least bbr_full_loss_cnt loss events in the current round. The draft states bbr_full_loss_cnt = 3 (page 21, Section 4.3.1.3), but in the code bbr_full_loss_cnt = 8. Which value is more up-to-date?

Here again (bbr_full_loss_cnt) the draft reflects the theory and the plan, and the code is in transition and still using the old value.
 
-On page 24, Section 4.3.3.4, the draft explains how a flow behaves in ProbeBW:Up "If the flow has not set BBR.inflight_hi or BBR.bw_hi". How can bw_hi not be set in ProbeBW:Up?

The inflight_hi and bw_hi values will not be set in ProbeBW_UP if the flow has never met the conditions for packet loss or ECN setting inflight_hi or bw_hi.
  
-The "control behavior" table on page 51, Section 4.6.1 states that pacing_gain in Drain is 0.5, but isn't it the multiplicative inverse of its value in Startup? [2]

As noted in [2] the pacing_gain in Drain is the multiplicative inverse of the cwnd_gain in Startup, rather than the pacing_gain in Startup. Basically, how slowly the flow needs to send to try to drain the queue is a function of the estimated size of the queue, which is a function of the cwnd_gain rather than the pacing_gain.
 
-The description of ProbeBW:Down on page 22, Section 4.3.3.1 seems to miss the part of the exit condition that checks whether inflight < estimated BDP. The code as well as the code sample on page 30 of the draft include this check.

Thanks for catching the inconsistency. At the time I wrote the prose in the draft I was thinking that we'd have to give up on the "drain_to_target" mechanism, so omitted it from the prose, but forgot to go back to the pseudocode snippet and switch that. Since then I think I see how we can keep the "drain_to_target" mechanism, so I have attempted to change both the prose and the pseudocode snippet to reflect that.

We will post an update of the draft with the updates above as soon as we get a chance. Thanks!

Many thanks!
neal
 
--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/b73502f7-4fc1-4f19-9881-14462fd39e62n%40googlegroups.com.

Adrian Z

unread,
Nov 18, 2021, 2:26:44 AM11/18/21
to BBR Development
Hi Neal,

>Thanks for the careful reading and excellent comments/questions! I will add your name to the Acknowledgments section of the draft; if you are comfortable sharing your last name I can add that as well.
Cool, thank you. My full name is Adrian Zapletal.

>The "rounds = min(reno_rounds, 63)" is intentional. Even though the 63 is a constant, the dynamics are still randomized, because the initialization of BBR.rounds_since_bw_probe is randomly picked to be 0 or 1. In our experience having even one round trip of randomization can be important for fairness convergence; otherwise low-bandwidth flows can probe for bandwidth in lockstep with high-bandwidth flows, causing the low-bandwidth flows to be unable to effectively probe bandwidth and converge upward to their fair share.
That's interesting!

>Here again (bbr_full_loss_cnt) the draft reflects the theory and the plan, and the code is in transition and still using the old value.
Could you maybe share some insights on why you chose 3 for bbr_full_loss_cnt?

>The inflight_hi and bw_hi values will not be set in ProbeBW_UP if the flow has never met the conditions for packet loss or ECN setting inflight_hi or bw_hi.
I think I misunderstood how bw_hi works. From the code, it seems like bw_hi keeps track of the recent maximum bandwidth at all times (via bbr2_take_bw_hi_sample()), in which case it is always set to some value in ProbeBW_UP. When loss/ECN is too high, BBRv2 seems to set inflight_hi explicitly, but not bw_hi.
In the draft, it looks like bw_hi gets set upon loss similar to inflight_hi, and max_bw is the value that records the maximum bandwidth at all times. So basically there is a little mismatch between the code and the draft. Is that correct?

Best,
Adrian

Neal Cardwell

unread,
Mar 7, 2022, 2:28:51 PM3/7/22
to Adrian Z, BBR Development
On Thu, Nov 18, 2021 at 2:26 AM Adrian Z <adria...@gmail.com> wrote:
Hi Neal,

>Thanks for the careful reading and excellent comments/questions! I will add your name to the Acknowledgments section of the draft; if you are comfortable sharing your last name I can add that as well.
Cool, thank you. My full name is Adrian Zapletal.

Thanks! The next version of the draft should reflect that.
 
>The "rounds = min(reno_rounds, 63)" is intentional. Even though the 63 is a constant, the dynamics are still randomized, because the initialization of BBR.rounds_since_bw_probe is randomly picked to be 0 or 1. In our experience having even one round trip of randomization can be important for fairness convergence; otherwise low-bandwidth flows can probe for bandwidth in lockstep with high-bandwidth flows, causing the low-bandwidth flows to be unable to effectively probe bandwidth and converge upward to their fair share.
That's interesting!

>Here again (bbr_full_loss_cnt) the draft reflects the theory and the plan, and the code is in transition and still using the old value.
Could you maybe share some insights on why you chose 3 for bbr_full_loss_cnt?

The idea was to have a heuristic to try to filter out short-term/bursty congestion and only exit Startup if the bottleneck is sufficiently full on a sustained basis. As with the rest of the BBR v2alpha code, this is still open for discussion/iteration if there are better ideas.
 
>The inflight_hi and bw_hi values will not be set in ProbeBW_UP if the flow has never met the conditions for packet loss or ECN setting inflight_hi or bw_hi.
I think I misunderstood how bw_hi works. From the code, it seems like bw_hi keeps track of the recent maximum bandwidth at all times (via bbr2_take_bw_hi_sample()), in which case it is always set to some value in ProbeBW_UP. When loss/ECN is too high, BBRv2 seems to set inflight_hi explicitly, but not bw_hi.
In the draft, it looks like bw_hi gets set upon loss similar to inflight_hi, and max_bw is the value that records the maximum bandwidth at all times. So basically there is a little mismatch between the code and the draft. Is that correct?

Sorry, the Internet draft is using names based on our current internal version of the BBRv2 code, and we have had an internal patch that has renamed some of the variables related to bandwidth, so reading the draft along with the v2alpha code may currently be a little confusing. The bw_hi in the current v2alpha code corresponds to max_bw in the draft-cardwell-iccrg-bbr-congestion-control text. I'll try to update the v2alpha code ASAP to make this less confusing. Thanks for pointing this out!

Thanks for reading the draft and code so carefully!

Sorry about the delayed response to this thread! I just noticed this in my inbox as I was searching for something else.

best,
neal

 
Reply all
Reply to author
Forward
0 new messages