Why did the gain in the STARTUP phase be 2/ln(2)?

870 views

marywan...@gmail.com

Jun 7, 2018, 11:42:30 PM6/7/18
to BBR Development
Hi ALL!

I encountered a problem while reading the source code. I hope I can get replies. Thanks!
------

In the following statement, there is a magic number, as noted, which is 2/ln(2) :

static const int bbr_high_gain = BBR_UNIT * 2885/1000 + 1;

But I don't know how it came about.
I have read the paper of BBR and have the following explanation:

To handle the Internet link bandwidths span 12 orders of magnitude,Startup implements a binary search for BtlBw by using a gain of 2 / ln2 to double the sending rate while delivery rate is increasing. This discovers BtlBw in log2BDP RTTs but creates up to 2 BDP excessThe queue in the process.

But I still don't understand the details.Why not 3/ln(3), why not 2.5, why not e?

I think there might be a smooth curve that satisfies the startup constraint, such as a two-fold increase in speed per round.And then you take the derivative of this curve, and you get some value...

There must be a mathematical explanation behind this, but I don't know what it is.

Regards.

Neal Cardwell

Jun 8, 2018, 11:14:25 AM6/8/18
to marywan...@gmail.com, BBR Development
Hi,

The gain of 2/ln(2) (used for both pacing and cwnd in STARTUP mode) was derived as the gain required to allow the pacing rate (computed as pacing_gain * bw_sample) to exactly double each round trip. It arises because the pacing rate evolves as a function of time as 2^t, doubling every round trip, and the integral of 2^t is 2^t / ln(2).

cheers,
neal

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.

刘文斌

Jun 8, 2018, 1:11:59 PM6/8/18
to BBR Development

But why 2 ^ t integral can be as gain?

I can understand f (t) = (2 ^ t) of the independent variable t represents the RTT, so what is the physical meaning of the integral?

And why the minimum?

Jonathan Morton

Jun 8, 2018, 6:34:46 PM6/8/18
to Neal Cardwell, marywan...@gmail.com, BBR Development
> On 8 Jun, 2018, at 6:14 pm, 'Neal Cardwell' via BBR Development <bbr...@googlegroups.com> wrote:
>
> The gain of 2/ln(2) (used for both pacing and cwnd in STARTUP mode) was derived as the gain required to allow the pacing rate (computed as pacing_gain * bw_sample) to exactly double each round trip. It arises because the pacing rate evolves as a function of time as 2^t, doubling every round trip, and the integral of 2^t is 2^t / ln(2).

This would be true if the measured rate varies continuously and smoothly over time. However, it is actually the result of discrete measurement events, and I'm not aware of a smoothing function being applied.

In a simple simulation I just conducted assuming the latter (that is, the measured BW tracks the pacing rate with an RTT lag), I got a sharply stepped pacing rate, which averaged a gain-rate of very nearly 2x per RTT when the pacing gain was 2x, and a significant higher value when the ln(2) factor was included.

So I'm afraid a little more explanation is required.

- Jonathan Morton

marywan...@gmail.com

Jun 10, 2018, 10:22:23 PM6/10/18
to BBR Development
Hi~

I think it is ok to take 2 as the gain value, why take into account the smooth will pick up 2 ^ t points?And why the minimum?

Message has been deleted

marywan...@gmail.com

Jun 12, 2018, 5:05:34 AM6/12/18
to BBR Development
Hi neal,

I thought about it a little bit again. How about the following derivation?

The startup stage of "pacing rate" relationship with time fixed point has the following:
The rate of change of the "pacing rate" is exactly equal to the current "pacing rate"

so,
let x be the time,Assuming that the RTT is constant, use it to normalize:
the BDP evolves as a function of time as f1(x)=2^x
the pacing rate evolves as a function of time as f2(x)=2^x

Let g(x) be the derivative of f1(x):
g(x)=f1'(x)=ln2*2^x

the derivative of f1'(x) is the rate, so g(x) is pacing rate.

let G be the pacing gain:
g(x+1) = G*g(x) = G*ln2*2^x = f2(x+1) = 2*2^x

so G*ln2*2^x = 2*2^x ==> G = 2/ln2

best regards.

On Friday, 8 June 2018 23:14:25 UTC+8, Neal Cardwell wrote:

Neal Cardwell

Jun 15, 2018, 10:22:50 PM6/15/18
to BBR Development
Hi,

Thanks for the discussion on this question.

Here's our team's latest detailed, step-by-step analytic derivation and a discussion of a simulation evaluation for the theoretical value for Startup gain, using a simplified continuous model:
Hopefully this doc should provide a detailed enough answer.

Here's a quick diagram illustrating some different options for slow start and startup behavior:

And here's the code for a simulator mentioned in that doc. This simulator allows fast and deterministic exploration of different BBR gain values:

Of course the real world has discrete packets, TSO/GSO bursts, serialization delays, multiple flows, and various kinds of noise. So it does not quite fit an idealized model, particularly in the first few round trips. We are planning on running production tests to evaluate an appropriate strategy for Startup gain and other parameters using A/B tests in real YouTube and Google.com workloads.

cheers,
neal

marywan...@gmail.com

Jun 19, 2018, 2:29:08 AM6/19/18
to BBR Development
Hi,

The explanation is very detailed
Thanks.

But will future versions replace 2.89 with 2.77? Just because of strict doubling, right?

Neal Cardwell

Jun 19, 2018, 1:30:23 PM6/19/18
to marywan...@gmail.com, BBR Development
On Tue, Jun 19, 2018 at 2:29 AM <marywan...@gmail.com> wrote:
Hi,

The explanation is very detailed
Thanks.

You're welcome!

But will future versions replace 2.89 with 2.77? Just because of strict doubling, right?

We would need to run production experiments to assess the real-world impact before making a change like that. So as I mentioned in my most recent note, we are planning on running production tests to evaluate an appropriate strategy for Startup gain and other parameters using A/B tests in real YouTube and Google.com workloads.

I am not sure what you mean by "Just because of strict doubling, right?" Several thoughts come to mind here.

First, because BBR has a cwnd as well, and doubles its cwnd each round trip, the cwnd would keep it from exceeding a strict doubling per round trip time, in any cas2e.

Second, the growth rate per round trip time is not the only consideration. Another big issue is the congestion window and pacing behavior in the first few round trips. I have added to the bbr github repo a zoomed-in version of the graph that illustrates the first five round trips of slow-start and Startup. This graph shows that, although a gain of 2.77 or 2.89 can basically allow the in-flight data to double each round trip, there is a large difference in the overall behavior depending on how the cwnd and pacing implementation in the first few round "seed" the growth process.

Finally, as mentioned previously, the 2.77 comes from an idealized model, and  the real world has discrete packets, TSO/GSO bursts, serialization delays, multiple flows, and various kinds of noise. So real-world Startup does not quite fit an idealized model, particularly in the first few round trips.

cheers,
neal

Neal Cardwell

Jun 19, 2018, 1:32:02 PM6/19/18
to BBR Development
I also wanted to explicitly mention these updates:

(1) I added an updated version of the zoomed-out Slow-Start/Startup graph, with a fix for the unpaced CUBIC flow. Previously I had accidentally shown the result for an unpaced CUBIC flow with IW20. I have fixed the graph to show an unpaced CUBIC flow with IW10. With this fix the graph makes much more sense, and the result looks quite nice for BBR with IW32.

(2) I added a zoomed-in version of the graph that illustrates the first five round trips of slow-start and Startup. This graph shows that, although a gain of 2.77 or 2.89 can basically allow the in-flight data to double each round trip, there is a large difference in the overall behavior of the various approaches depending on how the cwnd and pacing implementation in the first few round "seed" the growth process.

cheers,
neal

marywan...@gmail.com

Jun 22, 2018, 1:05:07 AM6/22/18
to BBR Development
Thank you very much!

That's a great explanation
But can you tell me the reason for the change in gain from 2/ln2 to 4ln2 and the original derivation of 2/ln2.

Thanks again.

Best regards.
Message has been deleted

Neal Cardwell

Jun 22, 2018, 12:03:25 PM6/22/18
to marywan...@gmail.com, BBR Development
On Fri, Jun 22, 2018 at 1:05 AM <marywan...@gmail.com> wrote:
Thank you very much!

You are very welcome!

That's a great explanation

Thanks!

But can you tell me the reason for the change in gain from 2/ln2 to 4ln2 and the original derivation of 2/ln2.

The team member who did the original derivation that arrived at 2/ln(2)=2.89 did not have easy access to his notes from that original derivation (which was several years ago at this point). So our team undertook to reconstruct the derivation, and in this process we ended up with 4*ln(2)=2.77.

Note that the 2.77 gain is only 4% lower than the original 2.89. And the original 2.89 is indeed sufficient to double the pacing rate each round trip. It's just that, in an idealized system, the 2.77 gain appears to be the theoretical lower limit on a gain that can double the pacing rate each round trip, when calculating the pacing rate as a multiple of the estimated bandwidth.

cheers,
neal

赵亚

Jun 25, 2018, 8:02:17 PM6/25/18
to BBR Development
Oh,i see..

I see why gain is 2/ln2 or 4*ln2. Thank you very much!

But as you say "Another big issue is the congestion window and pacing behaviors in the first what round trips.".Why did pacing_rate start with (high_gain*init_cwnd/RTT) instead of (init_cwnd/RTT)?
I did several rounds of experiments (init_cwnd=4), and the tcptrace of the first round trip almost showed the following

The reason for "the window of silence" seems to be the coefficient high_gain.
If the initial pacing_rate did not gain multiplied by the coefficient high_gain, So it looks like the delivery of init_cwnd packets will be evenly distributed in one round trip, but on the other hand, when connecting the initial, the round trip time measurement has not been smooth, affected by the fluctuation is great, the round trip time is not reliable.

So what is the latest development in this area for the BBR team? Can this problem be solved now?

Best regards.
Auto Generated Inline Image 1

Jonathan Morton

Jun 25, 2018, 8:16:33 PM6/25/18
to 赵亚, BBR Development
> On 26 Jun, 2018, at 3:02 am, 赵亚 <maryw...@126.com> wrote:
>
> The reason for "the window of silence" seems to be the coefficient high_gain.

The gain is greater than unity in order to permit the bandwidth estimate to converge to the true path bandwidth as soon as possible. In a TCP that was entirely dependent on managing cwnd, you could more reasonably set the pacing rate to transmit the window over exactly one RTT.

The real reason for the "window of silence" is that the cwnd is not similarly scaled in this initial phase, and this initial cwnd is still respected by BBR. You can actually see smaller silences, with the same cause, after the packet pairs resulting from the first few acks. Later on, the cwnd ceases to be a constraint under normal conditions, as long as the bandwidth estimate matches reality on the wire.

- Jonathan Morton

赵亚

Jun 25, 2018, 8:56:19 PM6/25/18
to BBR Development

On Tuesday, 26 June 2018 08:16:33 UTC+8, Jonathan Morton wrote:
> On 26 Jun, 2018, at 3:02 am, 赵亚 <maryw...@126.com> wrote:
>
> The reason for "the window of silence" seems to be the coefficient high_gain.

The gain is greater than unity in order to permit the bandwidth estimate to converge to the true path bandwidth as soon as possible.  In a TCP that was entirely dependent on managing cwnd, you could more reasonably set the pacing rate to transmit the window over exactly one RTT.

Yes, it is
I may not have made myself clear before.What I mean is that "only the initial of the first round trip of the initial pacing_rate calculation is not multiplied by the coefficient of high_gain", because this time has not yet formed ACK-clock,  and I think this time need to use A uniform sending rate, since when, after receipt of ACK calculation pacing_rate normal when multiplied by the coefficient of high_gain.
What do you think?

The real reason for the "window of silence" is that the cwnd is not similarly scaled in this initial phase, and this initial cwnd is still respected by BBR.  You can actually see smaller silences, with the same cause, after the packet pairs resulting from the first few acks.  Later on, the cwnd ceases to be a constraint under normal conditions, as long as the bandwidth estimate matches reality on the wire.

- Jonathan Morton

Thank you

Best regards.

Neal Cardwell

Jun 26, 2018, 10:48:16 AM6/26/18
to maryw...@126.com, BBR Development
On Mon, Jun 25, 2018 at 8:56 PM 赵亚 <maryw...@126.com> wrote:

> I may not have made myself clear before.What I mean is that "only the
> initial of the first round trip of the initial pacing_rate calculation
> is not multiplied by the coefficient of high_gain", because this time
> has not yet formed ACK-clock, and I think this time need to use A
> uniform sending rate, since when, after receipt of ACK calculation
> pacing_rate normal when multiplied by the coefficient of high_gain.
> What do you think?

I confess that I don't understand the comment or your proposal.
Perhaps your idea would be more clear when expressed as a proposed
patch?

One thing to keep in mind here is that in STARTUP, every pacing rate
calculation *is* indeed scaled by high_gain:

(1) The initial pacing rate is set to high_gain * initial_cwnd /
initial_rtt_sample (in Linux TCP BBR, see
bbr_init_pacing_rate_from_rtt()).

(2) After that, candidate pacing rates are computed with: high_gain *
estimated_bandwidth. For any of those candidates that are higher than
the current pacing rate, we increase the pacing rate to the candidate
pacing rate (in Linux TCP BBR, see bbr_set_pacing_rate()).

There are two ways I like to think about (1):

(a) To bootstrap BBR we need some initial guess as to the bandwidth
that might be available along the network path. For traditional TCP
this guess is effectively initial_bw_guess = initial_cwnd /
initial_rtt_sample. So for BBR we use the same initial guess. The
result is that when BBR calculates an initial pacing rate with
high_gain * initial_bw_guess, we get: high_gain * initial_cwnd /
initial_rtt_sample.

(b) The Linux paced CUBIC implementation uses an initial pacing rate
of 2 * initial_cwnd / initial_rtt_sample (see
tcp_update_pacing_rate()). To avoid regressions vs paced CUBIC, BBR
needs to use something similar to or slightly higher than that
expression; that's another reason why BBR uses an initial pacing rate
of high_gain * initial_cwnd / initial_rtt_sample.

Another thing to keep in mind is that any attempt to always close that
initial gap of silence is likely to result in a doubling of the
latency for request/response traffic that has less than an initial
cwnd of data to send. Because smoothing out such flights of data will
cause the latency to expand from one round trip to two round trips. So
there are costs to a simple scheme that merely tries to eliminate that
gap: either a latency cost, or a complexity cost in the form of code
to adaptively estimate whether there is enough data in the send buffer
to warrant spreading out the initial paced flight.

cheers,
neal

Aug 26, 2021, 6:50:38 AM8/26/21
to BBR Development

cutesocks

Dec 19, 2023, 10:59:42 AM12/19/23
to BBR Development
(lowest gain that will allow the pacing rate to double each round trip)

but doesn't 3.465 [5ln(2)] instead of 2.77 [4ln(2)] give the exact 2x pacing_rate each round trip?

Neal Cardwell

Dec 19, 2023, 11:29:30 AM12/19/23
to cutesocks, BBR Development
On Tue, Dec 19, 2023 at 10:59 AM cutesocks <0xcut...@gmail.com> wrote:
(lowest gain that will allow the pacing rate to double each round trip)

As you probably saw, from our continuous model of Startup behavior at:
...with that model, the minimum pacing gain in Startup that would allow the sending rate to double is 4*ln(2) = 2.77.

Of course, that is a continuous model, and networking packets are discrete rather than continuous, so that model can only approximate the real behavior.

but doesn't 3.465 [5ln(2)] instead of 2.77 [4ln(2)] give the exact 2x pacing_rate each round trip?

As you allude to, with that simulator one can experimentally find values that are better than 2.77 at approximating this "minimum Startup pacing gain" value. Your value of 5*ln(2) ~= 3.465 is one such value. However, picking 3.465 is quite arbitrary; I don't see a particular reason to pick that value when we can easily find lower values that also allow full doubling each round trip. Since lower values will cause less network impact in terms of queuing and loss, I would argue that lower values that allow doubling are preferable to higher values that allow doubling.

For example, in simulations it seems 3.1 is not quite big enough to allow exact doubling of cwnd, but 3.14 is able to allow doubling of cwnd, at least out to 19 rounds:

) ./startup  3.14 | grep ROUND
ROUND: bw: 0.000x t: 0.100000 round: 1 cwnd:     20 pif:      0 bw: 100.000000 pacing_rate: 314.000000 release: 0.003185
ROUND: bw: 0.000x t: 0.200000 round: 2 cwnd:     21 pif:     19 bw: 100.000000 pacing_rate: 314.000000 release: 0.163694
ROUND: bw: 2.000x t: 0.300000 round: 3 cwnd:     41 pif:     39 bw: 200.000000 pacing_rate: 628.000000 release: 0.286334
ROUND: bw: 2.000x t: 0.400000 round: 4 cwnd:     81 pif:     79 bw: 400.000000 pacing_rate: 1256.000000 release: 0.397141
ROUND: bw: 2.000x t: 0.500000 round: 5 cwnd:    161 pif:    159 bw: 800.000000 pacing_rate: 2512.000000 release: 0.499777
ROUND: bw: 2.000x t: 0.600000 round: 6 cwnd:    321 pif:    319 bw: 1600.000000 pacing_rate: 5024.000000 release: 0.599964
ROUND: bw: 2.000x t: 0.700000 round: 7 cwnd:    641 pif:    639 bw: 3200.000000 pacing_rate: 10048.000000 release: 0.699964
ROUND: bw: 2.000x t: 0.800000 round: 8 cwnd:   1281 pif:   1279 bw: 6400.000000 pacing_rate: 20096.000000 release: 0.799964
ROUND: bw: 2.000x t: 0.900000 round: 9 cwnd:   2561 pif:   2559 bw: 12800.000000 pacing_rate: 40192.000000 release: 0.899965
ROUND: bw: 2.000x t: 0.999999 round: 10 cwnd:   5121 pif:   5119 bw: 25610.000000 pacing_rate: 80415.400000 release: 0.999965
ROUND: bw: 2.000x t: 1.099999 round: 11 cwnd:  10241 pif:  10239 bw: 51200.000000 pacing_rate: 160768.000000 release: 1.099965
ROUND: bw: 2.000x t: 1.199999 round: 12 cwnd:  20481 pif:  20479 bw: 102400.000000 pacing_rate: 321536.000000 release: 1.199965
ROUND: bw: 2.000x t: 1.299999 round: 13 cwnd:  40961 pif:  40959 bw: 204800.000000 pacing_rate: 643072.000000 release: 1.299965
ROUND: bw: 2.000x t: 1.399999 round: 14 cwnd:  81921 pif:  81919 bw: 409600.000000 pacing_rate: 1286144.000000 release: 1.399965
ROUND: bw: 2.000x t: 1.499999 round: 15 cwnd: 163842 pif: 163838 bw: 819210.000000 pacing_rate: 2572319.400000 release: 1.499965
ROUND: bw: 2.000x t: 1.599999 round: 16 cwnd: 327683 pif: 327677 bw: 1638410.000000 pacing_rate: 5144607.400000 release: 1.599965
ROUND: bw: 2.000x t: 1.699999 round: 17 cwnd: 655366 pif: 655354 bw: 3276830.000000 pacing_rate: 10289246.200000 release: 1.699965
ROUND: bw: 2.000x t: 1.799999 round: 18 cwnd: 1310731 pif: 1310709 bw: 6553650.000000 pacing_rate: 20578461.000000 release: 1.799965
ROUND: bw: 2.000x t: 1.899999 round: 19 cwnd: 2621461 pif: 2621419 bw: 13107300.000000 pacing_rate: 41156922.000000 release: 1.899965

Since it is somewhat cute that 3.14 ~= pi (𝜋) works for this, if we are going to pick values via simulation, then I would rather go with 𝜋 than 5*ln(2) ~= 3.465. :-)

However, I imagine it would be even better to pick a number that works well not only in simulation but has also been validated in real-world networking tests...

best regards,
neal

On Thursday, August 26, 2021 at 6:50:38 PM UTC+8 muhamm...@umt.edu.pk wrote:

On Friday, June 8, 2018 at 8:42:30 AM UTC+5 maryw...@126.com wrote:
Hi ALL!

I encountered a problem while reading the source code. I hope I can get replies. Thanks!
------

In the following statement, there is a magic number, as noted, which is 2/ln(2) :

static const int bbr_high_gain = BBR_UNIT * 2885/1000 + 1;

But I don't know how it came about.
I have read the paper of BBR and have the following explanation:

To handle the Internet link bandwidths span 12 orders of magnitude,Startup implements a binary search for BtlBw by using a gain of 2 / ln2 to double the sending rate while delivery rate is increasing. This discovers BtlBw in log2BDP RTTs but creates up to 2 BDP excessThe queue in the process.

But I still don't understand the details.Why not 3/ln(3), why not 2.5, why not e?

I think there might be a smooth curve that satisfies the startup constraint, such as a two-fold increase in speed per round.And then you take the derivative of this curve, and you get some value...

There must be a mathematical explanation behind this, but I don't know what it is.

Regards.

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.