an update on BBR work at Google (from IETF 101 ICCRG)

411 views
Skip to first unread message

Neal Cardwell

unread,
Mar 27, 2018, 12:24:29 PM3/27/18
to BBR Development
Hi all,

Last Friday we gave a quick update on some recent BBR work at Google, at
the ICCRG session at IETF 101:

Slides:

https://datatracker.ietf.org/meeting/101/materials/slides-101-iccrg-an-update-on-bbr-work-at-google-00

Video:
https://www.youtube.com/watch?v=rHH9wFbms80&feature=youtu.be&t=52m09s

Comments/questions welcome.

thanks,
neal

Thanos Koutsaftis

unread,
Mar 31, 2018, 8:09:27 PM3/31/18
to BBR Development

Hi all,

 

Following on the update posted a couple days ago related to the delay variation scenario, we would like to share some interesting simulation results that we have on BBR’s performance.

Specifically, we wanted to compare the performance of BBR & CUBIC in links where we experience delay variance. In one of our scenarios, we use the netem tool on a wired connection between 2 hosts in order to produce delay with mean and variance following normal distribution.Below, we present the throughput achieved for different values of mean (link delay in the plot) and variance.

 


Overall, we see that increase in the delay variance results in throughput performance degradation. Note that these results were produced a few weeks ago without implementing the recently released update.

 

Do you think that this is related to the problem that was presented in the update and if yes, does the estimation of extra ACKED data solve this and related issues?  

 

Thank you,

Thanos Koutsaftis

Matt Mathis

unread,
Apr 1, 2018, 10:10:39 PM4/1/18
to apk...@nyu.edu, BBR Development
Yea, BBR makes a pretty explicit assumption that the path delay is mostly stationary.   It would be interesting to see how it performed with authentic examples of non-stationary delay, such as LEO.

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
       however our response must be carefully measured: 
            too strong would be hypocritical and risks spiraling out of control;
            too weak risks being mistaken for tacit approval.

Jonathan Morton

unread,
Apr 1, 2018, 10:56:53 PM4/1/18
to Matt Mathis, apk...@nyu.edu, BBR Development
> On 2 Apr, 2018, at 5:10 am, 'Matt Mathis' via BBR Development <bbr...@googlegroups.com> wrote:
>
> Yea, BBR makes a pretty explicit assumption that the path delay is mostly stationary. It would be interesting to see how it performed with authentic examples of non-stationary delay, such as LEO.

On that note, I think it's worth talking about what "normally distributed delay variance" actually means in this context, and why it's not the best idea ever to use it in a traffic model.

The normal distribution is defined by a mean and a variance. The variance is proportional to an interval in which 50% of the samples will fall - that is, the interquartile distance. However, that means 50% of samples will fall *outside* that interval, and in particular, 25% will fall below it. In fact, samples from a normal distribution can fall arbitrarily far from the mean; they merely become less probable as the distance increases.

Which means that as soon as you start using a normally-distributed delay model, you can get samples of *zero* delay (or less). The frequency with which these occur depends on the variance relative to the mean. Approximately 1 sample per thousand will fall more than 3 standard deviations below the mean, so at a standard deviation of 33% of the mean, you should expect zero-delay packets once a second at 1k pps (~15Mbps with standard MTU). At the same rate, you will get packets delayed more than twice the mean, which will dominate the congestion window mechanism unless replaced by retransmissions that are delayed less.

With a link rate of 1Gbps, you need 66kpps to fill the pipe, so to keep the probability of BBR's 20-second minRTT window holding a zero-delay sample below 50%, you need a 1/2640000 quantile, which I'll round to 3.5*10^-7. You get that at roughly 5 standard deviations below the mean, corresponding to a standard deviation of 20%. (Variance means something else.) But even with that criterion, BBR is spending half its time with a minimum-sized congestion window, which it can only increase slightly in the other half of that time due to occasional samples which come *close* to 5 standard deviations below the mean.

If you re-run this experiment, you should probably examine the region between 10-20% variance more closely, say in 1% steps, and observe how frequently very low-delay packets actually occur during the run. You should also explore other distribution models to find one that doesn't have such extremely long tails.

- Jonathan Morton

Klatsky, Carl

unread,
Apr 8, 2018, 6:04:40 PM4/8/18
to BBR Development
Thanks Neal & team for the BBR status update. Just one clarification question, on slide 11 delay_variation_budget variable is introduced. Is that being used as the proxy value to account for aggregated / decimated ACK conditions?

Regards,
Carl Klatsky
--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Neal Cardwell

unread,
Apr 8, 2018, 6:39:31 PM4/8/18
to Klatsky, Carl, BBR Development
On Sun, Apr 8, 2018 at 6:04 PM Klatsky, Carl <Carl_K...@comcast.com> wrote:
Thanks Neal & team for the BBR status update.  Just one clarification question, on slide 11 delay_variation_budget variable is introduced.  Is that being used as the proxy value to account for aggregated / decimated ACK conditions?

Yes, exactly. In the lines in slide 11 that say:

BBR v1
 cwnd ~= (pipe_budget) + (delay_variation_budget)
       = (bw*min_rtt)  + (bw*min_rtt)

...the delay_variation_budget is there as a sort of explanatory pseudo-variable to explain the model behind why the steady-state cwnd_gain in BBR thus far is 2 (where thus far cwnd ~= cwnd_gain * bw * min_rtt = 2 * bw * min_rtt). This is discussed a bit in the original ACM Queue piece that gives an overview of BBR:

Delayed and Stretched ACKs

Cellular, Wi-Fi, and cable broadband networks often delay and aggregate ACKs. When inflight is limited to one BDP, this results in throughput-reducing stalls. Raising ProbeBW's cwnd_gain to two allowed BBR to continue sending smoothly at the estimated delivery rate, even when ACKs are delayed by up to one RTT. This largely avoids stalls.

In going from a static delay_variation_budget of bw * min_rtt to including an explicit estimate of recent aggregation effects, the goal is to avoid such stalls in a larger variety of network scenarios. The main case we've seen where this helps significantly is network paths with a wifi hop and a min_rtt of a few milliseconds, so that the delay variations from the wifi link can often be much larger than the min_rtt itself.

cheers,
neal

Reply all
Reply to author
Forward
0 new messages