Some questions about BBRv2's behavior

88 views
Skip to first unread message

Adrian Z

unread,
Oct 22, 2021, 4:53:40 AM10/22/21
to BBR Development
Hi Neal,

I've been taking a detailed look at BBRv2 recently and I've got a few questions about its inner workings. It would be very kind if you could explain this.

1. If BBRv2 exits ProbeBW:Up due to the flow hitting a "hard ceiling" (i.e. loss/ECN too high), it exits the phase and sets inflight_hi = max(inflight, 0.7 * target_inflight) (code line 1880) where target_inflight is basically min(estimated_bdp, cwnd). Can we ever have cwnd < estimated_bdp? And do we not always have inflight >= estimated_bdp in ProbeBW:Up anyway? (Since that is what Refill achieves.) In what kind of situation can a flow set inflight_hi = 0.7 * target_inflight?

2. ProbeBW:Up can also exit if the previous Up saw a hard ceiling and we have reached inflight >= inflight_hi (code line 2068). Let's call this "risky probe condition". But then it seems like ProbeBW:Down transitions into ProbeBW:Refill if the previous Up exited due to the risky probe condition (code line 1918). After this comes another ProbeBW:Up, so exiting the previous one due to the risky probe condition did not change a thing because now we're probing anyway. The reason for this behavior is not obvious to me.

3. After ProbeRTT, BBRv2 resets the timer that initiates bandwidth probing. Why is this necessary? The code says "Raising inflight after PROBE_RTT may cause loss". Isn't that the case in most situations?

4. cwnd gets increased according to the level of ACK aggregation by 256 * extra_acked (code line 879). What does this factor of 256 do? Moreover, it seems to get capped at the number of packets sent within the last 100ms. Is there a rationale for why you've picked 100ms?

Best regards,
Adrian

Neal Cardwell

unread,
Oct 22, 2021, 10:55:52 AM10/22/21
to Adrian Z, BBR Development
On Fri, Oct 22, 2021 at 4:53 AM Adrian Z <adria...@gmail.com> wrote:
Hi Neal,

I've been taking a detailed look at BBRv2 recently and I've got a few questions about its inner workings. It would be very kind if you could explain this.

Thanks for the great questions! Some responses in-line below...
 
1. If BBRv2 exits ProbeBW:Up due to the flow hitting a "hard ceiling" (i.e. loss/ECN too high), it exits the phase and sets inflight_hi = max(inflight, 0.7 * target_inflight) (code line 1880) where target_inflight is basically min(estimated_bdp, cwnd).

Yes, for those reading this thread who do not have the code pulled up, that's in bbr2_handle_inflight_too_high() where the code says:

  bbr->inflight_hi = max_t(u32, rs->tx_in_flight,
                           (u64)bbr2_target_inflight(sk) *
                           (BBR_UNIT - beta) >> BBR_SCALE);

 
Can we ever have cwnd < estimated_bdp?

Good question. Yes, cwnd can be less than estimated_bdp after packet loss events. For the most dramatic example, RTO resets cwnd to 1, which generally is far below the estimated_bdp.
 
And do we not always have inflight >= estimated_bdp in ProbeBW:Up anyway? (Since that is what Refill achieves.) In what kind of situation can a flow set inflight_hi = 0.7 * target_inflight?

Good question. We do not always have tx_in_flight >= estimated_bdp in ProbeBW:UP. For example, if a flow restarts from idle, tx_in_flight may be just a few packets at the time the flow detects a loss. That is the kind of situation where the flow sets inflight_hi = 0.7 * target_inflight. It may sound like an unimportant corner case, but in earlier revisions of BBRv2 we saw this specifically happen in YouTube traces, and the incorporation of the max() with 0.7 * target_inflight was important in addressing that kind of scenario.

2. ProbeBW:Up can also exit if the previous Up saw a hard ceiling and we have reached inflight >= inflight_hi (code line 2068). Let's call this "risky probe condition". But then it seems like ProbeBW:Down transitions into ProbeBW:Refill if the previous Up exited due to the risky probe condition (code line 1918). After this comes another ProbeBW:Up, so exiting the previous one due to the risky probe condition did not change a thing because now we're probing anyway.

That description of the dynamics is missing one key aspect. The second refill only happens if the first attempt to probe bandwidth did not find loss or ECN exceeding the tolerance thresholds.
 
The reason for this behavior is not obvious to me.

Sorry this is not so clear. The motivation for this "is_risky" mechanism is to try to avoid having a BBRv2 flow knowingly hold the buffer at exactly full for a whole round trip time, when there is no need to do so.

The tricky part is that there is a one-round-trip delay between the moment a flow exercises a behavior that causes a buffer to overflow and cause loss, and the moment the flow detects that loss.

The inflight_hi value comes from the flow's previous measurements of the amount of data that most recently seemed like it would cause the loss rate to cross the loss_thresh threshold; so it's often right around the level at which the buffer is full and starts dropping packets. If a flow drives inflight up to inflight_hi and then waits a round trip time to find out whether this caused significant loss, then it's likely holding the buffer near the exactly-full point for a full round trip time, causing high delay and elevated loss for any other flows sharing the bottleneck.

The "is_risky" mechanism is designed to momentarily push inflight up to inflight_hi, then pull it down again until we can find out whether this caused loss. Once the flow sees that this bandwidth probing did not seem to cause excessive loss, the bandwidth probing continues persistently.

This may be more complexity that is warranted, and we may be able to shave it off before we transition out of "alpha". But that hopefully at least clarifies the motivation.
 

3. After ProbeRTT, BBRv2 resets the timer that initiates bandwidth probing. Why is this necessary?

This is a judgment call, and probably isn't strictly necessary, and we may want to change the behavior, based on experience, and whatever other mechanisms are ultimately in place to respond to packet loss.
 
The code says "Raising inflight after PROBE_RTT may cause loss". Isn't that the case in most situations?

Not sure what you are imagining falls into the category of "most situations". :-) But if I am understanding your point then yes, almost any time a flow raises its inflight it is greatly elevating the risk that the flow causes loss. While a flow maintains its pacing rate matching the delivery rate and respects packet conservation, IMHO it can be reasonably said to not be "causing" loss, because it can be seen as mostly just leaving a fixed, smaller amount of buffer space and bottleneck link bandwidth available for other flows, which is nearly equivalent to having a slower bottleneck link and shallower buffer than would be there otherwise. Any time a flow exceeds the delivery rate or packet conservation rate, it is greatly increasing the risk of loss. And the reset of the timer that initiates bandwidth probing is meant to correspond to that.

I have also played with variants where after ProbeRTT BBRv2 does *not* reset the timer that initiates bandwidth probing. And instead, the code explicitly tracks when the last packet loss was seen and the "Reno emulation" mechanism uses that as a yardstick. We'll see how that line of experimentation goes.
 
4. cwnd gets increased according to the level of ACK aggregation by 256 * extra_acked (code line 879). What does this factor of 256 do?

That extra_acked_gain parameter is there for experimentation, and is currently configured to be a NOP. Please note that the specific full line of code is:

  aggr_cwnd = (bbr->params.extra_acked_gain * bbr_extra_acked(sk)) >> BBR_SCALE;

So this code is multiplying by 256 (extra_acked_gain) and dividing by 256 (BBR_SCALE is 8), so those cancel out. For the eventual non-alpha version of this code we'll either remove the extra_acked_gain and >> BBR_SCALE, or if we keep the parameter then it will be a compile-time constant and the compiler will probably optimize away the compensating multiply and shift.
 
Moreover, it seems to get capped at the number of packets sent within the last 100ms. Is there a rationale for why you've picked 100ms?

The 100ms is a sort of safety cap, chosen to be the next round (decimal) number above the maximum time scale over which we have seen ACKs aggregated in the wild in wifi, cellular, or DOCSIS traffic.

For an example, please see page 5 of our IETF 101 slides, where you can see the max RTT in this wifi test was just over 80ms:
We have seen aggregation, or silences in the ACK stream, of roughly up to this level, but not much beyond. If others are seeing different behavior, we would love to hear about it.

Thanks for the great questions!

neal


 
Best regards,
Adrian

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/1efc5571-c865-45fb-92fe-72bb34775ef4n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages