Hi Greg,
Thank you so much for your suggestion. Handling ACK
suppression/decimation/acceleration is indeed one of the challenges
faced by BBR, and we appreciate your very nicely explained and
carefully reasoned suggestion.
Based on my understanding of your proposal ("calculate the RTT based
on the oldest of the ACKed segments rather than the newest") it sounds
like this would, when it is applicable, increase the RTT samples. But
since the RTprop (min_rtt) filter is a min-filter, the question would
be whether these increased RTT samples would "survive" the min-filter
and be chosen as the estimated RTprop/min_rtt. The RTprop/min_rtt
filter window is pretty long (currently 10 secs). So my guess would be
that (unless the DOCSIS link was severely congested) there would
typically be some RTT sample during the 10 sec window that did not
suffer ACK suppression/decimation/acceleration and was low enough to
be chosen as the RTpop/min_rtt estimate. So I am not sure such RTT
sample selection logic would have a significant effect, if I'm
understanding it and anticipating its effects correctly.
However, I like the direction of this suggestion a lot. In the same
vein as your suggestion, BBR could track the maximum number of packets
(or bytes) recently ACKed in a single stretch ACK, and essentially add
that amount (or some small multiple thereof) to the cwnd.
If we're going to do something like this, it would be nice to also
have the mechanism handle cases with "compressed" ACKs as well; quite
often we see a sequence of ACKs arrive in a burst, perhaps because
they were granted a slot of air time on a wifi/cellular link
together. So perhaps we can improve behavior on a wider class of these
links if we generalize this mechanism to track the number of packets
(or bytes) recently ACKed in a "time slot" equal to the amount of time
in which we'd expect a TSO burst to be ACKed.
For example, if a BBR sender is sending TSO bursts of 4 packets in
every every 1ms time slot (48Mbps), and yet sometimes sees 20 packets
ACKed in a single 1ms time slot, then perhaps it should inflate its
cwnd by at least 20 packets, to allow itself (if needed) to keep
sending while those 20 ACKs are being held up.
In fact, since September we have been experimenting in the lab with
this approach for Linux TCP BBR. Our hope was that such an approach
help in many scenarios: ACK suppression/decimation/acceleration, ACK
compression, LRO, GRO, or TCP delayed ACKs. In any of these cases BBR
can see a big gap in the ACK stream followed by a big burst of packets
being ACKed. The conjecture is that in such cases BBR will want to
allow more packets in flight.
Does it sound to you like a mechanism like this might help BBR
operate over DOCSIS links with ACK decimation?
As to your question about when BBR applies cwnd_gain: BBR always
applies cwnd_gain and a few other provisions to allow cwnd to be
bigger than the estimated BDP (see bbr_target_cwnd()). There's no
heuristic to decide when to apply those and when not too. But a key
part of the design of BBR is that it tries (and usually succeeds) in
mostly running limited by its pacing rate, rather than cwnd. So the
fact that cwnd is bigger than BDP largely only comes into play if the
ACK rate temporarily falls below the estimated bandwidth (eg with ACK
decimation or compression, or delayed ACKs).
cheers,
neal