Hi All,
We wanted to share some Linux TCP BBR patches we are testing, to get feedback and hear how the patches perform for folks on this list.
We are interested in hearing comments on the code, and particularly interested in hearing how these patches affect throughput over low-RTT paths where there is a wifi hop. In particular our test results (examples attached) are showing improvements for the case where the TCP sender is on Ethernet and the receiver is on a wifi network. (Additional work is under way for the case where the BBR sender has a wifi interface.)
These patches are intended to be applied on top of the Linux net-next branch. (For tips on compiling and building a net-next kernel with TCP BBR please check out the Linux TCP BBR quick start guide).
There are two main efforts reflected in these patches:
1: Higher throughput for wifi and other paths with aggregation
Aggregation effects are extremely common with wifi, cellular, and cable modem link technologies, ACK decimation in middleboxes, and LRO and GRO in receiving hosts. The aggregation can happen in either direction, data or ACKs, but in either case the aggregation effect is visible to the sender in the ACK stream.
Previously, BBR's sending was often limited by cwnd under severe ACK aggregation/decimation because BBR sized the cwnd at 2*BDP. If packets were ACKed in bursts after long delays then BBR stopped sending after sending 2*BDP, leaving the bottleneck idle for potentially long periods. Note that loss-based congestion control does not have this issue because when facing aggregation it continues increasing cwnd after bursts of ACKs, growing cwnd until the buffer is full.
To achieve good throughput in the presence of aggregation effects, this new algorithm allows the BBR sender to put extra data in flight to keep the bottleneck utilized during silences in the ACK stream that it has evidence to suggest were caused by aggregation.
2: Lower queuing delays by frequently draining excess in-flight data
In BBR v1.0 the "drain" phase of the pacing gain cycle holds the pacing_gain to 0.75 for essentially 1*min_rtt (or less if inflight falls below the BDP).
This patch modifies the behavior of this "drain" phase to attempt to "drain to target", adaptively holding this "drain" phase until inflight reaches the target level that matches the estimated BDP (bandwidth-delay product).
This can significantly reduce the amount of data queued at the bottleneck, and hence reduce queuing delay and packet loss, in cases where there are multiple flows sharing a bottleneck.
For more info on both of these efforts, please check out the video and/or slides from our recent presentation at the ICCRG session at IETF 101.
If you have a chance to test these patches, we would be grateful to see throughput numbers comparing BBR to CUBIC. And if the BBR performance is significantly lower, it would be useful to see sender-side ss output and headers-only tcpdump traces (example command lines below).
Enjoy!
Thanks,
Neal, on behalf of the Google BBR team
ps: Here are some examples for collecting sender-side headers-only tcpdump traces:
tcpdump -w /tmp/trace.pcap -s 100 -i $DEVICE -c 1000000 port $PORT &
...and 30 seconds of sender-side ss output dumped every 100ms:
(for i in `seq 1 300`; do
ss -tin "dst $HOST and dport = $PORT"; usleep 100000;
done) > /tmp/ss.out.txt &