D* tcp looks pretty good, on paper

158 views
Skip to first unread message

Dave Taht

unread,
Jan 7, 2021, 1:35:14 PM1/7/21
to bloat, co...@lists.bufferbloat.net, ECN-Sane, Make-Wifi-fast, flent-users, BBR Development, gho...@cs.ucdavis.edu, tfl...@ucdavis.edu
See: https://arxiv.org/pdf/2012.14996.pdf

Things I really like:

* they used flent
* Using "variance" as the principal signal. This is essentially one of
the great unpublished and unanalyzed improvements on the minstrel
algorithm as well
* Conventional ecn response
* outperforms bbr on variable links

Only negative so far is I haven't found any published source to it. :(

Otherwise a very promising start to a year.

"The choice of feedback mechanism between delay and packet loss has
long been a point of contention in TCP congestion control. This has
partly been resolved, as it has become increasingly evident that delay
based methods are needed to facilitate modern interactive web
applications. However, what has not been resolved is what control
should be used, with the two candidates being the congestion window
and the pacing rate. BBR is a new delay based congestion control
algorithm that uses a pacing rate as its primary control and the
congestion window as a secondary control. We propose that a congestion
window first algorithm might give more desirable performance
characteristics in situations where latency must be minimized even at
the expense of some loss in throughput. To evaluate this hypothesis we
introduce a new congestion control algorithm called TCP D*, which is a
congestion window first algorithm that adopts BBR's approach of
maximizing delivery rate while minimizing latency. In this paper, we
discuss the key features of this algorithm, discuss the differences
and similarity to BBR, and present some preliminary results based on a
real implementation."




--
"For a successful technology, reality must take precedence over public
relations, for Mother Nature cannot be fooled" - Richard Feynman

da...@taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729

Dave Taht

unread,
Jan 7, 2021, 4:41:59 PM1/7/21
to Bob McMahon, Taran Lynn, gho...@cs.ucdavis.edu, BBR Development, Make-Wifi-fast, ECN-Sane, bloat, co...@lists.bufferbloat.net, flent-users
This is through one of the last remaining cerowrt boxes in the world,
running fq_codel. tcp-davis takes about a 20% single stream throughput
hit vs bbr.

I note, that I don't care one whit about throughput anymore. I care
that nothing, NOTHING messes up my videoconference...

and thus the tcp-rtt stats attached for davis are pleasing.

On Thu, Jan 7, 2021 at 12:26 PM Bob McMahon <bob.m...@broadcom.com> wrote:
>
> FYI, one can try this out using iperf 2.1 with --trip-times. This gives end/end delay at the application level. One can use --trip-times when clocks
> are sync'd to get the write to read latencies which are the latencies at the application level.
>
> Note: I set up a Raspberry Pi 4 with a GPS hat from ubutronics for solderless pulse per second. Then configured it as a PTP grandmaster. This cost me around $200.
>
> I also added support for a very crude --near-congestion option that paces the writes based upon the weight of the RTT. The tcp_info struct is sampled and available
> for other experiments though one would have to modify the source a bit. This current technique used by iperf 2.1 is designed for test networks only where all
> traffic is under script control. We've had too many people measuring bloat as latency. We really need separate measurements between the two phenomena,
> bloat vs latency, because they require different eng actions for a semiconductor supplier.
>
> Below are examples over a 10G link, first with no write pacing then with it. The server output, shown first, has the latency data (as well as the net power
> and little's law calculation.) (Note: use --histograms for to get full distributions.)
>
> No write pacing
>
> [rjmcmahon@localhost iperf2-code]$ src/iperf -s -i 1 -e
> ------------------------------------------------------------
> Server listening on TCP port 5001 with pid 24568
> Read buffer size: 128 KByte (Dist bin width=16.0 KByte)
> TCP window size: 85.3 KByte (default)
> ------------------------------------------------------------
> [ 1] local 192.168.1.10%enp2s0 port 5001 connected with 192.168.1.62 port 50056 (MSS=1448) (trip-times) (sock=4) (peer 2.1.0-rc) on 2021-01-07 12:11:04 (PST)
> [ ID] Interval Transfer Bandwidth Burst Latency avg/min/max/stdev (cnt/size) inP NetPwr Reads=Dist
> [ 1] 0.00-1.00 sec 1.09 GBytes 9.34 Gbits/sec 2.959/1.180/3.681/0.388 ms (8905/131072) 3.31 MByte 394522 18480=2459:2580:2475:2354:2203:2192:1974:2243
> [ 1] 1.00-2.00 sec 1.10 GBytes 9.41 Gbits/sec 2.993/2.302/3.703/0.359 ms (8978/131072) 3.36 MByte 393209 19482=2526:2850:3102:2622:2344:2297:1867:1874
> [ 1] 2.00-3.00 sec 1.10 GBytes 9.42 Gbits/sec 3.010/2.302/3.692/0.347 ms (8978/131085) 3.38 MByte 391047 19387=2563:2757:2928:2708:2432:2244:1829:1926
> [ 1] 3.00-4.00 sec 1.10 GBytes 9.41 Gbits/sec 3.009/2.301/3.668/0.348 ms (8979/131060) 3.38 MByte 391094 18821=2456:2585:2660:2545:2270:2239:1906:2160
> [ 1] 4.00-5.00 sec 1.10 GBytes 9.42 Gbits/sec 2.985/2.299/3.696/0.359 ms (8979/131070) 3.35 MByte 394295 19441=2509:2886:2959:2728:2336:2200:1971:1852
> [ 1] 5.00-6.00 sec 1.10 GBytes 9.41 Gbits/sec 2.977/2.258/3.671/0.363 ms (8978/131082) 3.34 MByte 395352 18509=2352:2602:2464:2380:2263:2142:2095:2211
> [ 1] 6.00-7.00 sec 1.10 GBytes 9.41 Gbits/sec 2.980/2.290/3.680/0.363 ms (8978/131072) 3.34 MByte 394873 18522=2407:2499:2565:2334:2213:2268:1999:2237
> [ 1] 7.00-8.00 sec 1.10 GBytes 9.42 Gbits/sec 2.980/2.253/3.702/0.362 ms (8979/131073) 3.35 MByte 394972 18615=2427:2592:2493:2460:2281:2057:2062:2243
> [ 1] 8.00-9.00 sec 1.10 GBytes 9.41 Gbits/sec 2.976/2.277/3.663/0.364 ms (8979/131065) 3.34 MByte 395443 18632=2338:2615:2647:2351:2192:2317:2063:2109
> [ 1] 9.00-10.00 sec 1.10 GBytes 9.41 Gbits/sec 2.976/2.293/3.690/0.366 ms (8978/131076) 3.34 MByte 395416 18428=2281:2622:2497:2275:2178:2253:2129:2193
> [ 1] 0.00-10.00 sec 11.0 GBytes 9.41 Gbits/sec 2.984/1.180/3.703/0.362 ms (89736/131072) 3.35 MByte 394014 188367=24320:26609:26793:24757:22712:22211:19916:21049
>
>
> [rjmcmahon@localhost iperf2-code]src/iperf -c 192.168.1.10 --trip-times -i 1 -e
> ------------------------------------------------------------
> Client connecting to 192.168.1.10, TCP port 5001 with pid 18961 (1 flows)
> Write buffer size: 131072 Byte
> TCP window size: 85.0 KByte (default)
> ------------------------------------------------------------
> [ 1] local 192.168.1.62%enp2s0 port 50056 connected with 192.168.1.10 port 5001 (MSS=1448) (trip-times) (sock=3) (ct=0.41 ms) on 2021-01-07 12:11:04 (PST)
> [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr
> [ 1] 0.00-1.00 sec 1.09 GBytes 9.37 Gbits/sec 8937/0 0 1508K/1099 us 1065750
> [ 1] 1.00-2.00 sec 1.10 GBytes 9.41 Gbits/sec 8975/0 0 1508K/1087 us 1082218
> [ 1] 2.00-3.00 sec 1.10 GBytes 9.41 Gbits/sec 8975/0 0 1508K/1081 us 1088225
> [ 1] 3.00-4.00 sec 1.10 GBytes 9.42 Gbits/sec 8984/0 0 1508K/1085 us 1085300
> [ 1] 4.00-5.00 sec 1.10 GBytes 9.42 Gbits/sec 8980/0 0 1508K/1105 us 1065182
> [ 1] 5.00-6.00 sec 1.10 GBytes 9.41 Gbits/sec 8975/0 0 1582K/1100 us 1069428
> [ 1] 6.00-7.00 sec 1.10 GBytes 9.42 Gbits/sec 8979/0 0 1582K/1121 us 1049862
> [ 1] 7.00-8.00 sec 1.10 GBytes 9.41 Gbits/sec 8976/0 0 1582K/1133 us 1038396
> [ 1] 8.00-9.00 sec 1.10 GBytes 9.41 Gbits/sec 8978/0 0 1582K/1115 us 1055394
> [ 1] 9.00-10.00 sec 1.10 GBytes 9.42 Gbits/sec 8986/0 0 1582K/1122 us 1049744
> [ 1] 0.00-10.00 sec 11.0 GBytes 9.41 Gbits/sec 89748/0 0 1582K/1122 us 1048294
>
>
> With write pacing
>
> [rjmcmahon@localhost iperf2-code]$ src/iperf -s -i 1 -e
> ------------------------------------------------------------
> Server listening on TCP port 5001 with pid 24702
> Read buffer size: 128 KByte (Dist bin width=16.0 KByte)
> TCP window size: 85.3 KByte (default)
> ------------------------------------------------------------
> [ 1] local 192.168.1.10%enp2s0 port 5001 connected with 192.168.1.62 port 50072 (MSS=1448) (trip-times) (sock=4) (peer 2.1.0-rc) on 2021-01-07 12:14:59 (PST)
> [ ID] Interval Transfer Bandwidth Burst Latency avg/min/max/stdev (cnt/size) inP NetPwr Reads=Dist
> [ 1] 0.00-1.00 sec 1.08 GBytes 9.31 Gbits/sec 0.401/0.193/2.682/0.168 ms (8876/131084) 456 KByte 2904347 19868=3296:2404:2508:2797:3559:1778:1551:1975
> [ 1] 1.00-2.00 sec 1.10 GBytes 9.41 Gbits/sec 0.400/0.219/0.627/0.053 ms (8971/131071) 460 KByte 2937822 19117=3069:2267:2307:2510:3029:1824:1683:2428
> [ 1] 2.00-3.00 sec 1.09 GBytes 9.39 Gbits/sec 0.374/0.193/0.541/0.055 ms (8958/131060) 428 KByte 3143030 18942=2846:2423:2304:2417:2927:1831:1856:2338
> [ 1] 3.00-4.00 sec 1.09 GBytes 9.39 Gbits/sec 0.385/0.190/0.664/0.070 ms (8952/131072) 441 KByte 3050401 19248=3041:2175:2343:2749:3320:1805:1526:2289
> [ 1] 4.00-5.00 sec 1.09 GBytes 9.40 Gbits/sec 0.380/0.197/0.546/0.057 ms (8965/131075) 436 KByte 3095915 19959=3321:2398:2551:2738:3500:1840:1532:2079
> [ 1] 5.00-6.00 sec 1.09 GBytes 9.39 Gbits/sec 0.369/0.198/0.536/0.051 ms (8956/131072) 423 KByte 3177431 21060=3627:2456:2886:3189:4246:1813:1190:1653
> [ 1] 6.00-7.00 sec 1.09 GBytes 9.39 Gbits/sec 0.380/0.202/0.562/0.054 ms (8959/131077) 436 KByte 3086914 19263=3044:2338:2424:2505:3155:1809:1636:2352
> [ 1] 7.00-8.00 sec 1.09 GBytes 9.40 Gbits/sec 0.376/0.198/0.541/0.053 ms (8965/131061) 432 KByte 3122495 19137=3079:2303:2340:2455:3017:1822:1683:2438
> [ 1] 8.00-9.00 sec 1.10 GBytes 9.41 Gbits/sec 0.381/0.208/0.576/0.054 ms (8974/131073) 438 KByte 3083767 19162=3050:2269:2392:2486:3019:1891:1667:2388
> [ 1] 9.00-10.00 sec 1.09 GBytes 9.40 Gbits/sec 0.371/0.194/0.582/0.057 ms (8964/131070) 425 KByte 3169244 19143=3006:2411:2303:2462:3067:1744:1760:2390
> [ 1] 0.00-10.00 sec 10.9 GBytes 9.39 Gbits/sec 0.382/0.190/2.682/0.076 ms (89544/131072) 437 KByte 3074913 194908=31380:23444:24362:26308:32839:18161:16084:22330
>
>
> [rjmcmahon@localhost iperf2-code]$ src/iperf -c 192.168.1.10 --near-congestion=0.05 --trip-times -i 1 -e
> ------------------------------------------------------------
> Client connecting to 192.168.1.10, TCP port 5001 with pid 19320 (1 flows)
> Write buffer size: 131072 Byte
> TCP near-congestion delay weight set to 0.0500
> TCP window size: 85.0 KByte (default)
> ------------------------------------------------------------
> [ 1] local 192.168.1.62%enp2s0 port 50072 connected with 192.168.1.10 port 5001 (MSS=1448) (trip-times) (sock=3) (ct=0.40 ms) on 2021-01-07 12:14:59 (PST)
> [ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr
> [ 1] 0.00-1.00 sec 1.08 GBytes 9.31 Gbits/sec 8881/0 0 1135K/373 us 3120427
> [ 1] 1.00-2.00 sec 1.10 GBytes 9.41 Gbits/sec 8971/0 0 1135K/391 us 3007281
> [ 1] 2.00-3.00 sec 1.09 GBytes 9.39 Gbits/sec 8958/0 0 1135K/331 us 3547260
> [ 1] 3.00-4.00 sec 1.09 GBytes 9.39 Gbits/sec 8952/0 0 1135K/288 us 4074155
> [ 1] 4.00-5.00 sec 1.09 GBytes 9.40 Gbits/sec 8965/0 0 1135K/301 us 3903855
> [ 1] 5.00-6.00 sec 1.09 GBytes 9.39 Gbits/sec 8955/0 0 1135K/414 us 2835144
> [ 1] 6.00-7.00 sec 1.09 GBytes 9.40 Gbits/sec 8961/0 0 1135K/470 us 2499013
> [ 1] 7.00-8.00 sec 1.09 GBytes 9.40 Gbits/sec 8964/0 0 1135K/350 us 3356941
> [ 1] 8.00-9.00 sec 1.10 GBytes 9.41 Gbits/sec 8973/0 0 1135K/472 us 2491756
> [ 1] 9.00-10.00 sec 1.09 GBytes 9.40 Gbits/sec 8964/0 0 1135K/402 us 2922710
> [ 1] 0.00-10.00 sec 10.9 GBytes 9.39 Gbits/sec 89547/0 0 1135K/402 us 2919642
>
> Bob
>
>
> On Thu, Jan 7, 2021 at 11:22 AM Taran Lynn via Make-wifi-fast <make-wi...@lists.bufferbloat.net> wrote:
>>
>> The source can be found at https://github.com/lambda-11235/tcp_davis .
>>
>> The code mentioned in the paper can be found under the tag "arxiv_2020". The current master branch has an additional stable mode that I was testing out.
>> _______________________________________________
>> Make-wifi-fast mailing list
>> Make-wi...@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>
>
> This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
tcp_nup-2021-01-07T132650.356720.tcp-davis.flent.gz
tcp_nup-2021-01-07T133304.619697.bbr.flent.gz
bbrvsdavis_bw.png
bbrvsdavis.png

Neal Cardwell

unread,
Jan 8, 2021, 10:38:25 AM1/8/21
to Dave Taht, bloat, co...@lists.bufferbloat.net, ECN-Sane, Make-Wifi-fast, flent-users, BBR Development, gho...@cs.ucdavis.edu, tfl...@ucdavis.edu
On Thu, Jan 7, 2021 at 1:35 PM Dave Taht <dave...@gmail.com> wrote:
See: https://arxiv.org/pdf/2012.14996.pdf

Thanks for the link!
 

Things I really like:

* they used flent
* Using "variance" as the principal signal. This is essentially one of
the great unpublished and unanalyzed improvements on the minstrel
algorithm as well
* Conventional ecn response
* outperforms bbr on variable links

What did you have in mind by "variable links" here? (I did not see that term in the paper.)

Rather than characterizing the algorithm as using "variance" as the principal signal, my sense is that the estimated BDP is the primary signal, and the algorithm uses variance as a secondary signal to adapt the gain.

I would be interested to hear how the algorithm performs in real-world paths with high degrees of aggregation and RTT variance, including wifi, cellular, and 10Gbps+ Ethernet LANs. The paper mentions "TCP D* sets the window to its estimated BDP," and our experience is that setting cwnd to the estimated BDP produces unusably low throughput over these kinds of paths. In these paths the min_rtt is very different from the typical RTT, so setting the cwnd purely using the min_rtt can lead to very significant underutilization:

Another interesting aspect is that it seems completely agnostic to packet losses. It would be interesting to see how the algorithm behaves in shallow or mid-sized buffers with a highly dynamic mix of traffic.

best,
neal

Jonathan Morton

unread,
Jan 8, 2021, 11:13:52 AM1/8/21
to Neal Cardwell, Dave Taht, gho...@cs.ucdavis.edu, BBR Development, tfl...@ucdavis.edu, Make-Wifi-fast, ECN-Sane, bloat, co...@lists.bufferbloat.net, flent-users
> On 8 Jan, 2021, at 5:38 pm, Neal Cardwell via Make-wifi-fast <make-wi...@lists.bufferbloat.net> wrote:
>
> What did you have in mind by "variable links" here? (I did not see that term in the paper.)

Wifi and LTE tend to vary their link characteristics a lot over time.

- Jonathan Morton

Bryson Richard

unread,
Jan 8, 2021, 11:24:16 AM1/8/21
to Jonathan Morton, Neal Cardwell, Dave Taht, gho...@cs.ucdavis.edu, BBR Development, tfl...@ucdavis.edu, Make-Wifi-fast, ECN-Sane, bloat, co...@lists.bufferbloat.net, flent-users
STOP ALL NOW AND FOREVER

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/98CABC94-0E85-4981-85D2-52C7F02F3B9E%40gmail.com.

Taran Lynn

unread,
Jan 11, 2021, 12:17:56 AM1/11/21
to Neal Cardwell, BBR Development
One of our hopes was that by releasing the code others would test it on
their own systems. It would be interesting to see how the algorithm
reacts to systems with high RTT variance. We based the gain in the
congestion window of off the variance in the BDP estimate, in order to
accommodate unstable networks by being more aggressive. However, it is
possible that variance could be a problem if the min RTT estimate is
inaccurate.

Would you be interested in working with us to test the algorithm under
these conditions? Since your team has a lot of experience in congestion
control, we would like to get your advise on what tests we should run,
how to set them up, and pitfalls we should avoid.

Best, Taran
_______________________________________________
Bloat mailing list
Bl...@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Neal Cardwell

unread,
Jan 11, 2021, 10:50:28 AM1/11/21
to Taran Lynn, BBR Development
Hi Taran,

I'm afraid our team does not have cycles to help out with testing D*,
but in terms of advice, the main points that come to mind:

o It's useful to test with real DOCSIS, wifi, cellular, and high-speed
Ethernet links, since these links have characteristics that are very
difficult to simulate/emulate.

o It's important to test with senders that have a very low min_rtt
relative to the typical RTT. This is because
(a) this is very common, since most Internet traffic is from CDNs
with a wide footprint of servers very close to users, and most
high-speed Ethernet traffic has low min_rtt but high RTT variance due
to GSO/TSO, LRO/GRO, interrupt coalescing, etc
(b) this is the most challenging scenario for an algorithm that is
trying to minimize the amount of data in flight while maintaining
usably high throughput
You could try to achieve this by running your senders on a cloud or
bare metal hosting provider in the same metropolitan area as your test
clients, with a O(1ms) min_rtt to your clients.

best,
neal
Reply all
Reply to author
Forward
0 new messages