Inquiry on BBRv1 Pacing Rate Behavior Under Bandwidth Fluctuations

154 views
Skip to first unread message

Li Zonglun

unread,
Mar 22, 2024, 11:27:34 AMMar 22
to BBR Development

Hi All,

I am reaching out to discuss an observation I've made while experimenting with the BBRv1 algorithm, specifically regarding the behavior of the bbr_set_pacing_rate function within the tcp_bbr1.c file from your BBRv3 branch on GitHub.

For context, I have compiled the BBRv1 and v3 kernels as loadable modules to facilitate repeated modifications and debugging of the source code. My current focus is on the original BBRv1 version. During my experiments, which involved dynamically adjusting the environment's bandwidth, I noted that the pacing rate computed by BBRv1, and assigned to the sk_pacing_rate in the socket structure, does not decrease immediately following a significant drop in the available bandwidth. This behavior aligns with my understanding, given BBRv1's reliance on the maximum bandwidth observed over the past 10 RTTs.

However, when analyzing the actual send rate through pcap files captured with tcpdump, I observed a rapid decline in the send rate concurrent with the bandwidth reduction. This was unexpected, as I assumed the send rate should be primarily dictated by the sk_pacing_rate, which remained at a higher value indicative of previous bandwidth conditions.

To illustrate my point more clearly, I've included a graph below that visualizes my observations. In this graph:

The red line represents the pacing_rate as determined by BBRv1. I get this value using printk in the bbr_set_pacing_rate function.

The orange line corresponds to the send rate calculated from pcap files captured with tcpdump.

The blue line illustrates the changes in the actual available bandwidth.

1.png

Could you please provide insights or clarifications on why there's a substantial discrepancy between the maintained sk_pacing_rate and the actual send rate observed through pcap analysis? Is there an underlying mechanism or factor that could explain this rapid adjustment in send rate despite the sk_pacing_rate not showing a corresponding decrease?

In pondering this discrepancy, I wonder if, after BBR sets the sk_pacing_rate using the calculated pacing_rate, there might be some internal TCP mechanisms that subsequently adjust the sk_pacing_rate, leading to the significant difference between the actual send rate and the sk_pacing_rate. I sincerely inquire whether your engineering team has previously encountered this issue or can provide any insights into such behavior within the TCP stack.

Your expertise and any guidance on this matter would be greatly appreciated, as it would significantly aid in my understanding and further experimentation with BBR.

Best Regards,

Zonglun Li

Neal Cardwell

unread,
Mar 22, 2024, 12:11:50 PMMar 22
to Li Zonglun, BBR Development
On Fri, Mar 22, 2024 at 11:27 AM Li Zonglun <gunp...@gmail.com> wrote:

Hi All,

I am reaching out to discuss an observation I've made while experimenting with the BBRv1 algorithm, specifically regarding the behavior of the bbr_set_pacing_rate function within the tcp_bbr1.c file from your BBRv3 branch on GitHub.

For context, I have compiled the BBRv1 and v3 kernels as loadable modules to facilitate repeated modifications and debugging of the source code. My current focus is on the original BBRv1 version. During my experiments, which involved dynamically adjusting the environment's bandwidth, I noted that the pacing rate computed by BBRv1, and assigned to the sk_pacing_rate in the socket structure, does not decrease immediately following a significant drop in the available bandwidth. This behavior aligns with my understanding, given BBRv1's reliance on the maximum bandwidth observed over the past 10 RTTs.

However, when analyzing the actual send rate through pcap files captured with tcpdump, I observed a rapid decline in the send rate concurrent with the bandwidth reduction. This was unexpected, as I assumed the send rate should be primarily dictated by the sk_pacing_rate, which remained at a higher value indicative of previous bandwidth conditions.

To illustrate my point more clearly, I've included a graph below that visualizes my observations. In this graph:

The red line represents the pacing_rate as determined by BBRv1. I get this value using printk in the bbr_set_pacing_rate function.

The orange line corresponds to the send rate calculated from pcap files captured with tcpdump.

The blue line illustrates the changes in the actual available bandwidth.

1.png

Could you please provide insights or clarifications on why there's a substantial discrepancy between the maintained sk_pacing_rate and the actual send rate observed through pcap analysis? Is there an underlying mechanism or factor that could explain this rapid adjustment in send rate despite the sk_pacing_rate not showing a corresponding decrease?

Yes, cwnd is a mechanism that is a second constraint on sending behavior, which might explain this. You might try looking at cwnd and packets_in_flight during these tests, using ss, e.g.:
 
(while true; do ss -tinm 'dst $REMOTE_HOST'; sleep 0.025; done) > /tmp/ss.out.txt &

For tips on building a recent ss binary:

If you find packets_in_flight (unacked - sacked - lost + retrans) is >= cwnd, then that indicates cwnd is constraining behavior.

In particular, if packet loss rates are high, then BBR is often cwnd-limited rather than pacing-limited.

If that doesn't explain it, would you be able to post your .pcap files (e.g., headers-only) on a web server, or Google Drive file, or Dropbox file, etc?

best regards,
neal

In pondering this discrepancy, I wonder if, after BBR sets the sk_pacing_rate using the calculated pacing_rate, there might be some internal TCP mechanisms that subsequently adjust the sk_pacing_rate, leading to the significant difference between the actual send rate and the sk_pacing_rate. I sincerely inquire whether your engineering team has previously encountered this issue or can provide any insights into such behavior within the TCP stack.

Your expertise and any guidance on this matter would be greatly appreciated, as it would significantly aid in my understanding and further experimentation with BBR.

Best Regards,

Zonglun Li

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/fa9e4dcc-0a5f-4320-9d0a-ff7555f72186n%40googlegroups.com.

Bob McMahon

unread,
Mar 22, 2024, 1:20:09 PMMar 22
to Neal Cardwell, Li Zonglun, BBR Development

If you find packets_in_flight (unacked - sacked - lost + retrans) is >= cwnd, then that indicates cwnd is constraining behavior.


A bit of a tangent - would it be useful for a tool like iperf 2 to output this inflight calculation? It would be sampled at the report interval rate.

rjmcmahon@fedora:~/Code/inflight/iperf2-code$ src/iperf -c 192.168.1.35 -i 1 -e
------------------------------------------------------------
Client connecting to 192.168.1.35, TCP port 5001 with pid 164094 (1/0 flows/load)
Write buffer size: 131072 Byte
TCP congestion control using cubic
TOS set to 0x0 (dscp=0,ecn=0) (Nagle on)
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.103%enp4s0 port 37468 connected with 192.168.1.35 port 5001 (sock=3) (icwnd/mss/irtt=14/1448/182) (ct=0.24 ms) on 2024-03-22 10:15:19.259 (PDT)
[ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     InF(pkts)/Cwnd/RTT(var)        NetPwr
[  1] 0.00-1.00 sec   116 MBytes   969 Mbits/sec  924/0        71     1077/1535K/13126(144) us  9227
[  1] 1.00-2.00 sec   113 MBytes   945 Mbits/sec  901/0         0     1178/1678K/14317(90) us  8249
[  1] 2.00-3.00 sec   112 MBytes   941 Mbits/sec  897/0         0     1227/1790K/15308(108) us  7680
[  1] 3.00-4.00 sec   112 MBytes   941 Mbits/sec  897/0         0     1305/1877K/16158(142) us  7276
[  1] 4.00-5.00 sec   112 MBytes   937 Mbits/sec  894/0         2      959/1360K/11557(83) us  10139
[  1] 5.00-6.00 sec   112 MBytes   941 Mbits/sec  897/0         0      998/1455K/12428(106) us  9460
[  1] 6.00-7.00 sec   112 MBytes   940 Mbits/sec  896/0         0     1044/1527K/13034(104) us  9010
[  1] 7.00-8.00 sec   112 MBytes   941 Mbits/sec  897/0         0     1093/1578K/13416(96) us  8764
[  1] 8.00-9.00 sec   114 MBytes   952 Mbits/sec  908/0         0     1127/1612K/13757(111) us  8651
[  1] 9.00-10.00 sec   112 MBytes   941 Mbits/sec  897/0         0     1138/1641K/13930(92) us  8440
[  1] 0.00-10.04 sec  1.10 GBytes   941 Mbits/sec  9009/0        73        0/1643K/14132(108) us  8322

 Bob

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

MUHAMMAD AHSAN

unread,
Mar 22, 2024, 1:26:40 PMMar 22
to Bob McMahon, BBR Development
Your congestion control below is not bbr, rather cubic.

Regards,
Ahsan

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.

Bob McMahon

unread,
Mar 22, 2024, 1:31:11 PMMar 22
to MUHAMMAD AHSAN, BBR Development
Sure, the congestion control option is --tcp-cca. That's kernel build dependent. I think the inflight calculation is independent of the CCA on a socket so should be valid for any CCA.

--tcp-cca
Set the congestion control algorithm to be used for TCP connections & exchange with the server (same as --tcp-congestion)


Bob

Neal Cardwell

unread,
Mar 22, 2024, 2:18:41 PMMar 22
to Bob McMahon, Li Zonglun, BBR Development
On Fri, Mar 22, 2024 at 1:20 PM Bob McMahon <bob.m...@broadcom.com> wrote:

If you find packets_in_flight (unacked - sacked - lost + retrans) is >= cwnd, then that indicates cwnd is constraining behavior.


A bit of a tangent - would it be useful for a tool like iperf 2 to output this inflight calculation? It would be sampled at the report interval rate.

Sure, for any tool that outputs cwnd, the number of packets in flight is also interesting to output. Particularly for BBR, which prefers to try to control sending with the pacing rate, and thus often does not use the full cwnd.

best regards,
neal

Bob McMahon

unread,
Mar 22, 2024, 2:24:21 PMMar 22
to Neal Cardwell, Li Zonglun, BBR Development

t - would it be useful for a tool like iperf 2 to output this inflight calculation? It would be sampled at the report interval rate.

Sure, for any tool that outputs cwnd, the number of packets in flight is also interesting to output. Particularly for BBR, which prefers to try to control sending with the pacing rate, and thus often does not use the full cwnd.


The units for inflight is packets and cwnd is bytes. Is that ok or is there a way to use same units, assuming that would be preferred (or maybe not?)

Thanks,
Bob 

Neal Cardwell

unread,
Mar 22, 2024, 2:32:04 PMMar 22
to Bob McMahon, Li Zonglun, BBR Development
If you want cwnd and packets in the same units (sounds good to me), then for Linux TCP you would probably want to use the units of packets. Both cwnd and inflight are natively in units of packets in Linux TCP. So if you are printing the cwnd in bytes you are presumably multiplying the cwnd by the MSS? So I guess you could just print the inflight and cwnd values directly in units of packets, and not multiply the cwnd by the MSS. :-)

neal

Bob McMahon

unread,
Mar 22, 2024, 2:52:47 PMMar 22
to Neal Cardwell, Li Zonglun, BBR Development

If you want cwnd and packets in the same units (sounds good to me), then for Linux TCP you would probably want to use the units of packets. Both cwnd and inflight are natively in units of packets in Linux TCP. So if you are printing the cwnd in bytes you are presumably multiplying the cwnd by the MSS? So I guess you could just print the inflight and cwnd values directly in units of packets, and not multiply the cwnd by the MSS. :-)


oops, you're right, the iperf code is changing cwnd units to Kbytes per stats->cwnd = tcp_info_buf.tcpi_snd_cwnd * tcp_info_buf.tcpi_snd_mss / 1024

I did this so the server side e2e in progress (InP) bytes, computed per Little's Law, and available with --trip-times, were in similar byte units.

Not sure if network engineers will want same units for all three or not. 

The forwarding plane engineers like units of packets but Little's Law doesn't really have that info unless tcp_info is exposed, making it platform dependent.

root@rpi5-35:~# iperf -s -i 0.5 -e
------------------------------------------------------------
Server listening on TCP port 5001 with pid 22124
Read buffer size:  128 KByte (Dist bin width=16.0 KByte)
TCP congestion control default cubic
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  1] local 192.168.1.35%eth0 port 5001 connected with 192.168.1.103 port 57496 (trip-times) (sock=4/reno) (peer 2.2.0-rc) (icwnd/mss/irtt=14/1448/177) on 2024-03-22 11:47:34.416 (PDT)
[ ID] Interval        Transfer    Bandwidth    Burst Latency avg/min/max/stdev (cnt/size) inP NetPwr  Reads=Dist
[  1] 0.00-0.50 sec  55.4 MBytes   930 Mbits/sec  7.634/6.585/8.031/0.210 ms (443/131160)  881 KByte 15222  9687=9685:1:0:1:0:0:0:0

rjmcmahon@fedora:~/Code/inflight/iperf2-code$ src/iperf -c 192.168.1.35 -i 0.5 -e --tcp-write-prefetch 1K --tcp-cca reno --trip-times --sync-transfer-id -t 2
------------------------------------------------------------
Client connecting to 192.168.1.35, TCP port 5001 with pid 182166 (1/0 flows/load)

Write buffer size: 131072 Byte
TCP congestion control set to reno using reno

TOS set to 0x0 (dscp=0,ecn=0) (Nagle on)
TCP window size: 85.0 KByte (default)
Event based writes (pending queue watermark at 1024 bytes)
------------------------------------------------------------
[  1] local 192.168.1.103%enp4s0 port 57496 connected with 192.168.1.35 port 5001 (prefetch=1024) (trip-times) (sock=3/reno) (icwnd/mss/irtt=14/1448/162) (ct=0.22 ms) on 2024-03-22 11:47:34.411 (PDT)

[ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     InF(pkts)/Cwnd/RTT(var)        NetPwr
[  1] 0.00-0.50 sec  56.3 MBytes   944 Mbits/sec  450/0         0      180/514K/1908(293) us  61827

Bob McMahon

unread,
Mar 22, 2024, 3:09:37 PMMar 22
to Neal Cardwell, Li Zonglun, BBR Development
I'll provide both unts as shown below unless some can propose a better way

rjmcmahon@fedora:~/Code/inflight/iperf2-code$ src/iperf -c 192.168.1.35 -i 0.5 -e --tcp-write-prefetch 1K --tcp-cca reno --trip-times --sync-transfer-id -t 2
------------------------------------------------------------
Client connecting to 192.168.1.35, TCP port 5001 with pid 183209 (1/0 flows/load)

Write buffer size: 131072 Byte
TCP congestion control set to reno using reno
TOS set to 0x0 (dscp=0,ecn=0) (Nagle on)
TCP window size: 93.5 KByte (default)

Event based writes (pending queue watermark at 1024 bytes)
------------------------------------------------------------
[  1] local 192.168.1.103%enp4s0 port 47194 connected with 192.168.1.35 port 5001 (prefetch=1024) (trip-times) (sock=3/reno) (icwnd/mss/irtt=14/1448/184) (ct=0.25 ms) on 2024-03-22 12:06:47.462 (PDT)

[ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry     InF(pkts)/Cwnd/RTT(var)        NetPwr
[  1] 0.00-0.50 sec  56.3 MBytes   944 Mbits/sec  450/0         0      248K(176)/518K/1780(316) us  66272
[  1] 0.50-1.00 sec  56.0 MBytes   940 Mbits/sec  448/0         0      244K(173)/518K/1731(345) us  67845
[  1] 1.00-1.50 sec  56.1 MBytes   942 Mbits/sec  449/0         0      255K(181)/518K/1861(298) us  63247
[  1] 1.50-2.00 sec  56.1 MBytes   942 Mbits/sec  449/0         0      247K(175)/518K/1756(321) us  67029

Bob

Neal Cardwell

unread,
Mar 22, 2024, 5:07:53 PMMar 22
to Bob McMahon, Li Zonglun, BBR Development
Sounds fine to me.

thanks,
neal

Bob McMahon

unread,
Mar 22, 2024, 5:49:02 PMMar 22
to Neal Cardwell, Li Zonglun, BBR Development
Thanks. My apologies for derailing the thread a bit. Hopefully, the OP will provide the information you requested.

Bob

Li Zonglun

unread,
Mar 26, 2024, 6:22:09 AMMar 26
to BBR Development

I would like to express my gratitude for your prompt and insightful response so quickly. Following your advice, I have utilized ss to gather statistics on the inflight features. As you mentioned , the formula for calculating inflight was delineated as unacked - sacked - lost + retrans. Based on the ss output, which includes bytes_sent, bytes_acked, bytes_sacked, lost, and bytes_retrans, my initial understanding led me to formulate inflight as bytes_sent - bytes_acked - bytes_sacked - lost + bytes_retrans. However, upon observing significant discrepancies between the inflight curve(yellow one) and the cwnd curve (as provided by ss, gray one) in the latter stages of my experiment, 

1.png

I revised the formula to inflight = bytes_sent - bytes_acked - bytes_sacked - lost. This adjustment brought the inflight and cwnd curves closer, yet, the inflight values remained consistently higher than cwnd in the mid to late phases of the experiment.

2.png 

Given this context, I seek your expertise in clarifying which of these approaches to inflight calculation is more accurate, or if perhaps neither is correct and there exists a more appropriate method. By the way, to show the corresponding perfomance of rate and inflight more explicitly, here are two graphs sharing the same x axis:

- figure one: orange line --- send rate; blue line --- environment actual bandwidth; red line --- bbr pacing rate

- figure two: yellow line --- inflight num; gray line --- cwnd from the ss output

3.png

4.png

Furthermore, I find myself puzzled by the assertion that when inflight >= cwnd, it is the cwnd and not the pacing_rate that constrains BBR's sending rate. I would greatly appreciate a more detailed explanation on this matter. Specifically, how does the BBR algorithm utilize cwnd and pacing_rate for data transmission upon receiving a new ACK, especially when inflight exceeds cwnd? My current understanding is that the reception of a new ACK prompts BBR to dispatch a new packet, with the rate of ACK reception dictating the pace of packet transmission. Therefore, even with a higher pacing_rate, the actual sending rate cannot surpass the limits imposed when inflight > cwnd.

I hope my interpretations are on the right track, but I am open to corrections and further enlightenment on these topics.

Thank you for your time and assistance.

Best regards,

Zonglun Li

Neal Cardwell

unread,
Mar 26, 2024, 1:00:36 PMMar 26
to Li Zonglun, BBR Development
> led me to formulate inflight as 
> bytes_sent - bytes_acked - bytes_sacked - lost + bytes_retrans

I think there may be several issues with estimating the amount of data in flight using that technique:

+ bytes_sent includes both original transmissions and retransmissions, so given that you are adding both bytes_sent and bytes_retrans you are double-counting retransmissions

+ adding bytes_sent plus bytes_retrans and subtracting bytes_acked is also problematic because (even if this was not double-counting retransmissions), there would be the problem that if a given sequence range of R bytes is transmitted and then retransmitted N times, it would add R * (N + 1) bytes to your total for transmissions and retransmissions, but then when the sequence range is cumulatively ACKed your expression would only subtract out R bytes as ACKed in the bytes_acked count.

+ sacked and lost and cwnd information is available only in packets; the rest of the numbers you mention are in bytes

FWIW, when I suggested computing "packets_in_flight" as (unacked - sacked - lost + retrans)  I meant (in tcp_info) terms, and in units of packets:

   (tcpi_unacked - tcpi_sacked - tcpi_lost + tcpi_retrans)

In ss output terms, those are available as:

  tcpi_unacked: the "unacked" field
  tcpi_sacked: the "sacked" field
  tcpi_lost: the "lost" field
  tcpi_retrans: in the "retrans:A/B" field this is the A number

Then you can directly compare a packets_in_flight calculated that way (the same way the Linux TCP stack computes it), with cwnd, which is also in units of packets.

cheers,
neal

ps: for this kind of project, it can be useful to use the source for the "ss" tool and the Linux TCP code that exports the tcp_info metrics:


Li Zonglun

unread,
Mar 29, 2024, 11:54:03 AMMar 29
to BBR Development
Thanks, Neal!

It's very kind of you to help me figure out why using the bytes_sent and bytes_retrans cannot get the correct result. Now I'm clear.

After comparing the inflight data with the cwnd value using the fields you have mentioned in the ss output, I found that each time when the sending rate is not equal to the sk_pacing_rate, the inflight data is always equal to the cwnd value(in units of packets), which shows that the constraint factor here should be the cwnd instead of the pacing_rate of bbr.

The picture below is the latest inflight(yellow) and cwnd(gray) lines.
5.png
Thanks for your reply!

Best regards,

Zonglun Li

Reply all
Reply to author
Forward
0 new messages