BBR suddenly dropped in-flight packet to 0

175 views
Skip to first unread message

Shehab Sarar Ahmed

unread,
Jan 24, 2025, 8:32:44 AMJan 24
to BBR Development
Hello Neal,

I observed an unusual behavior of BBR while running a BBR flow concurrently with a CUBIC flow on a shared link. For the first 80 seconds, both flows performed as expected. However, BBR abruptly reduced its sending rate, maintaining just one packet in the link. After receiving the acknowledgment for each packet, it sent the next one. This pattern persisted for the remainder of the experiment, which lasted over 20 seconds.

The pcap file is here: https://drive.google.com/file/d/1JHBdO7_Gxl1_YcnL6EAFvSZfWr839WNs/view?usp=sharing

Shehab Sarar Ahmed

unread,
Jan 24, 2025, 2:38:18 PMJan 24
to BBR Development
Here is another example: https://drive.google.com/file/d/1a-071yRdIXCmxjFf0N8r8aKLcyE_RYEO/view?usp=sharing

Here, BBR is using port number 34688. Just after around ~48 seconds, BBR suddenly drops its window abruptly and continues to send only one packet for the rest of the ~60 seconds.

Neal Cardwell

unread,
Jan 25, 2025, 4:01:31 PMJan 25
to Shehab Sarar Ahmed, BBR Development
Thanks for the report. 

I have a theory. We have a BBR bug fix in our queue to send upstream that may address this.

Can you please share some more information about these cases:

+ I guess this is a Linux kernel? Can you please share your exact kernel version, from `uname -r` or similar?

+ This looks like MPTCP traffic, since all packets seem to have "mptcp" options. Is that intentional? I have not heard any reports about use of BBR with MPTCP in Linux, so I'm not sure that's . Are you running various MPTCP tests?

+ This traffic looks like it is unpaced. BBR congestion control must be used with pacing enabled. If there is no pacing, queuing and loss can be excessive. Can you please share the output of `tc qdisc show` so we can see what qdiscs are in place (e.g., is "fq" in place on the device over which this traffic flowed).

+ Are you able to try compiling and booting and testing a kernel patch if I post a patch in this thread?

thanks,
neal



--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/3be2d231-a5f6-42d1-84de-3e0d4237f720n%40googlegroups.com.

Shehab Sarar Ahmed

unread,
Jan 26, 2025, 10:01:38 AMJan 26
to BBR Development
1. I am actually using an emulated link by modifying mahimahi. So, there are two qdiscs that are relevant. One is for the emulated link. Another is for the packets to go from the egress to actual destination. Output of tc qdisc show: 
i. qdisc fq_codel 0: dev ingress root refcnt 2 limit 10240p flows 1024 quantum 1500 target 5.0ms interval 100.0ms memory_limit 32Mb ecn.
ii. qdisc fq_codel 0: dev delay-link-rrc- root refcnt 2 limit 10240p flows 1024 quantum 1500 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
2. uname -r: 5.4.230.mptcp. I am using MPTCP kernel for several MPTCP related tests. But for this specific case, BBR was only provided with a single link.
3. Yes, I am able to compile a new linux kernel patch.

Thank you Neal for your prompt response.

Neal Cardwell

unread,
Jan 26, 2025, 10:23:04 AMJan 26
to Shehab Sarar Ahmed, BBR Development
Thanks for the details.

> 2. uname -r: 5.4.230.mptcp. I am using MPTCP kernel for several MPTCP related tests. But for this specific case, BBR was only provided with a single link.

Interesting. Regarding your kernel, "5.4.230.mptcp", is that from https://github.com/multipath-tcp/mptcp/releases ? Did you download it, or build it? FWIW, I notice the message for that repo at https://github.com/multipath-tcp/mptcp seems to suggest this repo is deprecated:

 "Deprecated 🚫 Out-of-tree Linux Kernel implementation of MultiPath TCP. 👉 Use https://github.com/multipath-tcp/mptcp_net-next repo instead"

The kernel you are using seems to have broken TCP-layer pacing, FWIW. This may be related to that kernel being deprecated? That said, it seems that the excessive losses caused by the broken pacing may be helping you to reproduce a real issue.

> 3. Yes, I am able to compile a new linux kernel patch.

Great! If I were to provide a patch for you to compile and test, what github repo and SHA1 should I use as the tree against which to build a patch for you to apply with "git am foo.patch"?

thanks,
neal



Shehab Sarar Ahmed

unread,
Feb 2, 2025, 9:13:10 AMFeb 2
to BBR Development
Hello Neal,

Sorry for being late. When you asked me if I am able to compile a patch, what I understood is you would provide me with a complete new kernel to download and install. I am currently using one of the downloaded ones. But I would also be able to clone github repo and build from there. Could you kindly suggest me what git repo to clone and use so that I can then also run the patch provided by you. Sorry for the confusion.

Thanks
Shehab

Taifeng Tan

unread,
Mar 24, 2025, 11:33:21 AMMar 24
to BBR Development
hi Shehab,

Sorry to jump in. Just in case you still want to discuss this issue. Here are my findings:

This might not be an issue with the BBR protocol itself but rather related to TLP (Tail Loss Probe).

Taking packet 416797 (416798 is a duplicate of the same packet) as an example:

Observations:

  • The SACK block 3299226501-3299227821 indicates the receiver has successfully received data within this range.
  • The ACK number 3299227821 signifies the receiver expects data starting from 3299227821.
  • Logical relationship: The right edge of the SACK block (3299227821) equals the ACK number, implying the receiver has acknowledged all data prior to 3299227821. However, subsequent data (3299227821) has not arrived or is lost.

So, If the ACK number has advanced to 3299227821, confirming all prior data, why does the receiver redundantly declare receipt of this range via the SACK block 3299226501-3299227821?

My speculative analysis (without theoretical basis): The TCP receiver might retain old SACK blocks even after advancing the ACK number. This is not normal.

From the observed phenomenon, after receiving such a SACK, the tcp sender transmitted only one segment (packet 416799), and this transmission occurred after TLP (Tail Loss Probe). This might indicate that the sender continued with TLP – though this is also mere speculation on my part.

 

Suggestions for testing (for experimental purposes only):

  1. Modify the network environment to test under low packet-loss conditions, avoiding the triggering of TLP. If my speculation is correct, under low packet loss, even if the above issue exists, the current phenomenon—sending isolated packets and waiting—should not occur.
  2. Disable TLP while keeping other settings unchanged. If my speculation is correct, disabling TLP should prevent the phenomenon from recurring.

 

Let me know if you'd like to explore these tests further.

 Cook


Reply all
Reply to author
Forward
0 new messages