Hi,
Could you please share the pcap files captured at the sender, receiver, and any other points where you might have done packet captures? Reviewing them in Wireshark should help with the analysis.
Thanks,
Taifeng
--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/3feb0438-1509-4a4d-84bf-5c28e570f651n%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/7c9c2846-853d-4d04-a0c0-a0b8cec032d3n%40googlegroups.com.
Hi Braden,
Thanks for sharing the pcap files and explaining more about the setup.
Is the current topology set up with NGINX and Synproxy deployed on 157.245.93.111, connecting to 207.47.245.215? Is that correct?
Based on both theoretical analysis and packet capture, I
agree with your point: the overly small iRTT in the initial phase leads to its
use as minRTT in the cwnd calculation formula, resulting in lower traffic
during the first 10 seconds of the connection.
From both a theoretical and packet-level perspective, this makes sense.
Figure 1: Theoretical Analysis
Figure 2: BIF on Wireshark Graph
However, since the packet capture tool (tcpdump) is placed
before the synproxy (between the synproxy and the receiver), the current
captures are unable to see that extremely small iRTT or minRTT.
Below is RTT graph:
To see this extremely small iRTT, the setup can be modified
as follows:
Deploy NGINX on 157.245.93.111, and use 207.47.245.215 as the receiver. Place a
separate device in between as the synproxy. In this way, by capturing packets
on all three devices, we can clearly observe the minRTT at any phase.
At present, I don’t have an environment available to conduct this experiment,
but I will try to find the necessary resources. If you are able to deploy this
experiment, that would be ideal.
Gentle note:
Thanks
Taifeng
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/7c9c2846-853d-4d04-a0c0-a0b8cec032d3n%40googlegroups.com.
Hi Neal, thanks for the reply. Would it be fair to say that the problem is in the current SYNPROXY implementation and not necessarily how BBR interacts with RTT?
Is omitting a SYNACK RTT an existing thing or is that a change necessary in the kernel?
It seems that the default syncookie implementation doesn't exhibit this same behaviour in my testing.
I am currently investigating bypassing SYNPROXY completely and relying on the default syncookie implementation to handle the SYN/ACK with the cookie. And thenputting more of the logic on when to generate a SYN cookie in our XDP program.
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/06384010-b07b-490e-9687-bb8ec2080264n%40googlegroups.com.
Hi Braden,
Let me clarify a few points:
You are correct that there is no third connection. In my diagram, Nginx, Synproxy, and tcpdump are all hosted on the same physical device (157.245.93.111). Logically, the packet path follows the order Nginx → Synproxy → tcpdump when analyzing traffic flow.
I agree that the netstat statistics alone suffice to prove the artificially low RTT. While I’m not a Synproxy expert, my intent was to demonstrate this phenomenon visually via Wireshark by capturing the abnormally small min_RTT. To achieve this, placing the three-way handshake proxy device between the TCP data sender (server) and receiver (client) is necessary. Anyway, no worries.
@Neal:
The behavior Braden observed is actually quite common in proxy/load‑balancer/anti‑DDoS deployments.
Here’s what happens:
If BBR uses the handshake RTT as its min_rtt, it may underestimate the real RTT. Although I haven’t built a testbed to prove this yet, the theory aligns exactly with Braden’s scenario.
Please consider excluding the initial handshake RTT (iRTT) from RTT calculations (min_rtt, RTO, etc.).
Site note: Different proxy/LB/anti‑DDoS devices handle traffic in various ways, but whenever SYN proxy (three‑way handshake proxying) is enabled and backend keep‑alives are disabled, you’ll see exactly this pattern. It’s a common configuration, not a rare edge case. These devices are ubiquitous, and despite their differing purposes, their SYN proxy implementations behave similarly—so the potential impact is broad.
Thanks,
Taifeng
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/CADVnQyn6qd5BJKBrQPihoO5S91G9rEzOfGFhA18H9X-h3dM1Mg%40mail.gmail.com.
Hi Braden,
Let me clarify a few points:
You are correct that there is no third connection. In my diagram, Nginx, Synproxy, and tcpdump are all hosted on the same physical device (157.245.93.111). Logically, the packet path follows the order Nginx → Synproxy → tcpdump when analyzing traffic flow.
I agree that the netstat statistics alone suffice to prove the artificially low RTT. While I’m not a Synproxy expert, my intent was to demonstrate this phenomenon visually via Wireshark by capturing the abnormally small min_RTT. To achieve this, placing the three-way handshake proxy device between the TCP data sender (server) and receiver (client) is necessary. Anyway, no worries.
@Neal:
The behavior Braden observed is actually quite common in proxy/load‑balancer/anti‑DDoS deployments.
Here’s what happens:
- Client ↔ Proxy RTT
The RTT between the TCP client (data receiver) and the proxy/LB/anti‑DDoS device is relatively large and closely matches the true end‑to‑end RTT.- Proxy ↔ Backend RTT
Once the proxy terminates the client’s handshake (SYN proxy) and establishes its own handshake with the backend server (the TCP sender), that RTT is typically very small.- Accurate RTT measurement
After data transfer begins, the RTT₃ (time from sending a segment to receiving its ACK) reflects the true end‑to‑end RTT.If BBR uses the handshake RTT as its min_rtt, it may underestimate the real RTT. Although I haven’t built a testbed to prove this yet, the theory aligns exactly with Braden’s scenario.
Please consider excluding the initial handshake RTT (iRTT) from RTT calculations (min_rtt, RTO, etc.).
Hi Neal,
Would it be reasonable to set different minRTO
values for different network scenarios, similar to how Windows configures it?
For example, Windows assigns different minRTO
values based on the network environment:
Parameter | InternetCustom | DatacenterCustom | Compat | Datacenter | Internet |
---|---|---|---|---|---|
MinRTO (ms) | 300 | 20 | 300 | 20 | 300 |
After obtaining the initial RTT (iRTT
), would it be reasonable to categorize it into predefined ranges and assign different minRTO
values accordingly?
And other mechanisms, such as TLP, would also help mitigate the impact during loss recovery. And, is it possible to use history connection RTT (if any) to set a more reasonable initial RTO?
Thanks,
Taifeng
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/e2a04f83-6af4-41da-b243-91d1f2b6169dn%40googlegroups.com.
I did some searching. It looks like the RTO clamping mechanism is a universal feature of the TCP protocol stack and is present across all standard TCP congestion control algorithms. BBR's RTO calculation also follows this.
If the RTT is very small (such as in the synproxy scenario here), and the system's TCP_RTO_MIN
is set to 200ms, then the final RTO will be clamped to the minRTO value, such as 200ms. Is that right?
Hi Neal,
Would it be reasonable to set different
minRTO
values for different network scenarios, similar to how Windows configures it?For example, Windows assigns different
minRTO
values based on the network environment:
Parameter InternetCustom DatacenterCustom Compat Datacenter Internet MinRTO (ms) 300 20 300 20 300 After obtaining the initial RTT (
iRTT
), would it be reasonable to categorize it into predefined ranges and assign differentminRTO
values accordingly?
And other mechanisms, such as TLP, would also help mitigate the impact during loss recovery.
And, is it possible to use history connection RTT (if any) to set a more reasonable initial RTO?
I did some searching. It looks like the RTO clamping mechanism is a universal feature of the TCP protocol stack and is present across all standard TCP congestion control algorithms. BBR's RTO calculation also follows this.
If the RTT is very small (such as in the synproxy scenario here), and the system's
TCP_RTO_MIN
is set to 200ms, then the final RTO will be clamped to the minRTO value, such as 200ms. Is that right?
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/CANsso2hmRJt%2Bm6pbK%3D5a7ESqrbsVyvBj0Fe%3DVQ04NX8r%2BmZE1g%40mail.gmail.com.
Hi Neal,
Regarding setting different minRTO values for different network scenarios (e.g., using a relatively smaller minRTO than 1s, like 20ms or 300ms), would that be better than using a small RTO calculated from the iRTT (just my guess)?
In the current RTO calculation during the first 10 seconds, it relies on the iRTT, right? Under what circumstances would it fall back to using 1 second as the RTO?
As for calculating the minRTT during the first 10 seconds, is it reasonable to use the RTT measured during data transmission—such as RTT3 in my diagram—instead of relying on the iRTT?
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/CANsso2iFLr_vEBHxgSumFAjoCJYBKWHcz6NrjzXzCVWKLm71Lg%40mail.gmail.com.
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/CANsso2gYzJJzbxmJXh3F5JLTPf77KVVniw1B3%3Dx2xKgmMzGLVg%40mail.gmail.com.
(1) the TCP stack could have a static branch that controls whether the TCP stack uses the SYNACK RTT sample for connections for which TCP has seen a SYN and SYNACK; by default this is enabled, to use such RTT samples(2) If/when the SYNPROXY iptables module is enabled, it would set the static branch to disable use of the SYNACK RTT sample(3) If/when the SYNPROXY iptables module is unloaded, it would set the static branch to re-enable use of the SYNACK RTT sampleHopefully that would allow SYNPROXY connections to not be fooled by the bogus SYNACK RTT sample, but also have a minimal performance impact for the case of machines not using the SYNPROXY feature.