BBR v1 + XDP SYN Proxy in WAN has low throughput for 10s

Braden Bassingthwaite

unread,

Apr 5, 2025, 5:23:28 PMApr 5

to BBR Development

Hello,

I am investigating an issue with our usage of XDP SYNPROXY + BBR which is resulting in low bandwidth throughput for the first 10 seconds of a connection mainly in WAN scenarios.

I believe its because SYNPROXY is delaying the SYN from entering the kernel networking stack and when the client ACKs after the SYN/ACK, SYNPROXY is submitting both the client SYN and ACK in succession and causing an artificially low RTT. In this scenario, I see mrtt:0.037 which would indicate a min RTT of 37 microseconds which is orders of magnitude lower than the actual RTT of 44ms.

I believe because mrtt is in a 10s window, we see low bandwidth (< 1 mbps) and then after the initial rtt expires out, we see the bandwidth rise to > 500 Mbps. After the 10s, I see the mrtt:45.209 as the correct value and the cwnd rise as well.

Has anyone experienced this issue? Any thoughts on a possible path forward to resolve our issue? Is this something BRR should handle?

Thanks!

Taifeng Tan

unread,

Apr 8, 2025, 12:11:32 PMApr 8

to Braden Bassingthwaite, BBR Development

Hi,

Could you please share the pcap files captured at the sender, receiver, and any other points where you might have done packet captures? Reviewing them in Wireshark should help with the analysis.

Thanks,
Taifeng

'Braden Bassingthwaite' via BBR Development <bbr...@googlegroups.com> 于2025年4月6日周日 05:23写道：

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/3feb0438-1509-4a4d-84bf-5c28e570f651n%40googlegroups.com.

Braden Bassingthwaite

unread,

Apr 8, 2025, 3:01:27 PMApr 8

to BBR Development

I will attach a PCAP, but an easy way to replicate the issue is to:

Enable BBR:

sysctl net.core.default_qdisc=fq
sysctl net.ipv4.tcp_congestion_control=bbr

Enable SYNPROXY:

iptables -t raw -I PREROUTING -i eth0 -p tcp -m tcp --syn --dport 80 -j CT --notrack
iptables -t filter -A INPUT -i eth0 -p tcp -m tcp --dport 80 -m state --state INVALID,UNTRACKED -j SYNPROXY --sack-perm --timestamp --wscale 7 --mss 1460
iptables -t filter -A INPUT -i eth0 -m state --state INVALID -j DROP
sysctl -w net.netfilter.nf_conntrack_tcp_loose=0

sysctl -w net.ipv4.tcp_timestamps=1

And then have something like NGINX serve a large file over port 80 and download it from a client with RTT > 100ms.

There are two PCAPs attached:

1) bbr.pcap which the expected behaviour without SYN PROXY enabled

2) bbr_syn_proxy.pcap which shows the issue when SYN PROXY is enabled.

bbr_syn_proxy.pcap

bbr.pcap

Neal Cardwell

unread,

Apr 9, 2025, 8:56:52 AMApr 9

to Braden Bassingthwaite, BBR Development, Eric Dumazet, Yuchung Cheng, Kevin Yang

Hi Braden,

Thanks for the report. It seems like this artificially low RTT estimate that derives from the SYNPROXY approach would cause buggy behavior in several parts of the TCP stack:

+ RTT and RTO estimates will be incorrect, for loss recovery

+ RTT will be incorrect, causing incorrect pacing rates for TCP connections using pacing due to going through the fq qdisc (e.g., paced CUBIC)

+ for TCP connections using BBR, the min_rtt can be orders of magnitude too low, causing low cwnd values and very low throughput, as we see in your traces (5 Mbps instead of 440 Mbps, until the min_rtt expires around t=10 secs)

The most practical approach that occurs to me for fixing this would be something like:

(1) the TCP stack could have a static branch that controls whether the TCP stack uses the SYNACK RTT sample for connections for which TCP has seen a SYN and SYNACK; by default this is enabled, to use such RTT samples

(2) If/when the SYNPROXY iptables module is enabled, it would set the static branch to disable use of the SYNACK RTT sample

(3) If/when the SYNPROXY iptables module is unloaded, it would set the static branch to re-enable use of the SYNACK RTT sample

Hopefully that would allow SYNPROXY connections to not be fooled by the bogus SYNACK RTT sample, but also have a minimal performance impact for the case of machines not using the SYNPROXY feature.

Eric, Yuchung, and Kevin: any other ideas, or thoughts on that proposal?

thanks,

neal

To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/7c9c2846-853d-4d04-a0c0-a0b8cec032d3n%40googlegroups.com.

Taifeng Tan

unread,

Apr 9, 2025, 8:56:57 AMApr 9

to Braden Bassingthwaite, BBR Development

Hi Braden,

Thanks for sharing the pcap files and explaining more about the setup.

Is the current topology set up with NGINX and Synproxy deployed on 157.245.93.111, connecting to 207.47.245.215? Is that correct?

Based on both theoretical analysis and packet capture, I agree with your point: the overly small iRTT in the initial phase leads to its use as minRTT in the cwnd calculation formula, resulting in lower traffic during the first 10 seconds of the connection.
From both a theoretical and packet-level perspective, this makes sense.

Figure 1: Theoretical Analysis

Figure 2: BIF on Wireshark Graph

However, since the packet capture tool (tcpdump) is placed before the synproxy (between the synproxy and the receiver), the current captures are unable to see that extremely small iRTT or minRTT.
Below is RTT graph:

To see this extremely small iRTT, the setup can be modified as follows:
Deploy NGINX on 157.245.93.111, and use 207.47.245.215 as the receiver. Place a separate device in between as the synproxy. In this way, by capturing packets on all three devices, we can clearly observe the minRTT at any phase.
At present, I don’t have an environment available to conduct this experiment, but I will try to find the necessary resources. If you are able to deploy this experiment, that would be ideal.

Gentle note:

Capturing packets on all three devices will help better observe the phenomena.
Disabling all TCP offload features on 157.245.93.111 and the synproxy will assist in clearer observations.

Thanks

Taifeng

'Braden Bassingthwaite' via BBR Development <bbr...@googlegroups.com> 于2025年4月9日周三 03:01写道：

To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/7c9c2846-853d-4d04-a0c0-a0b8cec032d3n%40googlegroups.com.

Braden Bassingthwaite

unread,

Apr 9, 2025, 10:47:02 AMApr 9

to BBR Development

Hi Neal, thanks for the reply. Would it be fair to say that the problem is in the current SYNPROXY implementation and not necessarily how BBR interacts with RTT?

Is omitting a SYNACK RTT an existing thing or is that a change necessary in the kernel? It seems that the default syncookie implementation doesn't exhibit this same behaviour in my testing.

I am currently investigating bypassing SYNPROXY completely and relying on the default syncookie implementation to handle the SYN/ACK with the cookie. And then

putting more of the logic on when to generate a SYN cookie in our XDP program.

Braden Bassingthwaite

unread,

Apr 9, 2025, 10:47:12 AMApr 9

to BBR Development

Hey Taifeng, thanks for the reply.

> Is the current topology set up with NGINX and Synproxy deployed on 157.245.93.111, connecting to 207.47.245.215? Is that correct?

Other way around, client is 207.47.245.215 and connecting to 157.245.93.111 (SYN PROXY + NGINX). And the tcpdump capture is happening on the NGINX side (157.245.93.111)

In this setup, NGINX is serving a file that is local to the box, so there isn't a 3rd connection in the picture.

> However, since the packet capture tool (tcpdump) is placed before the synproxy (between the synproxy and the receiver), the current captures are unable to see that extremely small iRTT or minRTT.

I don't believe it's possible to place a capture between SYNPROXY and the server, it's embedded directly within netfilter of the local machine, at least not with tcpdump. I do feel like looking at netstat statistics of the connection is enough

to prove that the RTT is artificially low.

ts sack cubic wscale:6,8 rto:248 rtt:0.029/0.014 ato:40 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:1000
bytes_sent:131768 bytes_retrans:1448 bytes_received:85 segs_out:92 segs_in:3 data_segs_out:91 data_segs_in:1 send 399448275862bps
lastsnd:11 lastrcv:15 lastack:15 pacing_rate 798896551720bps delivered:1 app_limited busy:15ms rwnd_limited:15ms(100.0%)
unacked:90 retrans:0/1 rcv_space:14600 rcv_ssthresh:1448000 notsent:195480 minrtt:0.029 snd_wnd:131712 rcv_wnd:1448192

while a normal one looks like:

ts sack bbr wscale:6,8 rto:253 rtt:52.811/0.68 ato:40 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:7146 ssthresh:1870
bytes_sent:117419768 bytes_retrans:3202976 bytes_acked:110085648 bytes_received:84 segs_out:81093 segs_in:2476
data_segs_out:81091 data_segs_in:1 bbr:(bw:631047480bps,mrtt:46.078,pacing_gain:1.25,cwnd_gain:2) send 1567462536bps
lastrcv:2860 pacing_rate 780921256bps delivery_rate 544938504bps delivered:76140 busy:2860ms rwnd_limited:484ms(16.9%)
unacked:2853 retrans:0/2212 dsack_dups:113 reordering:282 reord_seen:125 rcv_space:14600 rcv_ssthresh:1460000
notsent:130320 minrtt:46.078 snd_wnd:4194240 rcv_wnd:1460224

Let me know your thoughts.

Neal Cardwell

unread,

Apr 9, 2025, 10:51:34 AMApr 9

to Braden Bassingthwaite, BBR Development

On Wed, Apr 9, 2025 at 10:47 AM 'Braden Bassingthwaite' via BBR Development <bbr...@googlegroups.com> wrote:

Hi Neal, thanks for the reply. Would it be fair to say that the problem is in the current SYNPROXY implementation and not necessarily how BBR interacts with RTT?

That is my sense, yes.

Is omitting a SYNACK RTT an existing thing or is that a change necessary in the kernel?

That would be a change needed in the kernel.

It seems that the default syncookie implementation doesn't exhibit this same behaviour in my testing.

Right. That's because the TCP stack knows it is using syncookies, so knows not to try to calculate an RTT sample. (e.g., see "treq->snt_synack = 0;" in cookie_tcp_reqsk_init().

The SYNPROXY implementation is manipulating packet timing without TCP knowing, so TCP doesn't yet know that the SYNACK RTT is bogus.

I am currently investigating bypassing SYNPROXY completely and relying on the default syncookie implementation to handle the SYN/ACK with the cookie. And then
putting more of the logic on when to generate a SYN cookie in our XDP program.

Yes, if you can get that approach to work ("relying on the default syncookie implementation to handle the SYN/ACK with the cookie"), that sounds preferable to my proposal above. Please let us know how that goes. Thanks!

Thanks,

neal

To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/06384010-b07b-490e-9687-bb8ec2080264n%40googlegroups.com.

Taifeng Tan

unread,

Apr 10, 2025, 8:18:33 AMApr 10

to Neal Cardwell, Braden Bassingthwaite, BBR Development

Hi Braden,

Let me clarify a few points:

You are correct that there is no third connection. In my diagram, Nginx, Synproxy, and tcpdump are all hosted on the same physical device (157.245.93.111). Logically, the packet path follows the order Nginx → Synproxy → tcpdump when analyzing traffic flow.

I agree that the netstat statistics alone suffice to prove the artificially low RTT. While I’m not a Synproxy expert, my intent was to demonstrate this phenomenon visually via Wireshark by capturing the abnormally small min_RTT. To achieve this, placing the three-way handshake proxy device between the TCP data sender (server) and receiver (client) is necessary. Anyway, no worries.

@Neal:

The behavior Braden observed is actually quite common in proxy/load‑balancer/anti‑DDoS deployments.

Here’s what happens:

Client ↔ Proxy RTT
The RTT between the TCP client (data receiver) and the proxy/LB/anti‑DDoS device is relatively large and closely matches the true end‑to‑end RTT.
Proxy ↔ Backend RTT
Once the proxy terminates the client’s handshake (SYN proxy) and establishes its own handshake with the backend server (the TCP sender), that RTT is typically very small.
Accurate RTT measurement
After data transfer begins, the RTT₃ (time from sending a segment to receiving its ACK) reflects the true end‑to‑end RTT.

If BBR uses the handshake RTT as its min_rtt, it may underestimate the real RTT. Although I haven’t built a testbed to prove this yet, the theory aligns exactly with Braden’s scenario.

Please consider excluding the initial handshake RTT (iRTT) from RTT calculations (min_rtt, RTO, etc.).

Site note: Different proxy/LB/anti‑DDoS devices handle traffic in various ways, but whenever SYN proxy (three‑way handshake proxying) is enabled and backend keep‑alives are disabled, you’ll see exactly this pattern. It’s a common configuration, not a rare edge case. These devices are ubiquitous, and despite their differing purposes, their SYN proxy implementations behave similarly—so the potential impact is broad.

Thanks,
Taifeng

'Neal Cardwell' via BBR Development <bbr...@googlegroups.com> 于2025年4月9日周三 22:51写道：

To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/CADVnQyn6qd5BJKBrQPihoO5S91G9rEzOfGFhA18H9X-h3dM1Mg%40mail.gmail.com.

Neal Cardwell

unread,

Apr 10, 2025, 9:03:15 AMApr 10

to Taifeng Tan, Braden Bassingthwaite, BBR Development

On Thu, Apr 10, 2025 at 12:30 AM Taifeng Tan <cook.ta...@gmail.com> wrote:

Hi Braden,

Let me clarify a few points:

You are correct that there is no third connection. In my diagram, Nginx, Synproxy, and tcpdump are all hosted on the same physical device (157.245.93.111). Logically, the packet path follows the order Nginx → Synproxy → tcpdump when analyzing traffic flow.

I agree that the netstat statistics alone suffice to prove the artificially low RTT. While I’m not a Synproxy expert, my intent was to demonstrate this phenomenon visually via Wireshark by capturing the abnormally small min_RTT. To achieve this, placing the three-way handshake proxy device between the TCP data sender (server) and receiver (client) is necessary. Anyway, no worries.

@Neal:

The behavior Braden observed is actually quite common in proxy/load‑balancer/anti‑DDoS deployments.

Here’s what happens:

Client ↔ Proxy RTT
The RTT between the TCP client (data receiver) and the proxy/LB/anti‑DDoS device is relatively large and closely matches the true end‑to‑end RTT.

Proxy ↔ Backend RTT
Once the proxy terminates the client’s handshake (SYN proxy) and establishes its own handshake with the backend server (the TCP sender), that RTT is typically very small.

Accurate RTT measurement
After data transfer begins, the RTT₃ (time from sending a segment to receiving its ACK) reflects the true end‑to‑end RTT.

If BBR uses the handshake RTT as its min_rtt, it may underestimate the real RTT. Although I haven’t built a testbed to prove this yet, the theory aligns exactly with Braden’s scenario.

Please consider excluding the initial handshake RTT (iRTT) from RTT calculations (min_rtt, RTO, etc.).

How do you propose that the server should set the RTO for its initial flight of data that it sends? If the server were to use the 1 second initial RTO default from RFC 6298 then this would impose an enormous latency penalty for connections that suffer a tail packet loss in the first flight of data sent by the server.

best,

neal

Taifeng Tan

unread,

Apr 10, 2025, 12:15:22 PMApr 10

to Neal Cardwell, Braden Bassingthwaite, BBR Development

Hi Neal,

Would it be reasonable to set different minRTO values for different network scenarios, similar to how Windows configures it?

For example, Windows assigns different minRTO values based on the network environment:

Parameter	InternetCustom	DatacenterCustom	Compat	Datacenter	Internet
MinRTO (ms)	300	20	300	20	300

After obtaining the initial RTT (iRTT), would it be reasonable to categorize it into predefined ranges and assign different minRTO values accordingly?

And other mechanisms, such as TLP, would also help mitigate the impact during loss recovery. And, is it possible to use history connection RTT (if any) to set a more reasonable initial RTO?

Thanks,

Taifeng

Neal Cardwell <ncar...@google.com> 于2025年4月10日周四 21:02写道：

Braden Bassingthwaite

unread,

Apr 10, 2025, 12:53:28 PMApr 10

to BBR Development

> Yes, if you can get that approach to work ("relying on the default syncookie implementation to handle the SYN/ACK with the cookie"), that sounds preferable to my proposal above. Please let us know how that goes. Thanks!

Unfortunately this doesn't work since the kernel will only validate syncookies IFF there was a recent SYN flood it has detected. Since were handling this in XDP, it doesn't know about the flood and therefore won't check for syn cookies in the incoming ACKs.

Going to reach out to the SYNPROXY folks if they have any thoughts on the matter.

Thanks!

Braden Bassingthwaite

unread,

Apr 10, 2025, 10:24:32 PMApr 10

to BBR Development

I ported the BBR v1 implementation into BPF and run as a custom CC in the kernel.

The line within bbr_init function has an odd behaviour:

bbr->min_rtt_us = tcp_min_rtt(tp);

It will quite frequently be set to 4294967295 (2^32-1) when flowing through SYN proxy.

I've since also patched the implementation to exclude the initial RTT if the value is less 100μs or the value is 2^32-1. And adjusted bbr_update_min_rtt to trigger

an update if the current min_rtt_us==0.

My experiments so far are looking promising. Likely not suitable to be included in mainline but a workaround until SYNPROXY can resolve the RTT issue.

Thanks for your folks time!

Also Kudos to https://github.com/zmrui/bbr-bpf for taking it 99% of the way. There were a couple of verifier errors that needed to be adjusted and the function signature for cong_control was changed in my kernel.

Taifeng Tan

unread,

Apr 11, 2025, 1:29:48 AMApr 11

to Braden Bassingthwaite, BBR Development

Hi Neal,

Regarding setting different minRTO values for different network scenarios (e.g., using a relatively smaller minRTO than 1s, like 20ms or 300ms), would that be better than using a small RTO calculated from the iRTT (just my guess)?
In the current RTO calculation during the first 10 seconds, it relies on the iRTT, right? Under what circumstances would it fall back to using 1 second as the RTO?

As for calculating the minRTT during the first 10 seconds, is it reasonable to use the RTT measured during data transmission—such as RTT3 in my diagram—instead of relying on the iRTT?

Thank you so much for sharing your deep-dive insights.

Taifeng

'Braden Bassingthwaite' via BBR Development <bbr...@googlegroups.com> 于2025年4月11日周五 10:24写道：

To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/e2a04f83-6af4-41da-b243-91d1f2b6169dn%40googlegroups.com.

Taifeng Tan

unread,

Apr 11, 2025, 1:53:32 AMApr 11

to Braden Bassingthwaite, BBR Development

I did some searching. It looks like the RTO clamping mechanism is a universal feature of the TCP protocol stack and is present across all standard TCP congestion control algorithms. BBR's RTO calculation also follows this.

If the RTT is very small (such as in the synproxy scenario here), and the system's TCP_RTO_MIN is set to 200ms, then the final RTO will be clamped to the minRTO value, such as 200ms. Is that right?

Taifeng Tan <cook.ta...@gmail.com> 于2025年4月11日周五 13:29写道：

Neal Cardwell

unread,

Apr 11, 2025, 9:05:54 AMApr 11

to Taifeng Tan, Braden Bassingthwaite, BBR Development

On Thu, Apr 10, 2025 at 12:15 PM Taifeng Tan <cook.ta...@gmail.com> wrote:

Hi Neal,
Would it be reasonable to set different minRTO values for different network scenarios, similar to how Windows configures it?

For example, Windows assigns different minRTO values based on the network environment:

Parameter InternetCustom DatacenterCustom Compat Datacenter Internet
MinRTO (ms) 300 20 300 20 300

After obtaining the initial RTT (iRTT), would it be reasonable to categorize it into predefined ranges and assign different minRTO values accordingly?

I don't think that would work as a fix. As you noted, the problem is that the iRTT is unreliable. So, much as using iRTT would lead to incorrect timer-based loss recovery, congestion control, and pacing decisions, using iRTT to try to categorize connections into these classes would lead to incorrect categorization. For sites with the SYNPROXY-style deployments, iRTT would typically be O(1ms) or less, so would be spuriously categorized as a "Datacenter" environment when the traffic might be over a long-RTT public Internet path.

And other mechanisms, such as TLP, would also help mitigate the impact during loss recovery.

TLP needs an RTT estimate in order to schedule a TLP probe timer. You are asking us not to use the iRTT to obtain an RTT estimate ("Please consider excluding the initial handshake RTT (iRTT) from RTT calculations"). So TLP won't have an RTT estimate to use, if we use your suggested approach. TLP would have to fall back to using a default value for the first flight. Using a default value would cause a performance cost.

And, is it possible to use history connection RTT (if any) to set a more reasonable initial RTO?

Yes, Linux caches RTT values. See "ip tcp_metrics". But busy servers see a diverse enough population of client IPs that they cannot rely on caching RTTs for every remote IP. And NAT (especially carrier-grade NAT) means that there can be clients with very different RTTs that have the same public IP.

best,

neal

Neal Cardwell

unread,

Apr 11, 2025, 9:05:58 AMApr 11

to Taifeng Tan, Braden Bassingthwaite, BBR Development

On Fri, Apr 11, 2025 at 1:53 AM Taifeng Tan <cook.ta...@gmail.com> wrote:

I did some searching. It looks like the RTO clamping mechanism is a universal feature of the TCP protocol stack and is present across all standard TCP congestion control algorithms. BBR's RTO calculation also follows this.

BBR does not calculate RTOs. The core Linux TCP stack does that.

If the RTT is very small (such as in the synproxy scenario here), and the system's TCP_RTO_MIN is set to 200ms, then the final RTO will be clamped to the minRTO value, such as 200ms. Is that right?

Close. :-) Please read the RFC; the Linux TCP RTO calculation is similar in spirit to RFC 6298, where the TCP_RTO_MIN of 200ms plays the role of "G" in the formulas in the RFC:

https://datatracker.ietf.org/doc/html/rfc6298#section-2

neal

To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/CANsso2hmRJt%2Bm6pbK%3D5a7ESqrbsVyvBj0Fe%3DVQ04NX8r%2BmZE1g%40mail.gmail.com.

Neal Cardwell

unread,

Apr 11, 2025, 9:06:06 AMApr 11

to Taifeng Tan, Braden Bassingthwaite, BBR Development

On Fri, Apr 11, 2025 at 1:29 AM Taifeng Tan <cook.ta...@gmail.com> wrote:

Hi Neal,

Regarding setting different minRTO values for different network scenarios (e.g., using a relatively smaller minRTO than 1s, like 20ms or 300ms), would that be better than using a small RTO calculated from the iRTT (just my guess)?

Yes, it would be better to choose one of those minRTO from the network scenario menu, but how would TCP do that, given that SYNPROXY is destroying the iRTT information that would be used to pick a minRTO from that network scenario menu? :-)

In the current RTO calculation during the first 10 seconds, it relies on the iRTT, right? Under what circumstances would it fall back to using 1 second as the RTO?

No. The 10-second window is for BBR's min_rtt estimate, used for BBR's cwnd calculation.

The 1-second fallback is what the core Linux TCP stack uses for TLP or RTO timeouts when there is no valid RTT sample yet. If Linux TCP used the proposal to ignore the iRTT then when sending the first flight of data the TLP/RTO timeouts would need to use 1 second.

As for calculating the minRTT during the first 10 seconds, is it reasonable to use the RTT measured during data transmission—such as RTT3 in my diagram—instead of relying on the iRTT?

Yes, the min_rtt and RTT estimates would use the RTT sample from the first ACK of data to initialize the RTT sample, if the TCP stack were set up to ignore the iRTT.

neal

To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/CANsso2iFLr_vEBHxgSumFAjoCJYBKWHcz6NrjzXzCVWKLm71Lg%40mail.gmail.com.

Taifeng Tan

unread,

Apr 12, 2025, 1:51:03 AMApr 12

to Neal Cardwell, Braden Bassingthwaite, BBR Development

Thanks for the explanation. Neal.

Neal Cardwell <ncar...@google.com> 于2025年4月11日周五 21:05写道：

yoshi nishida

unread,

Apr 15, 2025, 8:19:13 PMApr 15

to Taifeng Tan, Neal Cardwell, Braden Bassingthwaite, BBR Development

Hi,

I guess I am missing something, but I am still not very sure if this is just an issue for initial RTT.

Let's say the initial min_rtt is miscalculated 1000 times lower than the actual value.

In my understanding, even with miscalculated cwnd, BBR can catch up in around 10 RTT (log2(1000)). So, if the actual RTT is 40msec, it will just take 400 msec.

what's the other factors that affect more than 10 secs lower performance?

--

Yoshi

To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/CANsso2gYzJJzbxmJXh3F5JLTPf77KVVniw1B3%3Dx2xKgmMzGLVg%40mail.gmail.com.

Daniel Micay

unread,

Apr 15, 2025, 8:19:18 PMApr 15

to BBR Development

On Wednesday, April 9, 2025 at 8:56:52 a.m. UTC-4 Neal Cardwell wrote:

(1) the TCP stack could have a static branch that controls whether the TCP stack uses the SYNACK RTT sample for connections for which TCP has seen a SYN and SYNACK; by default this is enabled, to use such RTT samples

(2) If/when the SYNPROXY iptables module is enabled, it would set the static branch to disable use of the SYNACK RTT sample

(3) If/when the SYNPROXY iptables module is unloaded, it would set the static branch to re-enable use of the SYNACK RTT sample

Hopefully that would allow SYNPROXY connections to not be fooled by the bogus SYNACK RTT sample, but also have a minimal performance impact for the case of machines not using the SYNPROXY feature.

Similarly to the rest of the netfilter functionality, synproxy is designed to work the same way whether it's on the same machine or a separate machine. A host-based firewall using synproxy on the same machine is a special case. In general, netfilter doesn't handle the host-based firewall case in a special way. It could theoretically create the TCP session instead of sending a spoofed TCP handshake via loopback for the host-based firewall case but it wouldn't be a general solution. The host-based firewall case is mainly for small deployments using it to defend the host-based firewall from resource exhaustion (conntrack table exhaustion). At scale, it will be handled on edge nodes acting as a reverse proxy and load balancer. It can be at an enormous scale such as a DDoS protection provider with a huge anycast network using it to handle TCP handshakes as part of their service along with potentially implemented established connection limits which can't be exhausted with spoofed SYN packets.

Synproxy can be set up so that it only activates for packets above a certain rate limit threshold under SYN flood conditions. That's a nice way of doing it and means it's not as simple as it either being enabled or not enabled. SYN packets going over the rate limit would get handled with it and those under the rate limit without it. Setting things up to handle things as if there's an ongoing SYN flood attack and synproxy is being used for the general case where it's not being used would be far from ideal.

Synproxy is crucial when using a stateful firewall since otherwise it's trivial to exhaust the resources of the stateful firewall via a SYN flood filling the conntrack table. Many don't have it which is a disaster. It's a core part of what DDoS protection services for TCP services need to provide whether or not they implement an application layer reverse proxy or just filter the connections.

It would be nice if synproxy special cased the host-based firewall case where it could inform the TCP stack of what's happening but I don't think that will help with most real world usage.

Reply all

Reply to author

Forward