Long RTT Problem with BBRv1 and v3

157 views
Skip to first unread message

Li Zonglun

unread,
Jan 22, 2025, 9:32:00 AMJan 22
to BBR Development

Dear Neal,

I hope this email finds you well.

I have been conducting some tests involving BBRv1 and BBRv3 between two cloud servers and have encountered an issue that I’m hoping you could shed some light on. During traffic testing, I observed that the peak bandwidth I could achieve was around 100Mbps(1s granularity ). However, when performing the same test using iperf3 with UDP traffic, I was able to achieve at least 500Mbps between the same servers.

Further tests between two servers with lower RTT (100-200ms) yielded better results, without such significant bandwidth limitations. This led me to hypothesize that the high RTT (approximately 300ms) between the cloud servers might be affecting BBR versions performance.

I am curious as to why high RTT seems to degrade BBR’s performance in this case. To further investigate, I’ve attached graphs from a pcap capture(sender side with BBRv1) , where the the statistic granularity is 10ms. 6ad68a6aca27abadd79e9b0e750959b.png

400402624b4a632d50d3bd969e94c3f.png

As you can see, the pacing behavior appears to be lost, and the sending rate fluctuates significantly, which seems unusual to me.

Could you help clarify why this might be happening, and whether high RTT could cause such a performance drop in BBR? Any insights you have would be greatly appreciated.

Thank you very much for your time and expertise.


Best regards,  
Zonglun Li

Neal Cardwell

unread,
Jan 22, 2025, 10:12:33 AMJan 22
to Li Zonglun, BBR Development
Hi,

Thanks for the report.

Usually with TCP, when there are performance problems at high RTTs this is caused by the default receive buffer and send buffer settings on most Linux distributions being too small to achieve high throughput for high RTTs.

I would recommend the following:

+ please check the throughput you get with "cubic" CC, to help quickly get a sense if this is a BBR issue or a wider TCP issue

+ please try running something like following as root on sender and receiver machines, to tune sysctl settings enable large receive buffers and send buffers needed for long-RTT WAN flows:

 # print the starting values:
 ssh root@${HOST} "sysctl net.core.wmem_max net.core.rmem_max net.ipv4.tcp_wmem net.ipv4.tcp_rmem"
 # tune the settings:
 ssh root@${HOST} "sysctl -w net.core.wmem_max=900000000 net.ipv4.tcp_wmem='4096 262144 900000000'"
 ssh root@${HOST} "sysctl -w net.core.rmem_max=900000000 net.ipv4.tcp_rmem='4096 540000 900000000'"
 # print the ending values:
 ssh root@${HOST} "sysctl net.core.wmem_max net.core.rmem_max net.ipv4.tcp_wmem net.ipv4.tcp_rmem"

+ if throughput is still slow when you make those tuning changes and run your tests, then please share ss and tcpdump traces; something like the following:

(while true; do date +%s.%N; ss -tinmo; sleep 0.025; done) > /tmp/ss.txt &
tcpdump -n -c 3000000 -w /tmp/tcpdump.pcap -i any -s 128 port $PORT &
# run test
killall tcpdump
gzip /tmp/tcpdump.pcap
gzip /tmp/ss.txt

Before using ss, I recommend downloading and using the latest ss binary, since some disributions have very old ss tools:

Please see the "Considerations When Benchmarking TCP Bulk Flows" doc from the following page for more info:

thanks,
neal





--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/0b9d1549-80a8-43e9-8fe3-0f00cf0eb4d4n%40googlegroups.com.

Li Zonglun

unread,
Jan 23, 2025, 8:14:15 AMJan 23
to BBR Development
After setting the wmem and rmem to a larger value, BBR works really well on my cloud server!
It turns out to be the buffer window.

Thanks Neal!

Li

Neal Cardwell

unread,
Jan 23, 2025, 8:15:14 AMJan 23
to Li Zonglun, BBR Development
Great. Glad to hear that the buffer tuning resolved the performance issue!

Thanks,
neal


Taifeng Tan

unread,
Mar 24, 2025, 8:35:04 AMMar 24
to Li Zonglun, BBR Development
Sorry to jump into the discussion, Zonglun. Just in case you need to troubleshoot another issue like this one — a pcap from the sender (and sometimes a pcap from the receiver also) would help pinpoint issues such as the small wmem/rmem problem and many others. I'd be happy to help anytime.

Li Zonglun <gunp...@gmail.com> 于2025年1月23日周四 21:14写道:
Reply all
Reply to author
Forward
0 new messages