RTT jump after the RTT-Probe

149 views
Skip to first unread message

Jyotirmoy Banik

unread,
Jan 5, 2024, 8:45:30 PMJan 5
to BBR Development
Hi All,

I was simulating a wireless network using ns-3 with the BBRv1 (I know this is sort of obsolete, but this is the default one ships with ns-3). In this simulation, I choose a very low error rate in the link layer (1e-6), so link layer retransmission is not happening and there is only one path between the source and the destination. I notice something interesting here. I am attaching the throughput and the RTT plots that I got from the wireshark trace. 

Throughput
tput.png
rtt.png

The throughput number is aligned with the expectation and the RTT in general is within the expected value, but what I am struggling to explain is: why the RTT plots are clearly divided into three segments. If I understand correctly, this happens when RTT probe takes place (because it is aligned with the throughput drop in the top figure). But I am not sure why would that happen. If I understand correctly, BBR is designed to keep the RTT from climbing. One possible hypothesis is, during the RTT probe, somehow BBR overestimates the RTT and results in a higher BDP (i.e. operating point) and essentially creates a backlog. But honestly, I am not super convinced with this explanation. I would appreciate if someone could shed some light on it. Thank you. 

Neal Cardwell

unread,
Jan 5, 2024, 9:10:49 PMJan 5
to Jyotirmoy Banik, BBR Development
On Fri, Jan 5, 2024 at 6:45 PM Jyotirmoy Banik <jba...@gmail.com> wrote:
Hi All,

I was simulating a wireless network using ns-3 with the BBRv1 (I know this is sort of obsolete, but this is the default one ships with ns-3). In this simulation, I choose a very low error rate in the link layer (1e-6), so link layer retransmission is not happening and there is only one path between the source and the destination. I notice something interesting here. I am attaching the throughput and the RTT plots that I got from the wireshark trace. 

Throughput
tput.png
rtt.png

The throughput number is aligned with the expectation and the RTT in general is within the expected value, but what I am struggling to explain is: why the RTT plots are clearly divided into three segments. If I understand correctly, this happens when RTT probe takes place (because it is aligned with the throughput drop in the top figure).

Yes, the three segments seem to be due to PROBE_RTT phases happening near t=10 secs and t=20 secs.
 
But I am not sure why would that happen. If I understand correctly, BBR is designed to keep the RTT from climbing. One possible hypothesis is, during the RTT probe, somehow BBR overestimates the RTT and results in a higher BDP (i.e. operating point) and essentially creates a backlog. But honestly, I am not super convinced with this explanation. I would appreciate if someone could shed some light on it. Thank you. 

I would bet that the RTT is growing around t=11-15 secs and t=21-25 secs because during those intervals the max-filtered BBR bandwidth estimate is slightly higher than the average delivery rate of the bottleneck link. This causes the average sending rate to be slightly too high, which causes a standing queue to gradually build at the bottleneck link.

BBRv3 should do better in a single-flow wireless scenario like this, since it will explicitly attempt to fully drain the bottleneck queue (reduce in-flight data to the estimated BDP) once per bandwidth-probing cycle. So a slight overestimate in the available bandwidth should result in a small queue accumulating over the course of one bandwidth-probing cycle (which is then almost entirely drained), rather than a substantial queue like this accumulating and remaining over long periods.

best regards,
neal

 

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/44052d58-943e-4699-b160-3c3e2a42a559n%40googlegroups.com.

Jyotirmoy Banik

unread,
Jan 8, 2024, 5:27:52 PMJan 8
to BBR Development


On Friday 5 January 2024 at 18:10:49 UTC-8 Neal Cardwell wrote:
On Fri, Jan 5, 2024 at 6:45 PM Jyotirmoy Banik wrote:
Hi All,

I was simulating a wireless network using ns-3 with the BBRv1 (I know this is sort of obsolete, but this is the default one ships with ns-3). In this simulation, I choose a very low error rate in the link layer (1e-6), so link layer retransmission is not happening and there is only one path between the source and the destination. I notice something interesting here. I am attaching the throughput and the RTT plots that I got from the wireshark trace. 

Throughput
tput.png
rtt.png

The throughput number is aligned with the expectation and the RTT in general is within the expected value, but what I am struggling to explain is: why the RTT plots are clearly divided into three segments. If I understand correctly, this happens when RTT probe takes place (because it is aligned with the throughput drop in the top figure).

Yes, the three segments seem to be due to PROBE_RTT phases happening near t=10 secs and t=20 secs.
 
But I am not sure why would that happen. If I understand correctly, BBR is designed to keep the RTT from climbing. One possible hypothesis is, during the RTT probe, somehow BBR overestimates the RTT and results in a higher BDP (i.e. operating point) and essentially creates a backlog. But honestly, I am not super convinced with this explanation. I would appreciate if someone could shed some light on it. Thank you. 

I would bet that the RTT is growing around t=11-15 secs and t=21-25 secs because during those intervals the max-filtered BBR bandwidth estimate is slightly higher than the average delivery rate of the bottleneck link. This causes the average sending rate to be slightly too high, which causes a standing queue to gradually build at the bottleneck link.
[Jyotirmoy] Thanks for confirming my hypothesis. I do have a follow up question. What causes the BBR to overestimate the max-filtered bandwidth, right after the RTT_PROBE? I am trying to understand the correlation here.   

Neal Cardwell

unread,
Jan 8, 2024, 5:46:31 PMJan 8
to Jyotirmoy Banik, BBR Development
On Mon, Jan 8, 2024 at 5:27 PM Jyotirmoy Banik <jba...@gmail.com> wrote:


On Friday 5 January 2024 at 18:10:49 UTC-8 Neal Cardwell wrote:
On Fri, Jan 5, 2024 at 6:45 PM Jyotirmoy Banik wrote:
Hi All,

I was simulating a wireless network using ns-3 with the BBRv1 (I know this is sort of obsolete, but this is the default one ships with ns-3). In this simulation, I choose a very low error rate in the link layer (1e-6), so link layer retransmission is not happening and there is only one path between the source and the destination. I notice something interesting here. I am attaching the throughput and the RTT plots that I got from the wireshark trace. 

Throughput
tput.png
rtt.png

The throughput number is aligned with the expectation and the RTT in general is within the expected value, but what I am struggling to explain is: why the RTT plots are clearly divided into three segments. If I understand correctly, this happens when RTT probe takes place (because it is aligned with the throughput drop in the top figure).

Yes, the three segments seem to be due to PROBE_RTT phases happening near t=10 secs and t=20 secs.
 
But I am not sure why would that happen. If I understand correctly, BBR is designed to keep the RTT from climbing. One possible hypothesis is, during the RTT probe, somehow BBR overestimates the RTT and results in a higher BDP (i.e. operating point) and essentially creates a backlog. But honestly, I am not super convinced with this explanation. I would appreciate if someone could shed some light on it. Thank you. 

I would bet that the RTT is growing around t=11-15 secs and t=21-25 secs because during those intervals the max-filtered BBR bandwidth estimate is slightly higher than the average delivery rate of the bottleneck link. This causes the average sending rate to be slightly too high, which causes a standing queue to gradually build at the bottleneck link.
[Jyotirmoy] Thanks for confirming my hypothesis. I do have a follow up question. What causes the BBR to overestimate the max-filtered bandwidth, right after the RTT_PROBE? I am trying to understand the correlation here.   

It's not clear to me from these plots what, specifically, is causing the bandwidth overestimation. A conjecture: it might be some kind of aggregation behavior or link-layer retransmission behavior that is causing a burst of ACKs to arrive at a rate that is faster than the sustainable throughput of that simulated path. This is a known issue, and can happen in the real world, particularly for wifi or cellular links. It's not ideal, but the resulting queue is generally no worse than CUBIC / Reno in the same scenario. If folks in the research community have ideas about how to improve the bandwidth estimation approach in practical/feasible ways, that's great. :-)

I would suggest graphing a time-sequence plot if you want to root-cause the overestimation.

The easiest way to generate a time-sequence plot might be to ask ns-3 to generate a pcap output file, and then use tcptrace and xplot.org or wireshark to take the pcap file as input and make the plot as output.

To generate pcap output, AFAICT from poking around you may be able to Use the PcapHelperForDevice class to enable pcap tracing on specific devices:
PcapHelperForDevice pcapHelper;
pcapHelper.EnablePcap("my-trace", devices, false);  // devices is a NetDeviceContainerThe EnablePcap method takes:
    • prefix for the generated file names.
    • The NetDeviceContainer holding the devices to trace.
    • A boolean flag for promiscuous mode (capture all traffic, not just addressed to the device).
best regards,
neal

 

BBRv3 should do better in a single-flow wireless scenario like this, since it will explicitly attempt to fully drain the bottleneck queue (reduce in-flight data to the estimated BDP) once per bandwidth-probing cycle. So a slight overestimate in the available bandwidth should result in a small queue accumulating over the course of one bandwidth-probing cycle (which is then almost entirely drained), rather than a substantial queue like this accumulating and remaining over long periods.

best regards,
neal

 

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/44052d58-943e-4699-b160-3c3e2a42a559n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages