bbrv1 is unable to detect more bandwidth when min_rtt is much smaller typical rtt

290 views
Skip to first unread message

mingkun bian

unread,
Feb 4, 2021, 4:54:12 AM2/4/21
to BBR Development
Hi, 

    I  use wifi to measure speed in the early morning,because the network at the current time is likely to be idle.  I found that the speed is sometimes too bigger or too smaller,such 75Mbps or 7.5Mbps,then I use bcc to collect protocol stack data in server, then compare them and find a problem:

    min_rtt is 20ms, typical rtt is about 80ms,  one part of the sampling data is roughly as follows:
  
    at t1,             cwnd 84 pacingrate is 2532203, pacing gain is 1.25  maxbw is 24313
    at t1+20ms,    cwnd 84 pacingrate is 1519321, pacing gain is 0.75  maxbw is 24313
    at t1+40ms,    cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1+60ms,    cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1+80ms,    cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313,  it will ack segs of time t1 sent.
   
    The problem is that time "t1+80ms" will not calculate more delivered data because  "t1+20ms" will slow down speed  although “t1”  speed up the speed.   

    Do you have any suggestions for this kind of scene ?
  
    Thanks. 
    

Neal Cardwell

unread,
Feb 4, 2021, 11:13:41 AM2/4/21
to mingkun bian, BBR Development
Hi,

Thanks for your note! I agree with your observation.

On paths like wifi, the min_rtt is often much smaller than the typical RTT, as can be seen, for example, in our IETF 101 slides:


In such cases, the BBRv1 approach of only probing for bandwidth for a single min_rtt can lead to bandwidth growth problems, as you note.

One of the work items under way in our team for BBRv2 addresses this issue by making the bandwidth probing phase longer: always at least K packet-timed round trips rather than one min_rtt.

If there are other ideas out there for addressing this issue, we'd be interested in hearing them.

Thanks!
neal




--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/640bd915-2085-4bfa-bfef-09555264b31dn%40googlegroups.com.

mingkun bian

unread,
Feb 5, 2021, 1:49:44 AM2/5/21
to BBR Development
Hi Neal,

    I still have some doubts about bw calculation, it seams like that the total num of sent date from “t1” to “t1 + 80ms” is bw*80ms because of pacing gain(1.25->0.75), then new_bw = (bw*20*1.25 +bw*20*0.75 + bw*20*1 + bw*20*1)/80ms = bw, then it
does not probe more bandwidth.

   But I looked at the logic of calculating bandwidth in kernel, which is not like this(tcp_rate.c tcp_rate_skb_delivered and tcp_rate_gen) as followig:

    min_rtt is 20ms, typical rtt is about 80ms,  one part of the sampling data is roughly as follows(inflight is almost the same about 82):

    at t1 - 60ms,     cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1 - 40ms,     cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1 - 20ms,     cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1,               cwnd 84 pacingrate is 2532203, pacing gain is 1.25  maxbw is 24313  deliverd is d1
    at t1 + 20ms,    cwnd 84 pacingrate is 1519321, pacing gain is 0.75  maxbw is 24313
    at t1 + 40ms,    cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1 + 60ms,    cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1 + 80ms,    cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313,  it will ack segs of time t1 sent.
    at t1 + 100ms,   deliverd is d1+ delivered  from “at t1-60ms” to “t1+20ms”, then new_bw = (d1 + (bw*20 * 3 + bw * 20 *1.25) - d1)/80ms = (bw+ bw/16) > bw

    As above, at time "t1+100ms", new_bw is only a little bigger than bw,it is hard to probe more  bandwidth,  the reason for  what you said “the BBRv1 approach of only probing for bandwidth for a single min_rtt can lead to bandwidth growth problems” 
is described above,  am I right?

   Thanks.

Neal Cardwell

unread,
Feb 5, 2021, 10:42:30 AM2/5/21
to mingkun bian, BBR Development
On Fri, Feb 5, 2021 at 1:49 AM mingkun bian <bianm...@gmail.com> wrote:
Hi Neal,

    I still have some doubts about bw calculation, it seams like that the total num of sent date from “t1” to “t1 + 80ms” is bw*80ms because of pacing gain(1.25->0.75), then new_bw = (bw*20*1.25 +bw*20*0.75 + bw*20*1 + bw*20*1)/80ms = bw, then it
does not probe more bandwidth.

   But I looked at the logic of calculating bandwidth in kernel, which is not like this(tcp_rate.c tcp_rate_skb_delivered and tcp_rate_gen) as followig:

    min_rtt is 20ms, typical rtt is about 80ms,  one part of the sampling data is roughly as follows(inflight is almost the same about 82):

    at t1 - 60ms,     cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1 - 40ms,     cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1 - 20ms,     cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1,               cwnd 84 pacingrate is 2532203, pacing gain is 1.25  maxbw is 24313  deliverd is d1
    at t1 + 20ms,    cwnd 84 pacingrate is 1519321, pacing gain is 0.75  maxbw is 24313
    at t1 + 40ms,    cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1 + 60ms,    cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313
    at t1 + 80ms,    cwnd 84 pacingrate is 2025762, pacing gain is 1  maxbw is 24313,  it will ack segs of time t1 sent.
    at t1 + 100ms,   deliverd is d1+ delivered  from “at t1-60ms” to “t1+20ms”, then new_bw = (d1 + (bw*20 * 3 + bw * 20 *1.25) - d1)/80ms = (bw+ bw/16) > bw

    As above, at time "t1+100ms", new_bw is only a little bigger than bw,it is hard to probe more  bandwidth,  the reason for  what you said “the BBRv1 approach of only probing for bandwidth for a single min_rtt can lead to bandwidth growth problems” 
is described above,  am I right?

Yes, that matches my sense as well. That's a great numerical example. In paths like this where the typical RTT is far larger than the min_rtt, the bandwidth growth rate is highly "diluted" because only a small fraction of the round trip is spent at a higher sending rate, so only a small fraction of the bandwidth sampling interval reflects the higher sending rate.

best,
neal

 

mingkun bian

unread,
Feb 25, 2021, 7:08:29 AM2/25/21
to BBR Development
Hi Neal,

    I just read the bbrv2 code roughly,you say that  probing phase is "always at least K packet-timed round trips rather than one min_rtt", but I found that the logic of the probing time is like this:
    
    static void bbr2_update_cycle_phase(struct sock *sk,
    const struct rate_sample *rs) {
    ...
    case BBR_BW_PROBE_UP:
    ...
    if (bbr2_has_elapsed_in_phase(sk, bbr->min_rtt_us) &&
   inflight >=
   bbr_inflight(sk, bw,
bbr->params.bw_probe_pif_gain)) {
is_queuing = true;
}
if (is_risky || is_queuing) {
bbr->prev_probe_too_high = 0;  /* no loss/ECN (yet) *//
bbr2_start_bw_probe_down(sk);  /* restart w/ down */
}
    ...
    }
    
    Compared to the bbrv1, bbrv2 does not have the logic you mentioned “ at least K packet-timed round trips rather than one min_rtt ”,
    Does this logic have not been incorporated into bbrv2 yet?And  how is K calculated?

Thanks.
   
在2021年2月5日星期五 UTC+8 上午12:13:41<Neal Cardwell> 写道:

Neal Cardwell

unread,
Feb 25, 2021, 9:19:53 AM2/25/21
to mingkun bian, BBR Development
>    Does this logic have not been incorporated into bbrv2 yet?

Right, the code for "making the bandwidth probing phase longer: always at least K packet-timed round trips" is, as mentioned, "One of the work items under way in our team", and as such, is under internal testing and not yet pushed to github.

And  how is K calculated?

It seems the most likely options are either a K of 2 or 3 round trips. In the first round trip, the delivery rate sample will only reflect the previous round trip, which will reflect the lower non-probing bandwidth, or if the flow is restarting from idle then will be close to zero. In the second round trip the delivery rate sample  will reflect one round of bandwidth probing, so AFAICT this would be the minimum required for the scheme to work. Allowing further rounds of probing could potentially allow for being more robust to noise and also allow the receiver to autotune its receive window to a higher value, allowing the delivery rate to grow still further, so that artificial receive window limits do not spuriously constrain bandwidth growth.

Overall this proposal is similar to the bandwidth-saturation estimator for exiting STARTUP, so if that approach works out I was hoping to reuse the same bandwidth-saturation estimator for exiting (a) STARTUP and (b) bandwidth probing in the steady-state.

best,
neal


mingkun bian

unread,
Feb 26, 2021, 4:46:49 AM2/26/21
to BBR Development
Hi Neal,
 
    I do not quite understand why needing to probe up 2 or 3  round trips:

    1. In the first round trip, the delivery rate sample will only reflect the previous round trip, which will reflect the lower non-probing bandwidth, or if the flow is restarting from idle then will be close to zero
    >>In the first round trip, the delivery rate sample will only reflect the previous round trip because we just begin to probe up at 1.25*bw,  and as you mentioned,  In the second round trip the delivery rate sample  will reflect one round of bandwidth probing, 
        So it seems like that only 1 round trip rtt of probing up which is bigger than min_rtt can meet this scenario,why needing 2 or 3 round trips?Will more rounds trip rtt of probing up make the inflight too big?

    2. We can not use the bbrv2 now because we can not use the newer kernel, using 2 or 3 round trip of probing up may cause too high inflight or loss because bbrv1 does not have the same mechanism as bbrv2 about inflight_lo/inflight_hi, 
         So can 1 round trip rtt of probing up in bbrv1 partially solve this problem?

Thanks.

Neal Cardwell

unread,
Feb 26, 2021, 9:47:57 AM2/26/21
to mingkun bian, BBR Development
Hi,

I agree that the single packet-timed round trip of bandwidth probing should be sufficient to solve the problem you raise in this thread (slow throughput growth over a single round trip when the min_rtt is much lower than the typical RTT).

Our internal effort related to this is looking at holding the bandwidth-probing state for 2 or 3 round trips because we are also trying to address a separate but related issue: in BBRv2, where the time between bandwidth probing can be much longer than BBRv1 (for Reno/CUBIC coexistence), it may be highly advantageous to stay in the bandwidth-probing state, growing the sending rate rapidly each round trip until the pipe appears to be full. This can allow unutilized bandwidth to be fully utilized in something like log_{1.25}(final_bw/starting_bw)*RTT rather than log_{1.25}(final_bw/starting_bw)*(bw_probing_cycle_length). Given that in BBRv2 the bw_probing_cycle_length can be up to about 2sec or 62 round trips, this can be a very large difference in the time required to utilize free bandwidth in the path.

One suggestion: given that it sounds like you are investigating having BBRv1 hold the phase 1.25 for a full packet-timed round trip, I would suggest that correspondingly in the following phase (which is intended to try to drain the queue), you also increase this phase to be a full packet-timed round trip, and that you correspondingly use a pacing gain of 1/1.25 = 0.8, in order to try to fully drain the queue. In short, that you experiment with replacing the BBRv1 phases of (1) min_rtt with 1.25x and then (2) min_rtt with 0.75x pacing gain, with instead:

(1) packet-timed round trip of sending using pacing gain of 1.25
(2) packet-timed round trip of sending using pacing gain of 1/1.25 = 0.8

Why am I suggesting (2) using a pacing gain of 1/1.25? The rationale is the same as the one that motivates DRAIN using a pacing gain that's the reciprocal of the STARTUP pacing gain. For an idealized case where there is a single flow that has already fully utilized the pipe, in steady state, we can estimate the expected queue at the end of the packet-timed round trip by considering:

+ starting queue after (1): 0.25*BDP

+ after we do (2), if we consider that one RTT of time has elapsed, then queue should be roughly:

end_queue = starting_queue  + packets_arriving - packets_departing
          = 0.25*BDP        + (1/1.25)*bw*RTT  - 1*bw*RTT
          = 0.25*bw*min_rtt + (1/1.25 - 1)*bw*RTT
          = 0.25*bw*min_rtt + (1/1.25 - 1)*bw*(min_rtt + starting_queue/bw)
          = 0.25*bw*min_rtt + (1/1.25 - 1)*bw*(min_rtt + (0.25*BDP)/bw)
          = 0.25*bw*min_rtt + (1/1.25 - 1)*bw*(min_rtt + (0.25*bw*min_rtt)/bw)
          = 0.25*bw*min_rtt + (1/1.25 - 1)*bw*(min_rtt + (0.25*bw*min_rtt)/bw)
          = 0.25*bw*min_rtt + (1/1.25 - 1)*bw*min_rtt(1 + 0.25)
          = 0.25*bw*min_rtt + (1/1.25 - 1)*bw*min_rtt*1.25
          = bw*min_rtt*(0.25 + (1/1.25 - 1)*1.25)
          = bw*min_rtt*(0.25 + (1 - 1.25))
          = bw*min_rtt*(0)
          = 0

So in this idealized scenario, having the (2) phase use a pacing gain of 1/1.25 should hopefully fully drain the queue.

best,
neal

 


Reply all
Reply to author
Forward
0 new messages