Hello,Here is a graph which plots the BBR bottleneck bandwidth (->bw in bytes/s) from a QUIC/BBR implementation during a 3GB object donwload as a function of the time (in seconds) on a 100 Mbits/s (or 12.5MB/s) link.My question is:is the estimated more than 16MB/s bandwidth after 200s suspect/wrong or not during this download?
Indeed, as the packet loss is not negligible during this download (more than 15% after reaching more than 16MB/s on a 100Mbits/s link, have I to conclude that this implementation is overestimating the delivery rate, and as a consequence leading to a big packet loss rate?
A second question would be:If the delivery rate estimation wrong, which would lead BBR to take more than 2/3 of the connection time to reach the ~16MB/s download rate (but with high packet loss) in place of the more realistic 12.5MB/s rate?
Regars,Fred.--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/2c702554-4691-4677-b83d-cd9076dbdadfn%40googlegroups.com.
Sorry fot the typo, the 2nd question is:*Is* the delivery rate estimation wrong, which would lead BBR to take more than 2/3 of the connection time to reach the ~16MB/s download rate (but with high packet loss) in place of the more realistic 12.5MB/s rate?
On Tuesday, December 3, 2024 at 8:44:43 PM UTC+1 Frederic Lecaille wrote:
Hello,Here is a graph which plots the BBR bottleneck bandwidth (->bw in bytes/s) from a QUIC/BBR implementation during a 3GB object donwload as a function of the time (in seconds) on a 100 Mbits/s (or 12.5MB/s) link.My question is:is the estimated more than 16MB/s bandwidth after 200s suspect/wrong or not during this download? Indeed, as the packet loss is not negligible during this download (more than 15% after reaching more than 16MB/s on a 100Mbits/s link, have I to conclude that this implementation is overestimating the delivery rate, and as a consequence leading to a big packet loss rate?A second question would be:If the delivery rate estimation wrong, which would lead BBR to take more than 2/3 of the connection time to reach the ~16MB/s download rate (but with high packet loss) in place of the more realistic 12.5MB/s rate?Regars,Fred.
--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/bbr-dev/c1309ef1-96b8-4304-9704-eda97ae2b331n%40googlegroups.com.
What is weird is why my BBR emplementation lately decides to stop
oscillating after downloading 2/3 of the object?
About the packet loss rate, it should not be more than 1% for all the
connection time I guess? Still too big on my side. Investigating...
To be sure: BBR should be continuously oscillating around the max
estimated bandwidth? Shouldn't it?
Please found attached to this mail a new plot(bbr.bbr.3.png). In purple,
this is a plot for the estimated bandwidth, in green, the difference
between the pacing rate and the estimated bandwidth. bbr.bw.3.2.png is a
plot for the last ~100s.
Regards,
Fred.
On 12/6/24 18:17, Neal Cardwell wrote:
>
>
> On Thu, Dec 5, 2024 at 6:15 PM Frederic Lecaille <flec...@haproxy.com
> <mailto:flec...@haproxy.com>> wrote:
>
> On 12/4/24 20:02, Neal Cardwell wrote:
> >
> >
> > On Wed, Dec 4, 2024 at 1:39 PM Frederic Lecaille
> <flec...@haproxy.com <mailto:flec...@haproxy.com>
> <https://datatracker.ietf.org/doc/draft-ietf-ccwg-bbr/> <https://
> > datatracker.ietf.org/doc/draft-ietf-ccwg-bbr/ <http://
> datatracker.ietf.org/doc/draft-ietf-ccwg-bbr/>> <https://
> > > datatracker.ietf.org/doc/draft-ietf-ccwg-bbr/ <http://
> datatracker.ietf.org/doc/draft-ietf-ccwg-bbr/> <http://
> transperf tool at https://github.com/google/transperf <https://
> github.com/google/transperf> ? (It was designed for this kind of thing...)
As the oscillation issue occurs at the begin of the download session, I
have reduced the download size to 200MB (~20s).
About transperf, at this time I am not sure I can install all its
requirements on the sender. Will check this.
> To be sure: BBR should be continuously oscillating around the max
> estimated bandwidth? Shouldn't it?
>
>
> The pacing rate should be continuously oscillating around the max
> estimated bandwidth, but the max estimated bandwidth should not be
> oscillating at all if the link's available bandwidth is not oscillating.
Ok.
> Please found attached to this mail a new plot(bbr.bbr.3.png). In purple,
> this is a plot for the estimated bandwidth, in green, the difference
> between the pacing rate and the estimated bandwidth. bbr.bw.3.2.png is a
> plot for the last ~100s.
>
>
> Thanks. I think this comment from me still applies: "I would suggest
> double-checking that the estimated bandwidth is correctly computed using
> the max bandwidth sample from the last two bandwidth-probing cycles. The
> way the bandwidth in the graphs oscillates makes it seem like perhaps
> the estimated bandwidth is using the most recent bandwidth sample, and
> not the max over a longer time range?"
Ok. Perhaps I have missed something, but I have double checked the code
about the max bandwidth (BBR.max_bw) filter (BBR.MaxBwFilter). We use
the same logic as the one for the QUIC implementation for quiche here to
implement the windowed max filter:
https://quiche.googlesource.com/quiche/+/5be974e29f7e71a196e726d6e2272676d33ab77d/quic/core/congestion_control/windowed_filter.h
That said, I have just realized that this code is different from the one
used by the Linux kernel in lib/win_minmax.c. This is the time which is
compared when possibly updating the 2nd and 3rd best choices in the
kernel, the value in quiche.
Here is a plot with the 3 sampled values for the max window filter (smp1
(green), smp2 (blue), smp3 (orange)). So, the BBR.max_bw value is equal
to smp1. In black we have the last sampled rs.delivery_rate value. The
BBR.cycle_count value is also plot. This is the time used to update the
windowed filer for the max bandwitdh (BBR.MaxBwFilter). Its scale is on
the right. During this test, this is same code as the one the kernel
code which is used to update the max filter.
Note that during such a little test, there are big losses (more than
10%) during PROBE_BW_UP. This could explain the max bandwidth oscillation?
As far as the delivery_rate increase during PROBE_BW_UP, I do not see
why the max bandwidth would not increase during this state. This is the
state during which the max bandwidth increases very much. Then
BBR.max_bw stays stable during 2 cycles. Same thing for the other samples.
Also please note that our pacer is newly implemented. Perhaps there are
bugs in relation with it. One question that comes to mind about the
pacing is, what if the pacer "lies" to BBR? I mean, what if it does not
pace the flow at the rate computed by BBR?
Some others questions came to mind when I had to implement BBR. Into
haproxy, the packet loss lookup is first done before treating the
acknowledged packets. I am not sure this is a good idea for BBR. So,
such BBR functions are called in this order:
BBRHandleLostPacket()
GenerateRateSample()
BBRUpdateOnACK()
Regards,
Fred.
On 12/9/24 19:13, Neal Cardwell wrote:
>
> AFAICT there's a bug in the way the BBR.cycle_count variable is updated
> in your implementation. In BBRv3 it should only be updated once per
> bandwidth-probing cycle (as long as the flow was not app-limited while
> it was attempting to probe for bandwidth). I only see about 5 bandwidth-
> probing cycles in this trace. So BBR.cycle_count should only be
> incremented about 5 times. The code seems to be incrementing
> BBR.cycle_count about once per round trip, or something like that?
Well, I do not know how this is possible because this is in relation
with an easy part of the code. :-)
BBR.cycle_count is incremented by BBRAdvanceMaxBwFilter(). This latter
is called only under the same conditions as in BBRAdaptUpperBounds():
if (BBR.ack_phase == ACKS_PROBE_STOPPING and BBR.round_start)
/* end of samples from bw probing phase */
if (IsInAProbeBWState() and !rs.is_app_limited)
BBRAdvanceMaxBwFilter()
Same code on my side.
BBR.ack_phase may be set to ACKS_PROBE_STOPPING only by
BBRCheckProbeRTT() and BBRStartProbeBW_DOWN(). But BBRCheckProbeRTT()
sets BBR.ack_phase to ACKS_PROBE_STOPPING when entering PROBE_RTT state.
Impossible here!
So I guess that if BBR.cycle_count is incremented when it should not
this is because BBR.ack_phase remains at ACKS_PROBE_STOPPING phase for
too long.
Another condition to call BBRAdvanceMaxBwFilter() is that
BBR.round_start is not null, but as far as I see/understand the RFC,
this happen each time the bytes/packets which were delivered have been
acknowledged (according to BBRUpdateRound())
According to this part, I am not sure that the code mentionned above may
ensure the cycle_count is correctly updated:
4.5.6. Tracking Time for the BBR.max_bw Max Filter
BBR tracks time for the BBR.max_bw filter window using a virtual
(non-wall-clock) time tracked by counting the cyclical progression
through ProbeBW cycles. Each time through the Probe bw cycle, one round
trip after exiting ProbeBW_UP (the point at which the flow has its best
chance to measure the highest throughput of the cycle), BBR increments
BBR.cycle_count, the virtual time used by the BBR.max_bw filter window.
Note that BBR.cycle_count only needs to be tracked with a single bit,
since the BBR.max_bw filter only needs to track samples from two time
slots: the previous ProbeBW cycle and the current ProbeBW cycle:
> ee98c12ad6f0e93153656218a7df1b1ef92618d7 <https://github.com/ietf-wg-
> ccwg/draft-ietf-ccwg-bbr/pull/5/commits/
> ee98c12ad6f0e93153656218a7df1b1ef92618d7>
>
> That BBRv3 state machine approach gets rid of the BBR.ack_phase variable
> entirely. I have not tried it myself, but it sounds simpler, and promising.
>
Ok. It really seems promising. I have tested this patch and I confirm
that the max bw oscillation issue has disappeared as shown by the last
plot file attached to this mail.
One fixed issue! Thank you Neal!
The next one is the remaining big packet losses issue. I am still
investigating. One thing which I have noted on this plot that BBR hangs
on a overestimated max bw (~16MB/s) I think, with a startup period which
lasted 4s. This seems too long to me.
Regards,
Fred