QUIC's BBR2: ERROR message logged

69 views
Skip to first unread message

Stanislav Slusny

unread,
Dec 14, 2023, 7:41:42 AM12/14/23
to BBR Development

Hello,

we observed that QUICHE implementation of BBR2occasionally logs the following message at ERROR level:

quiche/quiche/quic/core/congestion_control/bbr2_misc.cc

Example:

prior_bytes_in_flight:0 is smaller than the sum of bytes_acked:28 and bytes_lost:0

I have troubles to estimate the impact of the error – can you help me to determine the severity, please ?
Is that log message at ERROR level appropriate here ? 


One scenario that leads to this error:

This can happen during processing of ACKs, when sent packet is at first marked as LOST, but later an ACK for this packet is received. There are two entities that estimate "bytes in flight" and process the ACKs:

  1. QuicUnackedPacketMap, see QuicUnackedPacketMap:bytes_in_flight()
  2. Bbr2NetworkModel/BandwidthSampler, see Bbr2NetworkModel::OnCongestionEventStart

When a packet is marked as LOST, QuicUnackedPacketMap is updated and the packet is removed from "bytes in flight".
Later, when a ACK for this packet is received, QuicUnackedPacketMap does NOT modify "bytes in flight" again, as the packet is not in flight anymore. There is the following check:

void QuicUnackedPacketMap::RemoveFromInFlight(QuicTransmissionInfo* info) { if (info->in_flight) { ...

 

However, such a check is not applied when processing ACKs in Bbr2NetworkModel/BandwidthSampler, and the counter that computes the number of acked bytes is increased despite the fact that the packet is not "in flight" anymore.

This leads to a mismatch between "bytes in flight" from QuicUnackedPacketMap and "bytes in flight" from Bbr2NetworkModel/BandwidthSampler. In that case the ERROR message can be logged.

 

Thanks,

Stanislav Slusny

Bin Wu

unread,
Dec 14, 2023, 11:53:06 AM12/14/23
to Stanislav Slusny, BBR Development
Hi Stanislav,

QUICHE dev here.  Your observation makes sense at a high level, for historical reasons, both QuicUnackedPacketMap and BandwidthSampler tracks in-flight packet information, there could be issues when they disagree.  But I'm not sure if the scenario you described leads to that ERROR message. In the message,"bytes_acked" is 28, but if the packet has been marked as lost before, I'd expect it to be 0 instead, see here.

Is the ERROR reproducible in the form of a unit test? I might be able to look deeper if I can reproduce it myself.

I wouldn't expect any visible impact other than the logging itself, as long as prior_bytes_in_flight is 0.

Thanks,
Bin

--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/CAACc7tpsnAm4rTAz0JhzBJ26FzHo0m2b5PpQdHH5cP8TzudLjw%40mail.gmail.com.

Bin Wu

unread,
Dec 14, 2023, 12:36:18 PM12/14/23
to Slusny, Stanislav, Stanislav Slusny, BBR Development
I see, in that case, I think we skip the packet in this loop if packet.bytes_acked is 0.

On Thu, Dec 14, 2023 at 12:21 PM Slusny, Stanislav <ssl...@akamai.com> wrote:

Hi Bin,

The problem is that the bytes_acked, that  is printed in the message, is not the counter computed in QuicSentPacketManager (you posted a link pointing to QuicSentManager).


It is computed in
BandwidthSampler, and there is no check to see if packet is in flight, see https://github.com/google/quiche/blob/4069625b5260ab0df30f733e17d6ab7d1703cbce/quiche/quic/core/congestion_control/bandwidth_sampler.cc#L398

and https://github.com/google/quiche/blob/4069625b5260ab0df30f733e17d6ab7d1703cbce/quiche/quic/core/congestion_control/bbr2_misc.cc#L146

 

Unfortunately I have not managed to build a unit test.


Stanislav

Bin Wu

unread,
Dec 14, 2023, 2:04:14 PM12/14/23
to Slusny, Stanislav, Stanislav Slusny, BBR Development
I meant "we should skip the packet in that loop", not that we are already doing so.  I'll fix it when I have some free time.

Sorry for the confusion, I should have double checked my email before I hit send.

Thanks,
Bin

On Thu, Dec 14, 2023 at 1:12 PM Slusny, Stanislav <ssl...@akamai.com> wrote:

Well in fact we do not skip the packet there, because  the packet is really ACKed.

The packet is spuriously marked as lost at first (and removed from flight), but an ACK is eventually received and processed.

 

So the packed is really ACKed, it is marked as ACKed  in QuicUnackedPacketMap (https://github.com/google/quiche/blob/4069625b5260ab0df30f733e17d6ab7d1703cbce/quiche/quic/core/quic_sent_packet_manager.cc#L620)

and “bytes in flight” is correctly updated there.


Then, the BandwidthSampler loops through all acked packets again – but BandwidthSampler does not know that the packet has been marked as lost in the past, and it incorrectly increases bytes_acked, without checking if packet is in flight.

Reply all
Reply to author
Forward
0 new messages