Possible bug in retransmission optimization

76 views
Skip to first unread message

Petr Matrix

unread,
Aug 30, 2022, 10:32:43 AM8/30/22
to QUIC Prototype Protocol Discussion group
Hi,
we use your quiche lib and we found out a bug in retransmission in IETF Quic.
The problem is caused when the last packet from a given packet number space is lost.
During PUT/POST method, client usualy has lot of data to send, so it preferes sending new data with skipped packet number instead of retransmitting the lost packet. Unfortunately, it doesn't provoke ack sending from previous packet number space.

Can you look at it?
Sending a patch to fix it.

Thanks
Petr
0001-prefer-sending-new-data-before-retransmitting-only-a.patch

Fan Yang

unread,
Aug 30, 2022, 11:43:51 AM8/30/22
to proto...@chromium.org
Hi,
  Oh, this is interesting. Can you please provide an example of the bug?
  "it prefers sending new data with skipped packet number instead of retransmitting the lost packet", presumably, you are referring to PTO retransmission. When PTO fires, the sender does not know which packet is lost, and, as you said, the sender would prefer new data over retransmissions. But the sender *should* make sure to send data of the right packet number space. For example, QuicConnection::SendStreamData makes sure not to preempt handshake data with stream data before handshake gets confirmed.
  Thank you very much, Fan

--
You received this message because you are subscribed to the Google Groups "QUIC Prototype Protocol Discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proto-quic+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/proto-quic/e54cc561-41ab-4692-87c7-157be1b8b75en%40chromium.org.

Petr Matrix

unread,
Sep 5, 2022, 12:00:38 PM9/5/22
to QUIC Prototype Protocol Discussion group, fay...@chromium.org
Hi Fan,
I am sorry for delay. I will try to describe an example which we can reliably reproduce in our test framework.

Imagine you have a quic client which tries to do a POST or PUT request.
1) Client sends 3 packets with handshake which are received by server.
2) Client sends 4th packet with last crypto frame (handhake confirmation). This packet is the last from hanshake packet number space and is lost.
3) Client beleives it confirmed handshake and starts sending data with forward secure encryption.
4) Server misses 4th packet from the client with handshake confirmation and gets 5th packet with data but cannot decrypt them because even if server has keys, it is not allowed to use them until the confirmation, so server buffers them for later decryption.
5) Client didn't get ack to 4th packet, so PTO fires. But because client does PUT/POST, he usualy has much data to send, so client prefers new data sending with skipped packet number to provoke ack response.
6) Server gets new data, buffers them. Unfortunetelly, I don't remember if server sends any ack or not.
7) Client doesn't know it should retransmit the last packet from handskake space and sends new data again.
8) If client has enough data to send, it might deplete server buffer space.
9) Connection end due to an error or handshake timeout.

The patch I provided solves the situation. It prefers retransmit in case handshake is not confirmed, otherwise it uses the optimization. Maybe it could be solved somewhere else, don't know, I am not so skilled in quiche source as you surely are.
I briefly looked at SendStreamData and it might solve the situation (not sure) but it is conditioned to server only.
Note: We currently work with sources at 7f718d04e3 (quiche repository).

Hope it helps.
Petr

Fan Yang

unread,
Sep 7, 2022, 5:17:08 PM9/7/22
to Petr Matrix, QUIC Prototype Protocol Discussion group
Ah, thanks for the details! Great catch. Yes, you are correct, QuicConnection::SendStreamData is conditioned to server only. 
The patch you provided might be less optimal because we do want to send new handshake data (if any) when PTO fires. So we should make SendStreamData work for the client as well.
I will land a fix for this if you do not mind.

Again, thanks for the great catch! Fan

Petr Matrix

unread,
Sep 8, 2022, 4:50:58 AM9/8/22
to QUIC Prototype Protocol Discussion group, fay...@chromium.org, QUIC Prototype Protocol Discussion group, Petr Matrix
You are welcome. FIx it in the way you thing is the best. Patch was just to indicate where the problem could be.
Petr

Fan Yang

unread,
Oct 3, 2022, 6:05:24 PM10/3/22
to Petr Matrix, QUIC Prototype Protocol Discussion group
Hi Petr,
  Thanks again for reporting this issue,
  Fan
  

Petr Matrix

unread,
Oct 20, 2022, 6:37:24 PM10/20/22
to QUIC Prototype Protocol Discussion group, fay...@chromium.org, QUIC Prototype Protocol Discussion group, Petr Matrix
Thanks for the fix. It looks like it works with our tests.
But I noticed, you reverted it in https://quiche.googlesource.com/quiche/+/44da7b642b96fabf8af3eeda44210ae92290cc00. Is anything wrong with the fix? Why it remained for server and not for the client? Could you explain it to me, please?

Thanks,
Petr

Fan Yang

unread,
Oct 20, 2022, 9:23:12 PM10/20/22
to Petr Matrix, QUIC Prototype Protocol Discussion group
Hi Petr,
  Yeah, the change gets reverted because it made one of our integration tests flaky. The scenario is when the client is doing an upload, before handshake confirmed, a stream can be woken up > 20 times but no data gets sent and this check fails. We have landed another fix for this issue: https://quiche.googlesource.com/quiche/+/1a61e925fdaaa424e20483bedd6c1c1840eb72baHopefully, this will fix the problem.
  Thank you, Fan
Reply all
Reply to author
Forward
0 new messages