Question about BBR2 ECN mode

Fejes Ferenc

unread,

Mar 13, 2020, 5:12:07 PM3/13/20

to bbr...@googlegroups.com

Hi!

I played a bit with the ECN mode of the BBR2. tcp_ecn=1 set on both peers along with the ecn_enabled flag at the bbr2 module parameter and ecn_max_rtt_us set to zero as well. I'm using RED at the bottleneck for ECN marking which works seemingly fine I can verify the CE flags at the receiver with tcpdump. However the sender weirdly turns off the ECT flag in the IP header for random packet burst. I experimented with different RTTs and bottleneck capacities, but that happened every time. With small buffer and 100 flow 12% of the transmitted packets sent with ECT zero. That is the expected behavior, BBR2 with ECN turns off the ECT during the normal operation? I would like to learn more about that but I cant find any reference in tcp_bbr2.c for that.

Thanks,

Ferenc

Neal Cardwell

unread,

Mar 14, 2020, 2:08:01 PM3/14/20

to Fejes Ferenc, BBR Development

On Fri, Mar 13, 2020 at 5:12 PM Fejes Ferenc <fejes.fe...@gmail.com> wrote:

Hi!

I played a bit with the ECN mode of the BBR2. tcp_ecn=1 set on both peers along with the ecn_enabled flag at the bbr2 module parameter and ecn_max_rtt_us set to zero as well. I'm using RED at the bottleneck for ECN marking which works seemingly fine I can verify the CE flags at the receiver with tcpdump. However the sender weirdly turns off the ECT flag in the IP header for random packet burst. I experimented with different RTTs and bottleneck capacities, but that happened every time. With small buffer and 100 flow 12% of the transmitted packets sent with ECT zero. That is the expected behavior, BBR2 with ECN turns off the ECT during the normal operation? I would like to learn more about that but I cant find any reference in tcp_bbr2.c for that.

Thanks for the report!

For the data packets that are marked Not-ECT: are they retransmits?

If it's not known whether they were retransmits, if you are able to post at some public HTTP(S) URL a binary tcpdump .pcap file that has just headers, that would be great, e.g.:

tcpdump -w /tmp/test.pcap -s 100 -i $ETH_DEVICE -c 100000 port $PORT

As I am reviewing the upstream/public bbr2 code we posted, and comparing it to our internal version, I see there is some code missing in the upstream version that would ensure that even retransmits are marked as ECT (much like DCTCP, in that respect). I have a hunch that upstreaming that missing code would fix the issue you are seeing. But if you can provide the data points above, that could help us get a sense of whether that's indeed the issue.

Thanks!

neal

Fejes Ferenc

unread,

Mar 14, 2020, 5:57:03 PM3/14/20

to Neal Cardwell, bbr...@googlegroups.com

Thank you for the answer!

You can find my pcap here: http://fejesferenc.web.elte.hu/full.pcap

This is a 60 sec trace with 1Gbps and 100 BBR2 flow, the sender IP always 10.0.0.7. I tried to inspect the retransmissions and non-ECT packets and got the following:

$ tshark -Y "tcp.analysis.retransmission" -r full.pcap | wc -l
13856
$ tshark -Y "ip.dsfield.ecn == 0x00" -r full.pcap | wc -l
139437

I tried my best to avoid drops and configured a very large buffer (10 times the BDP) and set the target queue length in RED to 1 BDP. I even tried to modify RED to mark non-ECT packets with CE above the target to avoid drops.

Thanks for looking into this,

Ferenc

Fejes Ferenc <fejes.fe...@gmail.com> ezt írta (időpont: 2020. márc. 14., Szo, 22:04):

Thank you for the answer!

You can find my pcap here: http://fejesferenc.web.elte.hu/full.pcap
This is a 60 sec trace with 1Gbps and 100 BBR2 flow, the sender IP always 10.0.0.7. I tried to inspect the retransmissions and non-ECT packets and got the following:
$ tshark -Y "tcp.analysis.retransmission" -r full.pcap | wc -l
13856
$ tshark -Y "ip.dsfield.ecn == 0x00" -r full.pcap | wc -l
139437

I tried my best to avoid drops and configured a very large buffer (10 times the BDP) and set the target queue length in RED to 1 BDP. I even tried to modify RED to mark non-ECT packets with CE above the target to avoid drops.

Thanks for looking into this,
Ferenc

Neal Cardwell

unread,

Mar 19, 2020, 4:26:05 PM3/19/20

to Fejes Ferenc, BBR Development

Thanks for the trace!

AFAICT there seems to be a bug in the tshark tool, where for some packets that
should be deemed retransmits, it marks them as merely "Out-Of-Order" instead.
For example, it is declaring this an out-of-order packet, when it
should be marked
a retransmission:

8212 0.404242 10.0.0. 10.0.0.50 TCP 1514 [TCP Out-Of-Order]
6666 [ACK] Seq=308425 Ack=1 Win=65536 Len=1448 TSval=151696734
TSecr=2255428540

From the tcpdump output we can see that the packet is a retransmit:

Here is the original offload burst of 3*MSS:
09:32:21.763592 IP 10.0.0.7.39860 > 10.0.0.50.6666: Flags [P.],
seq 306976:311320, ack 1, win 64, options [nop,nop,TS val 151696674
ecr 2255428481], length 4344

And here is the retransmit of 1MSS at 308424:309872:
09:32:21.822726 IP 10.0.0.7.39860 > 10.0.0.50.6666: Flags [.], seq
308424:309872, ack 1, win 64, options [nop,nop,TS val 151696734 ecr
2255428540], length 1448

My guess would be that this is why the not-ECT packets count is 10x
higher than the
retransmit count.

So I still have the same theory that I noted above. I will try to prepare a fix
and follow up in this thread when it is ready for testing.

Thanks!
neal

On Sat, Mar 14, 2020 at 5:57 PM Fejes Ferenc

Dave Taht

unread,

Mar 19, 2020, 4:30:50 PM3/19/20

to Neal Cardwell, Fejes Ferenc, BBR Development

could you share your red configuration also?

> --
> You received this message because you are subscribed to the Google Groups "BBR Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/CADVnQymQ_3fVLmAB7OyVfwHV1ow1Cz6SXm%3DxhvJC1Wd4n4HTFA%40mail.gmail.com.

--
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729

Fejes Ferenc

unread,

Mar 19, 2020, 5:44:15 PM3/19/20

to Neal Cardwell, BBR Development

Thanks for looking into this!

Well that explains the difference I'll be more careful with tshark in the future. It would have been more clearer with turned off GRO/GSO.

However as it turned out yesterday my RED bottleneck (implemented in DPDK) miscalculated the checksum which somehow slipped through the receiver NIC's checksum verification so ip_rcv_core dropped all of my CE marked packets with checksum error. Then the remaining packets of the window went straight to the OFO queue. That's why I got that much retransmits/OFO packets. The "randomly" missing ECN flags at the BBR sender just amplified my confusion.

Now the BBR uses the CE signals and I see zero retransmits. However I would be grateful for the patches to try out the setup with small buffers where real drops and retransmissions can occur.

Thanks,

Ferenc

Fejes Ferenc

unread,

Mar 19, 2020, 5:56:56 PM3/19/20

to Dave Taht, Neal Cardwell, BBR Development

I'm using a custom DPDK implementation with wrong checksum calculation as it turned out (and fixed) lately.

My configuration (with 5ms RTT propagation delay):

limit: very high, 62500000 byte (1Gbps, 500ms) to make sure we never drop in the bottleneck for real
min = max: 106250 byte (0.17*5ms*1Gbps)
mark probability: 1.0 - to mark everything above the target qlen

Best,

Ferenc

Dave Taht

unread,

Mar 19, 2020, 5:58:56 PM3/19/20

to Fejes Ferenc, Neal Cardwell, BBR Development

On Thu, Mar 19, 2020 at 2:44 PM Fejes Ferenc
<fejes.fe...@gmail.com> wrote:
>
> Thanks for looking into this!
>
> Well that explains the difference I'll be more careful with tshark in the future. It would have been more clearer with turned off GRO/GSO.
>
> However as it turned out yesterday my RED bottleneck (implemented in DPDK) miscalculated the checksum which somehow slipped through the receiver NIC's checksum verification so ip_rcv_core dropped all of my CE marked packets with checksum error.

I was afraid we'd see this incorrect behavior long before now on WAY
more things, particularly after apple turned ecn negotiation on on
their devices. It's difficult to check for, and one of many reasons
why I've treated a dctcp style ecn rollout with caution. Old versions
of that code in linux have no response to packet loss.

I have had no faith, either, in red implementations (software or
hardware) as a conventional aqm or with the simplified l4s-style ramp
without extensive per device testing. (
https://gettys.wordpress.com/2010/12/17/red-in-a-different-light/
paper: http://mirrors.bufferbloat.net/~jg/RelevantPapers/Red_in_a_different_light.pdf
)

I knew the checksumming was done "right" on linux and bsd only, hadn't
looked into ddpk at all. I figure the ddpk code is trying to rely on
checksum offload?

>Then the remaining packets of the window went straight to the OFO queue. That's why I got that much retransmits/OFO packets. The
"randomly" missing ECN flags at the BBR sender just amplified my confusion.

Yep.

> Now the BBR uses the CE signals and I see zero retransmits. However I would be grateful for the patches to try out the setup with small buffers where real drops and retransmissions can occur.

Well, I keep hoping we'll see a RFC3168 version of bbrv1 or 2, and
something that bounds the BDP even better in those cases with fq_codel
on the path.

> --
> You received this message because you are subscribed to the Google Groups "BBR Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/CAAej5NYBPfH16Fkw%2BMEiUOT_xXrBcE8gj1rBdWuvj31_tS-0HQ%40mail.gmail.com.

Fejes Ferenc

unread,

Mar 19, 2020, 6:21:34 PM3/19/20

to Dave Taht, Neal Cardwell, BBR Development

Dave Taht <dave...@gmail.com> ezt írta (időpont: 2020. márc. 19., Cs, 22:58):

I knew the checksumming was done "right" on linux and bsd only, hadn't
looked into ddpk at all. I figure the ddpk code is trying to rely on
checksum offload?

No, there are options: you could calculate the checksum manually or let DPDK to calculate it for you. After some failed attempt to let DPDK calculate it (pass the mbuf to the NIC with a special flag and checksum set to zero) I decided to calculate it incrementally (RFC1624) with the help of an example code I found in the DPDK's examples folder. For some reason that still fails so I switched to the helper (https://doc.dpdk.org/api/rte__ip_8h.html#a8be1ccea98d6afa79fe0c8531eb266f7) which worked.

Neal Cardwell

unread,

Jul 7, 2021, 10:56:43 AM7/7/21

to Fejes Ferenc, BBR Development, Adithya Abraham Philip

We have pushed a fix for this issue with missing ECT code points on BBRv2 retransmissions when ECN is enabled:

https://github.com/google/bbr/commits/v2alpha

https://github.com/google/bbr/releases/tag/v2alpha-2021-07-07

The specific commit is:

https://github.com/google/bbr/commit/3d76056b85feab3aade8007eb560c3451e7d3433

As a reminder, the recipe for downloading and building and testing BBRv2 is here:

https://github.com/google/bbr/blob/v2alpha/README.md

Thanks again for the report!

best,

neal

Thanks!

neal

Fejes Ferenc

unread,

Jul 7, 2021, 12:25:56 PM7/7/21

to Neal Cardwell, BBR Development, Adithya Abraham Philip

Hi!

Neal Cardwell <ncar...@google.com> ezt írta (időpont: 2021. júl. 7.,
Sze, 16:56):

Thank you for the fix!

>
> As a reminder, the recipe for downloading and building and testing BBRv2 is here:
> https://github.com/google/bbr/blob/v2alpha/README.md
>
> Thanks again for the report!
>
> best,
> neal
>
>>
>>
>> Thanks!
>>
>> neal
>>
>>

Best,
Ferenc

Reply all

Reply to author

Forward