Aggregating without bloating - hard times for tcp and wifi

138 views
Skip to first unread message

Dave Taht

unread,
Nov 22, 2022, 1:04:44 AM11/22/22
to Make-Wifi-fast, bloat, BBR Development
This paper came out last month. Good work, really exhaustive look at
two chipsets, multiple congestion controls and the interactions with
TSQ, with
lots and lots of flent.

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9772053

as for wifi6... don't make me start talking about wifi6... but some of
these tests look like a good baseline to start comparing the ath11k,
mt79, etc..

Paper kind of misses the negative impact of AQL in the ath10k (and
most likely also the mt76 and mt79 chips)

--
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

Bob McMahon

unread,
Nov 22, 2022, 2:43:06 PM11/22/22
to Dave Taht, Make-Wifi-fast, bloat, BBR Development
Thanks for sharing this. Curious about how the xTSQ value can be set? Can it be done with sysctl?

We continue our analysis by using the ms-version of TSQ patch, which enables the tune of the TSQ size allowing each TCP variant to enqueue more than 1 ms of data at the current TCP rate. In particular, we allow to enqueue the equivalent of x ms of data, naming each test xTSQ, with x being an integer value. It is important to notice that this patch has been included in the Linux kernel mainline, and each Wi-Fi driver can now set the desired xTSQ value.

Another thing that could be interesting is the WiFi aggregation stop reasons, e.g. how many times agg stopped per hitting the max mpdu per ampdu vs the software fifo going empty (i.e. no more packets available to the driver from the TCP stack) per that TXOP.

Finally, many (most?) APs are forwarding and feeding packets at at the hardware level so not sure that the linux stack matters as much for an AP based analysis, particularly when considering multi user transmissions, i.e. multiple WiFi clients are active and sharing TXOPs.

Bob
--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbr-dev/CAA93jw6yJU10wh9ajqX4yW2AvnJPStyWAAcV4%2BoQ8wSGsJgKZA%40mail.gmail.com.

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Neal Cardwell

unread,
Nov 22, 2022, 3:10:42 PM11/22/22
to Bob McMahon, Dave Taht, Make-Wifi-fast, bloat, BBR Development
On Tue, Nov 22, 2022 at 2:43 PM 'Bob McMahon' via BBR Development <bbr...@googlegroups.com> wrote:
Thanks for sharing this. Curious about how the xTSQ value can be set? Can it be done with sysctl?

We continue our analysis by using the ms-version of TSQ patch, which enables the tune of the TSQ size allowing each TCP variant to enqueue more than 1 ms of data at the current TCP rate. In particular, we allow to enqueue the equivalent of x ms of data, naming each test xTSQ, with x being an integer value. It is important to notice that this patch has been included in the Linux kernel mainline, and each Wi-Fi driver can now set the desired xTSQ value.

I believe they are setting the xTSQ value using the sk_pacing_shift field, which was added here:

AFAIK the intent is only for drivers to set that, and there's no sysctl for that, but of course you could add a sysctl for testing if you wanted. :-)

cheers,
neal


 

Bob McMahon

unread,
Nov 22, 2022, 3:13:31 PM11/22/22
to David Lang, Dave Taht, Make-Wifi-fast, BBR Development, bloat
An AP's radio complex may have a CPU but that doesn't mean it is the standard linux stack as most think of it. Many consider this as part of "firmware" which can be Linux, a Linux derivative or other.  Also, there are some levels of wired/wireless forwarding plane integration done at the hardware level that many might be surprised by. 

Bob

On Tue, Nov 22, 2022 at 12:03 PM David Lang <da...@lang.hm> wrote:
On Tue, 22 Nov 2022, Bob McMahon via Make-wifi-fast wrote:

> Finally, many (most?) APs are forwarding and feeding packets at at the
> hardware level so not sure that the linux stack matters as much for an AP
> based analysis, particularly when considering multi user transmissions,
> i.e. multiple WiFi clients are active and sharing TXOPs.

APs forward packets within the switch at the hardware level, but the radios have
to go through the CPU, so any wired <-> wireless needs to go through the CPU,
and I would be incredibly surprised if the wifi chips did wireless <-> wireless
routing at the hardware level.

David Lang

Bob McMahon

unread,
Nov 22, 2022, 3:29:12 PM11/22/22
to David Lang, Dave Taht, Make-Wifi-fast, BBR Development, bloat
Some main purposes of the WiFi CPU is 802.3 to 802.11 L2 translational bridging and handling 802.11 protocols for things like association. Most forwarded packets don't hit the main CPU anymore. This first sw to hw transition occurred decades ago with real internet routers (equipment that run IGPs and BGP) which started as software in the early 90s and then moved to hardware. The same engineering has been happening for home gateways or WiFi APs bridging wired to wireless.

Bob 

On Tue, Nov 22, 2022 at 12:16 PM David Lang <da...@lang.hm> wrote:
sorry, when I was saying 'the cpu', I was meaning the main one running linux,
not something that's part of the wifi chipset.

I would be very surprised if the wifi chipset is doing any packet routing, as
opposed to just sending the packets to the main processor.

Remember, the common case isn't forwarding from one wifi device to another, it's
moving between wifi devices and the wired uplink.

David Lang

Bob McMahon

unread,
Nov 22, 2022, 3:48:22 PM11/22/22
to David Lang, Dave Taht, Make-Wifi-fast, BBR Development, bloat
I don't know Qualcomm's offerings but here are some from Broadcom.

https://www.broadcom.com/products/wireless/wireless-lan-infrastructure/bcm67263

The BCM4916 forwarding plane is done with a network processor and doesn't run Linux.  Linux may be used to build the forwarding tables and this is standard "merchant silicon" forwarding approcch, let some CPU/stack build the topology tables and then realize the packet forwarding in (programmable) hardware.

https://docs.broadcom.com/doc/4916-PB1XX

Bob

Bob McMahon

unread,
Nov 22, 2022, 4:00:28 PM11/22/22
to Toke Høiland-Jørgensen, Neal Cardwell, Make-Wifi-fast, BBR Development, bloat
Does the TSQ code honor no-aggregation per voice access class or TCP_NODELAY where the app making the socket write calls knows that the WiFi aggregation isn't likely helpful? Sorry, my Linux stack expertise is quite limited.

Bob

On Tue, Nov 22, 2022 at 12:53 PM Toke Høiland-Jørgensen <to...@toke.dk> wrote:
Neal Cardwell via Bloat <bl...@lists.bufferbloat.net> writes:

> On Tue, Nov 22, 2022 at 2:43 PM 'Bob McMahon' via BBR Development <
> bbr...@googlegroups.com> wrote:
>
>> Thanks for sharing this. Curious about how the xTSQ value can be set? Can
>> it be done with sysctl?
>>
>> *We continue our analysis by using the ms-version of TSQ patch, which

>> enables the tune of the TSQ size allowing each TCP variant to enqueue more
>> than 1 ms of data at the current TCP rate. In particular, we allow to
>> enqueue the equivalent of x ms of data, naming each test xTSQ, with x being
>> an integer value. It is important to notice that this patch has been
>> included in the Linux kernel mainline, and each Wi-Fi driver can now set
>> the desired xTSQ value**.*

>>
>
> I believe they are setting the xTSQ value using the sk_pacing_shift field,
> which was added here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3a9b76fd0db9f0d426533f96a68a62a58753a51e
>
> AFAIK the intent is only for drivers to set that, and there's no sysctl for
> that, but of course you could add a sysctl for testing if you wanted.
> :-)

Yup, indeed this is what mac80211 fiddles with:
https://elixir.bootlin.com/linux/latest/source/net/mac80211/main.c#L739
https://elixir.bootlin.com/linux/latest/source/net/mac80211/tx.c#L4156

AFAICT, no in-tree drivers override the value set by mac80211.

I believe the tests in that paper were conducted with this series
applied:
https://lore.kernel.org/all/20180105113256.14835...@gmail.com/

-Toke

Bob McMahon

unread,
Nov 23, 2022, 3:36:22 PM11/23/22
to Toke Høiland-Jørgensen, Neal Cardwell, Make-Wifi-fast, BBR Development, bloat
Thanks Toke.

Bob

On Wed, Nov 23, 2022 at 5:50 AM Toke Høiland-Jørgensen <to...@toke.dk> wrote:
Bob McMahon <bob.m...@broadcom.com> writes:

> Does the TSQ code honor no-aggregation per voice access class or
> TCP_NODELAY where the app making the socket write calls knows that the WiFi
> aggregation isn't likely helpful? Sorry, my Linux stack expertise is quite
> limited.

TSQ only influences the buffering in the TCP layer. The WiFi stack will
still limit aggregation using its own logic (I think it turns it off
entirely for voice?). TCP_NODELAY is also orthogonal to TSQ; TSQ only
kicks in when there's a bunch of data buffered, in which case
TCP_NODELAY has no effect...

-Toke

Muhammad Ahsan

unread,
Nov 24, 2022, 12:25:03 AM11/24/22
to Neal Cardwell, Bob McMahon, Dave Taht, Make-Wifi-fast, bloat, BBR Development

Hi dev group guys,

 

I need to use sysctl to set ms value for net.ipv4.tcp_limit_output_ms .

 

The TSQ patch attached is not working or letting me do that . I am on linux 5.13.12

Manually changing tx_sk_pacing_shift = 7;  variable in main.c  needs to recompile kernel everytime…

 

I need to have sysctl to control the ms value , to set  2TSQ,4TSQ,8TSQ  etc for my wifi chipset.

 

I will be thankful if anyone can help me in it.

 

 

Rgds,

Ahsan

TSQ.patch

Dave Taht

unread,
Nov 24, 2022, 11:23:25 AM11/24/22
to muhamm...@umt.edu.pk, Neal Cardwell, Bob McMahon, Make-Wifi-fast, bloat, BBR Development
On Wed, Nov 23, 2022 at 9:25 PM Muhammad Ahsan <muhamm...@umt.edu.pk> wrote:
>
> Hi dev group guys,
>
>
>
> I need to use sysctl to set ms value for net.ipv4.tcp_limit_output_ms .
>
>
>
> The TSQ patch attached is not working or letting me do that . I am on linux 5.13.12

Why would you test using a 2 year old kernel? In general I try to
compile a net-next or a rc3+ candidate when doing network research. By
the time your paper is published, your
work is 3 or more years obsolete. Working within a modern kernel you
can get in general get help from the netdev mailing list.

Admittedly, embedded OSes like openwrt tend to lag a few years behind
also (presently at 5.15 for most chipsets).


> Manually changing tx_sk_pacing_shift = 7; variable in main.c needs to recompile kernel everytime…
>
>
>
> I need to have sysctl to control the ms value , to set 2TSQ,4TSQ,8TSQ etc for my wifi chipset.

One of the reasons why I was reluctant to provide this knob was that
it seemed inevitable folk would optimize for bandwidth at all other
costs, in a lab environment. We picked numbers for the initial
implementations of this that made a good compromise between latency
and throughput. Certainly tuning it up would be good, but *first* I
try to encourage researchers to emulate real-world conditions, where
interference and other stations can cause txop scheduling delays
measured in the 100s or 1000s of ms. The flent rtt_fair test, to 4 or
more stations, in particular, can be quite revealing and our test
results and test benchmarks are here:

https://www.cs.kau.se/tohojo/airtime-fairness/

There is MUCH other low hanging fruit in wifi. As one example, if you
look at aircaps, you will find a lot of single packet, followed by an
aggregate, because the driver accepts and schedules that first packet
(and txop) and doesn't wait a little bit to assemble a larger
aggregate, essentially wasting a txop. There is still, 7+ years since
this preso of mine:
https://www.youtube.com/watch?v=Rb-UnHDw02o&t=1591s not a lot of
understanding of how aggregation behaves. Furthermore ack-filtering is
proving helpful in many
scenarios.

I am sad to report that the initial mt79 firmware design does not
expose per station queues, which are IMHO utterly necessary for low
latency in wifi6 and wifi7. I have high hopes for the new qualcomm
chips...

Anyway, some more links:

https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-in-wifi/133605/

https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002

While I think that most of the patches I care about are in your 5.13
version, there's stuff scheduled for 6.2 or 6.3, and the AQL mechanism
used by the ath10k, mt76, iwl, and now mt79, is proving to be a real
barrier to good throughput and latency over wifi.
Reply all
Reply to author
Forward
0 new messages