TCP pacing without FQ packet scheduler

1,937 views
Skip to first unread message

Eric Dumazet

unread,
May 16, 2017, 10:46:17 AM5/16/17
to BBR Development
As promised, I did the changes in TCP to implement pacing internally,
for cases were having FQ/pacing packet scheduler it not practical.

If accepted by linux developers, this would land in linux-4.13, but is easy to backport to older versions if needed.

https://patchwork.ozlabs.org/patch/762899/  tcp: internal implementation for pacing

Eric Dumazet

unread,
May 16, 2017, 6:29:34 PM5/16/17
to BBR Development

Alexandre Ferrieux

unread,
Jun 5, 2017, 10:37:25 AM6/5/17
to BBR Development
On Tuesday, May 16, 2017 at 7:46:17 AM UTC-7, Eric Dumazet wrote:
> As promised, I did the changes in TCP to implement pacing internally,
> for cases were having FQ/pacing packet scheduler it not practical.
> If accepted by linux developers, this would land in linux-4.13, but is easy to backport to older versions if needed.
> [...]
Thanks a lot Eric, it does indeed address a practical need !

Two tiny things though:
 
 (1) The patch fails nontrivially on one hunk against the current Debian testing kernel source (4.9.25). Maybe that's a tad further "backporting" than you expected ;-)  If it is, forget it... (details below)

 (2) I'd like to evaluate the practicality of using fq anyway. So far I'm using a classfull qdisc (htb) with an u32 filter to do per-client shaping from command-line, on  a single interface facing all my clients. Since fq is classless, the method does not seem to apply simply. But maybe I'm missing another way ?
So, short of writing an LD_PRELOAD hack to setsockopt(SO_MAX_PACING_RATE) from within applications, is there a command-line way (tc or other) to give a per-socket (selected by 5-tuple or socket inode) shaping rate, assuming fq is activated on the interface ?

TIA,

-Alex


-------------------------------------
Hunk #2 FAILED at 397.
1 out of 3 hunks FAILED -- saving rejects to file include/net/sock.h.rej
--- include/net/sock.h                                                                                                                                                                                                                   
+++ include/net/sock.h                                                                                                                                                                                                                   
@@ -397,7 +398,7 @@ struct sock {
        __s32                   sk_peek_off;                                                                                                                                                                                             
        int                     sk_write_pending;                                                                                                                                                                                        
        __u32                   sk_dst_pending_confirm;                                                                                                                                                                                  
-       /* Note: 32bit hole on 64bit arches */                                                                                                                                                                                           
+       u32                     sk_pacing_status; /* see enum sk_pacing */                                                                                                                                                               
        long                    sk_sndtimeo;                                                                                                                                                                                             
        struct timer_list       sk_timer;                                                                                                                                                                                                
        __u32                   sk_priority;                                                                                                                                                                                             
 

Eric Dumazet

unread,
Jun 5, 2017, 10:50:10 AM6/5/17
to Alexandre Ferrieux, BBR Development
As far as BBR is concerned, you do not have to use SO_MAX_PACING_RATE
in order for FQ to enable pacing.

Note that fq/pacing, regardless of TCP CC can be used with maxrate
attribute (eg maxrate 25Mbit)

About mixing HTB and FQ in some HTB classes, I need to check if this
can work. I suspect some warnings might be generated...

The patch seems to apply, you have to add this sk_pacing_status
somewhere in the "struct sock" . Exact place does not really matter...
> --
> You received this message because you are subscribed to the Google Groups
> "BBR Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bbr-dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Alexandre Ferrieux

unread,
Jun 5, 2017, 11:00:39 AM6/5/17
to Eric Dumazet, BBR Development
Sorry I've been unclear: I need pacing independently from BBR nor even
TCP CC cases; I have some UDP flows to shape too ;-)
So yes, I'm basically wondering about ways to use, on the same
interface, HTB for some sockets (remote IP), FQ for others. Your
digging into this is appreciated.

Re the patch: OK, will do that; I was only afraid that the big
difference in the struct sock fields could be hiding something more
fundamental (an semantic change between 4.9 and 4.13).
But mixing HTB and FQ still interests me. In some cases patching the
kernel is not an option.

Alexandre Ferrieux

unread,
Jun 6, 2017, 4:13:21 AM6/6/17
to BBR Development, edum...@google.com

Confirming the semantic change: even if I manually just insert sk_pacing_status, the rest of the code still needs many fields that were not there yet in 4.9.
But don't sweat it, the qdisc mix approach, if it can be made to work, will do it for me (and serve others as a smooth migration path to full-FQ).

Alexandre Ferrieux

unread,
Jun 7, 2017, 2:52:00 AM6/7/17
to BBR Development, edum...@google.com

On Tuesday, June 6, 2017 at 10:13:21 AM UTC+2, Alexandre Ferrieux wrote:

Confirming the semantic change: even if I manually just insert sk_pacing_status, the rest of the code still needs many fields that were not there yet in 4.9.
But don't sweat it, the qdisc mix approach, if it can be made to work, will do it for me (and serve others as a smooth migration path to full-FQ).

I'd really appreciate assistance to do exactly this: mixing HTB and FQ on one interface. (Tc is a bitch and/or I'm not brain-compatible with its syntax and documentation).

Two things I tried:

 (1) FQ as root. Since it's classless, no way to insert HTB down the (nonexistent) tree, right ?

 (2) HTB as root. I get the crystal-clear error message "RTNETLINK answers: No such file or directory"

    root@foo:~# tc qdisc show dev eth1
    qdisc htb 1: root refcnt 2 r2q 10 default 0 direct_packets_stat 3747584522 direct_qlen 1000

    root@foo:~# lsmod | grep fq
    sch_fq                 20480  1 

    root@foo:~# tc qdisc add dev eth1 parent 1: fq
    RTNETLINK answers: No such file or directory

Alexandre Ferrieux

unread,
Jun 7, 2017, 3:11:52 AM6/7/17
to BBR Development, edum...@google.com


On Wednesday, June 7, 2017 at 8:52:00 AM UTC+2, Alexandre Ferrieux wrote:

On Tuesday, June 6, 2017 at 10:13:21 AM UTC+2, Alexandre Ferrieux wrote:

Confirming the semantic change: even if I manually just insert sk_pacing_status, the rest of the code still needs many fields that were not there yet in 4.9.
But don't sweat it, the qdisc mix approach, if it can be made to work, will do it for me (and serve others as a smooth migration path to full-FQ).

I'd really appreciate assistance to do exactly this: mixing HTB and FQ on one interface. (Tc is a bitch and/or I'm not brain-compatible with its syntax and documentation).
 
OK, finally got around to nesting classes and qdisc properly (did I say that tc is a bitch ?)

    root@foo:~# tc qdisc show dev eth1
    qdisc htb 1: root refcnt 2 r2q 10 default 0 direct_packets_stat 3747584522 direct_qlen 1000
    root@foo:~# tc class add dev eth1 parent 1: classid 1:42 htb rate 1gbit
    root@foo:~# tc qdisc add dev eth1 parent 1:42 fq

Now the semantics: will this do what I need, which is to allow BBR (on that interface) to pace with FQ, while (on the same interface) I add and delete HTB classes and filters to handle specific UDP flows ?



 

Alexandre Ferrieux

unread,
Jun 18, 2017, 3:33:02 PM6/18/17
to BBR Development, edum...@google.com
Hi,

I'm having a hard time mixing HTB and FQ on one interface.
After some efforts, I've come up with the following:

- Initial state: dev eth1 has root qdisc HTB and default class 1:0
     root@foo:~# tc qdisc show dev eth1
     qdisc htb 1: root refcnt 2 r2q 10 default 0 direct_packets_stat 3747584522 direct_qlen 1000

- This class is configured with a rate of 1Gbps (line rate on that machine). Reason: we want FQ to take the helm and restrict the rate, not some fixed quota.
    root@foo:~# tc class add dev eth1 parent 1: classid 1:0 htb rate 1gbit 

- We then attach qdisc FQ to that class:
    root@foo:~# tc qdisc add dev eth1 parent 1:0 fq

However, detailed trace analysis shows that BBR, though pretty effective at staying  near the optimal point (BIF=1BDP), doesn't really pace the packets. Instead, the stabilized pattern shows one-BDP bursts one RTT apart, where the slope of the burst stays near the line rate. I would have assumed BBR's setting of sk->sk_max_pacing rate to pace those bursts nearer and nearer to the BtlBw over time, ending up smoothing those bursts completely. Not so.

Any help about why FQ seems not to apply in that case is welcome.

-Alex

Eric Dumazet

unread,
Jun 18, 2017, 3:53:52 PM6/18/17
to Alexandre Ferrieux, BBR Development
Please provide :
tc -s -d class show dev eth1

Alexandre Ferrieux

unread,
Jun 18, 2017, 4:04:51 PM6/18/17
to BBR Development, alexandre...@gmail.com
# tc -s -d class show dev eth1
class htb 1: root prio 0 quantum 200000 rate 1Gbit ceil 1Gbit linklayer ethernet burst 1375b/1 mpu 0b cburst 1375b/1 mpu 0b level 0 
 Sent 1891571656564 bytes 1538595839 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
 lended: 882453292 borrowed: 0 giants: 0
 tokens: -158 ctokens: -158

Alexandre Ferrieux

unread,
Jun 18, 2017, 5:05:32 PM6/18/17
to BBR Development, alexandre...@gmail.com
Note that "tc class show" doesn't really help introspecting the nested qdiscs, like FQ which is present but not visible here !

Eric Dumazet

unread,
Jun 19, 2017, 3:59:14 AM6/19/17
to Alexandre Ferrieux, BBR Development
tc -d -s qdisc show dev eth1
I suspect no packets go to your fq qdisc 


--
You received this message because you are subscribed to the Google Groups "BBR Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+unsubscribe@googlegroups.com.

Alexandre Ferrieux

unread,
Jun 19, 2017, 5:39:17 AM6/19/17
to BBR Development, alexandre...@gmail.com
# tc -d -s qdisc show dev eth1
qdisc htb 1: root refcnt 2 r2q 10 default 0 direct_packets_stat 1872726275 ver 3.17 direct_qlen 1000
 Sent 13846345660509 bytes 2423184683 pkt (dropped 1263772, overlimits 1887815842 requeues 2007)
 backlog 0b 0p requeues 2007

No trace of FQ indeed.
To unsubscribe from this group and stop receiving emails from it, send an email to bbr-dev+u...@googlegroups.com.

Alexandre Ferrieux

unread,
Jun 20, 2017, 2:49:26 AM6/20/17
to BBR Development, edum...@google.com
Hi,

As it turns out, the problem was my unfortunate use of ID 1:0 for a class.
The ID naming space being common for qdiscs and classes, there's a collision... which tc FAILS to report (as in "everything is okay"). Did I say "tc is a bitch" ?

So the final, working incantation is

    tc qdisc add dev eth1 root handle 1: htb default 1
    tc class add dev eth1 parent 1: classid 1:1 htb rate 1gbit
    tc qdisc add dev eth1 parent 1:1 fq


With this we can mix HTB and FQ, FQ being the default (no filters needed for its enclosing class thanks to the "default 1").
We can see it works with the packet counters:

# tc -s qdisc show dev eth1
qdisc htb 1: root refcnt 2 r2q 10 default 1 direct_packets_stat 28436 direct_qlen 1000
Sent 457225837 bytes 372324 pkt (dropped 42, overlimits 661879 requeues 0)
backlog 1454b 1p requeues 0
qdisc fq 8039: parent 1:1 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140 refill_delay 40.0ms
Sent 419738735 bytes 309712 pkt (dropped 0, overlimits 0 requeues 0)
backlog 1454b 1p requeues 0
47 flows (46 inactive, 1 throttled), next packet delay 1212889 ns
0 gc, 0 highprio, 136626 throttled, 16581 ns latency

simon....@meraki.net

unread,
Nov 17, 2017, 5:56:29 PM11/17/17
to BBR Development
With this will it be easier to port BBR to a older 3.x series kernel? Has anyone attempted that?

Simon


On Tuesday, May 16, 2017 at 7:46:17 AM UTC-7, Eric Dumazet wrote:

Jonathan Morton

unread,
Nov 20, 2017, 1:05:27 AM11/20/17
to Simon Barber, BBR Development

Given that it now relies on a new kernel feature introduced even more recently than sch_fq...

I doubt it.

- Jonathan Morton

Reply all
Reply to author
Forward
0 new messages