nftables: Clamping mss size to lower mtu (on PPPoE connection does not work)

Ralph Aichinger

unread,

Jan 18, 2024, 6:50:07 AM1/18/24

to

Hello everybody, related question to what I asked a few days ago:

Since I touched my /etc/nftables.conf rules a few days ago to enable
IPv6 I've got IPv6 working completely (thanks again for your help with
suggesting logging packets), but I seemingly broke mss clamping for
IPv4 in doing so (or maybe this is an unrelated breakage? Unlikely).

Symptoms: There are two websites (https://ebanking.bawag.at/ and a the
profile subpage of the online paper derstandard.at (not accessible
without logging in) that just hang indefinitely on clients with
interface MTU set to the default 1500. If I lower the MTU to e.g.
1400 on the interface of the client, these pages load normally. These
two web pages seem to be IPv4 only (no AAAA record), I could be
overlooking something though, network dumps are very noisy, lots of
tracking cookies loaded etc. The derstandard.at one seems to do QUIC.

This happens on all clients (e.g. also on Android phones in my WiFi
behind this PPPoE gateway, unless I get the client to reduce the MTU.

So it seems clamping the mss on the NAT/PPPoE-Machine running Debian no
longer works. For this I use/used the follwing rules:

iifname "ppp0" tcp flags syn tcp option maxseg size set rt mtu;
oifname "ppp0" tcp flags syn tcp option maxseg size set rt mtu;

setting a specific mtu as a constant instead of "rt mtu" does not help
either.

ppp0 is my PPPoE interface:

14: ppp0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1460 qdisc
fq_codel state UNKNOWN group default qlen 3
link/ppp
inet 94.136.7.154 peer 94.136.0.40/32 scope global ppp0
valid_lft forever preferred_lft forever
inet6 2a02:ab8:201:5b0::1/64 scope global dynamic mngtmpaddr
valid_lft forever preferred_lft forever
inet6 fe80::1 peer fe80::e25f:b9ff:fe1e:a100/128 scope link
valid_lft forever preferred_lft forever

Now I read the nftables wiki, which is where I got my maxseg rule from,
and under the heading "Interactions with conntrack" it says

"Keep in mind the interactions with conntrack, flows with mangled
traffic must be untracked. You can do this in a single rule:
nft add rule ip6 raw prerouting ip6 daddr fd00::1 ip6 daddr set fd00::2
notrack

https://wiki.nftables.org/wiki-nftables/index.php/Mangling_packet_headers

and I do not understand what is meant here. Do I need a rule like the
one mentioned in the nftables wiki, but for IPv4 instead of IPv6? Will
"untracking" break the stateful firewall and be a security problem?
Sadly there is not a lot of documentation and configuration examples to
google for this with respect to nftables (and not e.g.older iptables).

Is there a better explanation what is meant by "flows with mangled
traffic must be untracked"? Is this relevant to my situation at all?

Any help on how to debug this would be appreciated. There are lots of
tutorials on how to find the MTU of a connection by using "ping -M do -
s 1500" or similar, but very little dignosing more complex MTU
problems e.g. with web pages.

Also: Do I need the MSS clamp rule for IPv6, or is it unnecessary with
the different path MTU discovery included into the protocol on IPv6?
For now I have included these lines there too, it probably makes no
difference.

I've included the full nftables rules below. The Interfaces en2 and en3
are IPv6 DMZs seemingly unrelated to this problem here, my problematic
connections are all coming from the internal network behind en0.

Thanks in advance,
Ralph

#!/usr/sbin/nft -f

flush ruleset

table ip natfilter {
chain prerouting {
type nat hook prerouting priority -100;
policy accept;
}
chain postrouting {
type nat hook postrouting priority 100;
policy accept;
oifname "ppp0" counter snat to 94.136.7.154;
}

chain input {
type filter hook input priority 0;
policy drop;
ct state invalid counter drop;
ct state related,established counter accept;
iifname "lo" counter accept;
ip protocol icmp counter accept;
tcp dport 22 counter accept;
tcp dport 25 counter accept;
tcp dport 53 counter accept;
udp dport 53 counter accept;
tcp dport 80 counter accept;
tcp dport 143 counter accept;
tcp dport 443 counter accept;
}
chain forward {
type filter hook forward priority 0;
policy drop;
ct state related,established counter accept;
iifname "en0" counter accept;
iifname "ppp0" tcp flags syn tcp option maxseg size set
rt mtu;
oifname "ppp0" tcp flags syn tcp option maxseg size set
rt mtu;
}
}

table ip6 filter {
chain input {
type filter hook input priority 0; policy drop;
ct state invalid counter drop
ct state {established, related} counter accept

iif lo accept
iif != lo ip6 daddr ::1/128 counter drop

meta l4proto ipv6-icmp counter accept

tcp dport 22 counter accept
tcp dport 25 counter accept
tcp dport 53 counter accept
udp dport 53 counter accept
tcp dport 80 counter accept
tcp dport 143 counter accept
tcp dport 443 counter accept
udp dport 546 counter accept
udp dport 547 counter accept

}

chain forward {
type filter hook forward priority 0; policy drop;

iif en0 accept

iif en2 oif ppp0 accept
iif en3 oif ppp0 accept
iif ppp0 oif en2 accept
iif ppp0 oif en3 accept
iif en2 oif en3 accept
iif en3 oif en2 accept

iif ppp0 oif en0 ct state established,related accept
iif en2 oif en0 ct state established,related accept
iif en3 oif en0 ct state established,related accept

iifname "ppp0" tcp flags syn tcp option maxseg size set
rt mtu;
oifname "ppp0" tcp flags syn tcp option maxseg size set
rt mtu;

meta l4proto ipv6-icmp accept

}
}

Tixy

unread,

Jan 18, 2024, 8:00:07 AM1/18/24

to

On Thu, 2024-01-18 at 12:31 +0100, Ralph Aichinger wrote:
[...]

> So it seems clamping the mss on the NAT/PPPoE-Machine running Debian no
> longer works. For this I use/used the follwing rules:
>
> iifname "ppp0" tcp flags syn tcp option maxseg size set rt mtu;
> oifname "ppp0" tcp flags syn tcp option maxseg size set rt mtu;
>
> setting a specific mtu as a constant instead of "rt mtu" does not help
> either.

I have the same options in the forward chain except that I haven't
qualified them with an interface name. Didn't occur to me that I would
need to do that as there are only two networks my LAN and 'the
internet'.

In case it helps, my complete forward chain is below. From the comments
with links to Stack Exchange it's obvious I hit the MTU size problem
and had to fix it...

chain forward {
type filter hook forward priority 0; policy drop;

# Count packets
iifname $DEV_PRIVATE counter name cnt_forward_out
iifname $DEV_WORLD counter name cnt_forward_in

# Allow traffic from established and related packets, drop invalid
ct state vmap { established : accept, related : accept, invalid : drop }

# Fix some sites not working correctly by hacking MTU size
# See https://unix.stackexchange.com/questions/658952/router-with-nftables-doesnt-work-well
# Also https://unix.stackexchange.com/questions/672742/why-mss-clamping-in-iptables-nft-seems-to-take-no-effect-in-nftables

tcp flags syn tcp option maxseg size set rt mtu

# connections from the internal net to the internet or to other
# internal nets are allowed
iifname $DEV_PRIVATE accept

# the rest is dropped by the above policy
}

--
Tixy

Ralph Aichinger

unread,

Jan 18, 2024, 8:40:05 AM1/18/24

to

On Thu, 2024-01-18 at 12:51 +0000, Tixy wrote:
>
> I have the same options in the forward chain except that I haven't
> qualified them with an interface name. Didn't occur to me that I
> would
> need to do that as there are only two networks my LAN and 'the
> internet'.

You probably don't need to, I just copied the example from the nftables
wiki. For my setup it might in theory make a difference because maybe
it could interfere with the use of jumbo frames on my lan, but as the
machine in question is a lowly Rasbperry Pi 4, it is a rather
theoretical aspect.

Thanks for your reply, and confirming that the maxseg line is in
principle looked sane. In looking at all the configuration again, I
noticed something else: In testing I seemingly had set the mtu of the
internal LAN interface en0 lower, to 1400. When I set that back to the
ethernet default of 1500, my setup started working suddenly, with or
without interface qualification in the maxseg (line/lines).

It never occured that I broke the MTU on the LAN side. Oh well.

Ralph -- I'll read the stackexchange links

Tixy

unread,

Jan 18, 2024, 9:40:06 AM1/18/24

to

On Thu, 2024-01-18 at 14:16 +0100, Ralph Aichinger wrote:
> On Thu, 2024-01-18 at 12:51 +0000, Tixy wrote:
> >
> > I have the same options in the forward chain except that I haven't
> > qualified them with an interface name. Didn't occur to me that I
> > would
> > need to do that as there are only two networks my LAN and 'the
> > internet'.
>
> You probably don't need to, I just copied the example from the nftables
> wiki. For my setup it might in theory make a difference because maybe
> it could interfere with the use of jumbo frames on my lan,

I'm not a network expert, but surely machines on your LAN are sending
packets direct to each other, not using this machine as a gateway?
Isn't that what the sub-net mask about? Identifying IP addresses that
are directly accessible and for any other addresses packets are sent to
the 'gateway'.

> but as the
> machine in question is a lowly Rasbperry Pi 4, it is a rather
> theoretical aspect.

Not as lowly as my SheevPlug ;-) Though to be fair, the SoC's inbuilt
ethernet and SATA devices do make it good for the use-cases it was
designed for, e.g. a NAS, or router.

--
Tixy