[syzbot] [net?] WARNING in mpls_gso_segment

12 views
Skip to first unread message

syzbot

unread,
Feb 21, 2024, 7:33:23 AMFeb 21
to da...@davemloft.net, dsa...@kernel.org, edum...@google.com, f...@strlen.de, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 4934446297c2 Merge tag 'linux-can-next-for-6.9-20240220' o..
git tree: net-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=13c5770c180000
kernel config: https://syzkaller.appspot.com/x/.config?x=970c7b6c80a096da
dashboard link: https://syzkaller.appspot.com/bug?extid=99d15fcdb0132a1e1a82
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12d1ba2c180000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1536462c180000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/adbf5d8e38d7/disk-49344462.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/0f8e3fb78410/vmlinux-49344462.xz
kernel image: https://storage.googleapis.com/syzbot-assets/682f4814bf23/bzImage-49344462.xz

The issue was bisected to:

commit 219eee9c0d16f1b754a8b85275854ab17df0850a
Author: Florian Westphal <f...@strlen.de>
Date: Fri Feb 16 11:36:57 2024 +0000

net: skbuff: add overflow debug check to pull/push helpers

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13262752180000
final oops: https://syzkaller.appspot.com/x/report.txt?x=10a62752180000
console output: https://syzkaller.appspot.com/x/log.txt?x=17262752180000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+99d15f...@syzkaller.appspotmail.com
Fixes: 219eee9c0d16 ("net: skbuff: add overflow debug check to pull/push helpers")

------------[ cut here ]------------
WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull_reason include/linux/skbuff.h:2723 [inline]
WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull include/linux/skbuff.h:2739 [inline]
WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 mpls_gso_segment+0x773/0xaa0 net/mpls/mpls_gso.c:34
Modules linked in:
CPU: 0 PID: 5068 Comm: syz-executor358 Not tainted 6.8.0-rc4-syzkaller-01071-g4934446297c2 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
RIP: 0010:pskb_may_pull_reason include/linux/skbuff.h:2723 [inline]
RIP: 0010:pskb_may_pull include/linux/skbuff.h:2739 [inline]
RIP: 0010:mpls_gso_segment+0x773/0xaa0 net/mpls/mpls_gso.c:34
Code: 48 81 c4 a0 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 0c 5e 4a f6 48 c7 c3 ea ff ff ff eb d9 e8 fe 5d 4a f6 90 <0f> 0b 90 e9 ff f9 ff ff 44 89 ef 44 89 e6 e8 aa 5f 4a f6 45 39 e5
RSP: 0018:ffffc90003aa70c8 EFLAGS: 00010293
RAX: ffffffff8b490e62 RBX: 0000000000000000 RCX: ffff888077c1d940
RDX: 0000000000000000 RSI: 00000000ffffff94 RDI: 0000000000000000
RBP: ffff8880153ced30 R08: ffffffff8b49085c R09: 1ffffffff2593084
R10: dffffc0000000000 R11: ffffffff8b4906f0 R12: ffffffffffffff94
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8880153cec80
FS: 0000555556d2a380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010000 CR3: 000000007a3e6000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
skb_mac_gso_segment+0x383/0x740 net/core/gso.c:53
nsh_gso_segment+0x40a/0xad0 net/nsh/nsh.c:108
skb_mac_gso_segment+0x383/0x740 net/core/gso.c:53
__skb_gso_segment+0x324/0x4c0 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x580/0x1120 net/core/dev.c:3611
validate_xmit_skb_list+0x95/0x130 net/core/dev.c:3661
sch_direct_xmit+0x11a/0x5f0 net/sched/sch_generic.c:327
__dev_xmit_skb net/core/dev.c:3759 [inline]
__dev_queue_xmit+0x1912/0x3b10 net/core/dev.c:4300
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x46a9/0x6130 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:745
__sys_sendto+0x3a4/0x4f0 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2199
do_syscall_64+0xf9/0x240
entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f4a33ec7169
Code: 48 83 c4 28 c3 e8 d7 19 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc1051d5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f4a33f15070 RCX: 00007f4a33ec7169
RDX: 000000000000ff88 RSI: 0000000020000180 RDI: 0000000000000004
RBP: 00007ffc1051d5c8 R08: 0000000020000140 R09: 0000000000000014
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc1051d5c4
R13: 0000000000000000 R14: 00007ffc1051d5d0 R15: 00007f4a33f15004
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Florian Westphal

unread,
Feb 21, 2024, 8:15:58 AMFeb 21
to syzbot, da...@davemloft.net, dsa...@kernel.org, edum...@google.com, f...@strlen.de, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
syzbot <syzbot+99d15f...@syzkaller.appspotmail.com> wrote:
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1536462c180000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/adbf5d8e38d7/disk-49344462.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/0f8e3fb78410/vmlinux-49344462.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/682f4814bf23/bzImage-49344462.xz
>
> The issue was bisected to:
>
> commit 219eee9c0d16f1b754a8b85275854ab17df0850a
> Author: Florian Westphal <f...@strlen.de>
> Date: Fri Feb 16 11:36:57 2024 +0000
>
> net: skbuff: add overflow debug check to pull/push helpers
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13262752180000
> final oops: https://syzkaller.appspot.com/x/report.txt?x=10a62752180000
> console output: https://syzkaller.appspot.com/x/log.txt?x=17262752180000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+99d15f...@syzkaller.appspotmail.com
> Fixes: 219eee9c0d16 ("net: skbuff: add overflow debug check to pull/push helpers")
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull_reason include/linux/skbuff.h:2723 [inline]
> WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 pskb_may_pull include/linux/skbuff.h:2739 [inline]
> WARNING: CPU: 0 PID: 5068 at include/linux/skbuff.h:2723 mpls_gso_segment+0x773/0xaa0 net/mpls/mpls_gso.c:34

Two possible solutions:

1.)

diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 533d082f0701..43801b78dd64 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -25,12 +25,13 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
netdev_features_t mpls_features;
u16 mac_len = skb->mac_len;
__be16 mpls_protocol;
- unsigned int mpls_hlen;
+ int mpls_hlen;

skb_reset_network_header(skb);
mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
- if (unlikely(!mpls_hlen || mpls_hlen % MPLS_HLEN))
+ if (unlikely(mpls_hlen <= 0 || mpls_hlen % MPLS_HLEN))
goto out;
+
if (unlikely(!pskb_may_pull(skb, mpls_hlen)))
goto out;

(or a variation thereof).

2) revert the pskb_may_pull_reason change added in 219eee9c0d16f1b754a8 to
make it tolerant to "negative" (huge) may-pull requests again.

With above repro, skb_inner_network_header() yields 0, skb_network_header()
returns 108, so we "pskb_may_pull(skb, -108)))" which now triggers
DEBUG_NET_WARN_ON_ONCE() check.

Before blamed commit, this would make pskb_may_pull hit:

if (unlikely(len > skb->len))
return SKB_DROP_REASON_PKT_TOO_SMALL;

and mpls_gso_segment takes the 'goto out' label.

So question is really if we should fix this in mpls_gso (and possible others
that try to pull negative numbers...) or if we should legalize this, either by
adding explicit if (unlikely(len > INT_MAX)) test to pskb_may_pull_reason or
by adding a comment that negative 'len' numbers are expected to be caught by
the check vs. skb->len.

Opinions?

Lizhi Xu

unread,
Feb 21, 2024, 10:15:44 PMFeb 21
to syzbot+99d15f...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git main

diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 533d082f0701..2ab24b2fd90f 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -25,11 +25,11 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
netdev_features_t mpls_features;
u16 mac_len = skb->mac_len;
__be16 mpls_protocol;
- unsigned int mpls_hlen;
+ int mpls_hlen;

skb_reset_network_header(skb);
mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
- if (unlikely(!mpls_hlen || mpls_hlen % MPLS_HLEN))
+ if (unlikely(mpls_hlen <= 0 || mpls_hlen % MPLS_HLEN))
goto out;

syzbot

unread,
Feb 21, 2024, 10:41:05 PMFeb 21
to linux-...@vger.kernel.org, lizh...@windriver.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+99d15f...@syzkaller.appspotmail.com

Tested on:

commit: 6d5c3656 PPPoL2TP: Add more code snippets
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git main
console output: https://syzkaller.appspot.com/x/log.txt?x=1234b4f8180000
kernel config: https://syzkaller.appspot.com/x/.config?x=970c7b6c80a096da
dashboard link: https://syzkaller.appspot.com/bug?extid=99d15fcdb0132a1e1a82
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=11727f68180000

Note: testing is done by a robot and is best-effort only.

Lizhi Xu

unread,
Feb 21, 2024, 11:00:57 PMFeb 21
to syzbot+99d15f...@syzkaller.appspotmail.com, da...@davemloft.net, dsa...@kernel.org, edum...@google.com, f...@strlen.de, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
When the network header pointer is greater than the inner network header, the
difference between the two can cause mpls_hlen overflow.

Reported-and-tested-by: syzbot+99d15f...@syzkaller.appspotmail.com
Signed-off-by: Lizhi Xu <lizh...@windriver.com>
---
net/mpls/mpls_gso.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 533d082f0701..2ab24b2fd90f 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -25,11 +25,11 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
netdev_features_t mpls_features;
u16 mac_len = skb->mac_len;
__be16 mpls_protocol;
- unsigned int mpls_hlen;
+ int mpls_hlen;

skb_reset_network_header(skb);
mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
- if (unlikely(!mpls_hlen || mpls_hlen % MPLS_HLEN))
+ if (unlikely(mpls_hlen <= 0 || mpls_hlen % MPLS_HLEN))
goto out;
if (unlikely(!pskb_may_pull(skb, mpls_hlen)))
goto out;
--
2.43.0

Eric Dumazet

unread,
Feb 22, 2024, 3:11:31 AMFeb 22
to Lizhi Xu, syzbot+99d15f...@syzkaller.appspotmail.com, da...@davemloft.net, dsa...@kernel.org, f...@strlen.de, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
I think Florian posted this patch, right ?

We must add a Fixes: tag

Also we should ask ourselves :
Why are we even looking at skb_inner_network_header(skb) if this was not set ?

Lets not hide a real bug, we need to understand the root cause.

Eric Dumazet

unread,
Feb 22, 2024, 3:14:29 AMFeb 22
to Florian Westphal, syzbot, da...@davemloft.net, dsa...@kernel.org, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
I guess we should try this, or perhaps understand why
skb->encapsulation might not be set,
or why skb_inner_network_header(skb) is not set at this point.

>
> (or a variation thereof).
>
> 2) revert the pskb_may_pull_reason change added in 219eee9c0d16f1b754a8 to
> make it tolerant to "negative" (huge) may-pull requests again.
>
> With above repro, skb_inner_network_header() yields 0, skb_network_header()
> returns 108, so we "pskb_may_pull(skb, -108)))" which now triggers
> DEBUG_NET_WARN_ON_ONCE() check.
>
> Before blamed commit, this would make pskb_may_pull hit:
>
> if (unlikely(len > skb->len))
> return SKB_DROP_REASON_PKT_TOO_SMALL;
>
> and mpls_gso_segment takes the 'goto out' label.
>
> So question is really if we should fix this in mpls_gso (and possible others
> that try to pull negative numbers...) or if we should legalize this, either by
> adding explicit if (unlikely(len > INT_MAX)) test to pskb_may_pull_reason or
> by adding a comment that negative 'len' numbers are expected to be caught by
> the check vs. skb->len.
>
> Opinions?

Lets live without 2) for a while, try to fix callers ?

Florian Westphal

unread,
Feb 22, 2024, 7:23:42 AMFeb 22
to Eric Dumazet, Florian Westphal, syzbot, da...@davemloft.net, dsa...@kernel.org, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Eric Dumazet <edum...@google.com> wrote:
> I guess we should try this, or perhaps understand why
> skb->encapsulation might not be set,
> or why skb_inner_network_header(skb) is not set at this point.

syz repro injects data via packet socket, skb passed down stack
has ->protocol set to NSH (0x894f), gso type is SKB_GSO_UDP | SKB_GSO_DODGY.

This gets passed down to skb_mac_gso_segment(), which sees NSH as ptype
callback.

nsh_gso_segment() retrieves next type:

proto = tun_p_to_eth_p(nsh_hdr(skb)->np);

... which is mpls (TUN_P_MPLS_UC), it then updates
skb->protocol. This calls back into skb_mac_gso_segment() which
sees MPLS as ptype callback, we then end up in mpls_gso_segment()
without any inner headers set (skb->encapsulation is not set,
inner header offsets are 0) and mpls_gso_segment() attempts to pull
negative header size off the skb.

I don't see anything that could be done earlier in the stack about this.

As far as I understand NSH assumes its only called from openvswitch
and MPLS GSO code only via Openvswitch or mpls_iptunnel, but its
reachable by other means.

But skb_mac_gso_segment() doesn't have any info on the originator
to know if it can call into nsh or mpls 'as intended'.

So I'd guess best solution is to explicitly check for negative
header size, plus a comment that explains how this could happen.

Eric Dumazet

unread,
Feb 22, 2024, 7:30:01 AMFeb 22
to Florian Westphal, syzbot, da...@davemloft.net, dsa...@kernel.org, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
SGTM, please send the patch with all this analysis captured in the changelog.

I was thinking about adding a debug check in skb_inner_network_header(skb)
if inner_network_header is zero (that would mean it is not 'set' yet),
but this would trigger even after your patch.

Florian Westphal

unread,
Feb 22, 2024, 7:57:47 AMFeb 22
to Eric Dumazet, Florian Westphal, syzbot, da...@davemloft.net, dsa...@kernel.org, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
What about adding:

static inline bool skb_inner_network_header_was_set(const struct sk_buff *skb)
{
return skb->inner_network_header > 0;
}

... and using that instead of checking for negative header length
post-subtraction?

Eric Dumazet

unread,
Feb 22, 2024, 8:27:36 AMFeb 22
to Florian Westphal, syzbot, da...@davemloft.net, dsa...@kernel.org, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
SGTM

Lizhi Xu

unread,
Feb 22, 2024, 10:30:24 PMFeb 22
to edum...@google.com, da...@davemloft.net, dsa...@kernel.org, f...@strlen.de, ho...@kernel.org, ku...@kernel.org, linux-...@vger.kernel.org, lizh...@windriver.com, net...@vger.kernel.org, pab...@redhat.com, syzbot+99d15f...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
After you mentioned it, I discovered it.
I forgot to check the email details before sending the patch.
>
> We must add a Fixes: tag
>
> Also we should ask ourselves :
> Why are we even looking at skb_inner_network_header(skb) if this was not set ?

__sys_sendto()->
__sock_sendmsg()->
netlink_sendmsg()->
netlink_broadcast_filtered()->
netlink_trim()->
1323 pskb_expand_head(skb, 0, -delta,
1 (allocation & ~__GFP_DIRECT_RECLAIM) |
2 __GFP_NOWARN | __GFP_NORETRY);
pskb_expand_head()->
skb_headers_offset_update()->
1977 skb->inner_network_header += off;

The root cause is:
Initialize inner_network_header to 0 in network_trim(), and without any other
assignment operations.
Reply all
Reply to author
Forward
0 new messages