[syzbot] [net?] INFO: rcu detected stall in unix_release

8 views
Skip to first unread message

syzbot

unread,
Aug 13, 2023, 4:46:01 PM8/13/23
to b...@vger.kernel.org, bra...@kernel.org, da...@davemloft.net, edum...@google.com, ji...@nvidia.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: d0378ae6d16c Merge branch 'enetc-probe-fix'
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=1052ea2ba80000
kernel config: https://syzkaller.appspot.com/x/.config?x=fa5bd4cd5ab6259d
dashboard link: https://syzkaller.appspot.com/bug?extid=a3618a167af2021433cd
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1152c6eda80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13b1eddda80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/c893f52cd6ab/disk-d0378ae6.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/dfb7a8b86a99/vmlinux-d0378ae6.xz
kernel image: https://storage.googleapis.com/syzbot-assets/cb9134e0a22c/bzImage-d0378ae6.xz

The issue was bisected to:

commit c2368b19807affd7621f7c4638cd2e17fec13021
Author: Jiri Pirko <ji...@nvidia.com>
Date: Fri Jul 29 07:10:35 2022 +0000

net: devlink: introduce "unregistering" mark and use it during devlinks iteration

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=134f1179a80000
final oops: https://syzkaller.appspot.com/x/report.txt?x=10cf1179a80000
console output: https://syzkaller.appspot.com/x/log.txt?x=174f1179a80000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a3618a...@syzkaller.appspotmail.com
Fixes: c2368b19807a ("net: devlink: introduce "unregistering" mark and use it during devlinks iteration")

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 0-....: (10499 ticks this GP) idle=9774/1/0x4000000000000000 softirq=8757/8758 fqs=5219
rcu: hardirqs softirqs csw/system
rcu: number: 1 0 0
rcu: cputime: 26308 26181 17 ==> 52490(ms)
rcu: (t=10500 jiffies g=8417 q=457 ncpus=2)
CPU: 0 PID: 5047 Comm: syz-executor224 Not tainted 6.5.0-rc4-syzkaller-00212-gd0378ae6d16c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:taprio_dequeue_tc_priority+0x263/0x4b0 net/sched/sch_taprio.c:798
Code: 8b 74 24 10 89 ef 44 89 f6 e8 29 b8 2c f9 44 39 f5 0f 84 40 ff ff ff e8 2b bd 2c f9 49 83 ff 0f 0f 87 e1 01 00 00 48 8b 04 24 <0f> b6 00 38 44 24 36 7c 08 84 c0 0f 85 bf 01 00 00 8b 33 8b 4c 24
RSP: 0018:ffffc90000007d60 EFLAGS: 00000293
RAX: ffffed10047a4a72 RBX: ffff888023d25394 RCX: 0000000000000100
RDX: ffff888028efbb80 RSI: ffffffff88594af5 RDI: 0000000000000004
RBP: 0000000000000008 R08: 0000000000000004 R09: 0000000000000008
R10: 0000000000000000 R11: ffffc90000007ff8 R12: 0000000000000010
R13: ffff88802d19ab60 R14: 0000000000000000 R15: 0000000000000001
FS: 0000555555857380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000600 CR3: 000000002cdd1000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
taprio_dequeue+0x12e/0x5f0 net/sched/sch_taprio.c:868
dequeue_skb net/sched/sch_generic.c:292 [inline]
qdisc_restart net/sched/sch_generic.c:397 [inline]
__qdisc_run+0x1c4/0x19d0 net/sched/sch_generic.c:415
qdisc_run include/net/pkt_sched.h:125 [inline]
qdisc_run include/net/pkt_sched.h:122 [inline]
net_tx_action+0x71e/0xc80 net/core/dev.c:5049
__do_softirq+0x218/0x965 kernel/softirq.c:553
invoke_softirq kernel/softirq.c:427 [inline]
__irq_exit_rcu kernel/softirq.c:632 [inline]
irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644
sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1109
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
RIP: 0010:unwind_next_frame+0x5ba/0x2020 arch/x86/kernel/unwind_orc.c:517
Code: 31 02 00 00 41 80 fe 04 0f 84 08 0c 00 00 41 80 fe 05 0f 85 d7 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 14 24 48 c1 ea 03 <80> 3c 02 00 0f 85 42 19 00 00 48 89 c8 4d 8b 7d 38 48 ba 00 00 00
RSP: 0018:ffffc90003b9f748 EFLAGS: 00000a02
RAX: dffffc0000000000 RBX: 0000000000000001 RCX: ffffffff8f3ed5c8
RDX: 1ffff92000773efe RSI: 0000000000000001 RDI: ffffffff8ec31910
RBP: ffffc90003b9f800 R08: ffffffff8f3ed646 R09: ffffffff8f3ed5cc
R10: ffffc90003b9f7b8 R11: 000000000000d9e9 R12: ffffc90003b9f808
R13: ffffc90003b9f7b8 R14: 0000000000000005 R15: 0000000000000000
arch_stack_walk+0x8b/0xf0 arch/x86/kernel/stacktrace.c:25
stack_trace_save+0x96/0xd0 kernel/stacktrace.c:122
kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:522
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free+0x15e/0x1b0 mm/kasan/common.c:200
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1792 [inline]
slab_free_freelist_hook+0x10b/0x1e0 mm/slub.c:1818
slab_free mm/slub.c:3801 [inline]
kmem_cache_free+0xf0/0x490 mm/slub.c:3823
sk_prot_free net/core/sock.c:2122 [inline]
__sk_destruct+0x49e/0x770 net/core/sock.c:2216
sk_destruct+0xc2/0xf0 net/core/sock.c:2231
__sk_free+0xc4/0x3a0 net/core/sock.c:2242
sk_free+0x7c/0xa0 net/core/sock.c:2253
sock_put include/net/sock.h:1975 [inline]
unix_release_sock+0xa76/0xf70 net/unix/af_unix.c:668
unix_release+0x88/0xe0 net/unix/af_unix.c:1065
__sock_release+0xcd/0x290 net/socket.c:654
sock_close+0x1c/0x20 net/socket.c:1386
__fput+0x3fd/0xac0 fs/file_table.c:384
task_work_run+0x14d/0x240 kernel/task_work.c:179
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x210/0x240 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297
do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fc3bb116ef7
Code: 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8
RSP: 002b:00007ffd1d8ead88 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007fc3bb116ef7
RDX: 0000000000000000 RSI: 0000000000008933 RDI: 0000000000000004
RBP: 00007ffd1d8ead90 R08: 0000000000000008 R09: 0000000000000004
R10: 000000000000000b R11: 0000000000000246 R12: 00007ffd1d8eafc0
R13: 00003faaaaaaaaaa R14: 00007ffd1d8eaff0 R15: 00007fc3bb164376
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Hillf Danton

unread,
Aug 14, 2023, 6:42:27 AM8/14/23
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Sun, 13 Aug 2023 13:45:59 -0700
> HEAD commit: d0378ae6d16c Merge branch 'enetc-probe-fix'
> git tree: net
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13b1eddda80000

Check if it is a rcu hog for taprio to dequeue.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git d0378ae6d16c

--- x/net/sched/sch_taprio.c
+++ y/net/sched/sch_taprio.c
@@ -787,6 +787,7 @@ static struct sk_buff *taprio_dequeue_tc
int num_tc = netdev_get_num_tc(dev);
struct sk_buff *skb;
int tc;
+ int loop = 0;

for (tc = num_tc - 1; tc >= 0; tc--) {
int first_txq = q->cur_txq[tc];
@@ -805,6 +806,8 @@ static struct sk_buff *taprio_dequeue_tc

if (skb)
return skb;
+ if (loop++ > 50)
+ return NULL;
} while (q->cur_txq[tc] != first_txq);
}

--

syzbot

unread,
Aug 14, 2023, 7:21:35 AM8/14/23
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+a3618a...@syzkaller.appspotmail.com

Tested on:

commit: d0378ae6 Merge branch 'enetc-probe-fix'
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1321419ba80000
kernel config: https://syzkaller.appspot.com/x/.config?x=fa5bd4cd5ab6259d
dashboard link: https://syzkaller.appspot.com/bug?extid=a3618a167af2021433cd
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=15a23e03a80000

Note: testing is done by a robot and is best-effort only.

Jakub Kicinski

unread,
Aug 14, 2023, 7:03:07 PM8/14/23
to syzbot, Vladimir Oltean, b...@vger.kernel.org, bra...@kernel.org, da...@davemloft.net, edum...@google.com, ji...@nvidia.com, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
Hi Vladimir, any ideas for this one?
The bisection looks pooped, FWIW, looks like a taprio inf loop.

Vladimir Oltean

unread,
Aug 15, 2023, 7:38:47 AM8/15/23
to Jakub Kicinski, syzbot, b...@vger.kernel.org, bra...@kernel.org, da...@davemloft.net, edum...@google.com, ji...@nvidia.com, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
On Mon, Aug 14, 2023 at 04:03:03PM -0700, Jakub Kicinski wrote:
> Hi Vladimir, any ideas for this one?
> The bisection looks pooped, FWIW, looks like a taprio inf loop.

I'm looking into it.

Vladimir Oltean

unread,
Aug 16, 2023, 6:58:08 PM8/16/23
to Jakub Kicinski, Jamal Hadi Salim, Cong Wang, Pedro Tammela, Victor Nogueira, syzbot, b...@vger.kernel.org, bra...@kernel.org, da...@davemloft.net, edum...@google.com, ji...@nvidia.com, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, Vinicius Costa Gomes
Hi Jakub,
Here's what I've found out and what help I'll need going forward.

Indeed there is an infinite loop in taprio_dequeue() -> taprio_dequeue_tc_priority(),
leading to an RCU stall.

Short description of taprio_dequeue_tc_priority(): it cycles
q->cur_txq[tc] in the range between [ offset, offset + count ), where:

int offset = dev->tc_to_txq[tc].offset;
int count = dev->tc_to_txq[tc].count;

with the initial q->cur_txq[tc], aka the "first_txq" variable, being set
by the control path: taprio_change(), also called by taprio_init():

if (mqprio) {
(...)
for (i = 0; i < mqprio->num_tc; i++) {
(...)
q->cur_txq[i] = mqprio->offset[i];
}
}

In the buggy case that leads to the RCU stall, the line in taprio_change()
which sets q->cur_txq[i] never gets executed. So first_txq will be 0
(pre-initialized memory), and if that's outside of the [ offset, offset + count )
range that taprio_dequeue_tc_priority() -> taprio_next_tc_txq() expects
to cycle through, the kernel is toast.

The nitty gritty of that is boring. What's not boring is how come the
control path skips the q->cur_txq[i] assignment. It's because "mqprio"
is NULL, and that's because taprio_change() (important: also tail-called
from taprio_init()) has this logic to detect a change in the traffic
class settings of the device, compared to the passed TCA_TAPRIO_ATTR_PRIOMAP
netlink attribute:

/* no changes - no new mqprio settings */
if (!taprio_mqprio_cmp(q, dev, mqprio))
mqprio = NULL;

And what happens is that:
- we go through taprio_init()
- a TCA_TAPRIO_ATTR_PRIOMAP gets passed to us
- taprio_mqprio_cmp() sees that there's no change compared to the
netdev's existing traffic class config
- taprio_change() sets "mqprio" to NULL, ignoring the given
TCA_TAPRIO_ATTR_PRIOMAP
- we skip modifying q->cur_txq[i], as if it was a taprio_change() call
that came straight from Qdisc_ops :: change(), rather than what it
really is: one from Qdisc_ops :: init()

So the next question: why does taprio_mqprio_cmp() see that there's no
change? Because there is no change. When Qdisc_ops :: init() is called,
the netdev really has a non-zero dev->num_tc, prio_tc_map, tc_to_txq and
all that.

But why? A previous taprio, if that existed, will call taprio_destroy()
-> netdev_reset_tc(), so it won't leave state behind that will hinder
the current taprio. Checking for stuff in the netdev state is just so
that taprio_change() can distinguish between a direct Qdisc_ops :: change()
call vs one coming from init().

Finally, here's where the syzbot repro becomes relevant. It crafts the
RTM_NEWQDISC netlink message in such a way, that it makes tc_modify_qdisc()
in sch_api.c call a Qdisc_ops sequence with which taprio wasn't written
in mind.

With "tc qdisc replace && tc qdisc replace", tc_modify_qdisc() is
supposed to call init() the first time and replace() the second time.
What the repro does is make the above sequence call two init() methods
back to back.

To create an iproute2-based reproducer rather than the C one provided by
syzbot, we need this iproute2 change:

diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c
index 56086c43b7fa..20d9622b6bf3 100644
--- a/tc/tc_qdisc.c
+++ b/tc/tc_qdisc.c
@@ -448,6 +448,8 @@ int do_qdisc(int argc, char **argv)
return tc_qdisc_modify(RTM_NEWQDISC, NLM_F_EXCL|NLM_F_CREATE, argc-1, argv+1);
if (matches(*argv, "change") == 0)
return tc_qdisc_modify(RTM_NEWQDISC, 0, argc-1, argv+1);
+ if (strcmp(*argv, "replace-exclusive") == 0)
+ return tc_qdisc_modify(RTM_NEWQDISC, NLM_F_CREATE|NLM_F_REPLACE|NLM_F_EXCL, argc-1, argv+1);
if (matches(*argv, "replace") == 0)
return tc_qdisc_modify(RTM_NEWQDISC, NLM_F_CREATE|NLM_F_REPLACE, argc-1, argv+1);
if (matches(*argv, "link") == 0)

which basically implements a crafted alternative of "tc qdisc replace"
which also sets the NLM_F_EXCL flag in n->nlmsg_flags.

Then, the minimal repro script can simply be expressed as:

#!/bin/bash

ip link add veth0 numtxqueues 16 numrxqueues 16 type veth peer name veth1
ip link set veth0 up && ip link set veth1 up

for ((i = 0; i < 2; i++)); do
tc qdisc replace-exclusive dev veth0 root stab overhead 24 taprio \
num_tc 2 map 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \
queues 8@0 4@8 \
clockid REALTIME \
base-time 0 \
cycle-time 61679 \
sched-entry S 0 54336 \
sched-entry S 0x8a27 7343 \
max-sdu 18343 18343 \
flags 0
done

ip link del veth0

Here's how things go sideways if sch_api.c goes through the Qdisc_ops :: init()
code path instead of change() for the second Qdisc.

The first taprio_attach() (i=0) will attach the root taprio Qdisc (aka itself)
to all netdev TX queues, and qdisc_put() the existing pfifo default Qdiscs.

When the taprio_init() method executes for i=1, taprio_destroy() hasn't
been called yet. So neither has netdev_reset_tc() been called, and
that's part of the problem (the one that causes the infinite loop in
dequeue()).

But, taprio_destroy() will finally get called for the initial taprio
created at i=0. The call trace looks like this:

rtnetlink_rcv_msg()
-> tc_modify_qdisc()
-> qdisc_graft()
-> taprio_attach() for i=1
-> qdisc_put() for the old Qdiscs attached to the TX queues, aka the taprio from i=0
-> __qdisc_destroy()
-> taprio_destroy()

What's more interesting is that the late taprio_destroy() for i=0
effectively destroys the netdev state - the netdev_reset_tc() call -
done by taprio_init() -> taprio_change() for i=1, and that can't be
too good, either. Even if there's no immediately observable hang, the
traffic classes are reset even though the Qdisc thinks they aren't.

Taprio isn't the only one affected by this. Mqprio also has the pattern
of calling netdev_set_num_tc() from Qdisc_ops :: init() and destroy().
But with the possibility of destroy(i=0) not being serialized with
init(i=1), that's buggy.

Sorry for the long message. This is where I'm at. For me, this is the
bottom of where things are intuitive. I don't understand what is
considered to be expected behavior from tc_modify_qdisc(), and what is
considered to be sane Qdisc-facing API, and I need help.

I've completely stopped debugging when I saw that the code enters
through this path at i=1, so I really can't tell you more:

/* This magic test requires explanation.
*
* We know, that some child q is already
* attached to this parent and have choice:
* either to change it or to create/graft new one.
*
* 1. We are allowed to create/graft only
* if CREATE and REPLACE flags are set.
*
* 2. If EXCL is set, requestor wanted to say,
* that qdisc tcm_handle is not expected
* to exist, so that we choose create/graft too.
*
* 3. The last case is when no flags are set.
* Alas, it is sort of hole in API, we
* cannot decide what to do unambiguously.
* For now we select create/graft, if
* user gave KIND, which does not match existing.
*/
if ((n->nlmsg_flags & NLM_F_CREATE) &&
(n->nlmsg_flags & NLM_F_REPLACE) &&
((n->nlmsg_flags & NLM_F_EXCL) ||
(tca[TCA_KIND] &&
nla_strcmp(tca[TCA_KIND], q->ops->id)))) {
netdev_err(dev, "magic test\n");
goto create_n_graft;
}

I've added more Qdisc people to the discussion. The problem description
is pretty much self-contained in this email, and going to the original
syzbot report won't bring much else.

There are multiple workarounds that can be done in taprio (and mqprio)
depending on what is considered as being sane API. Though I don't want
to get ahead of myself. Maybe there is a way to fast-forward the
qdisc_destroy() of the previous taprio so it doesn't overlap with the
new one's qdisc_create().

Jakub Kicinski

unread,
Aug 16, 2023, 10:58:56 PM8/16/23
to Vladimir Oltean, Jamal Hadi Salim, Cong Wang, Pedro Tammela, Victor Nogueira, syzbot, b...@vger.kernel.org, bra...@kernel.org, da...@davemloft.net, edum...@google.com, ji...@nvidia.com, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, Vinicius Costa Gomes
On Thu, 17 Aug 2023 01:57:59 +0300 Vladimir Oltean wrote:
> There are multiple workarounds that can be done in taprio (and mqprio)
> depending on what is considered as being sane API. Though I don't want
> to get ahead of myself. Maybe there is a way to fast-forward the
> qdisc_destroy() of the previous taprio so it doesn't overlap with the
> new one's qdisc_create().

Thanks for the details. I'm going to let others comment, but sounds
a bit similar to the recent problem with the ingress qdisc. The qdisc
expects to own the netdev which explodes when its lifetime rules are
fully exercised :(

Jamal Hadi Salim

unread,
Aug 17, 2023, 12:30:39 PM8/17/23
to Vladimir Oltean, Jakub Kicinski, Cong Wang, Pedro Tammela, Victor Nogueira, syzbot, b...@vger.kernel.org, bra...@kernel.org, da...@davemloft.net, edum...@google.com, ji...@nvidia.com, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, Vinicius Costa Gomes
I will take a look tommorow.

cheers,
jamal

Jamal Hadi Salim

unread,
Aug 18, 2023, 11:27:40 AM8/18/23
to Vladimir Oltean, Jakub Kicinski, Cong Wang, Pedro Tammela, Victor Nogueira, syzbot, b...@vger.kernel.org, bra...@kernel.org, da...@davemloft.net, edum...@google.com, ji...@nvidia.com, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, Vinicius Costa Gomes
Can you try the attached patchlet?

cheers,
jamal
patchlet-qdisc

Vladimir Oltean

unread,
Aug 18, 2023, 12:07:18 PM8/18/23
to Jamal Hadi Salim, Jakub Kicinski, Cong Wang, Pedro Tammela, Victor Nogueira, syzbot, b...@vger.kernel.org, bra...@kernel.org, da...@davemloft.net, edum...@google.com, ji...@nvidia.com, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, Vinicius Costa Gomes
Hi Jamal,

On Fri, Aug 18, 2023 at 11:27:27AM -0400, Jamal Hadi Salim wrote:
> Can you try the attached patchlet?

Thanks for the patch. I've tried it, and it eliminates the code path
(and thus the problem) exposed by the syzbot program, by responding to
RTM_NEWQDISC messages having the NLM_F_CREATE|NLM_F_REPLACE|NLM_F_EXCL
flags with "Error: Exclusivity flag on, cannot modify.".

Actually, to be precise, the first such netlink message successfully
creates the qdisc, but then the subsequent ones leave that qdisc alone
(don't change it), by failing with this extack message.

If that's the behavior that you intended, then I guess the answer is
that it works. Thanks a lot.

What would be an appropriate Fixes: tag?

Side note: I believe that we can now also revert commit be3618d96510
("net/sched: taprio: fix slab-out-of-bounds Read in taprio_dequeue_from_txq"),
which was papering over an unknown (at the time) issue - the same as
this one - without really even completely covering it, either. Hence
this other syzbot report.
https://lore.kernel.org/netdev/3b977f76-0289-270e...@huawei.com/T/
https://lore.kernel.org/netdev/20230608062756.3626...@huawei.com/

Jamal Hadi Salim

unread,
Aug 18, 2023, 1:11:01 PM8/18/23
to Vladimir Oltean, Jakub Kicinski, Cong Wang, Pedro Tammela, Victor Nogueira, syzbot, b...@vger.kernel.org, bra...@kernel.org, da...@davemloft.net, edum...@google.com, ji...@nvidia.com, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, Vinicius Costa Gomes, Zhengchao Shao
Hi Vladimir,

On Fri, Aug 18, 2023 at 12:07 PM Vladimir Oltean
<vladimi...@nxp.com> wrote:
>
> Hi Jamal,
>
> On Fri, Aug 18, 2023 at 11:27:27AM -0400, Jamal Hadi Salim wrote:
> > Can you try the attached patchlet?
>
> Thanks for the patch. I've tried it, and it eliminates the code path
> (and thus the problem) exposed by the syzbot program, by responding to
> RTM_NEWQDISC messages having the NLM_F_CREATE|NLM_F_REPLACE|NLM_F_EXCL
> flags with "Error: Exclusivity flag on, cannot modify.".
>

Ok, that is more of the expected behavior.
Noone should ever send that mumbo-jumbo (I doubt there is a "legit"
control app that will do that).

> Actually, to be precise, the first such netlink message successfully
> creates the qdisc, but then the subsequent ones leave that qdisc alone
> (don't change it), by failing with this extack message.
>

Yes, the first one will succeed because the root qdisc hasnt been
grafted yet (and the only interesting bit is NLM_F_CREATE. everything
else is ignored).

> If that's the behavior that you intended, then I guess the answer is
> that it works. Thanks a lot.
>
> What would be an appropriate Fixes: tag?
>

This should have been from early days when we trusted that iproute2
would do the right thing. I will look.
I dont think this is a taprio only potential victim, it's just that
syzbot was able to aggravate taprio sooner (it probably would have got
to some other qdisc later in its adventures).

> Side note: I believe that we can now also revert commit be3618d96510
> ("net/sched: taprio: fix slab-out-of-bounds Read in taprio_dequeue_from_txq"),
> which was papering over an unknown (at the time) issue - the same as
> this one - without really even completely covering it, either.

Unfortunately the commit log is not helpful - i cant tell what
"replace" means and cant seem to find the repro either. If you revert
it and see the problem going away then we are good.
+Cc Zhengchao Shao <shaozh...@huawei.com>
Makes sense.
BTW, thanks for your report - it made it faster to zone on the issue.
The comments above that code also need a bit of fixing to provide clarity.

cheers,
jamal
Reply all
Reply to author
Forward
0 new messages