[syzbot] [net?] INFO: rcu detected stall in packet

syzbot

unread,

May 27, 2024, 8:16:34 AMMay 27

to da...@davemloft.net, edum...@google.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, willemdebr...@gmail.com

Hello,

syzbot found the following issue on:

HEAD commit: 3ab5720881a9 net: phy: at803x: replace msleep(1) with usle..
git tree: net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16ac8a6ee80000
kernel config: https://syzkaller.appspot.com/x/.config?x=8f565e10f0b1e1fc
dashboard link: https://syzkaller.appspot.com/bug?extid=a7d2b1d5d1af83035567
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1086b376e80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16760ccee80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/ab97503560c5/disk-3ab57208.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/ca56b0dccaf8/vmlinux-3ab57208.xz
kernel image: https://storage.googleapis.com/syzbot-assets/03161a7d4885/bzImage-3ab57208.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a7d2b1...@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 0-...0: (1 GPs behind) idle=38bc/1/0x4000000000000000 softirq=6131/6133 fqs=5249
rcu: hardirqs softirqs csw/system
rcu: number: 0 0 0
rcu: cputime: 0 0 0 ==> 52510(ms)
rcu: (detected by 1, t=10502 jiffies, g=7301, q=303 ncpus=2)
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 5101 Comm: syz-executor577 Not tainted 6.7.0-rc5-syzkaller-01533-g3ab5720881a9 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:arch_atomic64_read arch/x86/include/asm/atomic64_64.h:15 [inline]
RIP: 0010:raw_atomic64_read include/linux/atomic/atomic-arch-fallback.h:2569 [inline]
RIP: 0010:atomic64_read include/linux/atomic/atomic-instrumented.h:1597 [inline]
RIP: 0010:taprio_set_budgets+0x144/0x310 net/sched/sch_taprio.c:681
Code: e8 03 48 89 44 24 28 e9 c1 00 00 00 e8 25 f6 e4 f8 48 8b 7c 24 20 be 08 00 00 00 e8 06 f7 3b f9 48 8b 44 24 28 42 80 3c 38 00 <0f> 85 6a 01 00 00 4c 63 64 24 08 48 8b 44 24 18 49 83 fc 0f 4c 8b
RSP: 0018:ffffc90000007d20 EFLAGS: 00000046
RAX: 1ffff110035e3e5c RBX: ffff8880152ecc84 RCX: ffffffff88a2a78a
RDX: ffffed10035e3e5d RSI: 0000000000000008 RDI: ffff88801af1f2e0
RBP: 0000000000000001 R08: 0000000000000000 R09: ffffed10035e3e5c
R10: ffff88801af1f2e7 R11: 0000000000000002 R12: 0000000004000000
R13: ffff8880152ecc08 R14: 0000000000000008 R15: dffffc0000000000
FS: 0000555556770380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000600 CR3: 0000000072c1b000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<IRQ>
advance_sched+0x5e1/0xc60 net/sched/sch_taprio.c:988
__run_hrtimer kernel/time/hrtimer.c:1688 [inline]
__hrtimer_run_queues+0x203/0xc20 kernel/time/hrtimer.c:1752
hrtimer_interrupt+0x31b/0x800 kernel/time/hrtimer.c:1814
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1065 [inline]
__sysvec_apic_timer_interrupt+0x105/0x400 arch/x86/kernel/apic/apic.c:1082
sysvec_apic_timer_interrupt+0x90/0xb0 arch/x86/kernel/apic/apic.c:1076
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:649
RIP: 0010:queue_work_on+0x92/0x110 kernel/workqueue.c:1836
Code: ff 48 89 ee e8 9f c4 31 00 48 85 ed 75 3b e8 05 c9 31 00 9c 5b 81 e3 00 02 00 00 31 ff 48 89 de e8 83 c4 31 00 48 85 db 75 66 <e8> e9 c8 31 00 44 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 e8 d6 c8
RSP: 0018:ffffc9000430f9d8 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff8155d4ed
RDX: ffff888072d5d940 RSI: ffffffff8155d4f7 RDI: 0000000000000007
RBP: 0000000000000200 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000008
R13: 0000000000000001 R14: ffff888013072800 R15: ffff888076207200
queue_work include/linux/workqueue.h:562 [inline]
synchronize_rcu_expedited_queue_work kernel/rcu/tree_exp.h:519 [inline]
synchronize_rcu_expedited+0x5a2/0x800 kernel/rcu/tree_exp.h:1006
synchronize_rcu+0x2f5/0x3b0 kernel/rcu/tree.c:3568
synchronize_net+0x4e/0x60 net/core/dev.c:10989
packet_release+0xb2c/0xdd0 net/packet/af_packet.c:3167
__sock_release+0xae/0x260 net/socket.c:659
sock_close+0x1c/0x20 net/socket.c:1419
__fput+0x270/0xbb0 fs/file_table.c:394
__fput_sync+0x47/0x50 fs/file_table.c:475
__do_sys_close fs/open.c:1590 [inline]
__se_sys_close fs/open.c:1575 [inline]
__x64_sys_close+0x87/0xf0 fs/open.c:1575
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fd03423c0c0
Code: ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d e1 df 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
RSP: 002b:00007ffc9b2872b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fd03423c0c0
RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000003
RBP: 00000000000f4240 R08: 0000000000000000 R09: 0000000100000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffc9b287310
R13: 0000000000030165 R14: 00007ffc9b2872dc R15: 0000000000000003
</TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.242 msecs

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Eric Dumazet

unread,

May 27, 2024, 8:43:37 AMMay 27

to syzbot, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, willemdebr...@gmail.com, Vladimir Oltean, Vinicius Costa Gomes

This is another manifestation of a long standing taprio bug.

Vladimir Oltean

unread,

May 27, 2024, 10:02:01 AMMay 27

to Eric Dumazet, syzbot, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, willemdebr...@gmail.com, Vinicius Costa Gomes

On Mon, May 27, 2024 at 02:43:19PM +0200, Eric Dumazet wrote:
> This is another manifestation of a long standing taprio bug.

Thanks for the heads up. I will send some patches after some testing.

Radoslaw

unread,

May 28, 2024, 8:03:58 AMMay 28

to syzkaller-bugs

Hello,

I'm working on similar taprio bug: https://syzkaller.appspot.com/bug?extid=c4c6c3dc10cc96bcf723
I think I know what is the root cause.

The function advance_sched() [https://elixir.bootlin.com/linux/v5.10.173/source/net/sched/sch_taprio.c#L696] runs repeatedly. It is executed using HRTimer. In every call to advance_sched(), end_time is calculated, and the timer is set so that the next execution will be at end_time. To achieve this, first, the expiration time is set using hrtimer_set_expires(), and second, HRTIMER_RESTART is returned. This means that the timer is re-enqueued with the adjusted expiration time. The issue is that end_time is set far before the current time (now), causing advance_sched() to execute immediately without a context switch.

__hrtimer_run_queues() [https://elixir.bootlin.com/linux/v5.10.173/source/kernel/time/hrtimer.c#L1615] is a function with a long loop. First, please note that now is calculated once and not updated within this function. We can see the statement basenow = now + base->offset, but this statement is outside the loop (and in our case, the offset is 0). The loop will terminate when the queue is empty or the next entry in the queue has an expiration time in the future. The issue here is that the queue can be updated within __run_timer(). In our case, __run_timer() adds a new entry to the queue with advance_sched() function. Since the expiration time is before now, we need to execute advance_sched() again. The loop is very long because, in our case, the cycle is set to 3ns.

My idea is to create throttling mechanism. When advance_sched() sets the hrtimer expiration time to before the current time for X consecutive times, we can postpone the new advance_sched() .
You can see my PoC here: https://lore.kernel.org/all/00000000000089...@google.com/T/

Could you take a look at it? What do you think? Is it acceptable, or is it too aggressive with too much impact on the TAPRIO scheduler?

Radosław.

Radoslaw Zielonek

unread,

May 28, 2024, 8:28:03 AMMay 28

to vladimi...@nxp.com, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzbot+a7d2b1...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, viniciu...@intel.com, willemdebr...@gmail.com, Radoslaw Zielonek

Hello,

I'm working on similar taprio bug:
https://syzkaller.appspot.com/bug?extid=c4c6c3dc10cc96bcf723
I think I know what is the root cause.

The function advance_sched()
[https://elixir.bootlin.com/linux/v5.10.173/source/net/sched/sch_taprio.c#L696]
runs repeatedly. It is executed using HRTimer.
In every call to advance_sched(), end_time is calculated,
and the timer is set so that the next execution will be at end_time.
To achieve this, first, the expiration time is set using hrtimer_set_expires(),
and second, HRTIMER_RESTART is returned.
This means that the timer is re-enqueued with the adjusted expiration time.
The issue is that end_time is set far before the current time (now),
causing advance_sched() to execute immediately without a context switch.

__hrtimer_run_queues()
[https://elixir.bootlin.com/linux/v5.10.173/source/kernel/time/hrtimer.c#L1615]
is a function with a long loop.
First, please note that now is calculated once and not updated within this function.
We can see the statement basenow = now + base->offset,

but this statement is outside the loop (and in my case, the offset is 0).

The loop will terminate when the queue is empty or the next entry in the queue
has an expiration time in the future.
The issue here is that the queue can be updated within __run_timer().

In my case, __run_timer() adds a new entry to the queue with advance_sched() function.

Vladimir Oltean

unread,

May 28, 2024, 8:55:36 AMMay 28

to Radoslaw Zielonek, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzbot+a7d2b1...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, viniciu...@intel.com, willemdebr...@gmail.com

On Tue, May 28, 2024 at 02:25:58PM +0200, Radoslaw Zielonek wrote:
> Hello,
>
> I'm working on similar taprio bug:
> https://syzkaller.appspot.com/bug?extid=c4c6c3dc10cc96bcf723

Could you please let me know if the patches I posted yesterday fix that?
https://lore.kernel.org/netdev/20240527153955.5533...@nxp.com/

> I think I know what is the root cause.
>
> The function advance_sched()
> [https://elixir.bootlin.com/linux/v5.10.173/source/net/sched/sch_taprio.c#L696]
> runs repeatedly. It is executed using HRTimer.
> In every call to advance_sched(), end_time is calculated,
> and the timer is set so that the next execution will be at end_time.
> To achieve this, first, the expiration time is set using hrtimer_set_expires(),
> and second, HRTIMER_RESTART is returned.
> This means that the timer is re-enqueued with the adjusted expiration time.
> The issue is that end_time is set far before the current time (now),
> causing advance_sched() to execute immediately without a context switch.
>
> __hrtimer_run_queues()
> [https://elixir.bootlin.com/linux/v5.10.173/source/kernel/time/hrtimer.c#L1615]
> is a function with a long loop.
> First, please note that now is calculated once and not updated within this function.
> We can see the statement basenow = now + base->offset,
> but this statement is outside the loop (and in my case, the offset is 0).
> The loop will terminate when the queue is empty or the next entry in the queue
> has an expiration time in the future.
> The issue here is that the queue can be updated within __run_timer().
> In my case, __run_timer() adds a new entry to the queue with advance_sched() function.
> Since the expiration time is before now, we need to execute advance_sched() again.
> The loop is very long because, in our case, the cycle is set to 3ns.

In plain English, the root cause is "the schedule is too tight for the
CPU to keep up with it". Although a schedule with a 3 ns cycle time is
not practically valid in itself, either. Vinicius proposed we should
just reject the cycles that are unrealistically small, using some
simplistic heuristic about the transmission time of a single small
packet. The problem is that the rejection mechanism was slightly broken.

> My idea is to create throttling mechanism.
> When advance_sched() sets the hrtimer expiration time to before the current time
> for X consecutive times, we can postpone the new advance_sched().
> You can see my PoC here: https://lore.kernel.org/all/00000000000089...@google.com/T/

The link is not valid. Can you repost it without the "..."?

Radoslaw Zielonek

unread,

May 28, 2024, 9:04:08 AMMay 28

to vladimi...@nxp.com, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, radoslaw...@gmail.com, syzbot+a7d2b1...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, viniciu...@intel.com, willemdebr...@gmail.com

Hello,

Ah, sorry. I didn't notice that.
The PoC has been tested by syzbot
[https://syzkaller.appspot.com/bug?extid=c4c6c3dc10cc96bcf723]

The full link:
[https://lore.kernel.org/all/00000000000089...@google.com/T/]

Radosław.

Vladimir Oltean

unread,

May 29, 2024, 9:43:57 AMMay 29

to Radoslaw Zielonek, vladimi...@nxp.com, da...@davemloft.net, edum...@google.com, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzbot+a7d2b1...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, viniciu...@intel.com, willemdebr...@gmail.com

The patch, in the form you are presenting, obviously derails the phase
alignment of the new schedule when the core struggles to keep up with
the hrtimer. I am not in favor of adding any logic to taprio that
instructs it to behave out of spec.

Hillf Danton

unread,

May 29, 2024, 6:52:32 PMMay 29

to syzbot, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, Vladimir Oltean, Radoslaw Zielonek, Vinicius Costa Gomes, Eric Dumazet

Test Vlad's patch [1]
[1] https://lore.kernel.org/netdev/20240527153955.5533...@nxp.com/

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git main

--- x/net/sched/sch_taprio.c
+++ y/net/sched/sch_taprio.c
@@ -1847,6 +1847,7 @@ static int taprio_change(struct Qdisc *s
return -EOPNOTSUPP;
}
q->flags = taprio_flags;
+ taprio_set_picos_per_byte(dev, q);

err = taprio_parse_mqprio_opt(dev, mqprio, extack, q->flags);
if (err < 0)
@@ -1907,7 +1908,6 @@ static int taprio_change(struct Qdisc *s
if (err < 0)
goto free_sched;

- taprio_set_picos_per_byte(dev, q);
taprio_update_queue_max_sdu(q, new_admin, stab);

if (FULL_OFFLOAD_IS_ENABLED(q->flags))
--

syzbot

unread,

May 29, 2024, 7:10:05 PMMay 29

to edum...@google.com, hda...@sina.com, linux-...@vger.kernel.org, net...@vger.kernel.org, radoslaw...@gmail.com, syzkall...@googlegroups.com, viniciu...@intel.com, vladimi...@nxp.com

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in sctp_addr_wq_timeout_handler

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 1-...D } 2647 jiffies s: 2345 root: 0x2/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 5448 Comm: dhcpcd Not tainted 6.9.0-syzkaller-12116-g782471db6c72-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
RIP: 0010:write_comp_data kernel/kcov.c:236 [inline]
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0x1f/0x90 kernel/kcov.c:304
Code: 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 4c 8b 04 24 65 48 8b 14 25 00 d5 03 00 65 8b 05 60 4b 6e 7e a9 00 01 ff 00 74 10 <a9> 00 01 00 00 74 5b 83 ba 1c 16 00 00 00 74 52 8b 82 f8 15 00 00
RSP: 0018:ffffc90000a18168 EFLAGS: 00000006
RAX: 0000000000010303 RBX: ffffffff89814b22 RCX: ffff88807bd9da00
RDX: ffff88807bd9da00 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000001 R08: ffffffff89814b52 R09: 1ffffffff25f04b0
R10: dffffc0000000000 R11: fffffbfff25f04b1 R12: dffffc0000000000
R13: ffff888024155808 R14: ffff888024155800 R15: ffff888023e05360
FS: 00007fc8f8104740(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000

CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

CR2: 00007fc8f805eff8 CR3: 000000007cda8000 CR4: 00000000003506f0

DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<IRQ>

rcu_read_lock include/linux/rcupdate.h:782 [inline]
advance_sched+0xa32/0xca0 net/sched/sch_taprio.c:985
__run_hrtimer kernel/time/hrtimer.c:1687 [inline]
__hrtimer_run_queues+0x59b/0xd50 kernel/time/hrtimer.c:1751
hrtimer_interrupt+0x396/0x990 kernel/time/hrtimer.c:1813
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1032 [inline]
__sysvec_apic_timer_interrupt+0x110/0x3f0 arch/x86/kernel/apic/apic.c:1049
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0x52/0xc0 arch/x86/kernel/apic/apic.c:1043
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:get_current arch/x86/include/asm/current.h:49 [inline]
RIP: 0010:write_comp_data kernel/kcov.c:235 [inline]
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0x8/0x90 kernel/kcov.c:304
Code: 44 0a 20 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 4c 8b 04 24 <65> 48 8b 14 25 00 d5 03 00 65 8b 05 60 4b 6e 7e a9 00 01 ff 00 74
RSP: 0018:ffffc90000a185b8 EFLAGS: 00000246
RAX: ffffc90000a18700 RBX: 0000000000000002 RCX: ffffc90000a11000
RDX: 0000000000000003 RSI: 0000000000000002 RDI: 0000000000000000
RBP: 1ffff920001430e2 R08: ffffffff814090ad R09: ffffffff81409006
R10: 0000000000000003 R11: ffff88807bd9da00 R12: ffffc90000a186f8
R13: ffffc90000a19000 R14: 1ffff920001430e1 R15: dffffc0000000000
on_stack arch/x86/include/asm/stacktrace.h:58 [inline]
stack_access_ok arch/x86/kernel/unwind_orc.c:393 [inline]
deref_stack_reg arch/x86/kernel/unwind_orc.c:403 [inline]
unwind_next_frame+0x109d/0x2a00 arch/x86/kernel/unwind_orc.c:585
__unwind_start+0x641/0x7c0 arch/x86/kernel/unwind_orc.c:760
unwind_start arch/x86/include/asm/unwind.h:64 [inline]
arch_stack_walk+0x103/0x1b0 arch/x86/kernel/stacktrace.c:24
stack_trace_save+0x118/0x1d0 kernel/stacktrace.c:122
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
__kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2195 [inline]
slab_free mm/slub.c:4436 [inline]
kfree+0x14a/0x370 mm/slub.c:4557
sctp_addr_wq_timeout_handler+0x2e6/0x470 net/sctp/protocol.c:685
call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792
expire_timers kernel/time/timer.c:1843 [inline]
__run_timers kernel/time/timer.c:2417 [inline]
__run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428
run_timer_base kernel/time/timer.c:2437 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447
handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
__do_softirq kernel/softirq.c:588 [inline]
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:160 [inline]
RIP: 0010:_raw_spin_unlock_irq+0x29/0x50 kernel/locking/spinlock.c:202
Code: 90 f3 0f 1e fa 53 48 89 fb 48 83 c7 18 48 8b 74 24 08 e8 ca ab ee f5 48 89 df e8 e2 ef ef f5 e8 fd 94 19 f6 fb bf 01 00 00 00 <e8> d2 cc e1 f5 65 8b 05 f3 77 80 74 85 c0 74 06 5b c3 cc cc cc cc
RSP: 0018:ffffc90004917cf0 EFLAGS: 00000286
RAX: 8cf5a8119035ed00 RBX: ffff8880275fae40 RCX: ffffffff9477a603
RDX: dffffc0000000000 RSI: ffffffff8bcabc20 RDI: 0000000000000001
RBP: ffffc90004917dd0 R08: ffffffff8fac132f R09: 1ffffffff1f58265
R10: dffffc0000000000 R11: fffffbfff1f58266 R12: 1ffff1100f7b3c63
R13: 00000000000006e0 R14: ffff88807bd9e318 R15: dffffc0000000000
do_sigaction+0x1f3/0x530
__do_sys_rt_sigaction kernel/signal.c:4499 [inline]
__se_sys_rt_sigaction kernel/signal.c:4484 [inline]
__x64_sys_rt_sigaction+0x1b9/0x290 kernel/signal.c:4484
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fc8f813eb57
Code: 4d 85 c0 74 0e 48 8d 54 24 20 31 f6 48 85 c0 75 04 eb 07 31 d2 48 8d 74 24 88 41 ba 08 00 00 00 44 89 cf b8 0d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 76 10 48 8b 15 a2 a2 16 00 f7 d8 64 89 02 48 83
RSP: 002b:00007fc8f805edc0 EFLAGS: 00000246 ORIG_RAX: 000000000000000d
RAX: ffffffffffffffda RBX: 00007ffd0b08de50 RCX: 00007fc8f813eb57
RDX: 00007fc8f805ede0 RSI: 0000000000000000 RDI: 0000000000000038
RBP: 00007fc8f805eff0 R08: 00007fc8f805ef28 R09: 0000000000000038
R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffd0b08e168
R13: 00007fc8f805ef28 R14: 0000000000000000 R15: 0000000000000038
</TASK>

Tested on:

commit: 782471db Merge branch 'xilinx-clock-support'
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git main
console output: https://syzkaller.appspot.com/x/log.txt?x=14a7c3ec980000
kernel config: https://syzkaller.appspot.com/x/.config?x=98a238b2569af6d
dashboard link: https://syzkaller.appspot.com/bug?extid=a7d2b1d5d1af83035567
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1491d7c6980000

Hillf Danton

unread,

May 29, 2024, 7:48:04 PMMay 29

to syzbot, edum...@google.com, linux-...@vger.kernel.org, net...@vger.kernel.org, radoslaw...@gmail.com, syzkall...@googlegroups.com, viniciu...@intel.com, vladimi...@nxp.com

On Wed, 29 May 2024 16:10:02 -0700

> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> INFO: rcu detected stall in sctp_addr_wq_timeout_handler

Feel free to read again the root cause [1] Vlad.

[1] https://lore.kernel.org/lkml/20240528122610.21393...@gmail.com/

Adding the tested patch in the net tree now looks like a case of blind landing.

Vladimir Oltean

unread,

May 29, 2024, 8:33:33 PMMay 29

to Hillf Danton, syzbot, edum...@google.com, linux-...@vger.kernel.org, net...@vger.kernel.org, radoslaw...@gmail.com, syzkall...@googlegroups.com, viniciu...@intel.com

What is the fact that you submitted only my patch 1/2 for syzbot testing
supposed to prove? It is the second patch (2/2) that addresses what has
been reported here; I thought the tags made that clear:
https://lore.kernel.org/netdev/20240527153955.5533...@nxp.com/
Patch 2/2 has patch 1/2 as a dependency, which is why they were
submitted that way.

Hillf Danton

unread,

May 30, 2024, 6:34:51 AMMay 30

to Vladimir Oltean, syzbot, edum...@google.com, linux-...@vger.kernel.org, net...@vger.kernel.org, radoslaw...@gmail.com, syzkall...@googlegroups.com, viniciu...@intel.com

On Thu, 30 May 2024 03:33:25 +0300 Vladimir Oltean <vladimi...@nxp.com>

>
> What is the fact that you submitted only my patch 1/2 for syzbot testing
> supposed to prove? It is the second patch (2/2) that addresses what has
> been reported here;

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git main

syzbot

unread,

May 30, 2024, 7:19:06 AMMay 30

to edum...@google.com, hda...@sina.com, linux-...@vger.kernel.org, net...@vger.kernel.org, radoslaw...@gmail.com, syzkall...@googlegroups.com, viniciu...@intel.com, vladimi...@nxp.com

Hello,

syzbot tried to test the proposed patch but the build/boot failed:

lost connection to test machine

syzkaller build log:
go env (err=<nil>)
GO111MODULE='auto'
GOARCH='amd64'
GOBIN=''
GOCACHE='/syzkaller/.cache/go-build'
GOENV='/syzkaller/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/syzkaller/jobs-2/linux/gopath/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/syzkaller/jobs-2/linux/gopath'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.4'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/syzkaller/jobs-2/linux/gopath/src/github.com/google/syzkaller/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1459151336=/tmp/go-build -gno-record-gcc-switches'

git status (err=<nil>)
HEAD detached at 4f9530a3b
nothing to commit, working tree clean

tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:32: run command via tools/syz-env for best compatibility, see:
Makefile:33: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
go list -f '{{.Stale}}' ./sys/syz-sysgen | grep -q false || go install ./sys/syz-sysgen
make .descriptions
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
bin/syz-sysgen
touch .descriptions
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=4f9530a3b62297342999c9097c77dde726522618 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20231220-163507'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-fuzzer github.com/google/syzkaller/syz-fuzzer
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=4f9530a3b62297342999c9097c77dde726522618 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20231220-163507'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-execprog github.com/google/syzkaller/tools/syz-execprog
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=4f9530a3b62297342999c9097c77dde726522618 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20231220-163507'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-stress github.com/google/syzkaller/tools/syz-stress
mkdir -p ./bin/linux_amd64
gcc -o ./bin/linux_amd64/syz-executor executor/executor.cc \
-m64 -O2 -pthread -Wall -Werror -Wparentheses -Wunused-const-variable -Wframe-larger-than=16384 -Wno-stringop-overflow -Wno-array-bounds -Wno-format-overflow -Wno-unused-but-set-variable -Wno-unused-command-line-argument -static-pie -fpermissive -w -DGOOS_linux=1 -DGOARCH_amd64=1 \
-DHOSTGOOS_linux=1 -DGIT_REVISION=\"4f9530a3b62297342999c9097c77dde726522618\"

Tested on:

commit: 13c7c941 netdev: add qstat for csum complete
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git main

kernel config: https://syzkaller.appspot.com/x/.config?x=98a238b2569af6d
dashboard link: https://syzkaller.appspot.com/bug?extid=a7d2b1d5d1af83035567
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Note: no patches were applied.

Hillf Danton

unread,

May 30, 2024, 7:57:42 AMMay 30

to Vladimir Oltean, syzbot, edum...@google.com, linux-...@vger.kernel.org, net...@vger.kernel.org, radoslaw...@gmail.com, syzkall...@googlegroups.com, viniciu...@intel.com

On Thu, 30 May 2024 03:33:25 +0300 Vladimir Oltean <vladimi...@nxp.com>
>

> What is the fact that you submitted only my patch 1/2 for syzbot testing
> supposed to prove? It is the second patch (2/2) that addresses what has
> been reported here;

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git main

--- x/net/sched/sch_taprio.c
+++ y/net/sched/sch_taprio.c

@@ -1151,11 +1151,6 @@ static int parse_taprio_schedule(struct
list_for_each_entry(entry, &new->entries, list)
cycle = ktime_add_ns(cycle, entry->interval);

- if (!cycle) {
- NL_SET_ERR_MSG(extack, "'cycle_time' can never be 0");
- return -EINVAL;
- }
-
if (cycle < 0 || cycle > INT_MAX) {
NL_SET_ERR_MSG(extack, "'cycle_time' is too big");
return -EINVAL;
@@ -1164,6 +1159,11 @@ static int parse_taprio_schedule(struct
new->cycle_time = cycle;
}

+ if (new->cycle_time < new->num_entries * length_to_duration(q, ETH_ZLEN)) {
+ NL_SET_ERR_MSG(extack, "'cycle_time' is too small");
+ return -EINVAL;
+ }
+
taprio_calculate_gate_durations(q, new);

return 0;
@@ -1848,6 +1848,9 @@ static int taprio_change(struct Qdisc *s
}
q->flags = taprio_flags;

+ /* Needed for length_to_duration() during netlink attribute parsing */
+ taprio_set_picos_per_byte(dev, q);
+

err = taprio_parse_mqprio_opt(dev, mqprio, extack, q->flags);
if (err < 0)

return err;
@@ -1907,7 +1910,6 @@ static int taprio_change(struct Qdisc *s

syzbot

unread,

May 30, 2024, 8:30:07 AMMay 30

to edum...@google.com, hda...@sina.com, linux-...@vger.kernel.org, net...@vger.kernel.org, radoslaw...@gmail.com, syzkall...@googlegroups.com, viniciu...@intel.com, vladimi...@nxp.com

GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2710826618=/tmp/go-build -gno-record-gcc-switches'

git status (err=<nil>)
HEAD detached at 4f9530a3b
nothing to commit, working tree clean

tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:32: run command via tools/syz-env for best compatibility, see:
Makefile:33: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
go list -f '{{.Stale}}' ./sys/syz-sysgen | grep -q false || go install ./sys/syz-sysgen
make .descriptions
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
bin/syz-sysgen
touch .descriptions
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=4f9530a3b62297342999c9097c77dde726522618 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20231220-163507'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-fuzzer github.com/google/syzkaller/syz-fuzzer
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=4f9530a3b62297342999c9097c77dde726522618 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20231220-163507'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-execprog github.com/google/syzkaller/tools/syz-execprog
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=4f9530a3b62297342999c9097c77dde726522618 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20231220-163507'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-stress github.com/google/syzkaller/tools/syz-stress
mkdir -p ./bin/linux_amd64
gcc -o ./bin/linux_amd64/syz-executor executor/executor.cc \
-m64 -O2 -pthread -Wall -Werror -Wparentheses -Wunused-const-variable -Wframe-larger-than=16384 -Wno-stringop-overflow -Wno-array-bounds -Wno-format-overflow -Wno-unused-but-set-variable -Wno-unused-command-line-argument -static-pie -fpermissive -w -DGOOS_linux=1 -DGOARCH_amd64=1 \
-DHOSTGOOS_linux=1 -DGIT_REVISION=\"4f9530a3b62297342999c9097c77dde726522618\"

Tested on:

commit: c53a46b1 net: smc91x: Remove commented out code
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git main

kernel config: https://syzkaller.appspot.com/x/.config?x=98a238b2569af6d
dashboard link: https://syzkaller.appspot.com/bug?extid=a7d2b1d5d1af83035567
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

patch: https://syzkaller.appspot.com/x/patch.diff?x=169aadec980000

Hillf Danton

unread,

May 31, 2024, 4:24:27 PMMay 31

to Vladimir Oltean, syzbot, edum...@google.com, linux-...@vger.kernel.org, net...@vger.kernel.org, radoslaw...@gmail.com, syzkall...@googlegroups.com, viniciu...@intel.com

On Thu, 30 May 2024 03:33:25 +0300 Vladimir Oltean <vladimi...@nxp.com>
>

> What is the fact that you submitted only my patch 1/2 for syzbot testing
> supposed to prove? It is the second patch (2/2) that addresses what has
> been reported here;

They worked [1]. Sorry for my messup.

[1] https://lore.kernel.org/lkml/00000000000060...@google.com/

Reply all

Reply to author

Forward

[syzbot] [net?] INFO: rcu detected stall in packet_release

syzbot

Eric Dumazet

Vladimir Oltean

Radoslaw

Radoslaw Zielonek

Vladimir Oltean

Radoslaw Zielonek

Vladimir Oltean

Hillf Danton

syzbot

Hillf Danton

Vladimir Oltean

Hillf Danton

syzbot

Hillf Danton

syzbot

Hillf Danton