INFO: rcu detected stall in __hrtimer_run_queues

101 views
Skip to first unread message

syzbot

unread,
Feb 20, 2021, 4:05:19 PM2/20/21
to fwei...@gmail.com, linux-...@vger.kernel.org, mi...@kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de
Hello,

syzbot found the following issue on:

HEAD commit: f40ddce8 Linux 5.11
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1397f498d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e53d04227c52a0df
dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17a81012d00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1282b6d2d00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+de9526...@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-....: (10153 ticks this GP) idle=f1a/1/0x4000000000000000 softirq=10867/10868 fqs=925
(t=10502 jiffies g=10029 q=19103)
NMI backtrace for cpu 1
CPU: 1 PID: 10530 Comm: syz-executor248 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x1f4/0x230 kernel/rcu/tree_stall.h:337
print_cpu_stall kernel/rcu/tree_stall.h:569 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:643 [inline]
rcu_pending kernel/rcu/tree.c:3751 [inline]
rcu_sched_clock_irq.cold+0x48e/0xedf kernel/rcu/tree.c:2580
update_process_times+0x16d/0x200 kernel/time/timer.c:1782
tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1369
__run_hrtimer kernel/time/hrtimer.c:1519 [inline]
__hrtimer_run_queues+0x1c0/0xe40 kernel/time/hrtimer.c:1583
hrtimer_interrupt+0x334/0x940 kernel/time/hrtimer.c:1645
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
__sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1106
run_sysvec_on_irqstack_cond arch/x86/include/asm/irq_stack.h:91 [inline]
sysvec_apic_timer_interrupt+0x48/0x100 arch/x86/kernel/apic/apic.c:1100
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x25/0x50 kernel/locking/spinlock.c:191
Code: f8 5d c3 66 90 55 48 89 fd 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 9a d6 5b f8 48 89 ef e8 42 8b 5c f8 f6 c7 02 75 1a 53 9d <bf> 01 00 00 00 e8 81 92 50 f8 65 8b 05 7a f8 04 77 85 c0 74 0a 5b
RSP: 0018:ffffc90000db0e48 EFLAGS: 00000286
RAX: 0000000000ce281a RBX: 0000000000000286 RCX: 1ffffffff1b46a19
RDX: 0000000000000000 RSI: 0000000000000102 RDI: 0000000000000000
RBP: ffff8880b9d26a00 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff8178a8b8 R11: 0000000000000000 R12: 000000a80fcc296a
R13: ffff8880b9d26c80 R14: ffff8880b9d26a00 R15: ffffffff851589e0
__run_hrtimer kernel/time/hrtimer.c:1515 [inline]
__hrtimer_run_queues+0x51a/0xe40 kernel/time/hrtimer.c:1583
hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1600
__do_softirq+0x29b/0x9f6 kernel/softirq.c:343
asm_call_irq_on_stack+0xf/0x20
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
invoke_softirq kernel/softirq.c:226 [inline]
__irq_exit_rcu kernel/softirq.c:420 [inline]
irq_exit_rcu+0x134/0x200 kernel/softirq.c:432
sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1100
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629
RIP: 0010:queue_work_on+0x83/0xd0 kernel/workqueue.c:1530
Code: 31 ff 89 ee e8 6e 02 29 00 40 84 ed 74 46 e8 e4 fb 28 00 31 ff 48 89 de e8 ca 03 29 00 48 85 db 75 26 e8 d0 fb 28 00 41 56 9d <48> 83 c4 08 44 89 f8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 e8 b6 fb 28
RSP: 0018:ffffc90002e5fc80 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000000
RDX: ffff8880265f5340 RSI: ffffffff8149da20 RDI: 0000000000000000
RBP: ffffc90002e68000 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff8178a8b8 R11: 0000000000000000 R12: ffff8880b9d31568
R13: ffff888010c64c00 R14: 0000000000000293 R15: 0000000000000001
queue_work include/linux/workqueue.h:507 [inline]
schedule_work include/linux/workqueue.h:568 [inline]
__vfree_deferred mm/vmalloc.c:2307 [inline]
vfree_atomic+0xac/0xe0 mm/vmalloc.c:2325
free_thread_stack kernel/fork.c:291 [inline]
release_task_stack kernel/fork.c:428 [inline]
put_task_stack+0x29c/0x480 kernel/fork.c:439
finish_task_switch.isra.0+0x557/0x7e0 kernel/sched/core.c:4236
context_switch kernel/sched/core.c:4330 [inline]
__schedule+0x914/0x21a0 kernel/sched/core.c:5078
preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5238
preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
__raw_spin_unlock include/linux/spinlock_api_smp.h:152 [inline]
_raw_spin_unlock+0x36/0x40 kernel/locking/spinlock.c:183
spin_unlock include/linux/spinlock.h:394 [inline]
pick_file+0x129/0x1e0 fs/file.c:613
close_fd+0x44/0x80 fs/file.c:622
__do_sys_close fs/open.c:1299 [inline]
__se_sys_close fs/open.c:1297 [inline]
__x64_sys_close+0x2f/0xa0 fs/open.c:1297
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x403353
Code: c7 c2 c0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 45 c3 0f 1f 40 00 48 83 ec 18 89 7c 24 0c e8
RSP: 002b:00007ffdb38739a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000403353
RDX: 0000000000042000 RSI: 0000000000000004 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000031 R09: 0000000000000031
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000aa47b
R13: 00007ffdb3873a00 R14: 00007ffdb38739f0 R15: 00007ffdb38739c4


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Feb 20, 2021, 11:11:09 PM2/20/21
to syzbot, fwei...@gmail.com, linux-...@vger.kernel.org, Hillf Danton, Paul E . McKenney, syzkall...@googlegroups.com, tg...@linutronix.de
Sat, 20 Feb 2021 13:05:18 -0800
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: f40ddce8 Linux 5.11
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1397f498d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e53d04227c52a0df
> dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17a81012d00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1282b6d2d00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+de9526...@syzkaller.appspotmail.com
>
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 1-....: (10153 ticks this GP) idle=f1a/1/0x4000000000000000 softirq=10867/10868 fqs=925
> (t=10502 jiffies g=10029 q=19103)

This looks bogus without capturing a hrtimer hog - we cant fix it
without a target in assumption that it is benign in case of 100+ timers.
Print the hrtimer that spins more than two ticks.

--- x/kernel/time/hrtimer.c
+++ y/kernel/time/hrtimer.c
@@ -1455,6 +1455,21 @@ bool hrtimer_active(const struct hrtimer
}
EXPORT_SYMBOL_GPL(hrtimer_active);

+#ifdef CONFIG_DEBUG_AID_FOR_SYZBOT
+static void before_run_hrtimer(struct hrtimer *timer, unsigned long *tstamp)
+{
+ *tstamp = jiffies;
+}
+static void post_run_hrtimer(struct hrtimer *timer, unsigned long *tstamp)
+{
+ WARN(*tstamp + 2 < jiffies, "hrtimer hog %px spinning %u jiffies\n",
+ timer, (jiffies - (2 + *tstamp)) + 2);
+}
+#else
+static void before_run_hrtimer(struct hrtimer *timer, unsigned long *tstamp) {}
+static void post_run_hrtimer(struct hrtimer *timer, unsigned long *tstamp) {}
+#endif
+
/*
* The write_seqcount_barrier()s in __run_hrtimer() split the thing into 3
* distinct sections:
@@ -1481,6 +1496,7 @@ static void __run_hrtimer(struct hrtimer
enum hrtimer_restart (*fn)(struct hrtimer *);
bool expires_in_hardirq;
int restart;
+ unsigned long tstamp;

lockdep_assert_held(&cpu_base->lock);

@@ -1507,6 +1523,7 @@ static void __run_hrtimer(struct hrtimer
if (IS_ENABLED(CONFIG_TIME_LOW_RES))
timer->is_rel = false;

+ before_run_hrtimer(timer, &tstamp);
/*
* The timer is marked as running in the CPU base, so it is
* protected against migration to a different CPU even if the lock
@@ -1522,6 +1539,7 @@ static void __run_hrtimer(struct hrtimer
trace_hrtimer_expire_exit(timer);
raw_spin_lock_irq(&cpu_base->lock);

+ post_run_hrtimer(timer, &tstamp);
/*
* Note: We clear the running state after enqueue_hrtimer and
* we do not reprogram the event hardware. Happens either in

syzbot

unread,
Nov 16, 2021, 10:41:16 AM11/16/21
to ax...@kernel.dk, fwei...@gmail.com, h...@lst.de, hda...@sina.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, pau...@kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de
syzbot suspects this issue was fixed by commit:

commit b60876296847e6cd7f1da4b8b7f0f31399d59aa1
Author: Jens Axboe <ax...@kernel.dk>
Date: Fri Oct 15 21:03:52 2021 +0000

block: improve layout of struct request

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=137f2d01b00000
start commit: f40ddce88593 Linux 5.11
git tree: upstream
If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: block: improve layout of struct request

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Jens Axboe

unread,
Nov 16, 2021, 10:42:42 AM11/16/21
to syzbot, fwei...@gmail.com, h...@lst.de, hda...@sina.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, pau...@kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de
On 11/16/21 8:41 AM, syzbot wrote:
> syzbot suspects this issue was fixed by commit:
>
> commit b60876296847e6cd7f1da4b8b7f0f31399d59aa1
> Author: Jens Axboe <ax...@kernel.dk>
> Date: Fri Oct 15 21:03:52 2021 +0000
>
> block: improve layout of struct request

No functional changes in that patch, so looks like a fluky bisection.

--
Jens Axboe

Paul E. McKenney

unread,
Nov 17, 2021, 6:59:29 AM11/17/21
to Jens Axboe, syzbot, fwei...@gmail.com, h...@lst.de, hda...@sina.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, mi...@kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org
I am seeing an intermittent (and rather strange) stall warnings on
v5.16-rc1. This is a self-detected stall from the idle loop. The
reason that this is strange is that the usual reason that a CPU stalls
in the idle loop is due to a long-running interrupt, in which case
you would expect other CPUs to detect the stall.

Reproduce using RCU's TRE07 scenario, except that the MTBF looks to be
several hundred hours. But I ran this scenario long enough on v5.15-rc*
to be confident that this stall warning is a regression introduced
recently.

And the reason is that the CPU, despite being in the idle loop, is not
marked as idle from an RCU perspective (see the "idle=d59/0/0x1"):

rcu: 0-...!: (13 ticks this GP) idle=d59/0/0x1 softirq=281261
/281261 fqs=1
(t=2199037 jiffies g=249449 q=5)
NMI backtrace for cpu 0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc1 #4571
Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.13.0-2.module_el8.5.0+
746+bbd5d70c 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x33/0x42
nmi_cpu_backtrace.cold.6+0x30/0x70
? lapic_can_unplug_cpu+0x70/0x70
nmi_trigger_cpumask_backtrace+0xbf/0xd0
rcu_dump_cpu_stacks+0xc0/0x120
rcu_sched_clock_irq.cold.110+0x15a/0x312
? get_nohz_timer_target+0x60/0x190
? lock_timer_base+0x62/0x80
? account_process_tick+0xd4/0x160
? tick_sched_handle.isra.24+0x40/0x40
update_process_times+0x8e/0xc0
tick_sched_handle.isra.24+0x30/0x40
tick_sched_timer+0x6a/0x80
__hrtimer_run_queues+0xfc/0x2a0
hrtimer_interrupt+0x105/0x220
? resched_curr+0x1e/0xc0
__sysvec_apic_timer_interrupt+0x7a/0x160
sysvec_apic_timer_interrupt+0x85/0xb0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20
RIP: 0010:default_idle+0xb/0x10
Code: ff 48 89 df e8 16 5c 90 ff eb d7 e8 bf 82 ff ff cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc eb 07 0f 00 2d df 92 41 00 fb f4 <c3> 0f 1f 40
00 65 48 8b 04 25 00 ad 01 00 f0 80 48 02 20 48 8b 10
RSP: 0018:ffffffff9dc03e98 EFLAGS: 00000202
RAX: ffffffff9d3ed200 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff9da3e0a1 RDI: ffffffff9da683ae
RBP: ffffffff9de86050 R08: 00000000e141ad57 R09: ffffa03c5f229d40
R10: 0000000000002400 R11: 0000000000002400 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffffffffffff R15: ffffffff9dc14940
? __cpuidle_text_start+0x8/0x8
? __cpuidle_text_start+0x8/0x8
default_idle_call+0x28/0xd0
do_idle+0x1fb/0x290
cpu_startup_entry+0x14/0x20
start_kernel+0x659/0x680
secondary_startup_64_no_verify+0xc2/0xcb
</TASK>

The usual reason for this odd situation is that someone forgot an
irq_enter() or added an extra irq_exit(). Or likewise for a number of
similar functions that tell RCU to start/stop ignoring the current CPU:
nmi_enter(), nmi_exit(), rcu_*_enter(), rcu_*_exit(), and so on.

Adding the x86 list on CC.

Thanx, Paul

Hillf Danton

unread,
Mar 27, 2022, 5:26:45 AM3/27/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de
On Sat, 20 Feb 2021 13:05:18 -0800
See what will come up 12 months since the report.

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

syzbot

unread,
Mar 27, 2022, 6:59:16 AM3/27/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+de9526...@syzkaller.appspotmail.com

Tested on:

commit: f0228146 Merge tag 'trace-v5.18-1' of git://git.kernel..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=abe260d3ec31020f
dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Note: no patches were applied.
Note: testing is done by a robot and is best-effort only.

Hillf Danton

unread,
Jul 31, 2022, 3:25:42 AM7/31/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Sat, 20 Feb 2021 13:05:18 -0800
> syzbot found the following issue on:
>
> HEAD commit: f40ddce8 Linux 5.11
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1397f498d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e53d04227c52a0df
> dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17a81012d00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1282b6d2d00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+de9526...@syzkaller.appspotmail.com
>
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 1-....: (10153 ticks this GP) idle=f1a/1/0x4000000000000000 softirq=10867/10868 fqs=925
> (t=10502 jiffies g=10029 q=19103)
> NMI backtrace for cpu 1
> CPU: 1 PID: 10530 Comm: syz-executor248 Not tainted 5.11.0-syzkaller #0

Print the hrtimer hog.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f40ddce8

diff -pur x/kernel/time/hrtimer.c y/kernel/time/hrtimer.c
--- x/kernel/time/hrtimer.c 2022-07-31 15:07:43.454730700 +0800
+++ y/kernel/time/hrtimer.c 2022-07-31 15:13:22.992407100 +0800
@@ -1480,6 +1480,7 @@ static void __run_hrtimer(struct hrtimer
{
enum hrtimer_restart (*fn)(struct hrtimer *);
bool expires_in_hardirq;
+ unsigned long ts;
int restart;

lockdep_assert_held(&cpu_base->lock);
@@ -1515,9 +1516,12 @@ static void __run_hrtimer(struct hrtimer
raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
trace_hrtimer_expire_entry(timer, now);
expires_in_hardirq = lockdep_hrtimer_enter(timer);
+ ts = jiffies + 4;

restart = fn(timer);

+ WARN_ONCE(time_after(jiffies, ts),
+ "On CPU%u hrtimer %ps took more than 4 ticks\n", cpu_base->cpu, fn);
lockdep_hrtimer_exit(expires_in_hardirq);
trace_hrtimer_expire_exit(timer);
raw_spin_lock_irq(&cpu_base->lock);
--

syzbot

unread,
Jul 31, 2022, 3:38:11 AM7/31/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in __hrtimer_run_queues

------------[ cut here ]------------
On CPU1 hrtimer mac80211_hwsim_beacon took more than 4 ticks
WARNING: CPU: 1 PID: 6080 at kernel/time/hrtimer.c:1523 __run_hrtimer kernel/time/hrtimer.c:1523 [inline]
WARNING: CPU: 1 PID: 6080 at kernel/time/hrtimer.c:1523 __hrtimer_run_queues+0xe30/0xff0 kernel/time/hrtimer.c:1587
Modules linked in:
CPU: 1 PID: 6080 Comm: syz-executor.3 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
RIP: 0010:__run_hrtimer kernel/time/hrtimer.c:1523 [inline]
RIP: 0010:__hrtimer_run_queues+0xe30/0xff0 kernel/time/hrtimer.c:1587
Code: fa 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e b7 01 00 00 41 8b 77 40 48 c7 c7 c0 97 4d 89 48 8b 54 24 08 e8 ca d0 56 07 <0f> 0b e9 6e f4 ff ff e8 d4 c1 52 00 e9 68 f6 ff ff 48 8b 7c 24 18
RSP: 0018:ffffc90000db0e50 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff88801dcb0000 RSI: ffffffff815b73e5 RDI: fffff520001b61bc
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff815b05be R11: 0000000000000000 R12: ffff8880263bd878
R13: 0000000000000004 R14: ffff8880b9f26c80 R15: ffff8880b9f26a00
FS: 00007f0886014700(0000) GS:ffff8880b9f00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000056c000 CR3: 0000000023e2c000 CR4: 0000000000350ee0
Call Trace:
<IRQ>
hrtimer_run_softirq+0x17b/0x360 kernel/time/hrtimer.c:1604
__do_softirq+0x29b/0x9f6 kernel/softirq.c:343
asm_call_irq_on_stack+0xf/0x20
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
invoke_softirq kernel/softirq.c:226 [inline]
__irq_exit_rcu kernel/softirq.c:420 [inline]
irq_exit_rcu+0x134/0x200 kernel/softirq.c:432
sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1100
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629
RIP: 0010:lock_acquire+0x1d2/0x720 kernel/locking/lockdep.c:5410
Code: 48 c7 c7 a0 a9 4b 89 48 83 c4 20 e8 88 b9 a1 07 b8 ff ff ff ff 65 0f c1 05 ab 2a a9 7e 83 f8 01 0f 85 da 03 00 00 ff 34 24 9d <48> b8 00 00 00 00 00 fc ff df 48 01 c3 48 c7 03 00 00 00 00 48 c7
RSP: 0018:ffffc900020ffd28 EFLAGS: 00000246
RAX: 0000000000000001 RBX: 1ffff9200041ffa7 RCX: 0000000000000001
RDX: 1ffff11003b96130 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffff8ee6683f
R10: fffffbfff1dccd07 R11: 0000000000000001 R12: 0000000000000000
R13: ffff8880185411c8 R14: 0000000000000000 R15: 0000000000000000
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:354 [inline]
setfl fs/fcntl.c:79 [inline]
do_fcntl+0x696/0x1070 fs/fcntl.c:347
__do_sys_fcntl fs/fcntl.c:463 [inline]
__se_sys_fcntl fs/fcntl.c:448 [inline]
__x64_sys_fcntl+0x165/0x1e0 fs/fcntl.c:448
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465d99
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f0886014188 EFLAGS: 00000246 ORIG_RAX: 0000000000000048
RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 0000000000465d99
RDX: 0000000000042000 RSI: 0000000000000004 RDI: 0000000000000003
RBP: 00000000004bcf27 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60
R13: 00007ffefcbe216f R14: 00007f0886014300 R15: 0000000000022000
----------------
Code disassembly (best guess):
0: 48 c7 c7 a0 a9 4b 89 mov $0xffffffff894ba9a0,%rdi
7: 48 83 c4 20 add $0x20,%rsp
b: e8 88 b9 a1 07 callq 0x7a1b998
10: b8 ff ff ff ff mov $0xffffffff,%eax
15: 65 0f c1 05 ab 2a a9 xadd %eax,%gs:0x7ea92aab(%rip) # 0x7ea92ac8
1c: 7e
1d: 83 f8 01 cmp $0x1,%eax
20: 0f 85 da 03 00 00 jne 0x400
26: ff 34 24 pushq (%rsp)
29: 9d popfq
* 2a: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax <-- trapping instruction
31: fc ff df
34: 48 01 c3 add %rax,%rbx
37: 48 c7 03 00 00 00 00 movq $0x0,(%rbx)
3e: 48 rex.W
3f: c7 .byte 0xc7


Tested on:

commit: f40ddce8 Linux 5.11
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1791dcfe080000
kernel config: https://syzkaller.appspot.com/x/.config?x=3565439810a4d2e5
dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=157e1b36080000

Hillf Danton

unread,
Jul 31, 2022, 4:03:41 AM7/31/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Sat, 20 Feb 2021 13:05:18 -0800
> syzbot found the following issue on:
>
> HEAD commit: f40ddce8 Linux 5.11
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1397f498d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e53d04227c52a0df
> dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17a81012d00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1282b6d2d00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+de9526...@syzkaller.appspotmail.com
>
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 1-....: (10153 ticks this GP) idle=f1a/1/0x4000000000000000 softirq=10867/10868 fqs=925
> (t=10502 jiffies g=10029 q=19103)
> NMI backtrace for cpu 1
> CPU: 1 PID: 10530 Comm: syz-executor248 Not tainted 5.11.0-syzkaller #0

Print the hrtimer hog.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f40ddce8

diff -pur x/kernel/time/hrtimer.c y/kernel/time/hrtimer.c
--- x/kernel/time/hrtimer.c 2022-07-31 15:07:43.454730700 +0800
+++ y/kernel/time/hrtimer.c 2022-07-31 16:01:08.254012600 +0800
@@ -1480,6 +1480,7 @@ static void __run_hrtimer(struct hrtimer
{
enum hrtimer_restart (*fn)(struct hrtimer *);
bool expires_in_hardirq;
+ unsigned long ts;
int restart;

lockdep_assert_held(&cpu_base->lock);
@@ -1515,9 +1516,13 @@ static void __run_hrtimer(struct hrtimer
raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
trace_hrtimer_expire_entry(timer, now);
expires_in_hardirq = lockdep_hrtimer_enter(timer);
+ ts = jiffies + 4;

restart = fn(timer);

+ if (time_after(jiffies, ts))
+ pr_warn("On CPU%u hrtimer %ps took 4 ticks\n", cpu_base->cpu, fn);
+

syzbot

unread,
Jul 31, 2022, 4:19:14 AM7/31/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in __cfg80211_ibss_joined

------------[ cut here ]------------
WARNING: CPU: 1 PID: 6061 at net/wireless/ibss.c:36 __cfg80211_ibss_joined+0x487/0x520 net/wireless/ibss.c:36
Modules linked in:
CPU: 1 PID: 6061 Comm: kworker/u4:6 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
Workqueue: cfg80211 cfg80211_event_work
RIP: 0010:__cfg80211_ibss_joined+0x487/0x520 net/wireless/ibss.c:36
Code: 0f 0b e9 0c fe ff ff e8 a7 19 75 f9 e9 41 fc ff ff e8 7d 19 75 f9 e9 7d fc ff ff e8 93 19 75 f9 e9 0d ff ff ff e8 99 51 32 f9 <0f> 0b e9 7e fc ff ff e8 8d 51 32 f9 0f 0b e8 86 19 75 f9 e9 e4 fb
RSP: 0018:ffffc90001d5fbd8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888027c24bd0 RCX: 0000000000000000
RDX: ffff8880285e5340 RSI: ffffffff88408237 RDI: 0000000000000000
RBP: ffff888027c24000 R08: 0000000000000001 R09: ffffffff8ee66807
R10: fffffbfff1dccd00 R11: 0000000000000000 R12: 1ffff920003abf7d
R13: ffff88801486a718 R14: 0000000000000000 R15: 0000000000000006
FS: 0000000000000000(0000) GS:ffff8880b9f00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000053edb8 CR3: 0000000025e9f000 CR4: 0000000000350ee0
Call Trace:
cfg80211_process_wdev_events+0x3de/0x5b0 net/wireless/util.c:977
cfg80211_process_rdev_events+0x6e/0x100 net/wireless/util.c:1003
cfg80211_event_work+0x1a/0x20 net/wireless/core.c:322
process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296


Tested on:

commit: f40ddce8 Linux 5.11
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=12ecc26a080000
kernel config: https://syzkaller.appspot.com/x/.config?x=3565439810a4d2e5
dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=171e281e080000

Hillf Danton

unread,
Jul 31, 2022, 4:27:24 AM7/31/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Sat, 20 Feb 2021 13:05:18 -0800
> syzbot found the following issue on:
>
> HEAD commit: f40ddce8 Linux 5.11
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1397f498d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e53d04227c52a0df
> dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17a81012d00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1282b6d2d00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+de9526...@syzkaller.appspotmail.com
>
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 1-....: (10153 ticks this GP) idle=f1a/1/0x4000000000000000 softirq=10867/10868 fqs=925
> (t=10502 jiffies g=10029 q=19103)
> NMI backtrace for cpu 1
> CPU: 1 PID: 10530 Comm: syz-executor248 Not tainted 5.11.0-syzkaller #0

diff -pur x/drivers/net/wireless/mac80211_hwsim.c y/drivers/net/wireless/mac80211_hwsim.c
--- x/drivers/net/wireless/mac80211_hwsim.c 2022-07-31 16:20:11.823504700 +0800
+++ y/drivers/net/wireless/mac80211_hwsim.c 2022-07-31 16:22:14.130567900 +0800
@@ -1788,8 +1788,9 @@ mac80211_hwsim_beacon(struct hrtimer *ti
bcn_int -= data->bcn_delta;
data->bcn_delta = 0;
}
- hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
- ns_to_ktime(bcn_int * NSEC_PER_USEC));
+
+ hrtimer_forward_now(&data->beacon_timer, ns_to_ktime(bcn_int * NSEC_PER_USEC));
+
return HRTIMER_RESTART;
}

diff -pur x/kernel/time/hrtimer.c y/kernel/time/hrtimer.c
--- x/kernel/time/hrtimer.c 2022-07-31 15:07:43.454730700 +0800
+++ y/kernel/time/hrtimer.c 2022-07-31 16:04:56.920080400 +0800
@@ -1480,6 +1480,7 @@ static void __run_hrtimer(struct hrtimer
{
enum hrtimer_restart (*fn)(struct hrtimer *);
bool expires_in_hardirq;
+ unsigned long ts;
int restart;

lockdep_assert_held(&cpu_base->lock);
@@ -1515,9 +1516,13 @@ static void __run_hrtimer(struct hrtimer
raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
trace_hrtimer_expire_entry(timer, now);
expires_in_hardirq = lockdep_hrtimer_enter(timer);
+ ts = jiffies + 4;

restart = fn(timer);

+ if (time_after(jiffies, ts))
+ pr_warn("On CPU%u hrtimer %ps took more than 4 ticks\n", cpu_base->cpu, fn);

syzbot

unread,
Jul 31, 2022, 4:47:21 AM7/31/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in __cfg80211_ibss_joined

------------[ cut here ]------------
WARNING: CPU: 1 PID: 8 at net/wireless/ibss.c:36 __cfg80211_ibss_joined+0x487/0x520 net/wireless/ibss.c:36
Modules linked in:
CPU: 1 PID: 8 Comm: kworker/u4:0 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
Workqueue: cfg80211 cfg80211_event_work
RIP: 0010:__cfg80211_ibss_joined+0x487/0x520 net/wireless/ibss.c:36
Code: 0f 0b e9 0c fe ff ff e8 87 19 75 f9 e9 41 fc ff ff e8 5d 19 75 f9 e9 7d fc ff ff e8 73 19 75 f9 e9 0d ff ff ff e8 79 51 32 f9 <0f> 0b e9 7e fc ff ff e8 6d 51 32 f9 0f 0b e8 66 19 75 f9 e9 e4 fb
RSP: 0018:ffffc90000cd7bd8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888028288bd0 RCX: 0000000000000000
RDX: ffff888010d25340 RSI: ffffffff88408257 RDI: 0000000000000000
RBP: ffff888028288000 R08: 0000000000000001 R09: ffffffff8ee66807
R10: fffffbfff1dccd00 R11: 0000000000000000 R12: 1ffff9200019af7d
R13: ffff888011916918 R14: 0000000000000000 R15: 0000000000000006
FS: 0000000000000000(0000) GS:ffff8880b9f00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000005680c0 CR3: 000000001aeb4000 CR4: 0000000000350ee0
Call Trace:
cfg80211_process_wdev_events+0x3de/0x5b0 net/wireless/util.c:977
cfg80211_process_rdev_events+0x6e/0x100 net/wireless/util.c:1003
cfg80211_event_work+0x1a/0x20 net/wireless/core.c:322
process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296


Tested on:

commit: f40ddce8 Linux 5.11
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=16ff72fe080000
kernel config: https://syzkaller.appspot.com/x/.config?x=3565439810a4d2e5
dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=16978a82080000

Hillf Danton

unread,
Jul 31, 2022, 5:36:44 AM7/31/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Sat, 20 Feb 2021 13:05:18 -0800
> syzbot found the following issue on:
>
> HEAD commit: f40ddce8 Linux 5.11
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1397f498d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e53d04227c52a0df
> dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17a81012d00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1282b6d2d00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+de9526...@syzkaller.appspotmail.com
>
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 1-....: (10153 ticks this GP) idle=f1a/1/0x4000000000000000 softirq=10867/10868 fqs=925
> (t=10502 jiffies g=10029 q=19103)
> NMI backtrace for cpu 1
> CPU: 1 PID: 10530 Comm: syz-executor248 Not tainted 5.11.0-syzkaller #0

diff -pur x/net/wireless/ibss.c y/net/wireless/ibss.c
--- x/net/wireless/ibss.c 2022-07-31 17:31:48.574469700 +0800
+++ y/net/wireless/ibss.c 2022-07-31 17:33:54.724578800 +0800
@@ -33,7 +33,7 @@ void __cfg80211_ibss_joined(struct net_d
bss = cfg80211_get_bss(wdev->wiphy, channel, bssid, NULL, 0,
IEEE80211_BSS_TYPE_IBSS, IEEE80211_PRIVACY_ANY);

- if (WARN_ON(!bss))
+ if (!bss)
return;

if (wdev->current_bss) {
--

syzbot

unread,
Jul 31, 2022, 5:53:14 AM7/31/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in addrconf_rs_timer

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-...!: (1 GPs behind) idle=7ee/1/0x4000000000000000 softirq=10587/10593 fqs=0
(t=10501 jiffies g=12037 q=817)
rcu: rcu_preempt kthread starved for 10502 jiffies! g12037 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:28888 pid: 13 ppid: 2 flags:0x00004000
Call Trace:
context_switch kernel/sched/core.c:4327 [inline]
__schedule+0x90c/0x21a0 kernel/sched/core.c:5078
schedule+0xcf/0x270 kernel/sched/core.c:5157
schedule_timeout+0x148/0x250 kernel/time/timer.c:1878
rcu_gp_fqs_loop kernel/rcu/tree.c:1940 [inline]
rcu_gp_kthread+0xbbe/0x1d70 kernel/rcu/tree.c:2113
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
NMI backtrace for cpu 1
CPU: 1 PID: 6327 Comm: syz-executor.5 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x1f4/0x230 kernel/rcu/tree_stall.h:337
print_cpu_stall kernel/rcu/tree_stall.h:569 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:643 [inline]
rcu_pending kernel/rcu/tree.c:3751 [inline]
rcu_sched_clock_irq.cold+0x48e/0xedf kernel/rcu/tree.c:2580
update_process_times+0x16d/0x200 kernel/time/timer.c:1782
tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1369
__run_hrtimer kernel/time/hrtimer.c:1521 [inline]
__hrtimer_run_queues+0x68d/0xea0 kernel/time/hrtimer.c:1588
hrtimer_interrupt+0x334/0x940 kernel/time/hrtimer.c:1650
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
__sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1106
run_sysvec_on_irqstack_cond arch/x86/include/asm/irq_stack.h:91 [inline]
sysvec_apic_timer_interrupt+0x48/0x100 arch/x86/kernel/apic/apic.c:1100
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629
RIP: 0010:ndisc_send_skb+0x150/0x1750 net/ipv6/ndisc.c:475
Code: 00 0f 85 b8 14 00 00 4c 8b b9 98 05 00 00 48 ba 00 00 00 00 00 fc ff df 49 8d bf 08 0d 00 00 48 89 fe 48 c1 ee 03 80 3c 16 00 <0f> 85 77 14 00 00 49 8b 87 08 0d 00 00 48 ba 00 00 00 00 00 fc ff
RSP: 0018:ffffc90000db0aa0 EFLAGS: 00000246
RAX: ffff88803d517790 RBX: ffffc90000db0ce0 RCX: ffff88803acf0000
RDX: dffffc0000000000 RSI: 1ffff11004ff7cc9 RDI: ffff888027fbe648
RBP: ffffc90000db0c40 R08: 0000000000000000 R09: ffff88803c69ac47
R10: ffffffff87b1bfba R11: 0000000000000020 R12: ffff88803d517780
R13: 0000000000000000 R14: 0000000000000000 R15: ffff888027fbd940
ndisc_send_rs+0x12e/0x700 net/ipv6/ndisc.c:702
addrconf_rs_timer+0x3f2/0x820 net/ipv6/addrconf.c:3875
call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1417
expire_timers kernel/time/timer.c:1462 [inline]
__run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1731
__run_timers kernel/time/timer.c:1712 [inline]
run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1744
__do_softirq+0x29b/0x9f6 kernel/softirq.c:343
asm_call_irq_on_stack+0xf/0x20
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
invoke_softirq kernel/softirq.c:226 [inline]
__irq_exit_rcu kernel/softirq.c:420 [inline]
irq_exit_rcu+0x134/0x200 kernel/softirq.c:432
sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1100
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629
RIP: 0010:__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
RIP: 0010:_raw_spin_unlock_irq+0x25/0x40 kernel/locking/spinlock.c:199
Code: 0f 1f 44 00 00 55 48 8b 74 24 08 48 89 fd 48 83 c7 18 e8 5e e7 5b f8 48 89 ef e8 06 9c 5c f8 e8 f1 bf 7b f8 fb bf 01 00 00 00 <e8> 46 a3 50 f8 65 8b 05 3f 09 05 77 85 c0 74 02 5d c3 e8 cb 5e 03
RSP: 0018:ffffc9000180fc58 EFLAGS: 00000206
RAX: 00000000000000d7 RBX: 0000000000000001 RCX: 1ffffffff1a06a29
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000001
RBP: ffff8880b9f34c40 R08: 0000000000000001 R09: 0000000000000001
R10: ffffffff8178a698 R11: 0000000000000000 R12: ffff8880b9f34c40
R13: ffff88801cbbb780 R14: 0000000000000000 R15: ffff888021123780
finish_lock_switch kernel/sched/core.c:4079 [inline]
finish_task_switch.isra.0+0x158/0x7e0 kernel/sched/core.c:4196
context_switch kernel/sched/core.c:4330 [inline]
__schedule+0x914/0x21a0 kernel/sched/core.c:5078
preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5238
preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
__raw_spin_unlock include/linux/spinlock_api_smp.h:152 [inline]
_raw_spin_unlock+0x36/0x40 kernel/locking/spinlock.c:183
spin_unlock include/linux/spinlock.h:394 [inline]
setfl fs/fcntl.c:81 [inline]
do_fcntl+0x6da/0x1070 fs/fcntl.c:347
__do_sys_fcntl fs/fcntl.c:463 [inline]
__se_sys_fcntl fs/fcntl.c:448 [inline]
__x64_sys_fcntl+0x165/0x1e0 fs/fcntl.c:448
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465d99
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff3a2a3a188 EFLAGS: 00000246 ORIG_RAX: 0000000000000048
RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 0000000000465d99
RDX: 0000000000042000 RSI: 0000000000000004 RDI: 0000000000000003
RBP: 00000000004bcf27 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60
R13: 00007fff0360150f R14: 00007ff3a2a3a300 R15: 0000000000022000
On CPU1 hrtimer tick_sched_timer took more than 4 ticks


Tested on:

commit: f40ddce8 Linux 5.11
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=139e281e080000
kernel config: https://syzkaller.appspot.com/x/.config?x=3565439810a4d2e5
dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=16cd4ed2080000

Hillf Danton

unread,
Jul 31, 2022, 7:01:54 AM7/31/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Sat, 20 Feb 2021 13:05:18 -0800
> syzbot found the following issue on:
>
> HEAD commit: f40ddce8 Linux 5.11
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1397f498d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e53d04227c52a0df
> dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17a81012d00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1282b6d2d00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+de9526...@syzkaller.appspotmail.com
>
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 1-....: (10153 ticks this GP) idle=f1a/1/0x4000000000000000 softirq=10867/10868 fqs=925
> (t=10502 jiffies g=10029 q=19103)
> NMI backtrace for cpu 1
> CPU: 1 PID: 10530 Comm: syz-executor248 Not tainted 5.11.0-syzkaller #0

Print the hrtimer hog.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f40ddce8

diff -pur x/drivers/net/wireless/mac80211_hwsim.c y/drivers/net/wireless/mac80211_hwsim.c
--- x/drivers/net/wireless/mac80211_hwsim.c 2022-07-31 16:20:11.823504700 +0800
+++ y/drivers/net/wireless/mac80211_hwsim.c 2022-07-31 16:22:14.130567900 +0800
@@ -1788,8 +1788,9 @@ mac80211_hwsim_beacon(struct hrtimer *ti
bcn_int -= data->bcn_delta;
data->bcn_delta = 0;
}
- hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer),
- ns_to_ktime(bcn_int * NSEC_PER_USEC));
+
+ hrtimer_forward_now(&data->beacon_timer, ns_to_ktime(bcn_int * NSEC_PER_USEC));
+
return HRTIMER_RESTART;
}

diff -pur x/kernel/softirq.c y/kernel/softirq.c
--- x/kernel/softirq.c 2022-07-31 18:37:44.062662300 +0800
+++ y/kernel/softirq.c 2022-07-31 18:52:39.657258700 +0800
@@ -300,12 +300,14 @@ static inline void lockdep_softirq_end(b
asmlinkage __visible void __softirq_entry __do_softirq(void)
{
unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
+ unsigned long ts_sum;
unsigned long old_flags = current->flags;
int max_restart = MAX_SOFTIRQ_RESTART;
struct softirq_action *h;
bool in_hardirq;
__u32 pending;
int softirq_bit;
+ int cpu = smp_processor_id();

/*
* Mask out PF_MEMALLOC as the current task context is borrowed for the
@@ -327,8 +329,10 @@ restart:
local_irq_enable();

h = softirq_vec;
+ ts_sum = jiffies + 20;

while ((softirq_bit = ffs(pending))) {
+ unsigned long ts;
unsigned int vec_nr;
int prev_count;

@@ -338,10 +342,19 @@ restart:
prev_count = preempt_count();

kstat_incr_softirqs_this_cpu(vec_nr);
+ ts = jiffies + 10;

trace_softirq_entry(vec_nr);
h->action(h);
trace_softirq_exit(vec_nr);
+
+ WARN_ONCE(time_after(jiffies, ts_sum),
+ "On CPU%d softirq took more than 20 ticks\n", cpu);
+
+ WARN_ONCE(time_after(jiffies, ts),
+ "On CPU%d softirq %s took more than 10 ticks\n",
+ cpu, softirq_to_name[vec_nr]);
+
if (unlikely(prev_count != preempt_count())) {
pr_err("huh, entered softirq %u %s %p with preempt_count %08x, exited with %08x?\n",
vec_nr, softirq_to_name[vec_nr], h->action,
diff -pur x/kernel/time/hrtimer.c y/kernel/time/hrtimer.c
--- x/kernel/time/hrtimer.c 2022-07-31 15:07:43.454730700 +0800
+++ y/kernel/time/hrtimer.c 2022-07-31 18:55:23.426035000 +0800
@@ -1480,6 +1480,7 @@ static void __run_hrtimer(struct hrtimer
{
enum hrtimer_restart (*fn)(struct hrtimer *);
bool expires_in_hardirq;
+ unsigned long ts;
int restart;

lockdep_assert_held(&cpu_base->lock);
@@ -1515,9 +1516,16 @@ static void __run_hrtimer(struct hrtimer
raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
trace_hrtimer_expire_entry(timer, now);
expires_in_hardirq = lockdep_hrtimer_enter(timer);
+ ts = jiffies + 10;

restart = fn(timer);

+ WARN_ONCE(time_after(jiffies, ts),
+ "On CPU%u hrtimer %ps took more than 10 ticks\n", cpu_base->cpu, fn);
+ /*
+ if (time_after(jiffies, ts))
+ pr_warn("On CPU%u hrtimer %ps took more than 4 ticks\n", cpu_base->cpu, fn);
+*/

syzbot

unread,
Jul 31, 2022, 7:13:09 AM7/31/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING in __do_softirq

------------[ cut here ]------------
On CPU0 softirq took more than 20 ticks
WARNING: CPU: 0 PID: 6054 at kernel/softirq.c:351 __do_softirq+0x3b8/0xba6 kernel/softirq.c:351
Modules linked in:
CPU: 0 PID: 6054 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
RIP: 0010:__do_softirq+0x3b8/0xba6 kernel/softirq.c:351
Code: 0f 89 71 fe ff ff 80 3d 56 69 cf 03 00 0f 85 64 fe ff ff 8b 74 24 40 48 c7 c7 00 54 4a 89 c6 05 3e 69 cf 03 01 e8 52 a8 99 ff <0f> 0b e9 46 fe ff ff 65 8b 15 9a 91 e1 76 83 fa 07 0f 87 ed 04 00
RSP: 0018:ffffc90000007f78 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffff8b00a0c8 RCX: 0000000000000000
RDX: ffff888022969bc0 RSI: ffffffff815b73e5 RDI: fffff52000000fe1
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815b05be R11: 0000000000000000 R12: 0000000000000102
R13: 00000000ffffab06 R14: 0000000000000001 R15: 0000000000000082
FS: 00007f8534135700(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000005680c0 CR3: 000000003141e000 CR4: 0000000000350ef0
Call Trace:
<IRQ>
asm_call_irq_on_stack+0xf/0x20
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
invoke_softirq kernel/softirq.c:226 [inline]
__irq_exit_rcu kernel/softirq.c:433 [inline]
irq_exit_rcu+0x134/0x200 kernel/softirq.c:445
sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1100
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:629
RIP: 0010:lock_is_held_type+0x10/0x120 kernel/locking/lockdep.c:5470
Code: c7 c7 a0 a9 4b 89 89 44 24 04 e8 e5 29 bf ff 0f 0b 8b 44 24 04 e9 08 ff ff ff 41 57 41 56 41 55 41 bd 01 00 00 00 41 54 55 53 <48> 83 ec 08 8b 0d d2 3b 09 04 85 c9 0f 84 c6 00 00 00 65 8b 05 e7
RSP: 0018:ffffc900020afd08 EFLAGS: 00000202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff8158ce61
RDX: 1ffffffff166f836 RSI: 00000000ffffffff RDI: ffffffff8b373de0
RBP: 1ffff92000415fac R08: 0000000000000000 R09: ffffffff8d038c4f
R10: fffffbfff1a07189 R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000001 R14: 0000000000000000 R15: ffff888029f29980
lock_is_held include/linux/lockdep.h:271 [inline]
rcu_read_lock_sched_held+0x3a/0x70 kernel/rcu/update.c:123
trace_lock_release include/trace/events/lock.h:58 [inline]
lock_release+0x5b7/0x710 kernel/locking/lockdep.c:5453
__raw_spin_unlock include/linux/spinlock_api_smp.h:150 [inline]
_raw_spin_unlock+0x12/0x40 kernel/locking/spinlock.c:183
spin_unlock include/linux/spinlock.h:394 [inline]
setfl fs/fcntl.c:81 [inline]
do_fcntl+0x6da/0x1070 fs/fcntl.c:347
__do_sys_fcntl fs/fcntl.c:463 [inline]
__se_sys_fcntl fs/fcntl.c:448 [inline]
__x64_sys_fcntl+0x165/0x1e0 fs/fcntl.c:448
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465d99
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f8534135188 EFLAGS: 00000246 ORIG_RAX: 0000000000000048
RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 0000000000465d99
RDX: 0000000000042000 RSI: 0000000000000004 RDI: 0000000000000003
RBP: 00000000004bcf27 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60
R13: 00007ffee7bd08df R14: 00007f8534135300 R15: 0000000000022000
----------------
Code disassembly (best guess):
0: c7 c7 a0 a9 4b 89 mov $0x894ba9a0,%edi
6: 89 44 24 04 mov %eax,0x4(%rsp)
a: e8 e5 29 bf ff callq 0xffbf29f4
f: 0f 0b ud2
11: 8b 44 24 04 mov 0x4(%rsp),%eax
15: e9 08 ff ff ff jmpq 0xffffff22
1a: 41 57 push %r15
1c: 41 56 push %r14
1e: 41 55 push %r13
20: 41 bd 01 00 00 00 mov $0x1,%r13d
26: 41 54 push %r12
28: 55 push %rbp
29: 53 push %rbx
* 2a: 48 83 ec 08 sub $0x8,%rsp <-- trapping instruction
2e: 8b 0d d2 3b 09 04 mov 0x4093bd2(%rip),%ecx # 0x4093c06
34: 85 c9 test %ecx,%ecx
36: 0f 84 c6 00 00 00 je 0x102
3c: 65 gs
3d: 8b .byte 0x8b
3e: 05 .byte 0x5
3f: e7 .byte 0xe7


Tested on:

commit: f40ddce8 Linux 5.11
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=12c8846a080000
kernel config: https://syzkaller.appspot.com/x/.config?x=3565439810a4d2e5
dashboard link: https://syzkaller.appspot.com/bug?extid=de9526ade17c659d8336
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=16efe82e080000

syzbot

unread,
Dec 18, 2022, 10:45:39 PM12/18/22
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
No recent activity, existing reproducers are no longer triggering the issue.
Reply all
Reply to author
Forward
0 new messages