[syzbot] INFO: rcu detected stall in ext4_file_write_iter (4)

13 views
Skip to first unread message

syzbot

unread,
Dec 22, 2021, 12:33:27 PM12/22/21
to fwei...@gmail.com, linux-...@vger.kernel.org, mi...@kernel.org, syzkall...@googlegroups.com, tg...@linutronix.de
Hello,

syzbot found the following issue on:

HEAD commit: 9eaa88c7036e Merge tag 'libata-5.16-rc6' of git://git.kern..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=142d4143b00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e3bdfd29b408d1b6
dashboard link: https://syzkaller.appspot.com/bug?extid=03464269af631f4a4bdf
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11b1537db00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=125de549b00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+034642...@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
(detected by 0, t=10502 jiffies, g=5509, q=170)
rcu: All QSes seen, last rcu_preempt kthread activity 8642 (4295000735-4294992093), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 8643 jiffies! g5509 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:28208 pid: 14 ppid: 2 flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:4972 [inline]
context_switch kernel/sched/core.c:4972 [inline] kernel/sched/core.c:6253
__schedule+0xa9a/0x4940 kernel/sched/core.c:6253 kernel/sched/core.c:6253
schedule+0xd2/0x260 kernel/sched/core.c:6326 kernel/sched/core.c:6326
schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881 kernel/time/timer.c:1881
rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955 kernel/rcu/tree.c:1955
rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128 kernel/rcu/tree.c:2128
kthread+0x405/0x4f0 kernel/kthread.c:327 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 arch/x86/entry/entry_64.S:295
</TASK>
rcu: Stack dump where RCU GP kthread last ran:
NMI backtrace for cpu 0
CPU: 0 PID: 3672 Comm: syz-executor549 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
__dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 lib/dump_stack.c:106
nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111 lib/nmi_backtrace.c:111
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62 lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] kernel/rcu/tree_stall.h:481
rcu_check_gp_kthread_starvation.cold+0x1fb/0x200 kernel/rcu/tree_stall.h:481 kernel/rcu/tree_stall.h:481
print_other_cpu_stall kernel/rcu/tree_stall.h:586 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:729 [inline]
rcu_pending kernel/rcu/tree.c:3878 [inline]
print_other_cpu_stall kernel/rcu/tree_stall.h:586 [inline] kernel/rcu/tree.c:2597
check_cpu_stall kernel/rcu/tree_stall.h:729 [inline] kernel/rcu/tree.c:2597
rcu_pending kernel/rcu/tree.c:3878 [inline] kernel/rcu/tree.c:2597
rcu_sched_clock_irq+0x2125/0x2200 kernel/rcu/tree.c:2597 kernel/rcu/tree.c:2597
update_process_times+0x16d/0x200 kernel/time/timer.c:1785 kernel/time/timer.c:1785
tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226 kernel/time/tick-sched.c:226
tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428 kernel/time/tick-sched.c:1428
__run_hrtimer kernel/time/hrtimer.c:1685 [inline]
__run_hrtimer kernel/time/hrtimer.c:1685 [inline] kernel/time/hrtimer.c:1749
__hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749 kernel/time/hrtimer.c:1749
hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 kernel/time/hrtimer.c:1811
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] arch/x86/kernel/apic/apic.c:1103
__sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103 arch/x86/kernel/apic/apic.c:1103
sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097 arch/x86/kernel/apic/apic.c:1097
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 arch/x86/include/asm/idtentry.h:638
RIP: 0010:owner_on_cpu kernel/locking/rwsem.c:667 [inline]
RIP: 0010:owner_on_cpu kernel/locking/rwsem.c:667 [inline] kernel/locking/rwsem.c:750
RIP: 0010:rwsem_spin_on_owner+0x1aa/0x290 kernel/locking/rwsem.c:750 kernel/locking/rwsem.c:750
Code: a8 08 75 33 49 8d 7c 24 34 48 89 f8 48 c1 e8 03 42 0f b6 14 28 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 b0 00 00 00 <41> 8b 44 24 34 85 c0 0f 85 14 ff ff ff b8 08 00 00 00 48 83 c4 08
RSP: 0018:ffffc90002817448 EFLAGS: 00000246
RAX: 0000000000000007 RBX: ffff88806da34680 RCX: ffffffff815a9743
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88807d2000b4
RBP: ffff88806da34688 R08: 0000000000000000 R09: ffff888078b322c7
R10: ffffed100f166458 R11: 0000000000000001 R12: ffff88807d200080
R13: dffffc0000000000 R14: ffff888078b322c0 R15: ffffed100f166458
rwsem_optimistic_spin kernel/locking/rwsem.c:808 [inline]
rwsem_optimistic_spin kernel/locking/rwsem.c:808 [inline] kernel/locking/rwsem.c:1068
rwsem_down_write_slowpath+0x1a8/0x1110 kernel/locking/rwsem.c:1068 kernel/locking/rwsem.c:1068
__down_write_common kernel/locking/rwsem.c:1268 [inline]
__down_write_common kernel/locking/rwsem.c:1265 [inline]
__down_write kernel/locking/rwsem.c:1277 [inline]
__down_write_common kernel/locking/rwsem.c:1268 [inline] kernel/locking/rwsem.c:1524
__down_write_common kernel/locking/rwsem.c:1265 [inline] kernel/locking/rwsem.c:1524
__down_write kernel/locking/rwsem.c:1277 [inline] kernel/locking/rwsem.c:1524
down_write+0x135/0x150 kernel/locking/rwsem.c:1524 kernel/locking/rwsem.c:1524
inode_lock include/linux/fs.h:783 [inline]
inode_lock include/linux/fs.h:783 [inline] fs/ext4/file.c:263
ext4_buffered_write_iter+0xb6/0x3b0 fs/ext4/file.c:263 fs/ext4/file.c:263
ext4_file_write_iter+0x423/0x14e0 fs/ext4/file.c:681 fs/ext4/file.c:681
call_write_iter include/linux/fs.h:2162 [inline]
call_write_iter include/linux/fs.h:2162 [inline] fs/read_write.c:725
do_iter_readv_writev+0x472/0x750 fs/read_write.c:725 fs/read_write.c:725
do_iter_write+0x188/0x710 fs/read_write.c:851 fs/read_write.c:851
vfs_iter_write+0x70/0xa0 fs/read_write.c:892 fs/read_write.c:892
iter_file_splice_write+0x6fa/0xc10 fs/splice.c:689 fs/splice.c:689
do_splice_from fs/splice.c:767 [inline]
do_splice_from fs/splice.c:767 [inline] fs/splice.c:936
direct_splice_actor+0x110/0x180 fs/splice.c:936 fs/splice.c:936
splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891 fs/splice.c:891
do_splice_direct+0x1b3/0x280 fs/splice.c:979 fs/splice.c:979
do_sendfile+0xaf2/0x1250 fs/read_write.c:1245 fs/read_write.c:1245
__do_sys_sendfile64 fs/read_write.c:1310 [inline]
__se_sys_sendfile64 fs/read_write.c:1296 [inline]
__do_sys_sendfile64 fs/read_write.c:1310 [inline] fs/read_write.c:1296
__se_sys_sendfile64 fs/read_write.c:1296 [inline] fs/read_write.c:1296
__x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1296 fs/read_write.c:1296
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_x64 arch/x86/entry/common.c:50 [inline] arch/x86/entry/common.c:80
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f34d1e3f169
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc49095728 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f34d1e3f169
RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
R10: 00008400fffffffb R11: 0000000000000246 R12: 0000000000076d4d
R13: 00007ffc4909573c R14: 00007ffc49095750 R15: 00007ffc49095740
</TASK>
----------------
Code disassembly (best guess):
0: a8 08 test $0x8,%al
2: 75 33 jne 0x37
4: 49 8d 7c 24 34 lea 0x34(%r12),%rdi
9: 48 89 f8 mov %rdi,%rax
c: 48 c1 e8 03 shr $0x3,%rax
10: 42 0f b6 14 28 movzbl (%rax,%r13,1),%edx
15: 48 89 f8 mov %rdi,%rax
18: 83 e0 07 and $0x7,%eax
1b: 83 c0 03 add $0x3,%eax
1e: 38 d0 cmp %dl,%al
20: 7c 08 jl 0x2a
22: 84 d2 test %dl,%dl
24: 0f 85 b0 00 00 00 jne 0xda
* 2a: 41 8b 44 24 34 mov 0x34(%r12),%eax <-- trapping instruction
2f: 85 c0 test %eax,%eax
31: 0f 85 14 ff ff ff jne 0xffffff4b
37: b8 08 00 00 00 mov $0x8,%eax
3c: 48 83 c4 08 add $0x8,%rsp
----------------
Code disassembly (best guess):
0: a8 08 test $0x8,%al
2: 75 33 jne 0x37
4: 49 8d 7c 24 34 lea 0x34(%r12),%rdi
9: 48 89 f8 mov %rdi,%rax
c: 48 c1 e8 03 shr $0x3,%rax
10: 42 0f b6 14 28 movzbl (%rax,%r13,1),%edx
15: 48 89 f8 mov %rdi,%rax
18: 83 e0 07 and $0x7,%eax
1b: 83 c0 03 add $0x3,%eax
1e: 38 d0 cmp %dl,%al
20: 7c 08 jl 0x2a
22: 84 d2 test %dl,%dl
24: 0f 85 b0 00 00 00 jne 0xda
* 2a: 41 8b 44 24 34 mov 0x34(%r12),%eax <-- trapping instruction
2f: 85 c0 test %eax,%eax
31: 0f 85 14 ff ff ff jne 0xffffff4b
37: b8 08 00 00 00 mov $0x8,%eax
3c: 48 83 c4 08 add $0x8,%rsp


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Dec 24, 2021, 7:53:06 PM12/24/21
to syzbot, linux-...@vger.kernel.org, Peter Zijlstra, Waiman Long, syzkall...@googlegroups.com
On Wed, 22 Dec 2021 09:33:26 -0800
The lock acquirer was spinning long enough to make RCU stall, and the
question is, if this is not reproducible, how was the report triggered?
Anything that syzbot is special in this case?

Thanks
Hillf

Waiman Long

unread,
Dec 27, 2021, 10:14:28 PM12/27/21
to Hillf Danton, syzbot, linux-...@vger.kernel.org, Peter Zijlstra, syzkall...@googlegroups.com
The test was running on a CONFIG_PREEMPT kernel. So if the syzkaller
kthread is running at a higher priority than the rcu_preempt kthread, it
is possible for the rcu_preempt kthread to be starved of cpu time. The
rwsem optimistic spinning code will relinquish the cpu if there is a
higher priority thread running. Since rcu_preempt kthread did not seem
to be able to get the cpu, I suspect that it is probably caused by the
syzkaller thread having a higher priority.

Cheers,
Longman

Hillf Danton

unread,
Dec 28, 2021, 1:36:08 AM12/28/21
to Waiman Long, syzbot, linux-...@vger.kernel.org, Peter Zijlstra, syzkall...@googlegroups.com
Given CONFIG_PREEMPT, can you please specify a bit more on how a higer
priority helps the spinner survive the following check in
rwsem_spin_on_owner()? Why did checking reschedule fail to stop spinning
for a couple of ticks? What made the lock owner sit on the other CPU
more than ten ticks for example?

Thanks
Hillf


if (need_resched() || !owner_on_cpu(owner)) {
state = OWNER_NONSPINNABLE;
break;
}

Theodore Ts'o

unread,
Dec 29, 2021, 4:29:38 PM12/29/21
to Waiman Long, Hillf Danton, syzbot, linux-...@vger.kernel.org, Peter Zijlstra, syzkall...@googlegroups.com
On Mon, Dec 27, 2021 at 10:14:23PM -0500, Waiman Long wrote:
>
> The test was running on a CONFIG_PREEMPT kernel. So if the syzkaller kthread
> is running at a higher priority than the rcu_preempt kthread, it is possible
> for the rcu_preempt kthread to be starved of cpu time. The rwsem optimistic
> spinning code will relinquish the cpu if there is a higher priority thread
> running. Since rcu_preempt kthread did not seem to be able to get the cpu, I
> suspect that it is probably caused by the syzkaller thread having a higher
> priority.

It's even worse than that. The Syzkaller reproducer is calling
sched_setattr():

*(uint32_t*)0x20000080 = 0x38; // sched_attr.sched_size
*(uint32_t*)0x20000084 = 1; // sched_attr.sched_policy == SCHED_FIFO
*(uint64_t*)0x20000088 = 0; // sched_attr.sched_flags
*(uint32_t*)0x20000090 = 0; // sched_attr.sched_nice
*(uint32_t*)0x20000094 = 1; // sched_attr.sched_priority
*(uint64_t*)0x20000098 = 0; // ...
*(uint64_t*)0x200000a0 = 0;
*(uint64_t*)0x200000a8 = 0;
*(uint32_t*)0x200000b0 = 0;
*(uint32_t*)0x200000b4 = 0;
syscall(__NR_sched_setattr, 0, 0x20000080ul, 0ul);

So one or more of the syzkaller threads is SCHED_FIFO, and SCHED_FIFO
threads will *never* relinquish the CPU in favor of SCHED_OTHER
threads (which in practice will include all kernel threads unless
special measures are taken by someone who knows what they are doing)
so long as it they are runable.

See the discussion at:

https://lore.kernel.org/all/Yb5RMWRs...@casper.infradead.org/

I recommend that kernel developers simply ignore any syzkaller report
that relates to tasks being hung or rcu detected and where the
reproducer is trying to set a real-time priority (e.g., sched_policy
of SCHED_FIFO or SCHED_RR), since the number of ways that
sched_setattr can be used as a foot-gun are near infinite....

Syzkaller reports that include sched_setattr are great for inflating
the OMG! There are tons of unhandled syzkaller reports, "companies
need to fund more engineering headcount to fix syzkaller bugs" slide
decks. But IMHO, they are not good for much else.

- Ted

Hillf Danton

unread,
Dec 30, 2021, 7:50:33 AM12/30/21
to Theodore Ts'o, Waiman Long, Hillf Danton, syzbot, linux-...@vger.kernel.org, Peter Zijlstra, syzkall...@googlegroups.com
On the other hand, this report suggests IMHO the need for setting the
deadline, a couple of ticks by default, for spinners, to cut the chance
for FIFO tasks to make trouble in scenarios like the report.

Mutex needs the same mechanism if it makes sense.

Thanks
Hillf


+++ x/kernel/locking/rwsem.c
@@ -716,6 +716,7 @@ rwsem_spin_on_owner(struct rw_semaphore
struct task_struct *new, *owner;
unsigned long flags, new_flags;
enum owner_state state;
+ unsigned long deadline;

lockdep_assert_preemption_disabled();

@@ -724,6 +725,10 @@ rwsem_spin_on_owner(struct rw_semaphore
if (state != OWNER_WRITER)
return state;

+ /* avoid spinning long enough to make rcu stall
+ * particularly in case of FIFO spinner
+ */
+ deadline = jiffies + 2;
for (;;) {
/*
* When a waiting writer set the handoff flag, it may spin
@@ -747,7 +752,8 @@ rwsem_spin_on_owner(struct rw_semaphore
*/
barrier();

- if (need_resched() || !owner_on_cpu(owner)) {
+ if (need_resched() || !owner_on_cpu(owner) ||
+ time_after(jiffies, deadline)) {
state = OWNER_NONSPINNABLE;
break;
}

Aleksandr Nogikh

unread,
Feb 8, 2022, 5:32:19 AM2/8/22
to Hillf Danton, Theodore Ts'o, Waiman Long, syzbot, LKML, Peter Zijlstra, syzkall...@googlegroups.com
Closing the bug. Syzkaller now is much more careful with sched_setattr
and perf_event_open, so, hopefully, we'll see fewer such false
positive reports in the future.

#syz invalid
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20211230125018.2272-1-hdanton%40sina.com.
Reply all
Reply to author
Forward
0 new messages