INFO: rcu detected stall in ext4_file_write_iter

30 views
Skip to first unread message

syzbot

unread,
Feb 26, 2019, 1:50:06ā€ÆAM2/26/19
to adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Hello,

syzbot found the following crash on:

HEAD commit: 8a61716ff2ab Merge tag 'ceph-for-5.0-rc8' of git://github...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=161b71d4c00000
kernel config: https://syzkaller.appspot.com/x/.config?x=7132344728e7ec3f
dashboard link: https://syzkaller.appspot.com/bug?extid=7d19c5fe6a3f1161abb7
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=103908f8c00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=105e5cd0c00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+7d19c5...@syzkaller.appspotmail.com

audit: type=1400 audit(1550814986.750:36): avc: denied { map } for
pid=8058 comm="syz-executor004" path="/root/syz-executor004991115"
dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
hrtimer: interrupt took 42841 ns
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: (detected by 1, t=10502 jiffies, g=5873, q=2)
rcu: All QSes seen, last rcu_preempt kthread activity 10502
(4295059997-4295049495), jiffies_till_next_fqs=1, root ->qsmask 0x0
syz-executor004 R running task 26448 8069 8060 0x00000000
Call Trace:
<IRQ>
sched_show_task kernel/sched/core.c:5306 [inline]
sched_show_task.cold+0x292/0x30b kernel/sched/core.c:5281
print_other_cpu_stall kernel/rcu/tree.c:1301 [inline]
check_cpu_stall kernel/rcu/tree.c:1429 [inline]
rcu_pending kernel/rcu/tree.c:3018 [inline]
rcu_check_callbacks.cold+0xa10/0xa4a kernel/rcu/tree.c:2521
update_process_times+0x32/0x80 kernel/time/timer.c:1635
tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
__run_hrtimer kernel/time/hrtimer.c:1389 [inline]
__hrtimer_run_queues+0x33e/0xde0 kernel/time/hrtimer.c:1451
hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
smp_apic_timer_interrupt+0x120/0x570 arch/x86/kernel/apic/apic.c:1060
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
</IRQ>
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x50 kernel/kcov.c:94
Code: e5 08 41 bc f4 ff ff ff e8 11 9f ea ff 48 c7 05 12 fb e5 08 00 00 00
00 e9 c8 e9 ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 <55> 48 89 e5 48
8b 75 08 65 48 8b 04 25 40 ee 01 00 65 8b 15 38 0c
RSP: 0018:ffff888084ed72c0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: dffffc0000000000 RBX: ffff88809a2205c0 RCX: ffffffff81e7c36c
RDX: 1ffff11013444114 RSI: ffffffff81e7c37a RDI: ffff88809a2208a0
RBP: ffff888084ed7300 R08: ffff88808fa28240 R09: ffffed1015d25bd0
R10: ffffed1015d25bcf R11: ffff8880ae92de7b R12: 0000000000000002
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
__ext4_journal_start fs/ext4/ext4_jbd2.h:311 [inline]
ext4_dirty_inode+0x64/0xc0 fs/ext4/inode.c:6081
__mark_inode_dirty+0x919/0x1290 fs/fs-writeback.c:2155
mark_inode_dirty include/linux/fs.h:2125 [inline]
__generic_write_end+0x1b9/0x240 fs/buffer.c:2118
generic_write_end+0x6c/0x90 fs/buffer.c:2163
ext4_da_write_end+0x3c8/0xa50 fs/ext4/inode.c:3204
generic_perform_write+0x2f0/0x530 mm/filemap.c:3219
__generic_file_write_iter+0x25e/0x630 mm/filemap.c:3333
ext4_file_write_iter+0x33f/0x1160 fs/ext4/file.c:266
call_write_iter include/linux/fs.h:1863 [inline]
do_iter_readv_writev+0x5e0/0x8e0 fs/read_write.c:680
do_iter_write fs/read_write.c:956 [inline]
do_iter_write+0x184/0x610 fs/read_write.c:937
vfs_iter_write+0x77/0xb0 fs/read_write.c:969
iter_file_splice_write+0x667/0xbe0 fs/splice.c:749
do_splice_from fs/splice.c:851 [inline]
direct_splice_actor+0x126/0x1a0 fs/splice.c:1023
splice_direct_to_actor+0x369/0x970 fs/splice.c:978
do_splice_direct+0x1da/0x2a0 fs/splice.c:1066
do_sendfile+0x597/0xd00 fs/read_write.c:1436
__do_sys_sendfile64 fs/read_write.c:1491 [inline]
__se_sys_sendfile64 fs/read_write.c:1483 [inline]
__x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1483
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446a59
Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 4b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fce454e5db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 0000000000446a59
RDX: 0000000020000000 RSI: 0000000000000003 RDI: 0000000000000003
RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
R10: 00008080fffffffe R11: 0000000000000246 R12: 00000000006dcc2c
R13: 00007ffd7bf04daf R14: 00007fce454e69c0 R15: 20c49ba5e353f7cf
rcu: rcu_preempt kthread starved for 10502 jiffies! g5873 f0x2
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: RCU grace-period kthread stack dump:
rcu_preempt R running task 29200 10 2 0x80000000
Call Trace:
context_switch kernel/sched/core.c:2844 [inline]
__schedule+0x817/0x1cc0 kernel/sched/core.c:3485
schedule+0x92/0x180 kernel/sched/core.c:3529
schedule_timeout+0x4db/0xfd0 kernel/time/timer.c:1803
rcu_gp_fqs_loop kernel/rcu/tree.c:1948 [inline]
rcu_gp_kthread+0x956/0x17a0 kernel/rcu/tree.c:2105
kthread+0x357/0x430 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Theodore Y. Ts'o

unread,
Feb 26, 2019, 10:17:50ā€ÆAM2/26/19
to syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
TL;DR: This doesn't appear to be ext4 specific, and seems to involve
an unholy combination of the perf_event_open(2) and sendfile(2) system
calls.

On Mon, Feb 25, 2019 at 10:50:05PM -0800, syzbot wrote:
> syzbot found the following crash on:
>
> HEAD commit: 8a61716ff2ab Merge tag 'ceph-for-5.0-rc8' of git://github...
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=161b71d4c00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=7132344728e7ec3f
> dashboard link: https://syzkaller.appspot.com/bug?extid=7d19c5fe6a3f1161abb7
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=103908f8c00000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=105e5cd0c00000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+7d19c5...@syzkaller.appspotmail.com
>
> audit: type=1400 audit(1550814986.750:36): avc: denied { map } for
> pid=8058 comm="syz-executor004" path="/root/syz-executor004991115"
> dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> hrtimer: interrupt took 42841 ns
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: (detected by 1, t=10502 jiffies, g=5873, q=2)
> rcu: All QSes seen, last rcu_preempt kthread activity 10502
> (4295059997-4295049495), jiffies_till_next_fqs=1, root ->qsmask 0x0
> syz-executor004 R running task 26448 8069 8060 0x00000000

This particular repro seems to induce similar failures when I tried it
with xfs and btrfs as well as ext4.

The repro seems to involve the perf_event_open(2) and sendfile(2)
system calls, and killing the process which is performing the
sendfile(2). The repro also uses the sched_setattr(2) system call,
but when I commented it out, the failure still happened, so this
appears to be another case of "Syzkaller? We don't need to bug
developers with a minimal test case! Open source developers are a
free unlimited resource, after all!" :-)

Commenting out the perf_event_open(2) does seem to make the problem go
away.

Since there are zillions of ways to self-DOS a Linux server without
having to resert to exotic combination of system calls, this isn't
something I'm going to prioritize for myself, but I'm hoping someone
else has time and curiosity.

- Ted
repro.c

Dmitry Vyukov

unread,
Feb 27, 2019, 4:59:03ā€ÆAM2/27/19
to Theodore Y. Ts'o, syzbot, Andreas Dilger, linux...@vger.kernel.org, LKML, linux-fsdevel, syzkaller-bugs, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
Peter, Ingo, do you have any updates on the
perf_event_open/sched_setattr stalls? This bug cause assorted hangs
throughout kernel and so is nasty.

syzkaller tries to remove all syscalls from reproducers one-by-one.
Somehow without sched_setattr the hang did not reproduce (a bunch of
repros have perf_event_open+sched_setattr so somehow they seem to be
related). Kernel is not as simple as a single-threaded deterministic
fully-reproducible user-space xml parsing library, more (almost all)
aspects are flaky and non-deterministic and thus require more human
intelligence. But even with perfect repros machines still won't be
able to tell in all cases that even though the hang happened in ext4
code, the root cause is actually another scheduler-related system
call. So thanks for looking into this.

Theodore Y. Ts'o

unread,
Feb 27, 2019, 4:58:07ā€ÆPM2/27/19
to Dmitry Vyukov, syzbot, Andreas Dilger, linux...@vger.kernel.org, LKML, linux-fsdevel, syzkaller-bugs, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
On Wed, Feb 27, 2019 at 10:58:50AM +0100, Dmitry Vyukov wrote:
> Peter, Ingo, do you have any updates on the
> perf_event_open/sched_setattr stalls? This bug cause assorted hangs
> throughout kernel and so is nasty.
>
> syzkaller tries to remove all syscalls from reproducers one-by-one.
> Somehow without sched_setattr the hang did not reproduce (a bunch of
> repros have perf_event_open+sched_setattr so somehow they seem to be
> related)

FWIW, at least for me, the repro.c with sched_setattr commented out
(see the repro.c attached to a message[1] earlier in the thread) it
was reproducing reliably on a 2 CPU, 2 GB memory KVM using the
ext4.git tree (dev branch, 5.0-rc3 plus ext4 commits for the next
merge window) using a Debian stable-based VM[2].

[1] https://groups.google.com/d/msg/syzkaller-bugs/ByPpM3WZw1s/li7SsaEyAgAJ
[2] https://mirrors.edge.kernel.org/pub/linux/kernel/people/tytso/kvm-xfstests/root_fs.img.amd64

> But even with perfect repros machines still won't be
> able to tell in all cases that even though the hang happened in ext4
> code, the root cause is actually another scheduler-related system
> call. So thanks for looking into this.

To be clear, there was *not* a scheduler-related system call in the
repro.c I was playing with (see [2]); just perf_event_open(2) and
sendfile(2).

Cheers,

- Ted

Dmitry Vyukov

unread,
Feb 28, 2019, 4:34:42ā€ÆAM2/28/19
to Theodore Y. Ts'o, Dmitry Vyukov, syzbot, Andreas Dilger, linux...@vger.kernel.org, LKML, linux-fsdevel, syzkaller-bugs, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo
Let me correct the statement then:

But even with perfect repros machines still won't be able to tell in
all cases that even though the hang happened in ext4 code, the root
cause is actually another perf-related system call. So thanks for
looking into this.

syzbot

unread,
Mar 21, 2019, 1:30:01ā€ÆPM3/21/19
to ac...@kernel.org, adilger...@dilger.ca, dvy...@google.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, mi...@redhat.com, pet...@infradead.org, syzkall...@googlegroups.com, ty...@mit.edu

syzbot

unread,
Sep 11, 2020, 3:33:10ā€ÆPM9/11/20
to ac...@kernel.org, adilger...@dilger.ca, ax...@kernel.dk, dvy...@google.com, fwei...@gmail.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, mi...@kernel.org, mi...@redhat.com, penguin...@i-love.sakura.ne.jp, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de, ty...@mit.edu
syzbot suspects this issue was fixed by commit:

commit 7e24969022cbd61ddc586f14824fc205661bb124
Author: Ming Lei <ming...@redhat.com>
Date: Mon Aug 17 10:00:55 2020 +0000

block: allow for_each_bvec to support zero len bvec

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1120f201900000
start commit: 92ed3019 Linux 5.8-rc7
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=84f076779e989e69
dashboard link: https://syzkaller.appspot.com/bug?extid=7d19c5fe6a3f1161abb7
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=170f7cbc900000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16a25e38900000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: block: allow for_each_bvec to support zero len bvec

Jens Axboe

unread,
Sep 11, 2020, 4:03:37ā€ÆPM9/11/20
to syzbot, ac...@kernel.org, adilger...@dilger.ca, dvy...@google.com, fwei...@gmail.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, ming...@redhat.com, mi...@kernel.org, mi...@redhat.com, penguin...@i-love.sakura.ne.jp, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de, ty...@mit.edu
#syz fix: block: allow for_each_bvec to support zero len bvec


--
Jens Axboe

Reply all
Reply to author
Forward
0 new messages