INFO: task hung in __get_super

108 views
Skip to first unread message

syzbot

unread,
Apr 1, 2018, 1:08:02 PM4/1/18
to linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hello,

syzbot hit the following crash on upstream commit
10b84daddbec72c6b440216a69de9a9605127f7a (Sat Mar 31 17:59:00 2018 +0000)
Merge branch 'perf-urgent-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=10007d66ca02b08f0e60

Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=5899419228569600
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-2760467897697295172
compiler: gcc (GCC) 7.1.1 20170620

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+10007d...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

IPv6: ADDRCONF(NETDEV_UP): veth0: link is not ready
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
random: crng init done
INFO: task syz-executor3:13421 blocked for more than 120 seconds.
Not tainted 4.16.0-rc7+ #9
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor3 D24672 13421 4481 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2862 [inline]
__schedule+0x8fb/0x1ec0 kernel/sched/core.c:3440
schedule+0xf5/0x430 kernel/sched/core.c:3499
__rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:269 [inline]
rwsem_down_read_failed+0x401/0x6e0 kernel/locking/rwsem-xadd.c:286
call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
__down_read arch/x86/include/asm/rwsem.h:83 [inline]
down_read+0xa4/0x150 kernel/locking/rwsem.c:26
__get_super.part.9+0x1d3/0x280 fs/super.c:663
__get_super include/linux/spinlock.h:310 [inline]
get_super+0x2d/0x40 fs/super.c:692
fsync_bdev+0x19/0x80 fs/block_dev.c:468
invalidate_partition+0x35/0x60 block/genhd.c:1566
drop_partitions.isra.12+0xcd/0x1d0 block/partition-generic.c:440
rescan_partitions+0x72/0x900 block/partition-generic.c:513
__blkdev_reread_part+0x15f/0x1e0 block/ioctl.c:173
blkdev_reread_part+0x26/0x40 block/ioctl.c:193
loop_reread_partitions+0x12f/0x1a0 drivers/block/loop.c:619
loop_set_status+0x9bb/0xf60 drivers/block/loop.c:1161
loop_set_status64+0x9d/0x110 drivers/block/loop.c:1271
lo_ioctl+0xd86/0x1b70 drivers/block/loop.c:1381
__blkdev_driver_ioctl block/ioctl.c:303 [inline]
blkdev_ioctl+0x1759/0x1e00 block/ioctl.c:601
block_ioctl+0xde/0x120 fs/block_dev.c:1875
vfs_ioctl fs/ioctl.c:46 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
SYSC_ioctl fs/ioctl.c:701 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x454e79
RSP: 002b:00007fda691eec68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fda691ef6d4 RCX: 0000000000454e79
RDX: 00000000200001c0 RSI: 0000000000004c04 RDI: 0000000000000013
RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000287 R14: 00000000006f5d48 R15: 0000000000000000

Showing all locks held in the system:
2 locks held by khungtaskd/878:
#0: (rcu_read_lock){....}, at: [<000000004cf2ddac>]
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [inline]
#0: (rcu_read_lock){....}, at: [<000000004cf2ddac>] watchdog+0x1c5/0xd60
kernel/hung_task.c:249
#1: (tasklist_lock){.+.+}, at: [<00000000fc5e2248>]
debug_show_all_locks+0xd3/0x3d0 kernel/locking/lockdep.c:4470
2 locks held by getty/4404:
#0: (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4405:
#0: (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4406:
#0: (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4407:
#0: (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4408:
#0: (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4409:
#0: (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
2 locks held by getty/4410:
#0: (&tty->ldisc_sem){++++}, at: [<00000000c5139392>]
ldsem_down_read+0x37/0x40 drivers/tty/tty_ldsem.c:365
#1: (&ldata->atomic_read_lock){+.+.}, at: [<000000003da58a6e>]
n_tty_read+0x2ef/0x1a40 drivers/tty/n_tty.c:2131
3 locks held by syz-executor3/13421:
#0: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>]
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
#1: (&bdev->bd_mutex){+.+.}, at: [<0000000003605603>]
blkdev_reread_part+0x1e/0x40 block/ioctl.c:192
#2: (&type->s_umount_key#77){.+.+}, at: [<0000000077701649>]
__get_super.part.9+0x1d3/0x280 fs/super.c:663
1 lock held by syz-executor3/13464:
#0: (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458
1 lock held by syz-executor3/13466:
#0: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>]
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
1 lock held by syz-executor2/13423:
#0: (&bdev->bd_mutex){+.+.}, at: [<0000000032c86bf7>]
blkdev_put+0x2a/0x4f0 fs/block_dev.c:1808
2 locks held by syz-executor0/13428:
#0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>]
alloc_super fs/super.c:211 [inline]
#0: (&type->s_umount_key#76/1){+.+.}, at: [<00000000d25ba33a>]
sget_userns+0x3a1/0xe40 fs/super.c:502
#1: (&lo->lo_ctl_mutex/1){+.+.}, at: [<00000000834f78af>]
lo_ioctl+0x8b/0x1b70 drivers/block/loop.c:1355
1 lock held by syz-executor0/13465:
#0: (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458
1 lock held by blkid/13434:
#0: (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458
1 lock held by syz-executor2/13638:
#0: (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458
1 lock held by syz-executor2/13639:
#0: (&bdev->bd_mutex){+.+.}, at: [<00000000c39e77db>]
__blkdev_get+0x176/0x13b0 fs/block_dev.c:1458

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 878 Comm: khungtaskd Not tainted 4.16.0-rc7+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x24d lib/dump_stack.c:53
nmi_cpu_backtrace+0x1d2/0x210 lib/nmi_backtrace.c:103
nmi_trigger_cpumask_backtrace+0x123/0x180 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:138 [inline]
check_hung_task kernel/hung_task.c:132 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:190 [inline]
watchdog+0x90c/0xd60 kernel/hung_task.c:249
kthread+0x33c/0x400 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0x6/0x10
arch/x86/include/asm/irqflags.h:54


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

Tetsuo Handa

unread,
Jun 19, 2018, 7:45:08 AM6/19/18
to syzbot, syzkall...@googlegroups.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, vi...@zeniv.linux.org.uk
This bug report is getting no feedback, but I guess that this bug is in
block or mm or locking layer rather than fs layer.

NMI backtrace for this bug tends to report that sb_bread() from fill_super()
from mount_bdev() is stalling is the cause of keep holding s_umount_key for
more than 120 seconds. What is strange is that NMI backtrace for this bug tends
to point at rcu_read_lock()/pagecache_get_page()/radix_tree_deref_slot()/
rcu_read_unlock() which is expected not to stall.

Since CONFIG_RCU_CPU_STALL_TIMEOUT is set to 120 (and actually +5 due to
CONFIG_PROVE_RCU=y) which is longer than CONFIG_DEFAULT_HUNG_TASK_TIMEOUT,
maybe setting CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values (e.g. 25) can
give us some hints...

Dmitry Vyukov

unread,
Jun 19, 2018, 7:53:52 AM6/19/18
to Tetsuo Handa, syzbot, syzkaller-bugs, linux-fsdevel, LKML, Al Viro
If an rcu stall is the true root cause of this, then I guess would see
"rcu stall" bug too. Rcu stall is detected after 120 seconds, but task
hang after 120-240 seconds. So rcu stall has much higher chances to be
detected. Do you see the corresponding "rcu stall" bug?

But, yes, we need to tune all timeouts. There is
https://github.com/google/syzkaller/issues/516 for this.
We also need "kernel/hung_task.c: allow to set checking interval
separately from timeout" to be merged:
https://groups.google.com/forum/#!topic/syzkaller/rOr3WBE-POY
as currently it's very hard to tune task hung timeout. But maybe we
will need similar patches for other watchdogs too if they have the
same problem.

Tetsuo Handa

unread,
Jun 19, 2018, 10:10:43 AM6/19/18
to Dmitry Vyukov, syzbot, syzkaller-bugs, linux-fsdevel, LKML, Al Viro
On 2018/06/19 20:53, Dmitry Vyukov wrote:
> On Tue, Jun 19, 2018 at 1:44 PM, Tetsuo Handa
> <penguin...@i-love.sakura.ne.jp> wrote:
>> This bug report is getting no feedback, but I guess that this bug is in
>> block or mm or locking layer rather than fs layer.
>>
>> NMI backtrace for this bug tends to report that sb_bread() from fill_super()
>> from mount_bdev() is stalling is the cause of keep holding s_umount_key for
>> more than 120 seconds. What is strange is that NMI backtrace for this bug tends
>> to point at rcu_read_lock()/pagecache_get_page()/radix_tree_deref_slot()/
>> rcu_read_unlock() which is expected not to stall.
>>
>> Since CONFIG_RCU_CPU_STALL_TIMEOUT is set to 120 (and actually +5 due to
>> CONFIG_PROVE_RCU=y) which is longer than CONFIG_DEFAULT_HUNG_TASK_TIMEOUT,
>> maybe setting CONFIG_RCU_CPU_STALL_TIMEOUT to smaller values (e.g. 25) can
>> give us some hints...
>
> If an rcu stall is the true root cause of this, then I guess would see
> "rcu stall" bug too. Rcu stall is detected after 120 seconds, but task
> hang after 120-240 seconds. So rcu stall has much higher chances to be
> detected. Do you see the corresponding "rcu stall" bug?

RCU stall is detected after 125 seconds due to CONFIG_PROVE_RCU=y
(e.g. https://syzkaller.appspot.com/bug?id=1fac0fd91219f3f2a03d6fa7deafc95fbed79cc2 ).

I didn't find the corresponding "rcu stall" bug. But it is not required
that one RCU stall takes longer than 120 seconds.

down(); // Will take 120 seconds due to multiple RCU stalls
rcu_read_lock():
do_something();
rcu_read_unlock(): // Took 30 seconds for unknown reason.
rcu_read_lock():
do_something();
rcu_read_unlock(): // Took 30 seconds for unknown reason.
rcu_read_lock():
do_something();
rcu_read_unlock(): // Took 30 seconds for unknown reason.
rcu_read_lock():
do_something();
rcu_read_unlock(): // Took 30 seconds for unknown reason.
up();

Dmitry Vyukov

unread,
Jun 19, 2018, 10:16:05 AM6/19/18
to Tetsuo Handa, syzbot, syzkaller-bugs, linux-fsdevel, LKML, Al Viro
On Tue, Jun 19, 2018 at 4:10 PM, Tetsuo Handa
You think this is another false positive?
Like this one https://github.com/google/syzkaller/issues/516#issuecomment-395685629
?

Tetsuo Handa

unread,
Jun 19, 2018, 9:15:52 PM6/19/18
to Dmitry Vyukov, syzbot, syzkaller-bugs, linux-fsdevel, LKML, Al Viro
According to https://syzkaller.appspot.com/text?tag=CrashLog&x=11db16c4400000 from
"INFO: rcu detected stall in __process_echoes":

[ 859.630022] INFO: rcu_sched self-detected stall on CPU
[ 859.635509] 0-....: (124999 ticks this GP) idle=30e/1/4611686018427387906 softirq=287964/287964 fqs=31234
[ 859.645716] (t=125000 jiffies g=156333 c=156332 q=555)
(...snipped...)
[ 860.266660] ? process_one_work+0x1ba0/0x1ba0
[ 860.271135] ? kthread_bind+0x40/0x40
[ 860.274927] ret_from_fork+0x3a/0x50
[ 861.152252] INFO: task kworker/u4:2:59 blocked for more than 120 seconds.
[ 861.159245] Not tainted 4.18.0-rc1+ #109
[ 861.163851] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

RCU stall was reported immediately before khungtaskd fires. Since syzbot assigns
only 2 CPUs, it might not be rare case that a hung process was unable to run just
because somebody else kept occupying CPU resources.

Well, "BUG: soft lockup in __process_echoes" will be a dup of
"INFO: rcu detected stall in __process_echoes". I wonder why
softlockup detector waited for 135 seconds.

Well, "BUG: soft lockup in shrink_dcache_parent (2)" and
"BUG: soft lockup in snd_virmidi_output_trigger" and
"BUG: soft lockup in smp_call_function_many" and
"BUG: soft lockup in do_raw_spin_unlock (2)" as well waited for 134 seconds
while "BUG: soft lockup in d_walk" waited for only 22 seconds...

Anyway, I think that in some cases RCU stalls/soft lockups are the cause of hung tasks.

syzbot

unread,
Apr 28, 2019, 2:14:07 PM4/28/19
to ax...@kernel.dk, dvy...@google.com, ja...@suse.cz, linux-...@vger.kernel.org, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
syzbot has found a reproducer for the following crash on:

HEAD commit: 037904a2 Merge branch 'x86-urgent-for-linus' of git://git...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=135ff034a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=a42d110b47dd6b36
dashboard link: https://syzkaller.appspot.com/bug?extid=10007d66ca02b08f0e60
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1291b1f4a00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=135385a8a00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+10007d...@syzkaller.appspotmail.com

INFO: task syz-executor274:8097 blocked for more than 143 seconds.
Not tainted 5.1.0-rc6+ #89
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor274 D28008 8097 8041 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2877 [inline]
__schedule+0x813/0x1cc0 kernel/sched/core.c:3518
schedule+0x92/0x180 kernel/sched/core.c:3562
__rwsem_down_read_failed_common kernel/locking/rwsem-xadd.c:285 [inline]
rwsem_down_read_failed+0x213/0x420 kernel/locking/rwsem-xadd.c:302
call_rwsem_down_read_failed+0x18/0x30 arch/x86/lib/rwsem.S:94
__down_read arch/x86/include/asm/rwsem.h:83 [inline]
down_read+0x49/0x90 kernel/locking/rwsem.c:26
__get_super.part.0+0x203/0x2e0 fs/super.c:788
__get_super include/linux/spinlock.h:329 [inline]
get_super+0x2e/0x50 fs/super.c:817
fsync_bdev+0x19/0xd0 fs/block_dev.c:525
invalidate_partition+0x36/0x60 block/genhd.c:1581
drop_partitions block/partition-generic.c:443 [inline]
rescan_partitions+0xef/0xa20 block/partition-generic.c:516
__blkdev_reread_part+0x1a2/0x230 block/ioctl.c:173
blkdev_reread_part+0x27/0x40 block/ioctl.c:193
loop_reread_partitions+0x1c/0x40 drivers/block/loop.c:633
loop_set_status+0xe57/0x1380 drivers/block/loop.c:1296
loop_set_status64+0xc2/0x120 drivers/block/loop.c:1416
lo_ioctl+0x8fc/0x2150 drivers/block/loop.c:1559
__blkdev_driver_ioctl block/ioctl.c:303 [inline]
blkdev_ioctl+0x6f2/0x1d10 block/ioctl.c:605
block_ioctl+0xee/0x130 fs/block_dev.c:1933
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x441937
Code: 48 83 c4 08 48 89 d8 5b 5d c3 66 0f 1f 84 00 00 00 00 00 48 89 e8 48
f7 d8 48 39 c3 0f 92 c0 eb 92 66 90 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff
ff 0f 83 8d 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffcabed4e08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441937
RDX: 00007ffcabed4ea0 RSI: 0000000000004c04 RDI: 0000000000000004
RBP: 0000000000000003 R08: 0000000000000000 R09: 000000000000000a
R10: 0000000000000075 R11: 0000000000000202 R12: 0000000000000001
R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
INFO: task blkid:8099 blocked for more than 143 seconds.
Not tainted 5.1.0-rc6+ #89
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
blkid D27504 8099 8021 0x00000004
Call Trace:
context_switch kernel/sched/core.c:2877 [inline]
__schedule+0x813/0x1cc0 kernel/sched/core.c:3518
schedule+0x92/0x180 kernel/sched/core.c:3562
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:3620
__mutex_lock_common kernel/locking/mutex.c:1002 [inline]
__mutex_lock+0x726/0x1310 kernel/locking/mutex.c:1072
mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1087
blkdev_put+0x34/0x560 fs/block_dev.c:1866
blkdev_close+0x8b/0xb0 fs/block_dev.c:1915
__fput+0x2e5/0x8d0 fs/file_table.c:278
____fput+0x16/0x20 fs/file_table.c:309
task_work_run+0x14a/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:166
prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
do_syscall_64+0x52d/0x610 arch/x86/entry/common.c:293
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f1ae9c432b0
Code: Bad RIP value.
RSP: 002b:00007ffc29ff6028 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f1ae9c432b0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000028 R09: 0000000001680000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000b9d030
R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000005

Showing all locks held in the system:
1 lock held by khungtaskd/1041:
#0: 0000000027887009 (rcu_read_lock){....}, at:
debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5057
1 lock held by rs:main Q:Reg/7921:
#0: 00000000e00580d7 (&rq->lock){-.-.}, at: rq_lock
kernel/sched/sched.h:1168 [inline]
#0: 00000000e00580d7 (&rq->lock){-.-.}, at: __schedule+0x1f8/0x1cc0
kernel/sched/core.c:3456
1 lock held by rsyslogd/7923:
#0: 00000000488dcec4 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110
fs/file.c:801
2 locks held by getty/8014:
#0: 000000001b56f3c3 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000da9faa8c (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8015:
#0: 00000000e00c81bb (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 000000008d689a2e (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8016:
#0: 00000000177f6359 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000096437898 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8017:
#0: 000000002db00e12 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000071f2d88e (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8018:
#0: 00000000a41b6290 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000c340e26f (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8019:
#0: 00000000bca104ce (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 000000007e045212 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/8020:
#0: 0000000070fdafae (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000b4e26fa9 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
1 lock held by syz-executor274/8083:
2 locks held by syz-executor274/8097:
#0: 000000007a5ed526 (&bdev->bd_mutex){+.+.}, at:
blkdev_reread_part+0x1f/0x40 block/ioctl.c:192
#1: 0000000067606e21 (&type->s_umount_key#39){.+.+}, at:
__get_super.part.0+0x203/0x2e0 fs/super.c:788
1 lock held by blkid/8099:
#0: 000000007a5ed526 (&bdev->bd_mutex){+.+.}, at: blkdev_put+0x34/0x560
fs/block_dev.c:1866
2 locks held by syz-executor274/11705:
#0: 000000002b6bbb34 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0xbb/0x810
fs/block_dev.c:1833
#1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_release+0x1f/0x200
drivers/block/loop.c:1755
2 locks held by syz-executor274/11709:
#0: 00000000a45cb906 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0xbb/0x810
fs/block_dev.c:1833
#1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_release+0x1f/0x200
drivers/block/loop.c:1755
2 locks held by syz-executor274/11716:
#0: 00000000a19e2025 (&type->s_umount_key#38/1){+.+.}, at:
alloc_super+0x158/0x890 fs/super.c:228
#1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
drivers/block/loop.c:1514 [inline]
#1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
drivers/block/loop.c:1572
2 locks held by syz-executor274/11717:
#0: 00000000e185c083 (&type->s_umount_key#38/1){+.+.}, at:
alloc_super+0x158/0x890 fs/super.c:228
#1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
drivers/block/loop.c:1514 [inline]
#1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
drivers/block/loop.c:1572
2 locks held by blkid/11718:
#0: 000000003d9e77b2 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0xbb/0x810
fs/block_dev.c:1833
#1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: __loop_clr_fd+0x88/0xd60
drivers/block/loop.c:1046
2 locks held by blkid/11720:
#0: 000000000c0297bc (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0xbb/0x810
fs/block_dev.c:1833
#1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_release+0x1f/0x200
drivers/block/loop.c:1755

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1041 Comm: khungtaskd Not tainted 5.1.0-rc6+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:204 [inline]
watchdog+0x9b7/0xec0 kernel/hung_task.c:288
kthread+0x357/0x430 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Al Viro

unread,
Apr 28, 2019, 2:51:20 PM4/28/19
to syzbot, ax...@kernel.dk, dvy...@google.com, ja...@suse.cz, linux-...@vger.kernel.org, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
ioctl(..., BLKRRPART) blocked on ->s_umount in __get_super().
The trouble is, the only things holding ->s_umount appears to be
these:

> 2 locks held by syz-executor274/11716:
> #0: 00000000a19e2025 (&type->s_umount_key#38/1){+.+.}, at:
> alloc_super+0x158/0x890 fs/super.c:228
> #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> drivers/block/loop.c:1514 [inline]
> #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> drivers/block/loop.c:1572

> 2 locks held by syz-executor274/11717:
> #0: 00000000e185c083 (&type->s_umount_key#38/1){+.+.}, at:
> alloc_super+0x158/0x890 fs/super.c:228
> #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_simple_ioctl
> drivers/block/loop.c:1514 [inline]
> #1: 00000000bde6230e (loop_ctl_mutex){+.+.}, at: lo_ioctl+0x266/0x2150
> drivers/block/loop.c:1572

... and that's bollocks. ->s_umount held there is that on freshly allocated
superblock. It *MUST* be in mount(2); no other syscall should be able to
call alloc_super() in the first place. So what the hell is that doing
trying to call lo_ioctl() inside mount(2)? Something like isofs attempting
cdrom ioctls on the underlying device?

Why do we have loop_func_table->ioctl(), BTW? All in-tree instances are
either NULL or return -EINVAL unconditionally. Considering that the
caller is
err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
we could bloody well just get rid of cryptoloop_ioctl() (the only
non-NULL instance) and get rid of calling lo_simple_ioctl() in
lo_ioctl() switch's default.

Something like this:

diff --git a/drivers/block/cryptoloop.c b/drivers/block/cryptoloop.c
index 254ee7d54e91..f16468a562f5 100644
--- a/drivers/block/cryptoloop.c
+++ b/drivers/block/cryptoloop.c
@@ -167,12 +167,6 @@ cryptoloop_transfer(struct loop_device *lo, int cmd,
}

static int
-cryptoloop_ioctl(struct loop_device *lo, int cmd, unsigned long arg)
-{
- return -EINVAL;
-}
-
-static int
cryptoloop_release(struct loop_device *lo)
{
struct crypto_sync_skcipher *tfm = lo->key_data;
@@ -188,7 +182,6 @@ cryptoloop_release(struct loop_device *lo)
static struct loop_func_table cryptoloop_funcs = {
.number = LO_CRYPT_CRYPTOAPI,
.init = cryptoloop_init,
- .ioctl = cryptoloop_ioctl,
.transfer = cryptoloop_transfer,
.release = cryptoloop_release,
.owner = THIS_MODULE
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index bf1c61cab8eb..2ec162b80562 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -955,7 +955,6 @@ static int loop_set_fd(struct loop_device *lo, fmode_t mode,
lo->lo_flags = lo_flags;
lo->lo_backing_file = file;
lo->transfer = NULL;
- lo->ioctl = NULL;
lo->lo_sizelimit = 0;
lo->old_gfp_mask = mapping_gfp_mask(mapping);
mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
@@ -1064,7 +1063,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)

loop_release_xfer(lo);
lo->transfer = NULL;
- lo->ioctl = NULL;
lo->lo_device = NULL;
lo->lo_encryption = NULL;
lo->lo_offset = 0;
@@ -1262,7 +1260,6 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
if (!xfer)
xfer = &none_funcs;
lo->transfer = xfer->transfer;
- lo->ioctl = xfer->ioctl;

if ((lo->lo_flags & LO_FLAGS_AUTOCLEAR) !=
(info->lo_flags & LO_FLAGS_AUTOCLEAR))
@@ -1525,7 +1522,7 @@ static int lo_simple_ioctl(struct loop_device *lo, unsigned int cmd,
err = loop_set_block_size(lo, arg);
break;
default:
- err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
+ err = -EINVAL;
}
mutex_unlock(&loop_ctl_mutex);
return err;
@@ -1567,10 +1564,9 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode,
case LOOP_SET_BLOCK_SIZE:
if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
return -EPERM;
- /* Fall through */
+ return lo_simple_ioctl(lo, cmd, arg);
default:
- err = lo_simple_ioctl(lo, cmd, arg);
- break;
+ return -EINVAL;
}

return err;
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index af75a5ee4094..56a9a0c161d7 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -84,7 +84,6 @@ struct loop_func_table {
int (*init)(struct loop_device *, const struct loop_info64 *);
/* release is called from loop_unregister_transfer or clr_fd */
int (*release)(struct loop_device *);
- int (*ioctl)(struct loop_device *, int cmd, unsigned long arg);
struct module *owner;
};

Tetsuo Handa

unread,
Apr 28, 2019, 9:38:49 PM4/28/19
to Al Viro, syzbot, ax...@kernel.dk, dvy...@google.com, ja...@suse.cz, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 2019/04/29 3:51, Al Viro wrote:
> ioctl(..., BLKRRPART) blocked on ->s_umount in __get_super().
> The trouble is, the only things holding ->s_umount appears to be
> these:

Not always true. lockdep_print_held_locks() from debug_show_all_locks() can not
report locks held by TASK_RUNNING threads. Due to enabling CONFIG_PRINTK_CALLER=y,
the output from trigger_all_cpu_backtrace() is no longer included into the report
file (i.e. premature truncation) because NMI backtrace is printed from a different
printk() context. If we check the console output, we can understand that

>> 1 lock held by syz-executor274/8083:

was doing mount(2). Since there is a possibility that that thread was looping for
many seconds enough to trigger khungtaskd warnings, we can't tell whether this is
a locking dependency problem.

----------------------------------------
[ 1107.252933][ T1041] NMI backtrace for cpu 0
[ 1107.257402][ T1041] CPU: 0 PID: 1041 Comm: khungtaskd Not tainted 5.1.0-rc6+ #89
[ 1107.264960][ T1041] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 1107.275380][ T1041] Call Trace:
[ 1107.278691][ T1041] dump_stack+0x172/0x1f0
[ 1107.283216][ T1041] nmi_cpu_backtrace.cold+0x63/0xa4
[ 1107.288469][ T1041] ? lapic_can_unplug_cpu.cold+0x38/0x38
[ 1107.294155][ T1041] nmi_trigger_cpumask_backtrace+0x1be/0x236
[ 1107.300256][ T1041] arch_trigger_cpumask_backtrace+0x14/0x20
[ 1107.306174][ T1041] watchdog+0x9b7/0xec0
[ 1107.310362][ T1041] kthread+0x357/0x430
[ 1107.314446][ T1041] ? reset_hung_task_detector+0x30/0x30
[ 1107.320016][ T1041] ? kthread_cancel_delayed_work_sync+0x20/0x20
[ 1107.326280][ T1041] ret_from_fork+0x3a/0x50
[ 1107.331403][ T1041] Sending NMI from CPU 0 to CPUs 1:
[ 1107.337617][ C1] NMI backtrace for cpu 1
[ 1107.337625][ C1] CPU: 1 PID: 8083 Comm: syz-executor274 Not tainted 5.1.0-rc6+ #89
[ 1107.337631][ C1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 1107.337636][ C1] RIP: 0010:debug_lockdep_rcu_enabled.part.0+0xb/0x60
[ 1107.337648][ C1] Code: 5b 5d c3 e8 67 71 e5 ff 0f 1f 80 00 00 00 00 55 48 89 e5 e8 37 ff ff ff 5d c3 0f 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 55 <48> 89 e5 53 65 48 8b 1c 25 00 ee 01 00 48 8d bb 7c 08 00 00 48 89
[ 1107.337652][ C1] RSP: 0018:ffff8880a85274c8 EFLAGS: 00000202
[ 1107.337661][ C1] RAX: dffffc0000000000 RBX: ffff8880a85275d8 RCX: 1ffffffff12bcd63
[ 1107.337666][ C1] RDX: 0000000000000000 RSI: ffffffff870d8f3c RDI: ffff8880a85275e0
[ 1107.337672][ C1] RBP: ffff8880a85274d8 R08: ffff888081e68540 R09: ffffed1015d25bc8
[ 1107.337677][ C1] R10: ffffed1015d25bc7 R11: ffff8880ae92de3b R12: 0000000000000000
[ 1107.337683][ C1] R13: ffff8880a694d640 R14: ffff88809541b942 R15: 0000000000000006
[ 1107.337689][ C1] FS: 0000000000e0b880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
[ 1107.337693][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1107.337699][ C1] CR2: ffffffffff600400 CR3: 0000000092d6f000 CR4: 00000000001406e0
[ 1107.337704][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1107.337710][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1107.337713][ C1] Call Trace:
[ 1107.337717][ C1] ? debug_lockdep_rcu_enabled+0x71/0xa0
[ 1107.337721][ C1] xas_descend+0xbf/0x370
[ 1107.337724][ C1] xas_load+0xef/0x150
[ 1107.337728][ C1] find_get_entry+0x13d/0x880
[ 1107.337733][ C1] ? find_get_entries_tag+0xc10/0xc10
[ 1107.337736][ C1] ? mark_held_locks+0xa4/0xf0
[ 1107.337741][ C1] ? pagecache_get_page+0x1a8/0x740
[ 1107.337745][ C1] pagecache_get_page+0x4c/0x740
[ 1107.337749][ C1] __getblk_gfp+0x27e/0x970
[ 1107.337752][ C1] __bread_gfp+0x2f/0x300
[ 1107.337756][ C1] udf_tread+0xf1/0x140
[ 1107.337760][ C1] udf_read_tagged+0x50/0x530
[ 1107.337764][ C1] udf_check_anchor_block+0x1ef/0x680
[ 1107.337768][ C1] ? blkpg_ioctl+0xa90/0xa90
[ 1107.337772][ C1] ? udf_process_sequence+0x35d0/0x35d0
[ 1107.337776][ C1] ? submit_bio+0xba/0x480
[ 1107.337780][ C1] udf_scan_anchors+0x3f4/0x680
[ 1107.337784][ C1] ? udf_check_anchor_block+0x680/0x680
[ 1107.337789][ C1] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 1107.337793][ C1] ? udf_get_last_session+0x120/0x120
[ 1107.337797][ C1] udf_load_vrs+0x67f/0xc80
[ 1107.337801][ C1] ? udf_scan_anchors+0x680/0x680
[ 1107.337805][ C1] ? udf_bread+0x260/0x260
[ 1107.337809][ C1] ? lockdep_init_map+0x1be/0x6d0
[ 1107.337813][ C1] udf_fill_super+0x7d8/0x16d1
[ 1107.337817][ C1] ? udf_load_vrs+0xc80/0xc80
[ 1107.337820][ C1] ? vsprintf+0x40/0x40
[ 1107.337824][ C1] ? set_blocksize+0x2bf/0x340
[ 1107.337829][ C1] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 1107.337833][ C1] mount_bdev+0x307/0x3c0
[ 1107.337837][ C1] ? udf_load_vrs+0xc80/0xc80
[ 1107.337840][ C1] udf_mount+0x35/0x40
[ 1107.337844][ C1] ? udf_get_pblock_meta25+0x3a0/0x3a0
[ 1107.337848][ C1] legacy_get_tree+0xf2/0x200
[ 1107.337853][ C1] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 1107.337857][ C1] vfs_get_tree+0x123/0x450
[ 1107.337860][ C1] do_mount+0x1436/0x2c40
[ 1107.337864][ C1] ? copy_mount_string+0x40/0x40
[ 1107.337868][ C1] ? _copy_from_user+0xdd/0x150
[ 1107.337873][ C1] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 1107.337877][ C1] ? copy_mount_options+0x280/0x3a0
[ 1107.337881][ C1] ksys_mount+0xdb/0x150
[ 1107.337885][ C1] __x64_sys_mount+0xbe/0x150
[ 1107.337889][ C1] do_syscall_64+0x103/0x610
[ 1107.337893][ C1] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1107.337897][ C1] RIP: 0033:0x441a49
[ 1107.337909][ C1] Code: ad 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
[ 1107.337913][ C1] RSP: 002b:00007ffcabed5048 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1107.337923][ C1] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441a49
[ 1107.337929][ C1] RDX: 0000000020000140 RSI: 0000000020000080 RDI: 0000000020000000
[ 1107.337934][ C1] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1107.337940][ C1] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[ 1107.337946][ C1] R13: 00000000004026e0 R14: 0000000000000000 R15: 0000000000000000
[ 1107.339387][ T1041] Kernel panic - not syncing: hung_task: blocked tasks
[ 1107.787589][ T1041] CPU: 0 PID: 1041 Comm: khungtaskd Not tainted 5.1.0-rc6+ #89
[ 1107.795319][ T1041] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 1107.806082][ T1041] Call Trace:
[ 1107.809403][ T1041] dump_stack+0x172/0x1f0
[ 1107.813776][ T1041] panic+0x2cb/0x65c
[ 1107.817695][ T1041] ? __warn_printk+0xf3/0xf3
[ 1107.822391][ T1041] ? lapic_can_unplug_cpu.cold+0x38/0x38
[ 1107.828236][ T1041] ? ___preempt_schedule+0x16/0x18
[ 1107.833386][ T1041] ? nmi_trigger_cpumask_backtrace+0x19e/0x236
[ 1107.839739][ T1041] ? nmi_trigger_cpumask_backtrace+0x1fa/0x236
[ 1107.846024][ T1041] ? nmi_trigger_cpumask_backtrace+0x204/0x236
[ 1107.852196][ T1041] ? nmi_trigger_cpumask_backtrace+0x19e/0x236
[ 1107.858482][ T1041] watchdog+0x9c8/0xec0
[ 1107.862858][ T1041] kthread+0x357/0x430
[ 1107.866943][ T1041] ? reset_hung_task_detector+0x30/0x30
[ 1107.872522][ T1041] ? kthread_cancel_delayed_work_sync+0x20/0x20
[ 1107.878799][ T1041] ret_from_fork+0x3a/0x50
[ 1107.884924][ T1041] Kernel Offset: disabled
[ 1107.889301][ T1041] Rebooting in 86400 seconds..
----------------------------------------

I don't know whether "it is not safe to print locks held by TASK_RUNNING threads"
remains true. But since a thread's state can change at any moment, there is no
guarantee that only locks held by !TASK_RUNNING threads will be printed by
debug_show_all_locks(), I guess that allow printing all locks at their own risk
using some kernel config is fine...

Also, we could replace trigger_all_cpu_backtrace() with a new function which scans
all threads and dumps threads with ->on_cpu == 1 so that the output comes from
the same printk() context.

Dmitry Vyukov

unread,
Apr 29, 2019, 1:30:16 AM4/29/19
to Al Viro, syzbot, Jens Axboe, Jan Kara, linux-fsdevel, LKML, Tetsuo Handa, syzkaller-bugs, Peter Zijlstra
How useful would it be to see full stacks in such lockdep reports?
Now that we have lib/stackdepot.c that is capable of memorizing large
number of stacks and converting them to a single u32, we could use it
in more debug facilities. I remember +Peter mentioned some problems
with interrupts/reenterancy of stackdepot, but I hope it's resolvable
(at least in some conservative way as we already call stackdepot from
interrupts).
I think ODEBUG facility have the same problem of showing only single
PC in reports for a past stack.
Should I file an issue for this?

syzbot

unread,
Apr 29, 2019, 9:10:02 AM4/29/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
no output from test machine



Tested on:

commit: 37624b58 Linux 5.1-rc7
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17ff10dca00000
kernel config: https://syzkaller.appspot.com/x/.config?x=ef1b87b455c397cf
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=101785a8a00000

Tetsuo Handa

unread,
Apr 29, 2019, 9:18:57 AM4/29/19
to Dmitry Vyukov, syzkall...@googlegroups.com
Hello.

The test should have been repeated until at least uptime > 143 seconds.
Why prematurely judged as "no output" as of uptime = 64.772989 seconds ?
Just a temporary infra problem?

Dmitry Vyukov

unread,
Apr 29, 2019, 9:32:02 AM4/29/19
to Tetsuo Handa, syzkaller-bugs
On Mon, Apr 29, 2019 at 3:18 PM Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
>
> Hello.
>
> The test should have been repeated until at least uptime > 143 seconds.
> Why prematurely judged as "no output" as of uptime = 64.772989 seconds ?
> Just a temporary infra problem?

It should have been run until uptime 363 seconds. It's just there were
literally no output. 64 is just the last time when kernel printed
something, not when the "no output" condition was detected.

syzbot

unread,
Apr 29, 2019, 6:41:01 PM4/29/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
no output from test machine



Tested on:

commit: 83a50840 Merge tag 'seccomp-v5.1-rc8' of git://git.kernel...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=129340ff200000
kernel config: https://syzkaller.appspot.com/x/.config?x=ef1b87b455c397cf
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=10f00cdca00000

Tetsuo Handa

unread,
Apr 29, 2019, 7:03:26 PM4/29/19
to Dmitry Vyukov, syzkaller-bugs
On 2019/04/29 22:31, Dmitry Vyukov wrote:
> On Mon, Apr 29, 2019 at 3:18 PM Tetsuo Handa
> <penguin...@i-love.sakura.ne.jp> wrote:
>>
>> Hello.
>>
>> The test should have been repeated until at least uptime > 143 seconds.
>> Why prematurely judged as "no output" as of uptime = 64.772989 seconds ?
>> Just a temporary infra problem?
>
> It should have been run until uptime 363 seconds. It's just there were
> literally no output. 64 is just the last time when kernel printed
> something, not when the "no output" condition was detected.

Indeed, with ping printk(), it did run until uptime 369 seconds.
https://syzkaller.appspot.com/x/log.txt?x=129340ff200000

2019/04/29 22:35:33 parsed 1 programs
2019/04/29 22:35:34 executed programs: 0
2019/04/29 22:35:39 executed programs: 12

Something made the reproducer unable to repeat... Is it possible to
make syzbot try "echo t > /proc/sysrq-trigger" before giving up?
Maybe we could do it from kernel side if we track last printk() time...

Jan Kara

unread,
Apr 29, 2019, 10:55:06 PM4/29/19
to Al Viro, syzbot, ax...@kernel.dk, dvy...@google.com, ja...@suse.cz, linux-...@vger.kernel.org, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Actually UDF also calls CDROMMULTISESSION ioctl during mount. So I could
see how we get to lo_simple_ioctl() and indeed that would acquire
loop_ctl_mutex under s_umount which is the other way around than in
BLKRRPART ioctl.

> Why do we have loop_func_table->ioctl(), BTW? All in-tree instances are
> either NULL or return -EINVAL unconditionally. Considering that the
> caller is
> err = lo->ioctl ? lo->ioctl(lo, cmd, arg) : -EINVAL;
> we could bloody well just get rid of cryptoloop_ioctl() (the only
> non-NULL instance) and get rid of calling lo_simple_ioctl() in
> lo_ioctl() switch's default.

Yeah, you're right. And if we push the patch a bit further to not take
loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
can send a fix.

Honza
--
Jan Kara <ja...@suse.com>
SUSE Labs, CR

Al Viro

unread,
Apr 29, 2019, 11:11:51 PM4/29/19
to Jan Kara, syzbot, ax...@kernel.dk, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
On Tue, Apr 30, 2019 at 04:55:01AM +0200, Jan Kara wrote:

> Yeah, you're right. And if we push the patch a bit further to not take
> loop_ctl_mutex for invalid ioctl number, that would fix the problem. I
> can send a fix.

Huh? We don't take it until in lo_simple_ioctl(), and that patch doesn't
get to its call on invalid ioctl numbers. What am I missing here?

syzbot

unread,
Apr 30, 2019, 6:35:01 AM4/30/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
no output from test machine



Tested on:

commit: 3d17a1de Add linux-next specific files for 20190429
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=16035198a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=a5de954500ed36f7
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=115e7b60a00000

Jan Kara

unread,
Apr 30, 2019, 9:07:44 AM4/30/19
to Al Viro, Jan Kara, syzbot, ax...@kernel.dk, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Doesn't it? blkdev_ioctl() calls into __blkdev_driver_ioctl() for
unrecognized ioctl numbers. That calls into lo_ioctl() in case of a loop
device. lo_ioctl() calls into lo_simple_ioctl() for ioctl numbers it
doesn't recognize and lo_simple_ioctl() will lock loop_ctl_mutex as you
say.

Honza

Al Viro

unread,
Apr 30, 2019, 9:18:30 AM4/30/19
to Jan Kara, syzbot, ax...@kernel.dk, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Not with the patch upthread. lo_ioctl() part was

@@ -1567,10 +1564,9 @@ static int lo_ioctl(struct block_device *bdev, fmode_t mode,
case LOOP_SET_BLOCK_SIZE:
if (!(mode & FMODE_WRITE) && !capable(CAP_SYS_ADMIN))
return -EPERM;
- /* Fall through */
+ return lo_simple_ioctl(lo, cmd, arg);
default:
- err = lo_simple_ioctl(lo, cmd, arg);
- break;
+ return -EINVAL;
}

return err;

so anything unrecognized doesn't make it to lo_simple_ioctl() at all.

Jan Kara

unread,
Apr 30, 2019, 11:07:58 AM4/30/19
to Al Viro, Jan Kara, syzbot, ax...@kernel.dk, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Ah, right. I've missed that in your patch. So your patch should be really
fixing the problem. Will you post it officially? Thanks!

Tetsuo Handa

unread,
Apr 30, 2019, 11:35:10 AM4/30/19
to Jan Kara, Al Viro, syzbot, ax...@kernel.dk, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 2019/05/01 0:07, Jan Kara wrote:
> Ah, right. I've missed that in your patch. So your patch should be really
> fixing the problem. Will you post it officially? Thanks!

I still cannot understand what the problem is.
According to console output,

----------
INFO: task syz-executor274:8097 blocked for more than 143 seconds.
INFO: task blkid:8099 blocked for more than 143 seconds.

1 lock held by syz-executor274/8083:
2 locks held by syz-executor274/8097:
#0: 000000007a5ed526 (&bdev->bd_mutex){+.+.}, at: blkdev_reread_part+0x1f/0x40 block/ioctl.c:192
#1: 0000000067606e21 (&type->s_umount_key#39){.+.+}, at: __get_super.part.0+0x203/0x2e0 fs/super.c:788
1 lock held by blkid/8099:
#0: 000000007a5ed526 (&bdev->bd_mutex){+.+.}, at: blkdev_put+0x34/0x560 fs/block_dev.c:1866
----------

8099 was blocked for too long waiting for 000000007a5ed526 held by 8097.
8097 was blocked for too long waiting for 0000000067606e21 held by somebody.
Since there is nobody else holding 0000000067606e21,
I guessed that the "somebody" which is holding 0000000067606e21 is 8083.

----------
----------

8083 is doing mount(2) but is not holding 00000000bde6230e (loop_ctl_mutex).
I guessed that something went wrong with 8083 inside __getblk_gfp().
How can loop_ctl_mutex be relevant to this problem?

Tetsuo Handa

unread,
May 3, 2019, 6:29:57 AM5/3/19
to Jan Kara, Al Viro, syzbot, ax...@kernel.dk, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 2019/05/01 0:34, Tetsuo Handa wrote:
> I still cannot understand what the problem is.
(...snipped...)
> I guessed that something went wrong with 8083 inside __getblk_gfp().
> How can loop_ctl_mutex be relevant to this problem?
>

syzbot got similar NMI backtrace. No loop_ctl_mutex is involved.

INFO: task hung in mount_bdev (2)
https://syzkaller.appspot.com/bug?id=d9b9fa1428ff2466de64fc85256e769f516cea58

Dmitry Vyukov

unread,
May 8, 2019, 7:16:44 AM5/8/19
to Tetsuo Handa, syzkaller-bugs, syzkaller
From: Tetsuo Handa <penguin...@i-love.sakura.ne.jp>
Date: Tue, Apr 30, 2019 at 1:03 AM
To: Dmitry Vyukov
Cc: syzkaller-bugs
Will it help to debug any bugs? What percent of the bugs? I am not
sure it will help to debug even this one.
You may try to submit a patch that dumps all task after 4 minutes or so.

Tetsuo Handa

unread,
May 8, 2019, 9:18:04 AM5/8/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller
My concern is not number / percentage of bugs. My concern is how to collect
information for guessing where the problem is when unexpected silence occurred.

Like explained at https://akari.osdn.jp/capturing-kernel-messages.html ,
it is important to know whether situation changed over time. Roughly speaking,
my interested information is the output of /usr/bin/top (in order to know
which thread was busy or sleeping) and backtrace of such threads. But since
we can't count on userspace tools for getting information, triggering SysRq
just before giving up is a shorthand alternative for such purpose.
(Although /usr/bin/top might be already available in the image files,
currently the output stream would not allow intermixing the output of
/usr/bin/top because the console output is either kernel messages or
syzkaller messages.)

Dmitry Vyukov

unread,
May 8, 2019, 9:37:36 AM5/8/19
to Tetsuo Handa, syzkaller-bugs, syzkaller
From: Tetsuo Handa <penguin...@i-love.sakura.ne.jp>
Date: Wed, May 8, 2019 at 3:18 PM
To: Dmitry Vyukov
Cc: syzkaller-bugs, syzkaller
But if it's not useful for debugging of any bugs, we should not add
it.Every feature should cover some sensible percent of bugs, otherwise
we will just continue adding hundreds of such features every month and
they won't be helpful and soon they will start being harmful because
there will be too much output. There are some commonly useful info,
and there is infinite tail of custom debugging relevant to a single
bugs (or maybe not relevant even for a single bug, e.g. "we tried this
but it did not help to debug the bug"). For the infinite tail of
one-off things, it's more reasonable either to debug locally or to
submit custom patches for testing.

What reproducer did you use to get
https://syzkaller.appspot.com/x/log.txt?x=129340ff200000 ?
It would be useful to understand what happens there and make kernel
self-diagnose this and give the relevant info. Kernel not
self-diagnosing the problem seems to be the root cause of the hard
debugging. Improving kernel would benefit all kernel testing and
users, rather then just syzbot.
If the kernel is not healthy it may also not be possible to send a
sysrq from the userspace. A kernel watchdog for the bad condition
would be more reliable.

Dmitry Vyukov

unread,
May 8, 2019, 9:51:13 AM5/8/19
to Tetsuo Handa, syzkaller-bugs, syzkaller
From: Dmitry Vyukov <dvy...@google.com>
Date: Wed, May 8, 2019 at 3:37 PM
To: Tetsuo Handa
Cc: syzkaller-bugs, syzkaller
Is it possible to send sysrq's over console?
The part running on the test machine does not do crash detection and
does not know that the machine underneath is bad (generally not
possible). A host machine detects crashes, but then the question how
it can interact with the test machine when it's suspected to be bad.

Tetsuo Handa

unread,
May 8, 2019, 10:54:59 AM5/8/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller
On 2019/05/08 22:51, Dmitry Vyukov wrote:
> What reproducer did you use to get
> https://syzkaller.appspot.com/x/log.txt?x=129340ff200000 ?

I guess the reproducer syzbot has found. That is,
https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000
if "#syz test:" uses C reproducer and
https://syzkaller.appspot.com/text?tag=ReproSyz&x=1291b1f4a00000
if "#syz test:" uses syz reproducer.

Initially I tried "#syz test:" using
https://syzkaller.appspot.com/x/patch.diff?x=101785a8a00000 in order to test whether
https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp
can improve the output. But that "#syz test:" request failed to trigger khungtaskd
messages. Since the last printk() was so early, I retried "#syz test:" using
https://syzkaller.appspot.com/x/patch.diff?x=10f00cdca00000 in order to confirm
that the "#syz test:" request gave up after actually waiting for 300 seconds.

> Is it possible to send sysrq's over console?

Yes. Executing "echo t > /proc/sysrq-trigger" from shell will return
SysRq-t output via printk(). That's why I want syzbot to try it when
syzbot gives up due to "unexpected silence".

> The part running on the test machine does not do crash detection and
> does not know that the machine underneath is bad (generally not
> possible). A host machine detects crashes, but then the question how
> it can interact with the test machine when it's suspected to be bad.

Since "printk: kernel is alive." was printed, the kernel was healthy and
console loglevel was not changed. Thus, SysRq-t etc. would tell us why
these "#syz test:" request are resulting in "no output from test machine"
(I mean, "unexpected silence") instead of trigger khungtaskd messages.

Dmitry Vyukov

unread,
May 8, 2019, 11:24:01 AM5/8/19
to Tetsuo Handa, syzkaller-bugs, syzkaller
From: Tetsuo Handa <penguin...@i-love.sakura.ne.jp>
Date: Wed, May 8, 2019 at 4:54 PM
To: Dmitry Vyukov
Cc: syzkaller-bugs, syzkaller

> On 2019/05/08 22:51, Dmitry Vyukov wrote:
> > What reproducer did you use to get
> > https://syzkaller.appspot.com/x/log.txt?x=129340ff200000 ?
>
> I guess the reproducer syzbot has found. That is,
> https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000
> if "#syz test:" uses C reproducer and
> https://syzkaller.appspot.com/text?tag=ReproSyz&x=1291b1f4a00000
> if "#syz test:" uses syz reproducer.


That bug is reliably detected as "task hung":
https://syzkaller.appspot.com/bug?extid=de8b966f09b354eef8dd
I've tried locally with the same result.
There is a proper kernel bug report that already dumps all tasks.
If it did not produce a task list as a one-off flake, one may just
re-run testing to get the task list.



> Initially I tried "#syz test:" using
> https://syzkaller.appspot.com/x/patch.diff?x=101785a8a00000 in order to test whether
> https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp
> can improve the output. But that "#syz test:" request failed to trigger khungtaskd
> messages. Since the last printk() was so early, I retried "#syz test:" using
> https://syzkaller.appspot.com/x/patch.diff?x=10f00cdca00000 in order to confirm
> that the "#syz test:" request gave up after actually waiting for 300 seconds.
>
> > Is it possible to send sysrq's over console?
>
> Yes. Executing "echo t > /proc/sysrq-trigger" from shell will return
> SysRq-t output via printk(). That's why I want syzbot to try it when
> syzbot gives up due to "unexpected silence".
>
> > The part running on the test machine does not do crash detection and
> > does not know that the machine underneath is bad (generally not
> > possible). A host machine detects crashes, but then the question how
> > it can interact with the test machine when it's suspected to be bad.
>
> Since "printk: kernel is alive." was printed, the kernel was healthy and
> console loglevel was not changed. Thus, SysRq-t etc. would tell us why
> these "#syz test:" request are resulting in "no output from test machine"
> (I mean, "unexpected silence") instead of trigger khungtaskd messages.


I mean literally over console connection without involving shell and
start of multiple new processes. That may not work. The kernel wasn't
healthy, even if it was able to print something. There is hang task.

Tetsuo Handa

unread,
May 8, 2019, 1:00:13 PM5/8/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller
On 2019/05/09 0:23, Dmitry Vyukov wrote:
> From: Tetsuo Handa <penguin...@i-love.sakura.ne.jp>
> Date: Wed, May 8, 2019 at 4:54 PM
> To: Dmitry Vyukov
> Cc: syzkaller-bugs, syzkaller
>
>> On 2019/05/08 22:51, Dmitry Vyukov wrote:
>>> What reproducer did you use to get
>>> https://syzkaller.appspot.com/x/log.txt?x=129340ff200000 ?
>>
>> I guess the reproducer syzbot has found. That is,
>> https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000
>> if "#syz test:" uses C reproducer and
>> https://syzkaller.appspot.com/text?tag=ReproSyz&x=1291b1f4a00000
>> if "#syz test:" uses syz reproducer.
>
>
> That bug is reliably detected as "task hung":
> https://syzkaller.appspot.com/bug?extid=de8b966f09b354eef8dd

But "#syz test:" failed to detect as "task hung" two times out of two trials.
The reproducer for this bug which "#syz test:" is using seems unreliable.

> I've tried locally with the same result.
> There is a proper kernel bug report that already dumps all tasks.

Like mentioned at https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp ,
one task is missing in the report file. That task is only in the console
output file where developers won't look at.

> If it did not produce a task list as a one-off flake, one may just
> re-run testing to get the task list.

OK, I will try "#syz test:" using
https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp
for several more times.





> I mean literally over console connection without involving shell and
> start of multiple new processes. That may not work. The kernel wasn't
> healthy, even if it was able to print something. There is hang task.

Why can't we add a process which does

int fd = open("/proc/sysrq-trigger", O_WRONLY);
wait_for_trigger();
write(fd, "t", 1);
close(fd);

which was launched from ssh connection? "may not work" can't be a reason.
We can give up if connection was unexpectedly lost. But unless connection
was unexpectedly lost, we can try to tell the already started process
which is sleeping at wait_for_trigger() to resume. If the kernel was not
healthy (e.g. already panic()ed, console loglevel was set to
CONSOLE_LOGLEVEL_SILENT, all CPUs are busy looping inside kernel),
we just don't see the SysRq output. We just fail to get hints when
"unexpected silence" happened; that's tolerable.

... There is no ssh connection before starting the fuzzing test!?
Then, how can the process which starts fuzzing tests be launched?

syzbot

unread,
May 9, 2019, 9:43:01 AM5/9/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
no output from test machine



Tested on:

commit: ea986679 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=137431f0a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=2bd0da4b8de0b004
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=12a8153ca00000

syzbot

unread,
May 9, 2019, 10:09:01 AM5/9/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
no output from test machine



Tested on:

commit: ea986679 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=12922222a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=2bd0da4b8de0b004
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=17337aaca00000

syzbot

unread,
May 9, 2019, 10:34:01 AM5/9/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
no output from test machine



Tested on:

commit: ea986679 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=159e41f0a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=2bd0da4b8de0b004
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=1410b5d0a00000

Dmitry Vyukov

unread,
May 9, 2019, 10:53:55 AM5/9/19
to Tetsuo Handa, syzkaller-bugs, syzkaller, syzbot
> On 2019/05/09 0:23, Dmitry Vyukov wrote:
> > From: Tetsuo Handa <penguin...@i-love.sakura.ne.jp>
> > Date: Wed, May 8, 2019 at 4:54 PM
> > To: Dmitry Vyukov
> > Cc: syzkaller-bugs, syzkaller
> >
> >> On 2019/05/08 22:51, Dmitry Vyukov wrote:
> >>> What reproducer did you use to get
> >>> https://syzkaller.appspot.com/x/log.txt?x=129340ff200000 ?
> >>
> >> I guess the reproducer syzbot has found. That is,
> >> https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000
> >> if "#syz test:" uses C reproducer and
> >> https://syzkaller.appspot.com/text?tag=ReproSyz&x=1291b1f4a00000
> >> if "#syz test:" uses syz reproducer.
> >
> >
> > That bug is reliably detected as "task hung":
> > https://syzkaller.appspot.com/bug?extid=de8b966f09b354eef8dd
>
> But "#syz test:" failed to detect as "task hung" two times out of two trials.
> The reproducer for this bug which "#syz test:" is using seems unreliable.
>
> > I've tried locally with the same result.
> > There is a proper kernel bug report that already dumps all tasks.
>
> Like mentioned at https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp ,
> one task is missing in the report file. That task is only in the console
> output file where developers won't look at.

Sorry I am spread across too many things. What exactly stack is that?
That email just says "the culprit thread" which saying what exactly
stack is that.
Please also point to the dashboard bug entry, so that I see what's in
the report and can contrast log and the report.
Maybe we should fix syzkaller to include that stack then.


> > If it did not produce a task list as a one-off flake, one may just
> > re-run testing to get the task list.
>
> OK, I will try "#syz test:" using
> https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp
> for several more times.

Let's try on the original commit too:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
037904a2



> > I mean literally over console connection without involving shell and
> > start of multiple new processes. That may not work. The kernel wasn't
> > healthy, even if it was able to print something. There is hang task.
>
> Why can't we add a process which does
>
> int fd = open("/proc/sysrq-trigger", O_WRONLY);
> wait_for_trigger();
> write(fd, "t", 1);
> close(fd);
>
> which was launched from ssh connection? "may not work" can't be a reason.
> We can give up if connection was unexpectedly lost. But unless connection
> was unexpectedly lost, we can try to tell the already started process
> which is sleeping at wait_for_trigger() to resume. If the kernel was not
> healthy (e.g. already panic()ed, console loglevel was set to
> CONSOLE_LOGLEVEL_SILENT, all CPUs are busy looping inside kernel),
> we just don't see the SysRq output. We just fail to get hints when
> "unexpected silence" happened; that's tolerable.
>
> ... There is no ssh connection before starting the fuzzing test!?
> Then, how can the process which starts fuzzing tests be launched?

Well, I need to say that one of the reasons is that I don't have time
to carefully think it all out and implement it right now. There is
large difference between saying that something needs to be done and
actually designing it, thinking out all corner cases, carefully
implementing, testing, porting. If somebody sends a pull request with
a prototype, it would be easier to assess the change. What exactly
process does this? How the signal is sent? How does this affect
reproduction? bisection? patch testing? When exactly do we send this
signal? How do we understand that it reacted? Do we have some timeout
for reaction? How long?

syzbot

unread,
May 9, 2019, 11:11:01 AM5/9/19
to dvy...@google.com, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com, syzk...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
no output from test machine



Tested on:

commit: 037904a2 Merge branch 'x86-urgent-for-linus' of git://git...
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=116a9644a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=a42d110b47dd6b36

syzbot

unread,
May 9, 2019, 11:26:01 AM5/9/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
INFO: task hung in grab_super

INFO: task syz-executor.2:8325 blocked for more than 143 seconds.
Not tainted 5.1.0+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor.2 D29408 8325 8052 0x00004004
Call Trace:
context_switch kernel/sched/core.c:2817 [inline]
__schedule+0x7d1/0x1580 kernel/sched/core.c:3444
schedule+0xa8/0x260 kernel/sched/core.c:3508
__rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:585 [inline]
rwsem_down_write_failed+0x7a1/0xc60 kernel/locking/rwsem-xadd.c:617
__down_write kernel/locking/rwsem.h:227 [inline]
down_write+0x93/0xa0 kernel/locking/rwsem.c:68
grab_super+0xb4/0x290 fs/super.c:385
sget_userns+0x1ab/0x560 fs/super.c:601
sget+0x10c/0x150 fs/super.c:660
mount_bdev+0xff/0x3c0 fs/super.c:1319
udf_mount+0x35/0x40 fs/udf/super.c:131
legacy_get_tree+0x10a/0x220 fs/fs_context.c:665
vfs_get_tree+0x93/0x3a0 fs/super.c:1476
do_new_mount fs/namespace.c:2790 [inline]
do_mount+0x138c/0x1c00 fs/namespace.c:3110
ksys_mount+0xdb/0x150 fs/namespace.c:3319
__do_sys_mount fs/namespace.c:3333 [inline]
__se_sys_mount fs/namespace.c:3330 [inline]
__x64_sys_mount+0xbe/0x150 fs/namespace.c:3330
do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458da9
Code: Bad RIP value.
RSP: 002b:00007f561a76dc78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000458da9
RDX: 0000000020000140 RSI: 0000000020000080 RDI: 0000000020000000
RBP: 000000000073c0e0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f561a76e6d4
R13: 00000000004c4da1 R14: 00000000004d8a08 R15: 00000000ffffffff

Showing all locks held in the system:
1 lock held by khungtaskd/1040:
#0: 00000000efacfa90 (rcu_read_lock){....}, at:
debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5142
1 lock held by ping_printk/3734:
#0: 00000000efacfa90 (rcu_read_lock){....}, at: rcu_read_lock+0x0/0x70
arch/x86/include/asm/paravirt.h:767
1 lock held by rsyslogd/7854:
#0: 000000002b5d9fd4 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110
fs/file.c:801
2 locks held by getty/7943:
#0: 000000008baa9b1a (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000188b9e62 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7944:
#0: 00000000ff30ecc9 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000f90058cc (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7945:
#0: 0000000014472213 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000d3649ce5 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7946:
#0: 000000000c495207 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000027ec890e (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7947:
#0: 0000000048eb97c5 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000061e486df (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7948:
#0: 00000000d6db8815 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 000000002a50408a (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7949:
#0: 00000000df4cc1fa (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 000000001e0ed4dc (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
1 lock held by syz-executor.2/8291:
#0: 000000000f3c7140 (&type->s_umount_key#46/1){+.+.}, at:
alloc_super+0x158/0x890 fs/super.c:228
2 locks held by syz-executor.2/8325:
#0: 0000000098aea68f (&bdev->bd_fsfreeze_mutex){+.+.}, at:
mount_bdev+0x93/0x3c0 fs/super.c:1313
#1: 000000000f3c7140 (&type->s_umount_key#47){+.+.}, at:
grab_super+0xb4/0x290 fs/super.c:385

=============================================

INFO: Currently running
syz-executor.2 R running task 27216 8291 8052 0x80004006
Call Trace:
context_switch kernel/sched/core.c:2817 [inline]
__schedule+0x7d1/0x1580 kernel/sched/core.c:3444


Tested on:

commit: a2d635de Merge tag 'drm-next-2019-05-09' of git://anongit...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10852920a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=ff7efb09061edbfc
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=1138988ca00000

Tetsuo Handa

unread,
May 9, 2019, 11:26:45 AM5/9/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller
On 2019/05/09 23:53, Dmitry Vyukov wrote:
>> Like mentioned at https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp ,
>> one task is missing in the report file. That task is only in the console
>> output file where developers won't look at.
>
> Sorry I am spread across too many things. What exactly stack is that?

It is

[ 769.677189][ C0] NMI backtrace for cpu 0
[ 769.677194][ C0] CPU: 0 PID: 7547 Comm: syz-executor013 Not tainted 5.1.0-rc6+ #90
[ 769.677200][ C0] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[ 769.677204][ C0] RIP: 0010:__sanitizer_cov_trace_pc+0x1/0x50
[ 769.677215][ C0] Code: 25 b4 7e ec 08 41 bc f4 ff ff ff e8 cd 5d ea ff 48 c7 05 9e 7e ec 08 00 00 00 00 e9 a4 e9 ff ff 90 90 90 90 90 90 90 90 90 55 <48> 89 e5 48 8b 75 08 65 48 8b 04 25 00 ee 01 00 65 8b 15 c8 60 91
[ 769.677219][ C0] RSP: 0018:ffff8880973ef470 EFLAGS: 00000246
[ 769.677226][ C0] RAX: 0000000000000000 RBX: ffff8880973ef568 RCX: ffffffff870d9383
[ 769.677231][ C0] RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000007
[ 769.677235][ C0] RBP: ffff8880973ef4b0 R08: ffff88808c9fe080 R09: ffffed1015d05bc8
[ 769.677240][ C0] R10: ffffed1015d05bc7 R11: ffff8880ae82de3b R12: ffff8880a0dc9442
[ 769.677245][ C0] R13: ffff8880973ef580 R14: 0000000000000138 R15: ffff8880a0dc9442
[ 769.677250][ C0] FS: 00007f9541ef1700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
[ 769.677254][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 769.677258][ C0] CR2: 0000000000c02000 CR3: 00000000a9780000 CR4: 00000000001406f0
[ 769.677263][ C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 769.677268][ C0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 769.677270][ C0] Call Trace:
[ 769.677273][ C0] ? xas_start+0x211/0x560
[ 769.677276][ C0] xas_load+0x21/0x150
[ 769.677280][ C0] ? rcu_dynticks_curr_cpu_in_eqs+0x54/0xb0
[ 769.677283][ C0] find_get_entry+0x13d/0x880
[ 769.677286][ C0] ? kasan_check_read+0x11/0x20
[ 769.677290][ C0] ? find_get_entries_tag+0xc10/0xc10
[ 769.677293][ C0] ? mark_held_locks+0xa4/0xf0
[ 769.677296][ C0] pagecache_get_page+0x4c/0x740
[ 769.677299][ C0] __find_get_block+0x501/0xcf0
[ 769.677303][ C0] ? __sanitizer_cov_trace_const_cmp1+0x1a/0x20
[ 769.677306][ C0] __getblk_gfp+0x1b1/0x970
[ 769.677309][ C0] __bread_gfp+0x2f/0x300
[ 769.677312][ C0] udf_tread+0xf1/0x140
[ 769.677315][ C0] udf_read_tagged+0x50/0x530
[ 769.677318][ C0] udf_check_anchor_block+0x1ef/0x680
[ 769.677321][ C0] ? blkpg_ioctl+0xa90/0xa90
[ 769.677325][ C0] ? udf_process_sequence+0x35d0/0x35d0
[ 769.677328][ C0] ? submit_bio+0xba/0x480
[ 769.677331][ C0] udf_scan_anchors+0x1cf/0x680
[ 769.677335][ C0] ? udf_check_anchor_block+0x680/0x680
[ 769.677339][ C0] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 769.677342][ C0] ? udf_get_last_session+0x120/0x120
[ 769.677346][ C0] udf_load_vrs+0x8bf/0xc80
[ 769.677349][ C0] ? udf_scan_anchors+0x680/0x680
[ 769.677352][ C0] ? udf_bread+0x260/0x260
[ 769.677355][ C0] ? lockdep_init_map+0x1be/0x6d0
[ 769.677358][ C0] udf_fill_super+0x7d8/0x16d1
[ 769.677361][ C0] ? udf_load_vrs+0xc80/0xc80
[ 769.677364][ C0] ? vsprintf+0x40/0x40
[ 769.677367][ C0] ? set_blocksize+0x2bf/0x340
[ 769.677371][ C0] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 769.677374][ C0] mount_bdev+0x307/0x3c0
[ 769.677376][ C0] ? udf_load_vrs+0xc80/0xc80
[ 769.677379][ C0] udf_mount+0x35/0x40
[ 769.677382][ C0] ? udf_get_pblock_meta25+0x3a0/0x3a0
[ 769.677385][ C0] legacy_get_tree+0xf2/0x200
[ 769.677388][ C0] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 769.677391][ C0] vfs_get_tree+0x123/0x450
[ 769.677394][ C0] do_mount+0x1436/0x2c40
[ 769.677397][ C0] ? copy_mount_string+0x40/0x40
[ 769.677400][ C0] ? _copy_from_user+0xdd/0x150
[ 769.677403][ C0] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 769.677406][ C0] ? copy_mount_options+0x280/0x3a0
[ 769.677409][ C0] ksys_mount+0xdb/0x150
[ 769.677412][ C0] __x64_sys_mount+0xbe/0x150
[ 769.677415][ C0] do_syscall_64+0x103/0x610
[ 769.677418][ C0] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 769.677421][ C0] RIP: 0033:0x44a739
[ 769.677438][ C0] Code: 4d cb fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00
[ 769.677442][ C0] RSP: 002b:00007f9541ef0db8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 769.677454][ C0] RAX: ffffffffffffffda RBX: 00000000006dbc28 RCX: 000000000044a739
[ 769.677458][ C0] RDX: 0000000020000240 RSI: 0000000020000200 RDI: 0000000020000080
[ 769.677462][ C0] RBP: 00000000006dbc20 R08: 0000000000000000 R09: 0000000000000000
[ 769.677467][ C0] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc2c
[ 769.677471][ C0] R13: 00007ffc6cf3cddf R14: 00007f9541ef19c0 R15: 0000000000000000

part from "INFO: task hung in mount_bdev (2)"
( https://syzkaller.appspot.com/bug?id=d9b9fa1428ff2466de64fc85256e769f516cea58 ).
I'm suspecting that something went wrong at __getblk_gfp().

> That email just says "the culprit thread" which saying what exactly
> stack is that.

But this stack trace is not included in the dashboard bug entry and its report entry
( https://syzkaller.appspot.com/text?tag=CrashReport&x=14bf4a14a00000 ) because
caller_id of these lines is not "[ T1042]" but "[ C0]".

> Please also point to the dashboard bug entry, so that I see what's in
> the report and can contrast log and the report.
> Maybe we should fix syzkaller to include that stack then.

It might need a tricky change in order to handle "[ T1042]" and "[ C0]" equally.

Tetsuo Handa

unread,
May 9, 2019, 11:57:32 AM5/9/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller, syzbot
On 2019/05/09 23:53, Dmitry Vyukov wrote:
>> OK, I will try "#syz test:" using
>> https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp
>> for several more times.
>
> Let's try on the original commit too:
>
> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> 037904a2
>

So far succeeded to reproduce only once out of 7 "#syz test:" requests.

-------- Forwarded Message --------
Date: Thu, 09 May 2019 08:26:00 -0700
In-Reply-To: <de6037e8-d163-e231...@i-love.sakura.ne.jp>
Message-ID: <00000000000043...@google.com>
Subject: Re: INFO: task hung in __get_super
From: syzbot <syzbot+10007d...@syzkaller.appspotmail.com>
To: penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch but the reproducer still triggered crash:
git tree: upstream
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=1138988ca00000

-------- End of Forwarded Message --------

And the patch revealed that 27216 was busy looping inside mount() request.

[ 152.497163][ T3734] syz-executor.2 R running task 27216 8291 8052 0x80004006
[ 152.505079][ T3734] Call Trace:
[ 152.508830][ T3734] ? find_held_lock+0x35/0x130
[ 152.513616][ T3734] ? find_get_entry+0x50e/0x880
[ 152.518892][ T3734] ? find_held_lock+0x35/0x130
[ 152.523684][ T3734] ? rcu_is_watching+0x10/0x30
[ 152.528886][ T3734] ? kasan_check_read+0x11/0x20
[ 152.533770][ T3734] ? find_get_entries_tag+0xc10/0xc10
[ 152.539748][ T3734] ? find_get_entry+0x535/0x880
[ 152.544629][ T3734] ? find_get_entries_tag+0xc10/0xc10
[ 152.550416][ T3734] ? mark_held_locks+0xa4/0xf0
[ 152.555202][ T3734] ? mark_page_accessed+0x676/0x1140
[ 152.560951][ T3734] ? init_page_buffers+0x51/0x470
[ 152.566277][ T3734] ? _raw_spin_lock+0x13/0x40
[ 152.570977][ T3734] ? __find_get_block+0x529/0xcf0
[ 152.576478][ T3734] ? __sanitizer_cov_trace_const_cmp1+0x1a/0x20
[ 152.586453][ T3734] ? pagecache_get_page+0x4c/0x740
[ 152.591590][ T3734] ? __getblk_gfp+0x1b1/0x970
[ 152.597600][ T3734] ? __bread_gfp+0x2f/0x300
[ 152.602210][ T3734] ? udf_tread+0xf1/0x140
[ 152.606888][ T3734] ? udf_read_tagged+0x50/0x530
[ 152.611761][ T3734] ? udf_check_anchor_block+0x1ef/0x680
[ 152.617761][ T3734] ? blkpg_ioctl+0xa90/0xa90
[ 152.622399][ T3734] ? udf_process_sequence+0x35d0/0x35d0
[ 152.628374][ T3734] ? blk_queue_enter+0x700/0xe90
[ 152.633344][ T3734] ? udf_scan_anchors+0x3f4/0x680
[ 152.638848][ T3734] ? udf_check_anchor_block+0x680/0x680
[ 152.644495][ T3734] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 152.651214][ T3734] ? udf_get_last_session+0x120/0x120
[ 152.658517][ T3734] ? udf_load_vrs+0x67f/0xc80
[ 152.663222][ T3734] ? udf_scan_anchors+0x680/0x680
[ 152.668642][ T3734] ? udf_bread+0x260/0x260
[ 152.673771][ T3734] ? lockdep_init_map+0x1be/0x6d0
[ 152.679333][ T3734] ? udf_fill_super+0x7d8/0x16d1
[ 152.684294][ T3734] ? udf_load_vrs+0xc80/0xc80
[ 152.689323][ T3734] ? vsprintf+0x40/0x40
[ 152.693500][ T3734] ? set_blocksize+0x2bf/0x340
[ 152.698665][ T3734] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 152.704965][ T3734] ? mount_bdev+0x307/0x3c0
[ 152.709913][ T3734] ? udf_load_vrs+0xc80/0xc80
[ 152.714613][ T3734] ? udf_mount+0x35/0x40
[ 152.719347][ T3734] ? udf_get_pblock_meta25+0x3a0/0x3a0
[ 152.724856][ T3734] ? legacy_get_tree+0x10a/0x220
[ 152.731000][ T3734] ? legacy_parse_monolithic+0x11f/0x180
[ 152.737023][ T3734] ? vfs_get_tree+0x93/0x3a0
[ 152.741630][ T3734] ? do_mount+0x138c/0x1c00
[ 152.746426][ T3734] ? copy_mount_string+0x40/0x40
[ 152.751386][ T3734] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 152.758053][ T3734] ? _copy_from_user+0xdd/0x150
[ 152.762941][ T3734] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 152.769570][ T3734] ? copy_mount_options+0x280/0x3a0
[ 152.774786][ T3734] ? ksys_mount+0xdb/0x150
[ 152.779575][ T3734] ? __x64_sys_mount+0xbe/0x150
[ 152.785006][ T3734] ? do_syscall_64+0x103/0x670
[ 152.790358][ T3734] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 222.467245][ T3734] syz-executor.2 R running task 27216 8291 8052 0x80004006
[ 222.480166][ T3734] Call Trace:
[ 222.483487][ T3734] ? __lock_acquire+0x54f/0x5490
[ 222.488551][ T3734] ? lock_downgrade+0x880/0x880
[ 222.493457][ T3734] ? __schedule+0x7d1/0x1580
[ 222.498097][ T3734] ? __sched_text_start+0x8/0x8
[ 222.510067][ T3734] ? preempt_schedule_irq+0x8d/0x140
[ 222.520173][ T3734] ? rcu_dynticks_curr_cpu_in_eqs+0x54/0xb0
[ 222.527314][ T3734] ? __sanitizer_cov_trace_const_cmp1+0x1a/0x20
[ 222.533587][ T3734] ? find_get_entry+0x773/0x880
[ 222.543922][ T3734] ? __sanitizer_cov_trace_const_cmp1+0x1a/0x20
[ 222.550594][ T3734] ? preempt_schedule+0x4b/0x60
[ 222.555472][ T3734] ? ___preempt_schedule+0x16/0x18
[ 222.566306][ T3734] ? kasan_check_read+0x11/0x20
[ 222.571181][ T3734] ? do_raw_spin_unlock+0x57/0x270
[ 222.576724][ T3734] ? pagecache_get_page+0x4c/0x740
[ 222.592910][ T3734] ? __getblk_gfp+0x1b1/0x970
[ 222.597939][ T3734] ? __bread_gfp+0x2f/0x300
[ 222.602461][ T3734] ? udf_tread+0xf1/0x140
[ 222.607194][ T3734] ? udf_read_tagged+0x50/0x530
[ 222.612073][ T3734] ? udf_check_anchor_block+0x1ef/0x680
[ 222.628338][ T3734] ? blkpg_ioctl+0xa90/0xa90
[ 222.632966][ T3734] ? udf_process_sequence+0x35d0/0x35d0
[ 222.652017][ T3734] ? blk_queue_enter+0x700/0xe90
[ 222.666840][ T3734] ? udf_scan_anchors+0x3f4/0x680
[ 222.671894][ T3734] ? udf_check_anchor_block+0x680/0x680
[ 222.703521][ T3734] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 222.710168][ T3734] ? udf_get_last_session+0x120/0x120
[ 222.715574][ T3734] ? udf_load_vrs+0x67f/0xc80
[ 222.729206][ T3734] ? udf_scan_anchors+0x680/0x680
[ 222.734280][ T3734] ? udf_bread+0x260/0x260
[ 222.747319][ T3734] ? lockdep_init_map+0x1be/0x6d0
[ 222.752475][ T3734] ? udf_fill_super+0x7d8/0x16d1
[ 222.772860][ T3734] ? udf_load_vrs+0xc80/0xc80
[ 222.777979][ T3734] ? vsprintf+0x40/0x40
[ 222.782166][ T3734] ? set_blocksize+0x2bf/0x340
[ 222.802500][ T3734] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 222.809166][ T3734] ? mount_bdev+0x307/0x3c0
[ 222.813690][ T3734] ? udf_load_vrs+0xc80/0xc80
[ 222.818900][ T3734] ? udf_mount+0x35/0x40
[ 222.823164][ T3734] ? udf_get_pblock_meta25+0x3a0/0x3a0
[ 222.829086][ T3734] ? legacy_get_tree+0x10a/0x220
[ 222.834049][ T3734] ? legacy_parse_monolithic+0x11f/0x180
[ 222.845530][ T3734] ? vfs_get_tree+0x93/0x3a0
[ 222.858198][ T3734] ? do_mount+0x138c/0x1c00
[ 222.862729][ T3734] ? copy_mount_string+0x40/0x40
[ 222.868069][ T3734] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 222.874329][ T3734] ? _copy_from_user+0xdd/0x150
[ 222.879663][ T3734] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 222.890901][ T3734] ? copy_mount_options+0x280/0x3a0
[ 222.905329][ T3734] ? ksys_mount+0xdb/0x150
[ 222.910121][ T3734] ? __x64_sys_mount+0xbe/0x150
[ 222.915002][ T3734] ? do_syscall_64+0x103/0x670
[ 222.921075][ T3734] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 223.262114][ T1040] INFO: Currently running
[ 223.266883][ T1040] syz-executor.2 R running task 27216 8291 8052 0x80004006
[ 223.274806][ T1040] Call Trace:
[ 223.278564][ T1040] __schedule+0x7d1/0x1580
[ 223.287855][ T1040] ? __sched_text_start+0x8/0x8
[ 223.296854][ T1040] ? lock_downgrade+0x880/0x880
[ 223.301745][ T1040] ? __find_get_block+0x501/0xcf0
[ 223.307322][ T1040] ? pagecache_get_page+0x4c/0x740
[ 223.312462][ T1040] ? __getblk_gfp+0x27e/0x970
[ 223.323510][ T1040] ? __bread_gfp+0x2f/0x300
[ 223.329202][ T1040] ? udf_tread+0xf1/0x140
[ 223.333553][ T1040] ? udf_read_tagged+0x50/0x530
[ 223.344130][ T1040] ? udf_check_anchor_block+0x1ef/0x680
[ 223.350130][ T1040] ? blkpg_ioctl+0xa90/0xa90
[ 223.354915][ T1040] ? udf_process_sequence+0x35d0/0x35d0
[ 223.366637][ T1040] ? blk_queue_enter+0x700/0xe90
[ 223.371622][ T1040] ? udf_scan_anchors+0x3f4/0x680
[ 223.377030][ T1040] ? udf_check_anchor_block+0x680/0x680
[ 223.382594][ T1040] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 223.389304][ T1040] ? udf_get_last_session+0x120/0x120
[ 223.394711][ T1040] ? udf_load_vrs+0x67f/0xc80
[ 223.404843][ T1040] ? udf_scan_anchors+0x680/0x680
[ 223.436979][ T1040] ? udf_bread+0x260/0x260
[ 223.441460][ T1040] ? lockdep_init_map+0x1be/0x6d0
[ 223.453058][ T1040] ? udf_fill_super+0x7d8/0x16d1
[ 223.458135][ T1040] ? udf_load_vrs+0xc80/0xc80
[ 223.462854][ T1040] ? vsprintf+0x40/0x40
[ 223.486029][ T1040] ? set_blocksize+0x2bf/0x340
[ 223.493937][ T1040] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[ 223.500633][ T1040] ? mount_bdev+0x307/0x3c0
[ 223.505162][ T1040] ? udf_load_vrs+0xc80/0xc80
[ 223.510316][ T1040] ? udf_mount+0x35/0x40
[ 223.514598][ T1040] ? udf_get_pblock_meta25+0x3a0/0x3a0
[ 223.525707][ T1040] ? legacy_get_tree+0x10a/0x220
[ 223.531062][ T1040] ? legacy_parse_monolithic+0x11f/0x180
[ 223.537198][ T1040] ? vfs_get_tree+0x93/0x3a0
[ 223.558252][ T1040] ? do_mount+0x138c/0x1c00
[ 223.562823][ T1040] ? copy_mount_string+0x40/0x40
[ 223.598652][ T1040] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 223.604957][ T1040] ? _copy_from_user+0xdd/0x150
[ 223.612873][ T1040] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[ 223.625997][ T1040] ? copy_mount_options+0x280/0x3a0
[ 223.636192][ T1040] ? ksys_mount+0xdb/0x150
[ 223.640630][ T1040] ? __x64_sys_mount+0xbe/0x150
[ 223.645522][ T1040] ? do_syscall_64+0x103/0x670
[ 223.650787][ T1040] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 223.657350][ T1040] Kernel panic - not syncing: hung_task: blocked tasks

Dmitry Vyukov

unread,
May 9, 2019, 12:06:51 PM5/9/19
to Tetsuo Handa, syzkaller-bugs, syzkaller, syzbot
> On 2019/05/09 23:53, Dmitry Vyukov wrote:
> >> OK, I will try "#syz test:" using
> >> https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp
> >> for several more times.
> >
> > Let's try on the original commit too:
> >
> > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > 037904a2
> >
>
> So far succeeded to reproduce only once out of 7 "#syz test:" requests.

It _did_ reproduce always. Machine was unresponsive but kernel failed
to diagnose it as a hung task or in any other way.

Dmitry Vyukov

unread,
May 9, 2019, 12:41:16 PM5/9/19
to Tetsuo Handa, syzkaller-bugs, syzkaller, syzbot
> > On 2019/05/09 23:53, Dmitry Vyukov wrote:
> > >> OK, I will try "#syz test:" using
> > >> https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp
> > >> for several more times.
> > >
> > > Let's try on the original commit too:
> > >
> > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > 037904a2
> > >
> >
> > So far succeeded to reproduce only once out of 7 "#syz test:" requests.
>
> It _did_ reproduce always. Machine was unresponsive but kernel failed
> to diagnose it as a hung task or in any other way.


There is one potential improvement on the syzbot side, though. We run
3 tests and then choose a random crashed run:
https://github.com/google/syzkaller/blob/214bf476d3ae8eb77b04adbc52baeef09a9d6f31/syz-ci/jobs.go#L476-L508
I think we should prefer the one with a report, if there are both
crashes with reports ("hung task") and without report ("no output").
Assuming there was some mix of "task hung" and "no output" results in
your previous test attempts, this should increase probability of
getting "task hung" significantly.

Dmitry Vyukov

unread,
May 10, 2019, 5:10:46 AM5/10/19
to Tetsuo Handa, syzkaller-bugs, syzkaller, syzbot
From: Dmitry Vyukov <dvy...@google.com>
Date: Thu, May 9, 2019 at 6:41 PM
To: Tetsuo Handa
Cc: syzkaller-bugs, syzkaller, syzbot

> > > On 2019/05/09 23:53, Dmitry Vyukov wrote:
> > > >> OK, I will try "#syz test:" using
> > > >> https://lkml.kernel.org/r/39601316-2a59-bbd7...@i-love.sakura.ne.jp
> > > >> for several more times.
> > > >
> > > > Let's try on the original commit too:
> > > >
> > > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > > 037904a2
> > > >
> > >
> > > So far succeeded to reproduce only once out of 7 "#syz test:" requests.
> >
> > It _did_ reproduce always. Machine was unresponsive but kernel failed
> > to diagnose it as a hung task or in any other way.
>
>
> There is one potential improvement on the syzbot side, though. We run
> 3 tests and then choose a random crashed run:
> https://github.com/google/syzkaller/blob/214bf476d3ae8eb77b04adbc52baeef09a9d6f31/syz-ci/jobs.go#L476-L508
> I think we should prefer the one with a report, if there are both
> crashes with reports ("hung task") and without report ("no output").
> Assuming there was some mix of "task hung" and "no output" results in
> your previous test attempts, this should increase probability of
> getting "task hung" significantly.


I've implemented these 2 improvements:
1. syzbot will prefer crashes with reports during patch testing (over
"no output"):
https://github.com/google/syzkaller/commit/7143d0bc86ae1927da5e57957350126409321b80
2. hung task/rcu stall reports should include traceback from other CPUs now:
https://github.com/google/syzkaller/commit/4e7ecf0713aa5b07b0e2b9fffb6d6e51f759900d
They should be deployed soon.

syzbot

unread,
May 10, 2019, 6:47:01 AM5/10/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
INFO: task syz-executor can't die for more than 27 seconds.

INFO: task syz-executor.2:8014 can't die for more than 27 seconds.
syz-executor.2 R running task 27216 8014 7941 0x80004006
Call Trace:


Tested on:

commit: 8ea5b2ab Merge branch 'fixes' of git://git.kernel.org/pub/..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17dd8144a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=2197616f59b1885d
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=17162ae8a00000

Tetsuo Handa

unread,
May 10, 2019, 7:09:59 AM5/10/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller
On 2019/05/10 18:10, Dmitry Vyukov wrote:
>> So far succeeded to reproduce only once out of 7 "#syz test:" requests.
>
> It _did_ reproduce always. Machine was unresponsive but kernel failed
> to diagnose it as a hung task or in any other way.

Well, what I expected the reproducer is generate "hung task" messages, for
I didn't think "no output" as a useful result. If I think "no output" as a
crash, the reproducer always did reproduce some unhealthy situation.

Well, I came to feel that "no output" is a very bad categorization.
Since each test process should be killed by SIGKILL shortly, it is
possible that the reason of "no output" is that the test process could
not respond to SIGKILL. And I confirmed it using a test patch below.

-------- Forwarded Message --------
Date: Fri, 10 May 2019 03:47:00 -0700
Message-ID: <00000000000050...@google.com>
Subject: Re: INFO: task hung in __get_super
From: syzbot <syzbot+10007d...@syzkaller.appspotmail.com>
To: penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch but the reproducer still triggered crash:
INFO: task syz-executor can't die for more than 27 seconds.

INFO: task syz-executor.2:8014 can't die for more than 27 seconds.
syz-executor.2 R running task 27216 8014 7941 0x80004006
Call Trace:


Tested on:

commit: 8ea5b2ab Merge branch 'fixes' of git://git.kernel.org/pub/..
git tree: upstream
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=17162ae8a00000
-------- End of Forwarded Message --------

Maybe adding a watchdog which monitors threads not responding to SIGKILL
can reduce amount of "no output" reports while providing some hints for
what went wrong.

Dmitry Vyukov

unread,
May 10, 2019, 7:10:45 AM5/10/19
to syzbot, Tetsuo Handa, syzkaller-bugs
From: syzbot <syzbot+10007d...@syzkaller.appspotmail.com>
Date: Fri, May 10, 2019 at 12:47 PM
To: <penguin...@i-love.sakura.ne.jp>, <syzkall...@googlegroups.com>
This "can't die for more than" may be a useful debugging check for
kernel. We've seem lots of unkillable tasks in the past. Or all of
them should be detected by other facilities? As far as I remember
frequently they lead to just "no output" with no additional
diagnositcs.

Dmitry Vyukov

unread,
May 10, 2019, 7:13:26 AM5/10/19
to Tetsuo Handa, syzkaller-bugs, syzkaller
From: Tetsuo Handa <penguin...@i-love.sakura.ne.jp>
Date: Fri, May 10, 2019 at 1:09 PM
To: Dmitry Vyukov
Cc: syzkaller-bugs, syzkaller

We obviously raced and I answered on the other thread.
Yes. That would be a very useful check.

Tetsuo Handa

unread,
May 10, 2019, 7:32:25 AM5/10/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller
On 2019/05/10 20:10, Dmitry Vyukov wrote:
> This "can't die for more than" may be a useful debugging check for
> kernel. We've seem lots of unkillable tasks in the past. Or all of
> them should be detected by other facilities? As far as I remember
> frequently they lead to just "no output" with no additional
> diagnositcs.

Not all of them can be detected by other facilities, for this case is
basically doing

take_some_lock();
while (!try_something_and_succeed())
cond_resched();
release_some_lock();

which results in none of softlockup / hardlockup / RCU stall.
This is a healthy situation as long as try_something_and_succeed()
can eventually succeed. And for this case, try_something_and_succeed()
can't succeed for some reason.

On 2019/05/10 20:13, Dmitry Vyukov wrote:
> We obviously raced and I answered on the other thread.
> Yes. That would be a very useful check.

OK. Let's consider how to implement this check.

syzbot

unread,
May 11, 2019, 10:18:01 AM5/11/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
INFO: task syz-executor can't die for more than 20 seconds.

INFO: task syz-executor.1:8243 can't die for more than 20 seconds.
syz-executor.1 R running task 27664 8243 8094 0x80004006
Call Trace:
INFO: task syz-executor.2:8278 can't die for more than 21 seconds.
syz-executor.2 R running task 27520 8278 8096 0x80004006
Call Trace:
context_switch kernel/sched/core.c:2817 [inline]
__schedule+0x7d1/0x1580 kernel/sched/core.c:3444

Showing all locks held in the system:
1 lock held by khungtaskd/1043:
#0: 00000000422f5bc6 (rcu_read_lock){....}, at:
debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5148
1 lock held by rsyslogd/7891:
#0: 00000000a0cb712c (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110
fs/file.c:801
2 locks held by getty/7981:
#0: 00000000e8d4b83f (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000074572963 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7982:
#0: 00000000517df272 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000006c9ecea (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7983:
#0: 00000000f17a98f3 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 000000003ce54205 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7984:
#0: 000000007a5eb84a (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 000000009323d7f3 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7985:
#0: 000000008061da89 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000d26688a6 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7986:
#0: 00000000fea844f1 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 000000002b6a27b2 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7987:
#0: 000000001e368af1 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000063d48d1e (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
1 lock held by syz-executor.1/8243:
1 lock held by syz-executor.2/8278:

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1043 Comm: khungtaskd Not tainted 5.1.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:224 [inline]
watchdog+0xa58/0x1020 kernel/hung_task.c:308
kthread+0x357/0x430 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 140 Comm: kworker/u4:3 Not tainted 5.1.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: bat_events batadv_nc_worker
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:108 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:135 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:166 [inline]
RIP: 0010:check_memory_region_inline mm/kasan/generic.c:182 [inline]
RIP: 0010:check_memory_region+0x53/0x190 mm/kasan/generic.c:191
Code: 8d 5c 37 ff 49 89 f8 48 b8 00 00 00 00 00 fc ff df 4d 89 da 49 c1 e8
03 4d 8d 24 00 49 c1 ea 03 49 01 c2 4c 89 e0 49 8d 5a 01 <49> 89 d9 4d 29
e1 49 83 f9 10 0f 8e 99 00 00 00 45 89 e0 41 83 e0
RSP: 0018:ffff8880a94efc70 EFLAGS: 00000282
RAX: ffffed1015d26bdf RBX: ffffed1015d26be0 RCX: ffffffff815ee314
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8880ae935ef8
RBP: ffff8880a94efc88 R08: 1ffff11015d26bdf R09: ffff8880a94e2f78
R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: ffffed1015d26bdf
R13: 0000000000000328 R14: ffff88809e413380 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001b10e00 CR3: 000000009862f000 CR4: 00000000001406e0
Call Trace:
kasan_check_read+0x11/0x20 mm/kasan/common.c:94
atomic_read include/asm-generic/atomic-instrumented.h:26 [inline]
rcu_dynticks_curr_cpu_in_eqs+0x54/0xb0 kernel/rcu/tree.c:292
rcu_is_watching+0x10/0x30 kernel/rcu/tree.c:872
rcu_read_lock include/linux/rcupdate.h:596 [inline]
batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:407 [inline]
batadv_nc_worker+0x4cd/0x760 net/batman-adv/network-coding.c:718
process_one_work+0x98e/0x1790 kernel/workqueue.c:2268
worker_thread+0x98/0xe40 kernel/workqueue.c:2414
kthread+0x357/0x430 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352


Tested on:

commit: 1fb3b526 Merge tag 'docs-5.2a' of git://git.lwn.net/linux
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16241df0a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=2197616f59b1885d
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=10241df0a00000

syzbot

unread,
May 11, 2019, 10:39:01 PM5/11/19
to penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer still triggered
crash:
INFO: task syz-executor can't die for more than 143 seconds.

INFO: task syz-executor.2:8116 can't die for more than 143 seconds.
syz-executor.2 R running task 27664 8116 7914 0x80004006
Call Trace:

Showing all locks held in the system:
1 lock held by khungtaskd/1042:
#0: 000000003774dd85 (rcu_read_lock){....}, at:
debug_show_all_locks+0x5f/0x27e kernel/locking/lockdep.c:5148
1 lock held by rsyslogd/7689:
#0: 00000000393606a9 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xee/0x110
fs/file.c:801
2 locks held by getty/7801:
#0: 000000003757192c (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000b5ad0779 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7802:
#0: 00000000b833abbf (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 0000000066ecdf5b (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7803:
#0: 000000001420d139 (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000778a7442 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7804:
#0: 00000000609a7f4e (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 000000008193b36d (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7805:
#0: 000000004d70823c (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000bcf29e22 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7806:
#0: 0000000033ba4f5c (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000ce12d110 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
2 locks held by getty/7807:
#0: 000000000c5e509e (&tty->ldisc_sem){++++}, at:
ldsem_down_read+0x33/0x40 drivers/tty/tty_ldsem.c:341
#1: 00000000d8be932b (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x232/0x1b70 drivers/tty/n_tty.c:2156
1 lock held by syz-executor.2/8116:

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1042 Comm: khungtaskd Not tainted 5.1.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
nmi_cpu_backtrace.cold+0x63/0xa4 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x1be/0x236 lib/nmi_backtrace.c:62
arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:243 [inline]
watchdog+0xb85/0x12a0 kernel/hung_task.c:327
kthread+0x357/0x430 kernel/kthread.c:253
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 8116 Comm: syz-executor.2 Not tainted 5.1.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__sanitizer_cov_trace_const_cmp4+0x11/0x20 kernel/kcov.c:189
Code: bf 03 00 00 00 48 89 e5 48 8b 4d 08 e8 d8 fe ff ff 5d c3 66 0f 1f 44
00 00 55 89 f2 89 fe bf 05 00 00 00 48 89 e5 48 8b 4d 08 <e8> ba fe ff ff
5d c3 0f 1f 84 00 00 00 00 00 55 48 89 f2 48 89 fe
RSP: 0018:ffff88808c217588 EFLAGS: 00000246
RAX: 0000000000000007 RBX: 0000000000000002 RCX: ffffffff818c0e71
RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000005
RBP: ffff88808c217588 R08: 1ffffd40003cc646 R09: fffff940003cc647
R10: fffff940003cc646 R11: ffffea0001e63237 R12: ffffea0001e63234
R13: ffffea0001e63200 R14: ffff88808c217668 R15: ffffea0001e63200
FS: 00007fc6730c2700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5ad32f7140 CR3: 000000009faf7000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
atomic_fetch_add_unless include/linux/atomic-fallback.h:1089 [inline]
atomic_add_unless include/linux/atomic-fallback.h:1111 [inline]
page_ref_add_unless include/linux/page_ref.h:157 [inline]
__page_cache_add_speculative include/linux/pagemap.h:186 [inline]
page_cache_get_speculative include/linux/pagemap.h:202 [inline]
find_get_entry+0x221/0x880 mm/filemap.c:1510
pagecache_get_page+0x4c/0x740 mm/filemap.c:1607
find_or_create_page include/linux/pagemap.h:310 [inline]
grow_dev_page fs/buffer.c:948 [inline]
grow_buffers fs/buffer.c:1017 [inline]
__getblk_slow fs/buffer.c:1044 [inline]
__getblk_gfp+0x27e/0x970 fs/buffer.c:1321
__bread_gfp+0x2f/0x300 fs/buffer.c:1355
sb_bread include/linux/buffer_head.h:307 [inline]
udf_tread+0xf1/0x140 fs/udf/misc.c:44
udf_read_tagged+0x50/0x530 fs/udf/misc.c:210
udf_check_anchor_block+0x1ef/0x680 fs/udf/super.c:1780
udf_scan_anchors+0x3f4/0x680 fs/udf/super.c:1842
udf_find_anchor fs/udf/super.c:1877 [inline]
udf_load_vrs+0x67f/0xc80 fs/udf/super.c:1942
udf_fill_super+0x7d8/0x16d1 fs/udf/super.c:2135
mount_bdev+0x307/0x3c0 fs/super.c:1346
udf_mount+0x35/0x40 fs/udf/super.c:131
legacy_get_tree+0x10a/0x220 fs/fs_context.c:665
vfs_get_tree+0x93/0x3a0 fs/super.c:1476
do_new_mount fs/namespace.c:2790 [inline]
do_mount+0x138c/0x1c00 fs/namespace.c:3110
ksys_mount+0xdb/0x150 fs/namespace.c:3319
__do_sys_mount fs/namespace.c:3333 [inline]
__se_sys_mount fs/namespace.c:3330 [inline]
__x64_sys_mount+0xbe/0x150 fs/namespace.c:3330
do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458da9
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fc6730c1c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000458da9
RDX: 0000000020000140 RSI: 0000000020000080 RDI: 0000000020000000
RBP: 000000000073bfa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc6730c26d4
R13: 00000000004c4da1 R14: 00000000004d8a08 R15: 00000000ffffffff


Tested on:

commit: 8148c17b Merge tag 'gpio-v5.2-1' of git://git.kernel.org/p..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10ebce10a00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e37010b34967a96e
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=10dfb0f8a00000

Tetsuo Handa

unread,
May 12, 2019, 4:21:31 AM5/12/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller
Here is a patch. What do you think?

From 18bb3a8e0ceacddba99ebb2575d72ad72c8b8554 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Date: Sun, 12 May 2019 17:17:28 +0900
Subject: [PATCH] kernel/hung_task.c: Monitor killed tasks.

syzbot's second top report is "no output from test machine" where the
userspace process failed to spawn a new test process for 300 seconds
for some reason. One of reasons which can result in this report is that
an already spawned test process was unable to terminate (e.g. trapped at
an unkillable retry loop due to some bug) after SIGKILL was sent to that
process. Therefore, reporting when a thread is failing to terminate
despite a fatal signal is pending would give us more useful information.

This version shares existing sysctl settings (e.g. check interval,
timeout, whether to panic) used for detecting TASK_UNINTERRUPTIBLE
threads, for I don't know whether people want to use a new kernel
config option and different sysctl settings for monitoring killed
threads.

Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Cc: Dmitry Vyukov <dvy...@google.com>
---
include/linux/sched.h | 1 +
kernel/hung_task.c | 38 ++++++++++++++++++++++++++++++++++++++
2 files changed, 39 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a2cd1585..d42bdd7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -850,6 +850,7 @@ struct task_struct {
#ifdef CONFIG_DETECT_HUNG_TASK
unsigned long last_switch_count;
unsigned long last_switch_time;
+ unsigned long killed_time;
#endif
/* Filesystem information: */
struct fs_struct *fs;
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index f108a95..a3747d4 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -141,6 +141,41 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
touch_nmi_watchdog();
}

+static void check_killed_task(struct task_struct *t, unsigned long timeout)
+{
+ unsigned long stamp = t->killed_time;
+
+ if (!stamp) {
+ stamp = jiffies;
+ if (!stamp)
+ stamp++;
+ t->killed_time = stamp;
+ return;
+ }
+ if (time_is_after_jiffies(stamp + timeout * HZ))
+ return;
+ /*
+ * Skip threads which are already inside do_exit(), for exit_mm() etc.
+ * might take many seconds.
+ */
+ if (t->flags & PF_EXITING)
+ return;
+ trace_sched_process_hang(t);
+ if (sysctl_hung_task_panic) {
+ console_verbose();
+ hung_task_call_panic = true;
+ }
+ /*
+ * This thread failed to terminate for more than
+ * sysctl_hung_task_timeout_secs seconds, complain:
+ */
+ pr_err("INFO: task %s:%d can't die for more than %ld seconds.\n",
+ t->comm, t->pid, (jiffies - stamp) / HZ);
+ sched_show_task(t);
+ hung_task_show_lock = true;
+ touch_nmi_watchdog();
+}
+
/*
* To avoid extending the RCU grace period for an unbounded amount of time,
* periodically exit the critical section and enter a new one.
@@ -192,6 +227,9 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
goto unlock;
last_break = jiffies;
}
+ /* Check threads which are about to terminate. */
+ if (unlikely(fatal_signal_pending(t)))
+ check_killed_task(t, timeout);
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, timeout);
--
1.8.3.1


Dmitry Vyukov

unread,
May 13, 2019, 3:18:51 AM5/13/19
to Tetsuo Handa, syzkaller-bugs, syzkaller
From: Tetsuo Handa <penguin...@i-love.sakura.ne.jp>
Date: Sun, May 12, 2019 at 10:21 AM
To: Dmitry Vyukov
Cc: syzkaller-bugs, syzkaller

Looks good overall.
I like how this is bundled with hung task detection, because
effectively this is a hung task.

Should we also do PF_FROZEN check here?
There seems to be some code duplication with check_hung_task when we
detect an actual hang. E.g. the
console_verbose/hung_task_show_lock/tung_task_call_panic logic.
Perhaps it will be better to unify common pre-checks and logic after
we detected a hang, and just have 2 different predicates for bad
tasks: hung tasks and tasks failing to exit. We would need to move the
TASK_UNINTERRUPTIBLE check into the predicate for hung tasks then, but
it looks fine to me.

A question that bothers me is priority: getting more-or-less stable
hung/stall reports out of the kernel is quite tricky, but we seem to
have something more-or-less stable now. We need to think how this fits
into crash bucketing. Now when we add unkillable tasks, should they be
reported before/after hung tasks? Can a hung task be a result of an
unkillable task? The other way around?

Tetsuo Handa

unread,
May 13, 2019, 6:26:26 AM5/13/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller
On 2019/05/13 16:18, Dmitry Vyukov wrote:
> Looks good overall.
> I like how this is bundled with hung task detection, because
> effectively this is a hung task.

Yes, SIGKILL'ed thread can't terminate is effectively a hung.

>
> Should we also do PF_FROZEN check here?

Probably yes, for only threads killed by the OOM killer gets TIF_MEMDIE
at mark_oom_victim() in order to unfreeze. There might be killed threads
which remain frozen.

> There seems to be some code duplication with check_hung_task when we
> detect an actual hang. E.g. the
> console_verbose/hung_task_show_lock/tung_task_call_panic logic.
> Perhaps it will be better to unify common pre-checks and logic after
> we detected a hang, and just have 2 different predicates for bad
> tasks: hung tasks and tasks failing to exit. We would need to move the
> TASK_UNINTERRUPTIBLE check into the predicate for hung tasks then, but
> it looks fine to me.

That depends on whether we want to allow different timeouts/actions for
killed threads and hung threads. Someone (not limited to syzbot testing)
might want to set shorter timeout for killed threads and/or want to just
emit warning (i.e. do not panic()).

>
> A question that bothers me is priority: getting more-or-less stable
> hung/stall reports out of the kernel is quite tricky, but we seem to
> have something more-or-less stable now. We need to think how this fits
> into crash bucketing. Now when we add unkillable tasks, should they be
> reported before/after hung tasks? Can a hung task be a result of an
> unkillable task? The other way around?

One of "#syz test:" results ( https://syzkaller.appspot.com/x/log.txt?x=10852920a00000 )
is demonstrating that a killed thread being unable to release type->s_umount_key
(although https://syzkaller.appspot.com/x/patch.diff?x=1138988ca00000 used for this
console output was not ready to show killed threads, the reproducer is known to fall
into unkillable retry loop) triggered a hung task.

[ 221.996576][ T1040] INFO: task syz-executor.2:8325 blocked for more than 143 seconds.
[ 222.004590][ T1040] Not tainted 5.1.0+ #1

[ 223.175940][ T1040] 1 lock held by syz-executor.2/8291:
[ 223.183220][ T1040] #0: 000000000f3c7140 (&type->s_umount_key#46/1){+.+.}, at: alloc_super+0x158/0x890
[ 223.215929][ T1040] 2 locks held by syz-executor.2/8325:
[ 223.223980][ T1040] #0: 0000000098aea68f (&bdev->bd_fsfreeze_mutex){+.+.}, at: mount_bdev+0x93/0x3c0
[ 223.233792][ T1040] #1: 000000000f3c7140 (&type->s_umount_key#47){+.+.}, at: grab_super+0xb4/0x290

[ 223.262114][ T1040] INFO: Currently running
[ 223.266883][ T1040] syz-executor.2 R running task 27216 8291 8052 0x80004006

On the other hand, it is also possible that a hung task (or doing endless retry loop)
blocks a killed thread waiting for a lock held by that hung task (or doing endless
retry loop)...

Dmitry Vyukov

unread,
May 13, 2019, 6:34:59 AM5/13/19
to Tetsuo Handa, syzkaller-bugs, syzkaller
From: Tetsuo Handa <penguin...@i-love.sakura.ne.jp>
Date: Mon, May 13, 2019 at 12:26 PM
To: Dmitry Vyukov
Cc: syzkaller-bugs, syzkaller

Right.
The current way you do it is probably fine. We will see. Hard to tell
how many duplicates and these unkillable tasks we will see ahead of
time. Hung tasks tend to produce lots of duplicates anyway and
probably require some more fundamental solution.
Please send the patch to kernel mailing lists for official review.

Tetsuo Handa

unread,
May 14, 2019, 1:16:06 AM5/14/19
to Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello, Jan.

syzbot is still reporting livelocks inside __getblk_gfp() [1] (similar to
commit 04906b2f542c2362 ("blockdev: Fix livelocks on loop device")).

[1] https://syzkaller.appspot.com/bug?id=835a0b9e75b14b55112661cbc61ca8b8f0edf767

A debug printk() patch shown below revealed that since bdev->bd_inode->i_blkbits
is unexpectedly modified, __find_get_block_slow() is failing to find buffer pages
created by grow_dev_page(). I guess that opening a loop device (while somebody is
doing mount operation on that loop device) can trigger set_init_blocksize() from
__blkdev_get().

----------------------------------------
diff --git a/fs/buffer.c b/fs/buffer.c
index 0faa41fb4c88..8e197c771476 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -190,7 +190,7 @@ EXPORT_SYMBOL(end_buffer_write_sync);
* succeeds, there is no need to take private_lock.
*/
static struct buffer_head *
-__find_get_block_slow(struct block_device *bdev, sector_t block)
+__find_get_block_slow(struct block_device *bdev, sector_t block, unsigned size)
{
struct inode *bd_inode = bdev->bd_inode;
struct address_space *bd_mapping = bd_inode->i_mapping;
@@ -204,12 +204,20 @@ __find_get_block_slow(struct block_device *bdev, sector_t block)

index = block >> (PAGE_SHIFT - bd_inode->i_blkbits);
page = find_get_page_flags(bd_mapping, index, FGP_ACCESSED);
- if (!page)
+ if (!page) {
+ if (fatal_signal_pending(current) && __ratelimit(&last_warned))
+ printk("%s:%u block=%llu bd_inode->i_blkbits=%u index=%lu size=%u\n",
+ __FILE__, __LINE__, block, bd_inode->i_blkbits, index, size);
goto out;
+ }

spin_lock(&bd_mapping->private_lock);
- if (!page_has_buffers(page))
+ if (!page_has_buffers(page)) {
+ if (fatal_signal_pending(current) && __ratelimit(&last_warned))
+ printk("%s:%u block=%llu bd_inode->i_blkbits=%u index=%lu size=%u page=%p\n",
+ __FILE__, __LINE__, block, bd_inode->i_blkbits, index, size, page);
goto out_unlock;
+ }
head = page_buffers(page);
bh = head;
do {
@@ -934,6 +942,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
sector_t end_block;
int ret = 0; /* Will call free_more_memory() */
gfp_t gfp_mask;
+ static DEFINE_RATELIMIT_STATE(last_warned1, HZ, 1);
+ static DEFINE_RATELIMIT_STATE(last_warned2, HZ, 1);

gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS) | gfp;

@@ -946,11 +956,15 @@ grow_dev_page(struct block_device *bdev, sector_t block,
gfp_mask |= __GFP_NOFAIL;

page = find_or_create_page(inode->i_mapping, index, gfp_mask);
+ if (fatal_signal_pending(current) && __ratelimit(&last_warned1))
+ printk("%s:%u block=%llu inode->i_blkbits=%u index=%lu size=%u\n", __FILE__, __LINE__, block, inode->i_blkbits, index, size);

BUG_ON(!PageLocked(page));

if (page_has_buffers(page)) {
bh = page_buffers(page);
+ if (fatal_signal_pending(current) && __ratelimit(&last_warned2))
+ printk("%s:%u block=%llu index=%lu size=%u bh->b_size=%lu\n", __FILE__, __LINE__, block, index, size, bh->b_size);
if (bh->b_size == size) {
end_block = init_page_buffers(page, bdev,
(sector_t)index << sizebits,
@@ -1292,7 +1306,7 @@ __find_get_block(struct block_device *bdev, sector_t block, unsigned size)

if (bh == NULL) {
/* __find_get_block_slow will mark the page accessed */
- bh = __find_get_block_slow(bdev, block);
+ bh = __find_get_block_slow(bdev, block, size);
if (bh)
bh_lru_install(bh);
} else
----------------------------------------

----------------------------------------
[ 89.805981][ T7388] __find_get_block_slow: 1278192 callbacks suppressed
[ 89.805987][ T7388] fs/buffer.c:218 block=1 bd_inode->i_blkbits=12 index=1 size=512 page=00000000cd89a24d
[ 90.716164][ T7388] grow_dev_page: 1295496 callbacks suppressed
[ 90.716167][ T7388] fs/buffer.c:960 block=1 inode->i_blkbits=12 index=0 size=512
[ 90.735989][ T7388] grow_dev_page: 1296140 callbacks suppressed
[ 90.735991][ T7388] fs/buffer.c:967 block=1 index=0 size=512 bh->b_size=512
[ 90.815859][ T7388] __find_get_block_slow: 1312165 callbacks suppressed
[ 90.815862][ T7388] fs/buffer.c:218 block=1 bd_inode->i_blkbits=12 index=1 size=512 page=00000000cd89a24d
[ 91.726006][ T7388] grow_dev_page: 1306138 callbacks suppressed
[ 91.726022][ T7388] fs/buffer.c:960 block=1 inode->i_blkbits=12 index=0 size=512
[ 91.745856][ T7388] grow_dev_page: 1305423 callbacks suppressed
[ 91.745859][ T7388] fs/buffer.c:967 block=1 index=0 size=512 bh->b_size=512
[ 91.825701][ T7388] __find_get_block_slow: 1305252 callbacks suppressed
[ 91.825719][ T7388] fs/buffer.c:218 block=1 bd_inode->i_blkbits=12 index=1 size=512 page=00000000cd89a24d
[ 92.247976][ T1094] INFO: task a.out:7388 can't die for more than 60 seconds.
[ 92.251681][ T1094] a.out R running task 0 7388 1 0x80004004
[ 92.255528][ T1094] Call Trace:
[ 92.258142][ T1094] __schedule+0x2eb/0x600
[ 92.260950][ T1094] ? xas_start+0xa7/0x110
[ 92.263748][ T1094] ? xas_load+0x3d/0xd0
[ 92.266548][ T1094] ? xas_start+0xa7/0x110
[ 92.269598][ T1094] ? xas_load+0x3d/0xd0
[ 92.272499][ T1094] ? __find_get_block+0xa6/0x610
[ 92.275293][ T1094] ? _raw_spin_trylock+0x12/0x50
[ 92.278173][ T1094] ? ___ratelimit+0x28/0x1a0
[ 92.280865][ T1094] ? __getblk_gfp+0x307/0x490
[ 92.283687][ T1094] ? __bread_gfp+0x2d/0x130
[ 92.286857][ T1094] ? udf_tread+0x48/0x70
[ 92.289344][ T1094] ? udf_read_tagged+0x40/0x1a0
[ 92.292008][ T1094] ? udf_check_anchor_block+0x94/0x190
[ 92.294766][ T1094] ? udf_scan_anchors+0x12b/0x250
[ 92.297524][ T1094] ? udf_load_vrs+0x2b7/0x500
[ 92.300086][ T1094] ? udf_fill_super+0x32e/0x890
[ 92.302667][ T1094] ? snprintf+0x66/0x90
[ 92.305029][ T1094] ? mount_bdev+0x1c7/0x210
[ 92.307570][ T1094] ? udf_load_vrs+0x500/0x500
[ 92.309999][ T1094] ? udf_mount+0x34/0x40
[ 92.312295][ T1094] ? legacy_get_tree+0x2d/0x80
[ 92.314673][ T1094] ? vfs_get_tree+0x30/0x140
[ 92.317041][ T1094] ? do_mount+0x830/0xc30
[ 92.319463][ T1094] ? copy_mount_options+0x152/0x1c0
[ 92.321987][ T1094] ? ksys_mount+0xab/0x120
[ 92.324226][ T1094] ? __x64_sys_mount+0x26/0x30
[ 92.326956][ T1094] ? do_syscall_64+0x7c/0x1a0
[ 92.329278][ T1094] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 92.331880][ T1094] INFO: task a.out:7400 can't die for more than 60 seconds.
[ 92.334877][ T1094] a.out D 0 7400 1 0x00004004
[ 92.337732][ T1094] Call Trace:
[ 92.339595][ T1094] __schedule+0x2e3/0x600
[ 92.341668][ T1094] schedule+0x32/0xc0
[ 92.343578][ T1094] rwsem_down_read_failed+0xf0/0x1a0
[ 92.346142][ T1094] down_read+0x2f/0x40
[ 92.348047][ T1094] ? down_read+0x2f/0x40
[ 92.351045][ T1094] __get_super.part.12+0x113/0x140
[ 92.353673][ T1094] get_super+0x2d/0x40
[ 92.355705][ T1094] fsync_bdev+0x19/0x80
[ 92.357634][ T1094] invalidate_partition+0x35/0x60
[ 92.359829][ T1094] rescan_partitions+0x53/0x4b0
[ 92.361983][ T1094] __blkdev_reread_part+0x6a/0xa0
[ 92.364185][ T1094] blkdev_reread_part+0x24/0x40
[ 92.366804][ T1094] loop_reread_partitions+0x1e/0x60
[ 92.369377][ T1094] loop_set_status+0x4fa/0x5a0
[ 92.371571][ T1094] loop_set_status64+0x55/0x90
[ 92.373722][ T1094] lo_ioctl+0x651/0xc60
[ 92.375750][ T1094] ? loop_queue_work+0xdb0/0xdb0
[ 92.377951][ T1094] blkdev_ioctl+0xb69/0xc10
[ 92.380030][ T1094] block_ioctl+0x56/0x70
[ 92.381994][ T1094] ? blkdev_fallocate+0x230/0x230
[ 92.384240][ T1094] do_vfs_ioctl+0xc1/0x7e0
[ 92.386495][ T1094] ? tomoyo_file_ioctl+0x23/0x30
[ 92.388886][ T1094] ksys_ioctl+0x94/0xb0
[ 92.390889][ T1094] __x64_sys_ioctl+0x1e/0x30
[ 92.392963][ T1094] do_syscall_64+0x7c/0x1a0
[ 92.395057][ T1094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 92.397695][ T1094] RIP: 0033:0x7fc3dd4195d7
[ 92.399728][ T1094] Code: Bad RIP value.
[ 92.401773][ T1094] RSP: 002b:00007ffd221788b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 92.405194][ T1094] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3dd4195d7
[ 92.409004][ T1094] RDX: 00007ffd22178970 RSI: 0000000000004c04 RDI: 0000000000000004
[ 92.412289][ T1094] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[ 92.415315][ T1094] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fc3dd90e480
[ 92.418396][ T1094] R13: 00007ffd22178a60 R14: 00007ffd22178970 R15: 00007ffd22178b70
[ 92.421509][ T1094] INFO: task a.out:7400 blocked for more than 60 seconds.
[ 92.424267][ T1094] Not tainted 5.1.0+ #193
[ 92.426533][ T1094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 92.429861][ T1094] a.out D 0 7400 1 0x00004004
[ 92.432589][ T1094] Call Trace:
[ 92.434377][ T1094] __schedule+0x2e3/0x600
[ 92.436637][ T1094] schedule+0x32/0xc0
[ 92.438625][ T1094] rwsem_down_read_failed+0xf0/0x1a0
[ 92.441090][ T1094] down_read+0x2f/0x40
[ 92.443124][ T1094] ? down_read+0x2f/0x40
[ 92.445145][ T1094] __get_super.part.12+0x113/0x140
[ 92.447907][ T1094] get_super+0x2d/0x40
[ 92.449997][ T1094] fsync_bdev+0x19/0x80
[ 92.452049][ T1094] invalidate_partition+0x35/0x60
[ 92.454394][ T1094] rescan_partitions+0x53/0x4b0
[ 92.456792][ T1094] __blkdev_reread_part+0x6a/0xa0
[ 92.459063][ T1094] blkdev_reread_part+0x24/0x40
[ 92.461303][ T1094] loop_reread_partitions+0x1e/0x60
[ 92.463630][ T1094] loop_set_status+0x4fa/0x5a0
[ 92.465990][ T1094] loop_set_status64+0x55/0x90
[ 92.468377][ T1094] lo_ioctl+0x651/0xc60
[ 92.470605][ T1094] ? loop_queue_work+0xdb0/0xdb0
[ 92.472968][ T1094] blkdev_ioctl+0xb69/0xc10
[ 92.475107][ T1094] block_ioctl+0x56/0x70
[ 92.477221][ T1094] ? blkdev_fallocate+0x230/0x230
[ 92.479441][ T1094] do_vfs_ioctl+0xc1/0x7e0
[ 92.481720][ T1094] ? tomoyo_file_ioctl+0x23/0x30
[ 92.483973][ T1094] ksys_ioctl+0x94/0xb0
[ 92.486686][ T1094] __x64_sys_ioctl+0x1e/0x30
[ 92.489081][ T1094] do_syscall_64+0x7c/0x1a0
[ 92.491202][ T1094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 92.493565][ T1094] RIP: 0033:0x7fc3dd4195d7
[ 92.495690][ T1094] Code: Bad RIP value.
[ 92.497609][ T1094] RSP: 002b:00007ffd221788b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 92.500723][ T1094] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fc3dd4195d7
[ 92.503822][ T1094] RDX: 00007ffd22178970 RSI: 0000000000004c04 RDI: 0000000000000004
[ 92.506967][ T1094] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[ 92.510022][ T1094] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fc3dd90e480
[ 92.512985][ T1094] R13: 00007ffd22178a60 R14: 00007ffd22178970 R15: 00007ffd22178b70
[ 92.516039][ T1094] INFO: task probe-bcache:7402 blocked for more than 60 seconds.
[ 92.519171][ T1094] Not tainted 5.1.0+ #193
[ 92.521295][ T1094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 92.524469][ T1094] probe-bcache D 0 7402 7305 0x80000122
[ 92.527614][ T1094] Call Trace:
[ 92.529505][ T1094] __schedule+0x2e3/0x600
[ 92.531494][ T1094] schedule+0x32/0xc0
[ 92.533473][ T1094] schedule_preempt_disabled+0x13/0x20
[ 92.536036][ T1094] __mutex_lock.isra.8+0x16b/0x500
[ 92.538407][ T1094] __mutex_lock_slowpath+0xe/0x10
[ 92.540665][ T1094] ? __mutex_lock_slowpath+0xe/0x10
[ 92.542944][ T1094] mutex_lock+0x1d/0x20
[ 92.544945][ T1094] blkdev_put+0x22/0x190
[ 92.547434][ T1094] ? blkdev_put+0x190/0x190
[ 92.549519][ T1094] blkdev_close+0x26/0x30
[ 92.551667][ T1094] __fput+0xe9/0x300
[ 92.553711][ T1094] ____fput+0x15/0x20
[ 92.555690][ T1094] task_work_run+0xa9/0xd0
[ 92.557821][ T1094] do_exit+0x37a/0xf40
[ 92.559723][ T1094] ? __secure_computing+0x72/0x150
[ 92.562039][ T1094] ? syscall_trace_enter+0x10a/0x3b0
[ 92.564355][ T1094] do_group_exit+0x57/0xe0
[ 92.566780][ T1094] __x64_sys_exit_group+0x1c/0x20
[ 92.569307][ T1094] do_syscall_64+0x7c/0x1a0
[ 92.571461][ T1094] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 92.573874][ T1094] RIP: 0033:0x7f197f9e6e06
[ 92.576039][ T1094] Code: Bad RIP value.
[ 92.578264][ T1094] RSP: 002b:00007ffcfbcfce58 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 92.581582][ T1094] RAX: ffffffffffffffda RBX: 00007f197fce9740 RCX: 00007f197f9e6e06
[ 92.584878][ T1094] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 92.588253][ T1094] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff30
[ 92.591427][ T1094] R10: 0000000000000004 R11: 0000000000000246 R12: 00007f197fce9740
[ 92.594646][ T1094] R13: 0000000000000001 R14: 00007f197fcf2628 R15: 0000000000000000
----------------------------------------

----------------------------------------
[ 120.248453][ T7298] fs/buffer.c:960 block=7 inode->i_blkbits=12 index=3 size=2048
[ 120.318363][ T7298] grow_dev_page: 1514869 callbacks suppressed
[ 120.318365][ T7298] fs/buffer.c:967 block=7 index=3 size=2048 bh->b_size=2048
[ 120.328408][ T7298] __find_get_block_slow: 1515516 callbacks suppressed
[ 120.328410][ T7298] fs/buffer.c:210 block=7 bd_inode->i_blkbits=12 index=7 size=2048
[ 121.258324][ T7298] grow_dev_page: 1514197 callbacks suppressed
[ 121.258327][ T7298] fs/buffer.c:960 block=7 inode->i_blkbits=12 index=3 size=2048
[ 121.328271][ T7298] grow_dev_page: 1508348 callbacks suppressed
[ 121.328273][ T7298] fs/buffer.c:967 block=7 index=3 size=2048 bh->b_size=2048
[ 121.338232][ T7298] __find_get_block_slow: 1507633 callbacks suppressed
[ 121.338234][ T7298] fs/buffer.c:210 block=7 bd_inode->i_blkbits=12 index=7 size=2048
[ 122.268187][ T7298] grow_dev_page: 1469440 callbacks suppressed
[ 122.268190][ T7298] fs/buffer.c:960 block=7 inode->i_blkbits=12 index=3 size=2048
[ 122.268307][ T1095] INFO: task a.out:7298 can't die for more than 30 seconds.
[ 122.277909][ T1095] a.out R running task 0 7298 1 0x80004004
[ 122.281569][ T1095] Call Trace:
[ 122.283953][ T1095] ? xas_start+0xa7/0x110
[ 122.286658][ T1095] ? xas_load+0x3d/0xd0
[ 122.289248][ T1095] ? xas_start+0xa7/0x110
[ 122.291690][ T1095] ? xas_load+0x3d/0xd0
[ 122.294112][ T1095] ? find_get_entry+0x172/0x220
[ 122.296911][ T1095] ? preempt_count_add+0x80/0xc0
[ 122.299633][ T1095] ? __getblk_gfp+0x439/0x490
[ 122.302126][ T1095] ? __bread_gfp+0x2d/0x130
[ 122.305032][ T1095] ? udf_tread+0x48/0x70
[ 122.308252][ T1095] ? udf_read_tagged+0x40/0x1a0
[ 122.311864][ T1095] ? udf_check_anchor_block+0x94/0x190
[ 122.315502][ T1095] ? udf_scan_anchors+0x12b/0x250
[ 122.318995][ T1095] ? udf_load_vrs+0x2b7/0x500
[ 122.322278][ T1095] ? udf_fill_super+0x32e/0x890
[ 122.325533][ T1095] ? snprintf+0x66/0x90
[ 122.328862][ T1095] ? mount_bdev+0x1c7/0x210
[ 122.332195][ T1095] ? udf_load_vrs+0x500/0x500
[ 122.334499][ T1095] ? udf_mount+0x34/0x40
[ 122.336594][ T1095] ? legacy_get_tree+0x2d/0x80
[ 122.338071][ T7298] grow_dev_page: 1454158 callbacks suppressed
[ 122.338074][ T7298] fs/buffer.c:967 block=7 index=3 size=2048 bh->b_size=2048
[ 122.338945][ T1095] ? vfs_get_tree+0x30/0x140
[ 122.346797][ T1095] ? do_mount+0x830/0xc30
[ 122.348091][ T7298] __find_get_block_slow: 1459190 callbacks suppressed
[ 122.348093][ T7298] fs/buffer.c:210 block=7 bd_inode->i_blkbits=12 index=7 size=2048
[ 122.349493][ T1095] ? copy_mount_options+0x152/0x1c0
[ 122.357382][ T1095] ? ksys_mount+0xab/0x120
[ 122.359788][ T1095] ? __x64_sys_mount+0x26/0x30
[ 122.362247][ T1095] ? do_syscall_64+0x7c/0x1a0
[ 122.364517][ T1095] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 122.367426][ T1095] INFO: task systemd-udevd:7302 blocked for more than 30 seconds.
[ 122.371445][ T1095] Not tainted 5.1.0+ #193
[ 122.373967][ T1095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 122.377365][ T1095] systemd-udevd D 0 7302 5781 0x00000120
[ 122.380380][ T1095] Call Trace:
[ 122.382268][ T1095] __schedule+0x2e3/0x600
[ 122.384422][ T1095] schedule+0x32/0xc0
[ 122.386491][ T1095] schedule_preempt_disabled+0x13/0x20
[ 122.389047][ T1095] __mutex_lock.isra.8+0x16b/0x500
[ 122.391466][ T1095] ? get_disk_and_module+0x80/0x80
[ 122.393879][ T1095] __mutex_lock_slowpath+0xe/0x10
[ 122.396457][ T1095] ? __mutex_lock_slowpath+0xe/0x10
[ 122.398992][ T1095] mutex_lock+0x1d/0x20
[ 122.401200][ T1095] __blkdev_get+0x9a/0x720
[ 122.403366][ T1095] blkdev_get+0x18c/0x440
[ 122.405591][ T1095] ? preempt_count_add+0x80/0xc0
[ 122.407966][ T1095] blkdev_open+0xbd/0xf0
[ 122.410221][ T1095] do_dentry_open+0x2b2/0x500
[ 122.412756][ T1095] ? bd_acquire+0x120/0x120
[ 122.415044][ T1095] vfs_open+0x38/0x40
[ 122.417109][ T1095] path_openat+0x3c9/0x1a90
[ 122.419546][ T1095] ? ptep_set_access_flags+0x52/0x70
[ 122.422031][ T1095] do_filp_open+0xb9/0x150
[ 122.424222][ T1095] ? strncpy_from_user+0x19f/0x230
[ 122.426670][ T1095] ? _raw_spin_unlock+0x1f/0x40
[ 122.429675][ T1095] do_sys_open+0x2a2/0x380
[ 122.431965][ T1095] ? do_sys_open+0x2a2/0x380
[ 122.434174][ T1095] __x64_sys_openat+0x24/0x30
[ 122.436449][ T1095] do_syscall_64+0x7c/0x1a0
[ 122.438747][ T1095] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 122.441343][ T1095] RIP: 0033:0x7f27aff41c8e
[ 122.443511][ T1095] Code: Bad RIP value.
[ 122.445632][ T1095] RSP: 002b:00007ffcabf214b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 122.449006][ T1095] RAX: ffffffffffffffda RBX: 00005639f5178280 RCX: 00007f27aff41c8e
[ 122.452025][ T1095] RDX: 0000000000080000 RSI: 00005639f5179310 RDI: 00000000ffffff9c
[ 122.455064][ T1095] RBP: 0000000000000001 R08: 00005639f3dba346 R09: 00007f27b021dc40
[ 122.458167][ T1095] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 122.461701][ T1095] R13: 0000000000000000 R14: 00005639f517ab50 R15: 00000000ffffffff
[ 122.465636][ T1095] INFO: task a.out:7319 can't die for more than 30 seconds.
[ 122.469411][ T1095] a.out D 0 7319 1 0x00004004
[ 122.473056][ T1095] Call Trace:
[ 122.475416][ T1095] __schedule+0x2e3/0x600
[ 122.478216][ T1095] ? __switch_to_asm+0x34/0x70
[ 122.480823][ T1095] schedule+0x32/0xc0
[ 122.482688][ T1095] rwsem_down_read_failed+0xf0/0x1a0
[ 122.484951][ T1095] down_read+0x2f/0x40
[ 122.486819][ T1095] ? down_read+0x2f/0x40
[ 122.488965][ T1095] __get_super.part.12+0x113/0x140
[ 122.491230][ T1095] get_super+0x2d/0x40
[ 122.493112][ T1095] fsync_bdev+0x19/0x80
[ 122.495157][ T1095] invalidate_partition+0x35/0x60
[ 122.497444][ T1095] rescan_partitions+0x53/0x4b0
[ 122.499634][ T1095] __blkdev_reread_part+0x6a/0xa0
[ 122.501926][ T1095] blkdev_reread_part+0x24/0x40
[ 122.504018][ T1095] loop_reread_partitions+0x1e/0x60
[ 122.506257][ T1095] loop_set_status+0x4fa/0x5a0
[ 122.508836][ T1095] loop_set_status64+0x55/0x90
[ 122.510962][ T1095] lo_ioctl+0x651/0xc60
[ 122.513809][ T1095] ? loop_queue_work+0xdb0/0xdb0
[ 122.516747][ T1095] blkdev_ioctl+0xb69/0xc10
[ 122.519569][ T1095] block_ioctl+0x56/0x70
[ 122.522421][ T1095] ? blkdev_fallocate+0x230/0x230
[ 122.524723][ T1095] do_vfs_ioctl+0xc1/0x7e0
[ 122.526859][ T1095] ? tomoyo_file_ioctl+0x23/0x30
[ 122.529657][ T1095] ksys_ioctl+0x94/0xb0
[ 122.531694][ T1095] __x64_sys_ioctl+0x1e/0x30
[ 122.533872][ T1095] do_syscall_64+0x7c/0x1a0
[ 122.536038][ T1095] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 122.538589][ T1095] RIP: 0033:0x7fe9a73645d7
[ 122.540708][ T1095] Code: Bad RIP value.
[ 122.542701][ T1095] RSP: 002b:00007ffe29d27c78 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 122.546144][ T1095] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe9a73645d7
[ 122.549752][ T1095] RDX: 00007ffe29d27d30 RSI: 0000000000004c04 RDI: 0000000000000004
[ 122.552970][ T1095] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[ 122.556034][ T1095] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fe9a7859480
[ 122.559177][ T1095] R13: 00007ffe29d27e20 R14: 00007ffe29d27d30 R15: 00007ffe29d27f30
[ 122.562486][ T1095] INFO: task a.out:7319 blocked for more than 30 seconds.
[ 122.565427][ T1095] Not tainted 5.1.0+ #193
[ 122.567733][ T1095] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 122.571274][ T1095] a.out D 0 7319 1 0x00004004
[ 122.573997][ T1095] Call Trace:
[ 122.575858][ T1095] __schedule+0x2e3/0x600
[ 122.578070][ T1095] ? __switch_to_asm+0x34/0x70
[ 122.580532][ T1095] schedule+0x32/0xc0
[ 122.582647][ T1095] rwsem_down_read_failed+0xf0/0x1a0
[ 122.585117][ T1095] down_read+0x2f/0x40
[ 122.587168][ T1095] ? down_read+0x2f/0x40
[ 122.589792][ T1095] __get_super.part.12+0x113/0x140
[ 122.592223][ T1095] get_super+0x2d/0x40
[ 122.594273][ T1095] fsync_bdev+0x19/0x80
[ 122.596558][ T1095] invalidate_partition+0x35/0x60
[ 122.598959][ T1095] rescan_partitions+0x53/0x4b0
[ 122.601206][ T1095] __blkdev_reread_part+0x6a/0xa0
[ 122.603587][ T1095] blkdev_reread_part+0x24/0x40
[ 122.605875][ T1095] loop_reread_partitions+0x1e/0x60
[ 122.608326][ T1095] loop_set_status+0x4fa/0x5a0
[ 122.610707][ T1095] loop_set_status64+0x55/0x90
[ 122.613168][ T1095] lo_ioctl+0x651/0xc60
[ 122.615247][ T1095] ? loop_queue_work+0xdb0/0xdb0
[ 122.617606][ T1095] blkdev_ioctl+0xb69/0xc10
[ 122.619987][ T1095] block_ioctl+0x56/0x70
[ 122.622083][ T1095] ? blkdev_fallocate+0x230/0x230
[ 122.624626][ T1095] do_vfs_ioctl+0xc1/0x7e0
[ 122.627152][ T1095] ? tomoyo_file_ioctl+0x23/0x30
[ 122.631102][ T1095] ksys_ioctl+0x94/0xb0
[ 122.634123][ T1095] __x64_sys_ioctl+0x1e/0x30
[ 122.636641][ T1095] do_syscall_64+0x7c/0x1a0
[ 122.638877][ T1095] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 122.641414][ T1095] RIP: 0033:0x7fe9a73645d7
[ 122.643578][ T1095] Code: Bad RIP value.
[ 122.645736][ T1095] RSP: 002b:00007ffe29d27c78 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[ 122.649091][ T1095] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe9a73645d7
[ 122.652392][ T1095] RDX: 00007ffe29d27d30 RSI: 0000000000004c04 RDI: 0000000000000004
[ 122.655581][ T1095] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[ 122.658769][ T1095] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fe9a7859480
[ 122.662073][ T1095] R13: 00007ffe29d27e20 R14: 00007ffe29d27d30 R15: 00007ffe29d27f30
----------------------------------------

Tetsuo Handa

unread,
May 14, 2019, 9:02:58 PM5/14/19
to Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Since lo_ioctl() is calling sb_set_blocksize() immediately after udf_load_vrs()
called sb_set_blocksize(), udf_tread() can't use expected i_blkbits settings...

[ 48.685672][ T7322] fs/block_dev.c:135 bdev=0000000014fa0ec2 12 -> 9
[ 48.694675][ T7322] CPU: 4 PID: 7322 Comm: a.out Not tainted 5.1.0+ #196
[ 48.701321][ T7322] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 48.710265][ T7322] Call Trace:
[ 48.710272][ T7322] dump_stack+0xaa/0xd8
[ 48.715633][ T7322] set_blocksize+0xff/0x140
[ 48.822094][ T7322] sb_set_blocksize+0x27/0x70
[ 48.824843][ T7322] udf_load_vrs+0x4b/0x500
[ 48.827322][ T7322] udf_fill_super+0x32e/0x890
[ 48.830125][ T7322] ? snprintf+0x66/0x90
[ 48.832572][ T7322] mount_bdev+0x1c7/0x210
[ 48.835293][ T7322] ? udf_load_vrs+0x500/0x500
[ 48.838009][ T7322] udf_mount+0x34/0x40
[ 48.840504][ T7322] legacy_get_tree+0x2d/0x80
[ 48.843192][ T7322] vfs_get_tree+0x30/0x140
[ 48.845787][ T7322] do_mount+0x830/0xc30
[ 48.848325][ T7322] ? copy_mount_options+0x152/0x1c0
[ 48.851066][ T7322] ksys_mount+0xab/0x120
[ 48.853627][ T7322] __x64_sys_mount+0x26/0x30
[ 48.856168][ T7322] do_syscall_64+0x7c/0x1a0
[ 48.858943][ T7322] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 48.978376][ T7332] fs/block_dev.c:135 bdev=0000000014fa0ec2 9 -> 12
[ 49.079394][ T7332] CPU: 6 PID: 7332 Comm: a.out Not tainted 5.1.0+ #196
[ 49.082769][ T7332] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 49.089007][ T7332] Call Trace:
[ 49.091410][ T7332] dump_stack+0xaa/0xd8
[ 49.094053][ T7332] set_blocksize+0xff/0x140
[ 49.096734][ T7332] lo_ioctl+0x570/0xc60
[ 49.099366][ T7332] ? loop_queue_work+0xdb0/0xdb0
[ 49.102079][ T7332] blkdev_ioctl+0xb69/0xc10
[ 49.104667][ T7332] block_ioctl+0x56/0x70
[ 49.107267][ T7332] ? blkdev_fallocate+0x230/0x230
[ 49.110035][ T7332] do_vfs_ioctl+0xc1/0x7e0
[ 49.112728][ T7332] ? tomoyo_file_ioctl+0x23/0x30
[ 49.115452][ T7332] ksys_ioctl+0x94/0xb0
[ 49.118008][ T7332] __x64_sys_ioctl+0x1e/0x30
[ 49.120686][ T7332] do_syscall_64+0x7c/0x1a0
[ 49.123470][ T7332] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 49.163269][ T7322] UDF-fs: error (device loop2): udf_read_tagged: read failed, block=256, location=256
[ 51.009494][ T7322] grow_dev_page: 5 callbacks suppressed
[ 51.009509][ T7322] fs/buffer.c:960 bdev=0000000014fa0ec2 block=32 inode->i_blkbits=12 index=4 size=512
[ 51.016583][ T7322] fs/buffer.c:967 bdev=0000000014fa0ec2 block=32 index=4 size=512 bh->b_size=512

[ 55.046361][ T7322] grow_dev_page: 1177059 callbacks suppressed
[ 55.046366][ T7322] fs/buffer.c:960 bdev=0000000014fa0ec2 block=32 inode->i_blkbits=12 index=4 size=512
[ 55.054903][ T7322] grow_dev_page: 1177037 callbacks suppressed
[ 55.054906][ T7322] fs/buffer.c:967 bdev=0000000014fa0ec2 block=32 index=4 size=512 bh->b_size=512
[ 56.055187][ T7322] grow_dev_page: 1279065 callbacks suppressed
[ 56.055191][ T7322] fs/buffer.c:960 bdev=0000000014fa0ec2 block=32 inode->i_blkbits=12 index=4 size=512
[ 56.064443][ T7322] grow_dev_page: 1279065 callbacks suppressed
[ 56.064446][ T7322] fs/buffer.c:967 bdev=0000000014fa0ec2 block=32 index=4 size=512 bh->b_size=512

[ 92.282798][ T1097] INFO: task a.out:7322 can't die for more than 30 seconds.
[ 92.285926][ T1097] a.out R running task 0 7322 1 0x80004004
[ 92.289143][ T1097] Call Trace:
[ 92.291175][ T1097] __schedule+0x2eb/0x600
[ 92.293354][ T1097] ? apic_timer_interrupt+0xa/0x20
[ 92.295802][ T1097] ? xas_load+0x13/0xd0
[ 92.298079][ T1097] ? __switch_to_asm+0x40/0x70
[ 92.300466][ T1097] ? __switch_to_asm+0x34/0x70
[ 92.302890][ T1097] ? __switch_to_asm+0x40/0x70
[ 92.305299][ T1097] ? __switch_to_asm+0x34/0x70
[ 92.307621][ T1097] ? __switch_to_asm+0x40/0x70
[ 92.309908][ T1097] ? __switch_to_asm+0x34/0x70
[ 92.312161][ T1097] ? debug_smp_processor_id+0x2b/0x130
[ 92.314532][ T1097] ? xas_start+0xa7/0x110
[ 92.317139][ T1097] ? xas_load+0x3d/0xd0
[ 92.319674][ T1097] ? xas_start+0xa7/0x110
[ 92.321746][ T1097] ? xas_load+0x3d/0xd0
[ 92.323831][ T1097] ? find_get_entry+0x172/0x220
[ 92.326115][ T1097] ? __this_cpu_preempt_check+0x37/0x120
[ 92.328714][ T1097] ? __find_get_block+0x485/0x610
[ 92.331096][ T1097] ? _raw_spin_trylock+0x12/0x50
[ 92.333402][ T1097] ? __getblk_gfp+0xd9/0x4a0
[ 92.335617][ T1097] ? __bread_gfp+0x2d/0x130
[ 92.338410][ T1097] ? udf_tread+0x48/0x70
[ 92.341144][ T1097] ? udf_read_tagged+0x40/0x1a0
[ 92.343537][ T1097] ? udf_check_anchor_block+0x94/0x190
[ 92.346054][ T1097] ? udf_scan_anchors+0x12b/0x250
[ 92.348480][ T1097] ? udf_load_vrs+0x2b7/0x500
[ 92.350732][ T1097] ? udf_fill_super+0x32e/0x890
[ 92.353172][ T1097] ? snprintf+0x66/0x90
[ 92.355256][ T1097] ? mount_bdev+0x1c7/0x210
[ 92.358214][ T1097] ? udf_load_vrs+0x500/0x500
[ 92.361251][ T1097] ? udf_mount+0x34/0x40
[ 92.363661][ T1097] ? legacy_get_tree+0x2d/0x80
[ 92.366298][ T1097] ? vfs_get_tree+0x30/0x140
[ 92.368955][ T1097] ? do_mount+0x830/0xc30
[ 92.371468][ T1097] ? copy_mount_options+0x152/0x1c0
[ 92.374098][ T1097] ? ksys_mount+0xab/0x120
[ 92.376605][ T1097] ? __x64_sys_mount+0x26/0x30
[ 92.379553][ T1097] ? do_syscall_64+0x7c/0x1a0
[ 92.381615][ T1097] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 92.383942][ T1097] INFO: task a.out:7332 can't die for more than 30 seconds.
[ 92.387037][ T1097] a.out D 0 7332 1 0x00004004
[ 92.389621][ T1097] Call Trace:
[ 92.391297][ T1097] __schedule+0x2e3/0x600
[ 92.393313][ T1097] ? __switch_to_asm+0x40/0x70
[ 92.395486][ T1097] schedule+0x32/0xc0
[ 92.404871][ T1097] rwsem_down_read_failed+0xf0/0x1a0
[ 92.407605][ T1097] down_read+0x2f/0x40
[ 92.417196][ T1097] ? down_read+0x2f/0x40
[ 92.419579][ T1097] __get_super.part.12+0x113/0x140
[ 92.422165][ T1097] get_super+0x2d/0x40
[ 92.424559][ T1097] fsync_bdev+0x19/0x80
[ 92.427012][ T1097] invalidate_partition+0x35/0x60
[ 92.429588][ T1097] rescan_partitions+0x53/0x4b0
[ 92.432137][ T1097] __blkdev_reread_part+0x6a/0xa0
[ 92.434856][ T1097] blkdev_reread_part+0x24/0x40
[ 92.438203][ T1097] loop_reread_partitions+0x1e/0x60
[ 92.440909][ T1097] loop_set_status+0x4fa/0x5a0
[ 92.443427][ T1097] loop_set_status64+0x55/0x90
[ 92.445962][ T1097] lo_ioctl+0x651/0xc60
[ 92.448427][ T1097] ? loop_queue_work+0xdb0/0xdb0
[ 92.451260][ T1097] blkdev_ioctl+0xb69/0xc10
[ 92.453771][ T1097] block_ioctl+0x56/0x70
[ 92.456284][ T1097] ? blkdev_fallocate+0x230/0x230
[ 92.458957][ T1097] do_vfs_ioctl+0xc1/0x7e0
[ 92.461639][ T1097] ? tomoyo_file_ioctl+0x23/0x30
[ 92.464071][ T1097] ksys_ioctl+0x94/0xb0
[ 92.466394][ T1097] __x64_sys_ioctl+0x1e/0x30
[ 92.468879][ T1097] do_syscall_64+0x7c/0x1a0
[ 92.471339][ T1097] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Jan Kara

unread,
May 15, 2019, 6:21:37 AM5/15/19
to Tetsuo Handa, Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, linux...@vger.kernel.org
On Wed 15-05-19 10:02:37, Tetsuo Handa wrote:
> Since lo_ioctl() is calling sb_set_blocksize() immediately after udf_load_vrs()
> called sb_set_blocksize(), udf_tread() can't use expected i_blkbits settings...

Thanks for debugging this but this doesn't quiet make sense to me. See
below:

> [ 48.685672][ T7322] fs/block_dev.c:135 bdev=0000000014fa0ec2 12 -> 9
> [ 48.694675][ T7322] CPU: 4 PID: 7322 Comm: a.out Not tainted 5.1.0+ #196
> [ 48.701321][ T7322] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [ 48.710265][ T7322] Call Trace:
> [ 48.710272][ T7322] dump_stack+0xaa/0xd8
> [ 48.715633][ T7322] set_blocksize+0xff/0x140
> [ 48.822094][ T7322] sb_set_blocksize+0x27/0x70
> [ 48.824843][ T7322] udf_load_vrs+0x4b/0x500
> [ 48.827322][ T7322] udf_fill_super+0x32e/0x890
> [ 48.830125][ T7322] ? snprintf+0x66/0x90
> [ 48.832572][ T7322] mount_bdev+0x1c7/0x210
> [ 48.835293][ T7322] ? udf_load_vrs+0x500/0x500
> [ 48.838009][ T7322] udf_mount+0x34/0x40
> [ 48.840504][ T7322] legacy_get_tree+0x2d/0x80
> [ 48.843192][ T7322] vfs_get_tree+0x30/0x140
> [ 48.845787][ T7322] do_mount+0x830/0xc30
> [ 48.848325][ T7322] ? copy_mount_options+0x152/0x1c0
> [ 48.851066][ T7322] ksys_mount+0xab/0x120
> [ 48.853627][ T7322] __x64_sys_mount+0x26/0x30
> [ 48.856168][ T7322] do_syscall_64+0x7c/0x1a0
> [ 48.858943][ T7322] entry_SYSCALL_64_after_hwframe+0x44/0xa9

So this is normal - UDF sets block size it wants on the device during
mount. Now we have the block device exclusively open so nobody should be
changing it.

> [ 48.978376][ T7332] fs/block_dev.c:135 bdev=0000000014fa0ec2 9 -> 12
> [ 49.079394][ T7332] CPU: 6 PID: 7332 Comm: a.out Not tainted 5.1.0+ #196
> [ 49.082769][ T7332] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
> [ 49.089007][ T7332] Call Trace:
> [ 49.091410][ T7332] dump_stack+0xaa/0xd8
> [ 49.094053][ T7332] set_blocksize+0xff/0x140
> [ 49.096734][ T7332] lo_ioctl+0x570/0xc60
> [ 49.099366][ T7332] ? loop_queue_work+0xdb0/0xdb0
> [ 49.102079][ T7332] blkdev_ioctl+0xb69/0xc10
> [ 49.104667][ T7332] block_ioctl+0x56/0x70
> [ 49.107267][ T7332] ? blkdev_fallocate+0x230/0x230
> [ 49.110035][ T7332] do_vfs_ioctl+0xc1/0x7e0
> [ 49.112728][ T7332] ? tomoyo_file_ioctl+0x23/0x30
> [ 49.115452][ T7332] ksys_ioctl+0x94/0xb0
> [ 49.118008][ T7332] __x64_sys_ioctl+0x1e/0x30
> [ 49.120686][ T7332] do_syscall_64+0x7c/0x1a0
> [ 49.123470][ T7332] entry_SYSCALL_64_after_hwframe+0x44/0xa9

And this is strange. set_blocksize() is only called from loop_set_fd() but
that means that the loop device must already be in lo->lo_state ==
Lo_unbound. But loop device that is being used should never be in
Lo_unbound state... Except if... Oh, I see what the problem is:

UDF opens unbound loop device (through mount_bdev() ->
blkdev_get_by_path()). That succeeds as loop allows opens on unbound
devices so that ioctl can be run to set it up. UDF sets block size for the
block device. Someone else comes and calls LOOP_SET_FD for the device and
plop, block device block size changes under UDF's hands.

The question is how to fix this problem. The simplest fix I can see is that
we'd just refuse to do LOOP_SET_FD if someone has the block device
exclusively open as there are high chances such user will be unpleasantly
surprised by the device changing under him. OTOH this has some potential
for userspace visible regressions. But I guess it's worth a try. Something
like attached patch?

Let syzbot test the patch as well:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1

Honza

--
Jan Kara <ja...@suse.com>
SUSE Labs, CR
0001-loop-Don-t-change-loop-device-under-exclusive-opener.patch

Tetsuo Handa

unread,
May 15, 2019, 7:32:45 AM5/15/19
to Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, linux...@vger.kernel.org
On 2019/05/15 19:21, Jan Kara wrote:
> The question is how to fix this problem. The simplest fix I can see is that
> we'd just refuse to do LOOP_SET_FD if someone has the block device
> exclusively open as there are high chances such user will be unpleasantly
> surprised by the device changing under him. OTOH this has some potential
> for userspace visible regressions. But I guess it's worth a try. Something
> like attached patch?

(1) If I understand correctly, FMODE_EXCL is set at blkdev_open() only if O_EXCL
is specified. How can we detect if O_EXCL was not used, for the reproducer
( https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000 ) is not
using O_EXCL ?

(2) There seems to be no serialization. What guarantees that mount_bdev()
does not start due to preempted after the check added by this patch?

syzbot

unread,
May 15, 2019, 7:59:01 AM5/15/19
to ax...@kernel.dk, dvy...@google.com, ja...@suse.cz, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Reported-and-tested-by:
syzbot+10007d...@syzkaller.appspotmail.com

Tested on:

commit: e93c9c99 Linux 5.1
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1
kernel config: https://syzkaller.appspot.com/x/.config?x=5edd1df52e9bc982
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=133626d8a00000

Note: testing is done by a robot and is best-effort only.

Jan Kara

unread,
May 15, 2019, 9:07:35 AM5/15/19
to Tetsuo Handa, Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, linux...@vger.kernel.org
On Wed 15-05-19 20:32:27, Tetsuo Handa wrote:
> On 2019/05/15 19:21, Jan Kara wrote:
> > The question is how to fix this problem. The simplest fix I can see is that
> > we'd just refuse to do LOOP_SET_FD if someone has the block device
> > exclusively open as there are high chances such user will be unpleasantly
> > surprised by the device changing under him. OTOH this has some potential
> > for userspace visible regressions. But I guess it's worth a try. Something
> > like attached patch?
>
> (1) If I understand correctly, FMODE_EXCL is set at blkdev_open() only if
> O_EXCL is specified.

Yes.

> How can we detect if O_EXCL was not used, for the reproducer (
> https://syzkaller.appspot.com/text?tag=ReproC&x=135385a8a00000 ) is not
> using O_EXCL ?

mount_bdev() is using O_EXCL and that's what matters.

> (2) There seems to be no serialization. What guarantees that mount_bdev()
> does not start due to preempted after the check added by this patch?

That's a good question. lo_ctl_mutex actually synchronizes most of this
(taken in both loop_set_fd() and lo_open()) but you're right that there's
still a small race window where loop_set_fd() need not see bdev->bd_holders
elevated while blkdev_get() will succeed. So I need to think a bit more
about proper synchronization of this.

Jan Kara

unread,
May 16, 2019, 7:48:21 AM5/16/19
to Tetsuo Handa, Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, linux...@vger.kernel.org
OK, so non-racy fix was a bit more involved and I've ended up just
upgrading the file reference to an exclusive one in loop_set_fd() instead
of trying to hand-craft some locking solution. The result is attached and
it passes blktests.

Let syzbot also test it:
0001-loop-Don-t-change-loop-device-under-exclusive-opener.patch

Tetsuo Handa

unread,
May 16, 2019, 8:17:42 AM5/16/19
to Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, linux...@vger.kernel.org
On 2019/05/16 20:48, Jan Kara wrote:
> OK, so non-racy fix was a bit more involved and I've ended up just
> upgrading the file reference to an exclusive one in loop_set_fd() instead
> of trying to hand-craft some locking solution. The result is attached and
> it passes blktests.

blkdev_get() has corresponding blkdev_put().
bdgrab() does not have corresponding bdput() ?

Jan Kara

unread,
May 16, 2019, 8:32:03 AM5/16/19
to Tetsuo Handa, Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, linux...@vger.kernel.org
Yes, and that's hidden inside blkdev_put() (or failing blkdev_get()). Don't
get me started on calling conventions of these functions... I've wasted half
an hour trying to figure out where I'm leaking inode references in my patch
;).

syzbot

unread,
May 16, 2019, 8:33:01 AM5/16/19
to ax...@kernel.dk, dvy...@google.com, ja...@suse.cz, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, penguin...@i-love.sakura.ne.jp, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Reported-and-tested-by:
syzbot+10007d...@syzkaller.appspotmail.com

Tested on:

commit: e93c9c99 Linux 5.1
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git v5.1
kernel config: https://syzkaller.appspot.com/x/.config?x=5edd1df52e9bc982
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
patch: https://syzkaller.appspot.com/x/patch.diff?x=135c5b02a00000

Tetsuo Handa

unread,
May 16, 2019, 8:50:59 AM5/16/19
to Jan Kara, Jens Axboe, Alexander Viro, syzbot, dvy...@google.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, linux...@vger.kernel.org
Ah, found tricky comment. Please apply the patch. Thank you.

/**
* blkdev_get - open a block device
* @bdev: block_device to open
* @mode: FMODE_* mask
* @holder: exclusive holder identifier
*
* Open @bdev with @mode. If @mode includes %FMODE_EXCL, @bdev is
* open with exclusive access. Specifying %FMODE_EXCL with %NULL
* @holder is invalid. Exclusive opens may nest for the same @holder.
*
* On success, the reference count of @bdev is unchanged. On failure,
* @bdev is put.
*
* CONTEXT:
* Might sleep.
*
* RETURNS:
* 0 on success, -errno on failure.
*/

Tetsuo Handa

unread,
May 31, 2019, 10:18:13 AM5/31/19
to Dmitry Vyukov, syzkaller-bugs, syzkaller
On 2019/05/13 19:34, Dmitry Vyukov wrote:
> The current way you do it is probably fine. We will see. Hard to tell
> how many duplicates and these unkillable tasks we will see ahead of
> time. Hung tasks tend to produce lots of duplicates anyway and
> probably require some more fundamental solution.
> Please send the patch to kernel mailing lists for official review.
>

F.Y.I. The patch is in linux-next since next-20190529. I think that
there should be already reports with messages by this patch...
Reply all
Reply to author
Forward
0 new messages