possible deadlock in generic_file_write

syzbot

unread,

Nov 5, 2017, 5:25:02 AM11/5/17

to ak...@linux-foundation.org, han...@cmpxchg.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk

Hello,

syzkaller hit the following crash on
7a95bdb092c66b6473aa2fc848862ae557ab08f7
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.
C reproducer is attached
syzkaller reproducer is attached. See https://goo.gl/kgGztJ
for information about syzkaller reproducers

======================================================
WARNING: possible circular locking dependency detected
4.13.0+ #84 Not tainted
------------------------------------------------------
loop0/2986 is trying to acquire lock:
(&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff8186f9ec>] inode_lock
include/linux/fs.h:712 [inline]
(&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff8186f9ec>]
generic_file_write_iter+0xdc/0x7a0 mm/filemap.c:3151

but now in release context of a crosslock acquired at the following:
((complete)&ret.event){+.+.}, at: [<ffffffff822a055e>]
submit_bio_wait+0x15e/0x200 block/bio.c:953

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 ((complete)&ret.event){+.+.}:
check_prevs_add kernel/locking/lockdep.c:2020 [inline]
validate_chain kernel/locking/lockdep.c:2469 [inline]
__lock_acquire+0x328f/0x4620 kernel/locking/lockdep.c:3498
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
complete_acquire include/linux/completion.h:39 [inline]
__wait_for_common kernel/sched/completion.c:108 [inline]
wait_for_common_io kernel/sched/completion.c:128 [inline]
wait_for_completion_io+0xc8/0x770 kernel/sched/completion.c:176
submit_bio_wait+0x15e/0x200 block/bio.c:953
blkdev_issue_zeroout+0x13c/0x1d0 block/blk-lib.c:370
sb_issue_zeroout include/linux/blkdev.h:1367 [inline]
ext4_init_inode_table+0x4fd/0xdb1 fs/ext4/ialloc.c:1447
ext4_run_li_request fs/ext4/super.c:2867 [inline]
ext4_lazyinit_thread+0x81a/0xd40 fs/ext4/super.c:2961
kthread+0x39c/0x470 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

-> #3 (&meta_group_info[i]->alloc_sem){++++}:
check_prevs_add kernel/locking/lockdep.c:2020 [inline]
validate_chain kernel/locking/lockdep.c:2469 [inline]
__lock_acquire+0x328f/0x4620 kernel/locking/lockdep.c:3498
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
down_read+0x96/0x150 kernel/locking/rwsem.c:23
__ext4_new_inode+0x26dc/0x4f00 fs/ext4/ialloc.c:1056
ext4_symlink+0x2d9/0xae0 fs/ext4/namei.c:3118
vfs_symlink+0x323/0x560 fs/namei.c:4116
SYSC_symlinkat fs/namei.c:4143 [inline]
SyS_symlinkat fs/namei.c:4123 [inline]
SYSC_symlink fs/namei.c:4156 [inline]
SyS_symlink+0x134/0x200 fs/namei.c:4154
entry_SYSCALL_64_fastpath+0x1f/0xbe

-> #2 (jbd2_handle){.+.+}:
check_prevs_add kernel/locking/lockdep.c:2020 [inline]
validate_chain kernel/locking/lockdep.c:2469 [inline]
__lock_acquire+0x328f/0x4620 kernel/locking/lockdep.c:3498
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
start_this_handle+0x4b8/0x1080 fs/jbd2/transaction.c:390
jbd2__journal_start+0x389/0x9f0 fs/jbd2/transaction.c:444
__ext4_journal_start_sb+0x15f/0x550 fs/ext4/ext4_jbd2.c:80
__ext4_journal_start fs/ext4/ext4_jbd2.h:314 [inline]
ext4_dirty_inode+0x56/0xa0 fs/ext4/inode.c:5859
__mark_inode_dirty+0x912/0x1170 fs/fs-writeback.c:2096
generic_update_time+0x1b2/0x270 fs/inode.c:1649
update_time fs/inode.c:1665 [inline]
touch_atime+0x26d/0x2f0 fs/inode.c:1737
file_accessed include/linux/fs.h:2036 [inline]
ext4_file_mmap+0x161/0x1b0 fs/ext4/file.c:350
call_mmap include/linux/fs.h:1748 [inline]
mmap_region+0xa99/0x15a0 mm/mmap.c:1690
do_mmap+0x6a1/0xd50 mm/mmap.c:1468
do_mmap_pgoff include/linux/mm.h:2150 [inline]
vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1476
SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
entry_SYSCALL_64_fastpath+0x1f/0xbe

-> #1 (&mm->mmap_sem){++++}:
check_prevs_add kernel/locking/lockdep.c:2020 [inline]
validate_chain kernel/locking/lockdep.c:2469 [inline]
__lock_acquire+0x328f/0x4620 kernel/locking/lockdep.c:3498
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
__might_fault+0x13a/0x1d0 mm/memory.c:4502
_copy_to_user+0x2c/0xc0 lib/usercopy.c:24
copy_to_user include/linux/uaccess.h:154 [inline]
filldir+0x1a7/0x320 fs/readdir.c:196
dir_emit_dot include/linux/fs.h:3314 [inline]
dir_emit_dots include/linux/fs.h:3325 [inline]
dcache_readdir+0x12d/0x5e0 fs/libfs.c:192
iterate_dir+0x4b2/0x5d0 fs/readdir.c:51
SYSC_getdents fs/readdir.c:231 [inline]
SyS_getdents+0x225/0x450 fs/readdir.c:212
entry_SYSCALL_64_fastpath+0x1f/0xbe

-> #0 (&sb->s_type->i_mutex_key#9){++++}:
down_write+0x87/0x120 kernel/locking/rwsem.c:53
inode_lock include/linux/fs.h:712 [inline]
generic_file_write_iter+0xdc/0x7a0 mm/filemap.c:3151
call_write_iter include/linux/fs.h:1743 [inline]
do_iter_readv_writev+0x531/0x7f0 fs/read_write.c:650
do_iter_write+0x15a/0x540 fs/read_write.c:929
vfs_iter_write+0x77/0xb0 fs/read_write.c:942

other info that might help us debug this:

Chain exists of:
&sb->s_type->i_mutex_key#9 --> &meta_group_info[i]->alloc_sem -->
(complete)&ret.event

Possible unsafe locking scenario by crosslock:

CPU0 CPU1
---- ----
lock(&meta_group_info[i]->alloc_sem);
lock((complete)&ret.event);
lock(&sb->s_type->i_mutex_key#9);
unlock((complete)&ret.event);

*** DEADLOCK ***

1 lock held by loop0/2986:
#0: (&x->wait#14){..-.}, at: [<ffffffff815263f8>] complete+0x18/0x80
kernel/sched/completion.c:34

stack backtrace:
CPU: 1 PID: 2986 Comm: loop0 Not tainted 4.13.0+ #84
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_circular_bug+0x503/0x710 kernel/locking/lockdep.c:1259
check_prev_add+0x865/0x1520 kernel/locking/lockdep.c:1894
commit_xhlock kernel/locking/lockdep.c:5015 [inline]
commit_xhlocks kernel/locking/lockdep.c:5059 [inline]
lock_commit_crosslock+0xe73/0x1d10 kernel/locking/lockdep.c:5098
complete_release_commit include/linux/completion.h:49 [inline]
complete+0x24/0x80 kernel/sched/completion.c:39
submit_bio_wait_endio+0x9c/0xd0 block/bio.c:930
bio_endio+0x2f8/0x8d0 block/bio.c:1843
req_bio_endio block/blk-core.c:204 [inline]
blk_update_request+0x2a6/0xe20 block/blk-core.c:2743
blk_mq_end_request+0x54/0x120 block/blk-mq.c:509
lo_complete_rq+0xbe/0x1f0 drivers/block/loop.c:463
__blk_mq_complete_request+0x38f/0x6c0 block/blk-mq.c:550
blk_mq_complete_request+0x4f/0x60 block/blk-mq.c:570
loop_handle_cmd drivers/block/loop.c:1710 [inline]
loop_queue_work+0x26b/0x3900 drivers/block/loop.c:1719
kthread_worker_fn+0x340/0x9b0 kernel/kthread.c:635

---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.
Please credit me with: Reported-by: syzbot <syzk...@googlegroups.com>

syzbot will keep track of this bug report.
Once a fix for this bug is committed, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line.

config.txt

raw.log

repro.txt

repro.c

Dmitry Vyukov

unread,

Nov 5, 2017, 5:39:54 AM11/5/17

to syzbot, Andrew Morton, Johannes Weiner, Jan Kara, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Al Viro, Theodore Ts'o, linux...@vger.kernel.org

On Sun, Nov 5, 2017 at 1:25 PM, syzbot
<bot+f99f3a0db9007f4f4e...@syzkaller.appspotmail.com>
wrote:

This also happens on more recent commits. E.g. on upstream
42b76d0e6b1fe0fcb90e0ff6b4d053d50597b031:

======================================================
WARNING: possible circular locking dependency detected

4.14.0-rc3+ #116 Not tainted
------------------------------------------------------
loop6/10331 is trying to acquire lock:
(&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818756bc>]
inode_lock include/linux/fs.h:712 [inline]
(&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818756bc>]
generic_file_write_iter+0xdc/0x7a0 mm/filemap.c:3175

but now in release context of a crosslock acquired at the following:

((complete)&ret.event){+.+.}, at: [<ffffffff822ac9fe>]

submit_bio_wait+0x15e/0x200 block/bio.c:953

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 ((complete)&ret.event){+.+.}:
check_prevs_add kernel/locking/lockdep.c:2020 [inline]
validate_chain kernel/locking/lockdep.c:2469 [inline]
__lock_acquire+0x328f/0x4620 kernel/locking/lockdep.c:3498
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
complete_acquire include/linux/completion.h:39 [inline]
__wait_for_common kernel/sched/completion.c:108 [inline]
wait_for_common_io kernel/sched/completion.c:128 [inline]

wait_for_completion_io+0xcb/0x7b0 kernel/sched/completion.c:176

submit_bio_wait+0x15e/0x200 block/bio.c:953
blkdev_issue_zeroout+0x13c/0x1d0 block/blk-lib.c:370

sb_issue_zeroout include/linux/blkdev.h:1368 [inline]
ext4_init_inode_table+0x4fd/0xdb1 fs/ext4/ialloc.c:1447
ext4_run_li_request fs/ext4/super.c:2866 [inline]
ext4_lazyinit_thread+0x808/0xd30 fs/ext4/super.c:2960

kthread+0x39c/0x470 kernel/kthread.c:231
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

-> #3 (&meta_group_info[i]->alloc_sem){++++}:
check_prevs_add kernel/locking/lockdep.c:2020 [inline]
validate_chain kernel/locking/lockdep.c:2469 [inline]
__lock_acquire+0x328f/0x4620 kernel/locking/lockdep.c:3498
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
down_read+0x96/0x150 kernel/locking/rwsem.c:23
__ext4_new_inode+0x26dc/0x4f00 fs/ext4/ialloc.c:1056
ext4_symlink+0x2d9/0xae0 fs/ext4/namei.c:3118

vfs_symlink+0x323/0x560 fs/namei.c:4115
SYSC_symlinkat fs/namei.c:4142 [inline]
SyS_symlinkat fs/namei.c:4122 [inline]
SYSC_symlink fs/namei.c:4155 [inline]
SyS_symlink+0x134/0x200 fs/namei.c:4153
entry_SYSCALL_64_fastpath+0x1f/0xbe

-> #2 (jbd2_handle){++++}:

check_prevs_add kernel/locking/lockdep.c:2020 [inline]
validate_chain kernel/locking/lockdep.c:2469 [inline]
__lock_acquire+0x328f/0x4620 kernel/locking/lockdep.c:3498
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002

start_this_handle+0x4b8/0x1080 fs/jbd2/transaction.c:390
jbd2__journal_start+0x389/0x9f0 fs/jbd2/transaction.c:444
__ext4_journal_start_sb+0x15f/0x550 fs/ext4/ext4_jbd2.c:80
__ext4_journal_start fs/ext4/ext4_jbd2.h:314 [inline]
ext4_dirty_inode+0x56/0xa0 fs/ext4/inode.c:5859
__mark_inode_dirty+0x912/0x1170 fs/fs-writeback.c:2096
generic_update_time+0x1b2/0x270 fs/inode.c:1649
update_time fs/inode.c:1665 [inline]
touch_atime+0x26d/0x2f0 fs/inode.c:1737

file_accessed include/linux/fs.h:2061 [inline]
ext4_file_mmap+0x161/0x1b0 fs/ext4/file.c:352
call_mmap include/linux/fs.h:1775 [inline]

mmap_region+0xa99/0x15a0 mm/mmap.c:1690
do_mmap+0x6a1/0xd50 mm/mmap.c:1468
do_mmap_pgoff include/linux/mm.h:2150 [inline]
vm_mmap_pgoff+0x1de/0x280 mm/util.c:333
SYSC_mmap_pgoff mm/mmap.c:1518 [inline]
SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1476
SYSC_mmap arch/x86/kernel/sys_x86_64.c:99 [inline]
SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:90
entry_SYSCALL_64_fastpath+0x1f/0xbe

-> #1 (&mm->mmap_sem){++++}:
check_prevs_add kernel/locking/lockdep.c:2020 [inline]
validate_chain kernel/locking/lockdep.c:2469 [inline]
__lock_acquire+0x328f/0x4620 kernel/locking/lockdep.c:3498
lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:4002
__might_fault+0x13a/0x1d0 mm/memory.c:4502
_copy_to_user+0x2c/0xc0 lib/usercopy.c:24
copy_to_user include/linux/uaccess.h:154 [inline]
filldir+0x1a7/0x320 fs/readdir.c:196

dir_emit_dot include/linux/fs.h:3339 [inline]
dir_emit_dots include/linux/fs.h:3350 [inline]

dcache_readdir+0x12d/0x5e0 fs/libfs.c:192
iterate_dir+0x4b2/0x5d0 fs/readdir.c:51
SYSC_getdents fs/readdir.c:231 [inline]
SyS_getdents+0x225/0x450 fs/readdir.c:212
entry_SYSCALL_64_fastpath+0x1f/0xbe

-> #0 (&sb->s_type->i_mutex_key#9){++++}:
down_write+0x87/0x120 kernel/locking/rwsem.c:53
inode_lock include/linux/fs.h:712 [inline]

generic_file_write_iter+0xdc/0x7a0 mm/filemap.c:3175
call_write_iter include/linux/fs.h:1770 [inline]
do_iter_readv_writev+0x531/0x7f0 fs/read_write.c:673
do_iter_write+0x15a/0x540 fs/read_write.c:952
vfs_iter_write+0x77/0xb0 fs/read_write.c:965

> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzk...@googlegroups.com.
> Please credit me with: Reported-by: syzbot <syzk...@googlegroups.com>
>
> syzbot will keep track of this bug report.
> Once a fix for this bug is committed, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line.
>

> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/94eb2c05f6a018dc21055d39c05b%40google.com.
> For more options, visit https://groups.google.com/d/optout.

Al Viro

unread,

Nov 5, 2017, 10:29:58 PM11/5/17

to syzbot, ak...@linux-foundation.org, han...@cmpxchg.org, ja...@suse.cz, jla...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com

On Sun, Nov 05, 2017 at 02:25:00AM -0800, syzbot wrote:

> loop0/2986 is trying to acquire lock:
> (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff8186f9ec>] inode_lock
> include/linux/fs.h:712 [inline]
> (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff8186f9ec>]
> generic_file_write_iter+0xdc/0x7a0 mm/filemap.c:3151
>
> but now in release context of a crosslock acquired at the following:
> ((complete)&ret.event){+.+.}, at: [<ffffffff822a055e>]
> submit_bio_wait+0x15e/0x200 block/bio.c:953
>
> which lock already depends on the new lock.

Almost certainly a false positive... lockdep can't tell ->i_rwsem of
inode on filesystem that lives on /dev/loop0 and that of inode of
the backing file of /dev/loop0.

Try and put them on different filesystem types and see if you still
can reproduce that. We do have a partial ordering between the filesystems,
namely "(parts of) hosting device of X live in a file on Y". It's
going to be acyclic, or you have a much worse problem. And that's
what really orders the things here.

Dmitry Vyukov

unread,

Nov 6, 2017, 1:32:56 AM11/6/17

to Al Viro, syzbot, Andrew Morton, Johannes Weiner, Jan Kara, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com

Should we annotate these inodes with different lock types? Or use
nesting annotations?

Jan Kara

unread,

Nov 6, 2017, 8:15:51 AM11/6/17

to Dmitry Vyukov, Al Viro, syzbot, Andrew Morton, Johannes Weiner, Jan Kara, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com

Well, you'd need to have a completely separate set of locking classes for
each filesystem to avoid false positives like these. And that would
increase number of classes lockdep has to handle significantly. So I'm not
sure it's really worth it...

Honza
--
Jan Kara <ja...@suse.com>
SUSE Labs, CR

Al Viro

unread,

Nov 6, 2017, 8:33:18 AM11/6/17

to Jan Kara, Dmitry Vyukov, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com

On Mon, Nov 06, 2017 at 02:15:44PM +0100, Jan Kara wrote:

> > Should we annotate these inodes with different lock types? Or use
> > nesting annotations?
>
> Well, you'd need to have a completely separate set of locking classes for
> each filesystem to avoid false positives like these. And that would
> increase number of classes lockdep has to handle significantly. So I'm not
> sure it's really worth it...

Especially when you consider that backing file might be on a filesystem
that lives on another loop device. *All* per-{device,fs} locks involved
would need classes split that way...

Dmitry Vyukov

unread,

Nov 6, 2017, 8:36:05 AM11/6/17

to Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar, Peter Zijlstra

This crashes our test machines left and right. We've seen 100000+ of
these crashes. We need to do at least something. Can we disable all
checking of these mutexes if they inherently have positives?

+Ingo, Peter, maybe you have some suggestions of how to fight this
lockdep false positives. Full thread is here:
https://groups.google.com/forum/#!msg/syzkaller-bugs/NJ_4llH84XI/c7M9jNLTAgAJ

Peter Zijlstra

unread,

Nov 6, 2017, 11:01:26 AM11/6/17

to Dmitry Vyukov, Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar

On Mon, Nov 06, 2017 at 02:35:44PM +0100, Dmitry Vyukov wrote:
> On Mon, Nov 6, 2017 at 2:33 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> > On Mon, Nov 06, 2017 at 02:15:44PM +0100, Jan Kara wrote:
> >
> >> > Should we annotate these inodes with different lock types? Or use
> >> > nesting annotations?
> >>
> >> Well, you'd need to have a completely separate set of locking classes for
> >> each filesystem to avoid false positives like these. And that would
> >> increase number of classes lockdep has to handle significantly. So I'm not
> >> sure it's really worth it...
> >
> > Especially when you consider that backing file might be on a filesystem
> > that lives on another loop device. *All* per-{device,fs} locks involved
> > would need classes split that way...
>
>
> This crashes our test machines left and right. We've seen 100000+ of
> these crashes. We need to do at least something. Can we disable all
> checking of these mutexes if they inherently have positives?

Its not the mutexes that's the problem.. Its the completion that crosses
the filesystem layers.

> +Ingo, Peter, maybe you have some suggestions of how to fight this
> lockdep false positives. Full thread is here:
> https://groups.google.com/forum/#!msg/syzkaller-bugs/NJ_4llH84XI/c7M9jNLTAgAJ

The best I could come up with is something like the below; its not
at all pretty and I could see people objecting; least of all myself for
the __complete() thing, but I ran out of creative naming juice.

---
block/bio.c | 2 +-
block/blk-core.c | 8 ++++++--
drivers/block/loop.c | 9 +++++++++
include/linux/blk_types.h | 1 +
include/linux/blkdev.h | 1 +
include/linux/completion.h | 8 +++++++-
kernel/sched/completion.c | 11 ++++-------
7 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index cc60213e56d8..22bedceb7bae 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -919,7 +919,7 @@ EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);

static void submit_bio_wait_endio(struct bio *bio)
{
- complete(bio->bi_private);
+ __complete(bio->bi_private, !bio_flagged(bio, BIO_STACKED));
}

/**
diff --git a/block/blk-core.c b/block/blk-core.c
index 048be4aa6024..bb4092d716c3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -194,8 +194,12 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
if (error)
bio->bi_status = error;

- if (unlikely(rq->rq_flags & RQF_QUIET))
- bio_set_flag(bio, BIO_QUIET);
+ if (unlikely(rq->rq_flags & (RQF_QUIET|RQF_STACKED))) {
+ if (rq->rq_flags & RQF_QUIET)
+ bio_set_flag(bio, BIO_QUIET);
+ if (rq->rq_flags & RQF_STACKED)
+ bio_set_flag(bio, BIO_STACKED);
+ }

bio_advance(bio, nbytes);

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 85de67334695..7d702d2c4ade 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -452,6 +452,15 @@ static void lo_complete_rq(struct request *rq)
{
struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);

+ /*
+ * Assuming loop ensures the associated filesystems form a DAG, this
+ * cross-filesystem release can never form a deadlock.
+ *
+ * Inform the request that the corresponding BIO is of a stacked
+ * device and thereby forgo dependency checking.
+ */
+ rq->rq_flags |= RQF_STACKED;
+
if (unlikely(req_op(cmd->rq) == REQ_OP_READ && cmd->use_aio &&
cmd->ret >= 0 && cmd->ret < blk_rq_bytes(cmd->rq))) {
struct bio *bio = cmd->rq->bio;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 96ac3815542c..bf9b37de7975 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -136,6 +136,7 @@ struct bio {
* throttling rules. Don't do it again. */
#define BIO_TRACE_COMPLETION 10 /* bio_endio() should trace the final completion
* of this bio. */
+#define BIO_STACKED 11
/* See BVEC_POOL_OFFSET below before adding new flags */

/*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8da66379f7ea..dcf4b1a70f77 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -121,6 +121,7 @@ typedef __u32 __bitwise req_flags_t;
/* Look at ->special_vec for the actual data payload instead of the
bio chain. */
#define RQF_SPECIAL_PAYLOAD ((__force req_flags_t)(1 << 18))
+#define RQF_STACKED ((__force req_flags_t)(1 << 19))

/* flags that prevent us from merging requests: */
#define RQF_NOMERGE_FLAGS \
diff --git a/include/linux/completion.h b/include/linux/completion.h
index 0662a417febe..a6680197f2af 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -161,7 +161,13 @@ extern long wait_for_completion_killable_timeout(
extern bool try_wait_for_completion(struct completion *x);
extern bool completion_done(struct completion *x);

-extern void complete(struct completion *);
+extern void __complete(struct completion *, bool);
+
+static inline void complete(struct completion *x)
+{
+ __complete(x, true);
+}
+
extern void complete_all(struct completion *);

#endif
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 2ddaec40956f..a2071513decf 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -28,23 +28,20 @@
* It may be assumed that this function implies a write memory barrier before
* changing the task state if and only if any tasks are woken up.
*/
-void complete(struct completion *x)
+void __complete(struct completion *x, bool link)
{
unsigned long flags;

spin_lock_irqsave(&x->wait.lock, flags);
-
- /*
- * Perform commit of crossrelease here.
- */
- complete_release_commit(x);
+ if (link)
+ complete_release_commit(x);

if (x->done != UINT_MAX)
x->done++;
__wake_up_locked(&x->wait, TASK_NORMAL, 1);
spin_unlock_irqrestore(&x->wait.lock, flags);
}
-EXPORT_SYMBOL(complete);
+EXPORT_SYMBOL(__complete);

/**
* complete_all: - signals all threads waiting on this completion

Christoph Hellwig

unread,

Nov 6, 2017, 12:37:16 PM11/6/17

to Peter Zijlstra, Dmitry Vyukov, Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar

How about complete_nodep() as name? Otherwise this looks ok to me - not
pretty, but not _that_ horrible either..

Matthew Wilcox

unread,

Nov 6, 2017, 6:45:45 PM11/6/17

to Christoph Hellwig, Peter Zijlstra, Dmitry Vyukov, Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar

On Mon, Nov 06, 2017 at 09:36:25AM -0800, Christoph Hellwig wrote:
> How about complete_nodep() as name? Otherwise this looks ok to me - not
> pretty, but not _that_ horrible either..

I read that as node-p, not no-dep. complete_nocheck()?

Byungchul Park

unread,

Nov 6, 2017, 7:54:51 PM11/6/17

to Peter Zijlstra, Dmitry Vyukov, Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar, kerne...@lge.com

On Mon, Nov 06, 2017 at 05:01:07PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 06, 2017 at 02:35:44PM +0100, Dmitry Vyukov wrote:
> > On Mon, Nov 6, 2017 at 2:33 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> > > On Mon, Nov 06, 2017 at 02:15:44PM +0100, Jan Kara wrote:
> > >
> > >> > Should we annotate these inodes with different lock types? Or use
> > >> > nesting annotations?
> > >>
> > >> Well, you'd need to have a completely separate set of locking classes for
> > >> each filesystem to avoid false positives like these. And that would
> > >> increase number of classes lockdep has to handle significantly. So I'm not
> > >> sure it's really worth it...
> > >
> > > Especially when you consider that backing file might be on a filesystem
> > > that lives on another loop device. *All* per-{device,fs} locks involved
> > > would need classes split that way...
> >
> >
> > This crashes our test machines left and right. We've seen 100000+ of
> > these crashes. We need to do at least something. Can we disable all
> > checking of these mutexes if they inherently have positives?
>
> Its not the mutexes that's the problem.. Its the completion that crosses
> the filesystem layers.
>
> > +Ingo, Peter, maybe you have some suggestions of how to fight this
> > lockdep false positives. Full thread is here:
> > https://groups.google.com/forum/#!msg/syzkaller-bugs/NJ_4llH84XI/c7M9jNLTAgAJ
>
> The best I could come up with is something like the below; its not
> at all pretty and I could see people objecting; least of all myself for
> the __complete() thing, but I ran out of creative naming juice.

Patches assigning a lock_class per gendisk were already applied in tip.
I believe that solves this.

e319e1fbd9d42420ab6eec0bfd75eb9ad7ca63b1
block, locking/lockdep: Assign a lock_class per gendisk used for
wait_for_completion()

I think the following proposal makes kernel too hacky.

Peter Zijlstra

unread,

Nov 7, 2017, 3:12:01 AM11/7/17

to Byungchul Park, Dmitry Vyukov, Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar, kerne...@lge.com

On Tue, Nov 07, 2017 at 09:54:42AM +0900, Byungchul Park wrote:
> > The best I could come up with is something like the below; its not
> > at all pretty and I could see people objecting; least of all myself for
> > the __complete() thing, but I ran out of creative naming juice.
>
> Patches assigning a lock_class per gendisk were already applied in tip.
> I believe that solves this.
>
> e319e1fbd9d42420ab6eec0bfd75eb9ad7ca63b1
> block, locking/lockdep: Assign a lock_class per gendisk used for
> wait_for_completion()
>
> I think the following proposal makes kernel too hacky.

Ah, I tough this was with those included...

Dmitry Vyukov

unread,

Nov 7, 2017, 3:18:22 AM11/7/17

to Peter Zijlstra, Byungchul Park, Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar, kerne...@lge.com

Great. Let's tell the bot when to expect this fixed:

#syz fix: block, locking/lockdep: Assign a lock_class per gendisk used
for wait_for_completion()

Byungchul Park

unread,

Nov 7, 2017, 3:30:08 AM11/7/17

to Peter Zijlstra, Dmitry Vyukov, Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar, kerne...@lge.com

11/7/2017 5:11 PM에 Peter Zijlstra 이(가) 쓴 글:

Please CC me for issues wrt. crossrelease.

--
Thanks,
Byungchul

Dmitry Vyukov

unread,

Nov 7, 2017, 3:31:52 AM11/7/17

to Byungchul Park, Peter Zijlstra, Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar, kerne...@lge.com

Hi Byungchul,

Whom are you asking? And what is crossrelease?

Byungchul Park

unread,

Nov 7, 2017, 3:42:43 AM11/7/17

to Dmitry Vyukov, Peter Zijlstra, Al Viro, Jan Kara, syzbot, Andrew Morton, Johannes Weiner, jla...@redhat.com, LKML, linu...@kvack.org, npi...@gmail.com, rgol...@suse.com, ross.z...@linux.intel.com, syzkall...@googlegroups.com, Ingo Molnar, kerne...@lge.com

11/7/2017 5:31 PM에 Dmitry Vyukov 이(가) 쓴 글:

Hi Dmitry,

Actually, I asked Peter since he know I am the author of
crossrelease which is a feature enhancing lockdep detection.

That makes deadlocks caused by wait_for_completion() or
lock_page() detectable.

The feature was merged and disabled by default for now in
vanilla because of its performance regression, though that
was fixed and enabled in tip latest.

--
Thanks,
Byungchul

syzbot

unread,

Nov 17, 2017, 10:16:02 AM11/17/17

to dvy...@google.com, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger
crash:

Tested-by: syzbot <syzk...@googlegroups.com>

Once the fix is committed, please reply to this email with:
#syz fix: exact-commit-title

Tested on commit 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/locking/core

compiler: gcc (GCC) 7.1.1 20170620

Patch is attached.
Kernel config is attached.

---
There is no WARRANTY for the result, to the extent permitted by applicable
law.
Except when otherwise stated in writing syzbot provides the result "AS IS"
without warranty of any kind, either expressed or implied, but not limited
to,
the implied warranties of merchantability and fittness for a particular
purpose.
The entire risk as to the quality of the result is with you. Should the
result
prove defective, you assume the cost of all necessary servicing, repair or
correction.

possible deadlock in generic_file_write_iter

syzbot

Dmitry Vyukov

Al Viro

Dmitry Vyukov

Jan Kara

Al Viro

Dmitry Vyukov

Peter Zijlstra

Christoph Hellwig

Matthew Wilcox

Byungchul Park

Peter Zijlstra

Dmitry Vyukov

Byungchul Park

Dmitry Vyukov

Byungchul Park

syzbot