mm: possible deadlock in mm_take_all

Dmitry Vyukov

unread,

Jan 8, 2016, 11:58:53 AM1/8/16

to Peter Zijlstra, Ingo Molnar, LKML, Andrew Morton, Kirill A. Shutemov, Oleg Nesterov, Chen Gang, linu...@kvack.org, syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin

Hello,

I've hit the following deadlock warning while running syzkaller fuzzer
on commit b06f3a168cdcd80026276898fd1fee443ef25743. As far as I
understand this is a false positive, because both call stacks are
protected by mm_all_locks_mutex. What would be a way to annotate such
locking discipline?

======================================================
[ INFO: possible circular locking dependency detected ]
4.4.0-rc8+ #211 Not tainted
-------------------------------------------------------
syz-executor/11520 is trying to acquire lock:
(&mapping->i_mmap_rwsem){++++..}, at: [< inline >]
vm_lock_mapping mm/mmap.c:3159
(&mapping->i_mmap_rwsem){++++..}, at: [<ffffffff816e2e6d>]
mm_take_all_locks+0x1bd/0x5f0 mm/mmap.c:3207

but task is already holding lock:
(&hugetlbfs_i_mmap_rwsem_key){+.+...}, at: [< inline >]
vm_lock_mapping mm/mmap.c:3159
(&hugetlbfs_i_mmap_rwsem_key){+.+...}, at: [<ffffffff816e2e6d>]
mm_take_all_locks+0x1bd/0x5f0 mm/mmap.c:3207

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&hugetlbfs_i_mmap_rwsem_key){+.+...}:
[<ffffffff814472ec>] lock_acquire+0x1dc/0x430
kernel/locking/lockdep.c:3585
[<ffffffff81434989>] _down_write_nest_lock+0x49/0xa0
kernel/locking/rwsem.c:129
[< inline >] vm_lock_mapping mm/mmap.c:3159
[<ffffffff816e2e6d>] mm_take_all_locks+0x1bd/0x5f0 mm/mmap.c:3207
[<ffffffff817295a8>] do_mmu_notifier_register+0x328/0x420
mm/mmu_notifier.c:267
[<ffffffff817296c2>] mmu_notifier_register+0x22/0x30
mm/mmu_notifier.c:317
[< inline >] kvm_init_mmu_notifier
arch/x86/kvm/../../../virt/kvm/kvm_main.c:474
[< inline >] kvm_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:592
[< inline >] kvm_dev_ioctl_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2966
[<ffffffff8101acea>] kvm_dev_ioctl+0x72a/0x920
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2995
[< inline >] vfs_ioctl fs/ioctl.c:43
[<ffffffff817b66f1>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
[< inline >] SYSC_ioctl fs/ioctl.c:622
[<ffffffff817b6f3f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
[<ffffffff85e77af6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

-> #0 (&mapping->i_mmap_rwsem){++++..}:
[< inline >] check_prev_add kernel/locking/lockdep.c:1853
[< inline >] check_prevs_add kernel/locking/lockdep.c:1958
[< inline >] validate_chain kernel/locking/lockdep.c:2144
[<ffffffff8144398d>] __lock_acquire+0x320d/0x4720
kernel/locking/lockdep.c:3206
[< inline >] __lock_release kernel/locking/lockdep.c:3432
[<ffffffff81447e17>] lock_release+0x697/0xce0
kernel/locking/lockdep.c:3604
[<ffffffff81434ada>] up_write+0x1a/0x60 kernel/locking/rwsem.c:91
[< inline >] i_mmap_unlock_write include/linux/fs.h:504
[< inline >] vm_unlock_mapping mm/mmap.c:3254
[<ffffffff816e2bf6>] mm_drop_all_locks+0x266/0x320 mm/mmap.c:3278
[<ffffffff81729506>] do_mmu_notifier_register+0x286/0x420
mm/mmu_notifier.c:292
[<ffffffff817296c2>] mmu_notifier_register+0x22/0x30
mm/mmu_notifier.c:317
[< inline >] kvm_init_mmu_notifier
arch/x86/kvm/../../../virt/kvm/kvm_main.c:474
[< inline >] kvm_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:592
[< inline >] kvm_dev_ioctl_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2966
[<ffffffff8101acea>] kvm_dev_ioctl+0x72a/0x920
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2995
[< inline >] vfs_ioctl fs/ioctl.c:43
[<ffffffff817b66f1>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
[< inline >] SYSC_ioctl fs/ioctl.c:622
[<ffffffff817b6f3f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
[<ffffffff85e77af6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

other info that might help us debug this:

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&hugetlbfs_i_mmap_rwsem_key);
lock(&mapping->i_mmap_rwsem);
lock(&hugetlbfs_i_mmap_rwsem_key);
lock(&mapping->i_mmap_rwsem);

*** DEADLOCK ***

3 locks held by syz-executor/11520:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff817295a0>]
do_mmu_notifier_register+0x320/0x420 mm/mmu_notifier.c:266
#1: (mm_all_locks_mutex){+.+...}, at: [<ffffffff816e2cf7>]
mm_take_all_locks+0x47/0x5f0 mm/mmap.c:3201
#2: (&hugetlbfs_i_mmap_rwsem_key){+.+...}, at: [< inline >]
vm_lock_mapping mm/mmap.c:3159
#2: (&hugetlbfs_i_mmap_rwsem_key){+.+...}, at: [<ffffffff816e2e6d>]
mm_take_all_locks+0x1bd/0x5f0 mm/mmap.c:3207

stack backtrace:
CPU: 2 PID: 11520 Comm: syz-executor Not tainted 4.4.0-rc8+ #211
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
00000000ffffffff ffff88003613fa10 ffffffff82907ccd ffffffff88911190
ffffffff88911190 ffffffff889321c0 ffff88003613fa60 ffffffff8143cb68
ffff880034bbaf00 ffff880034bbb73a 0000000000000000 ffff880034bbb718
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff82907ccd>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
[<ffffffff8143cb68>] print_circular_bug+0x288/0x340
kernel/locking/lockdep.c:1226
[< inline >] check_prev_add kernel/locking/lockdep.c:1853
[< inline >] check_prevs_add kernel/locking/lockdep.c:1958
[< inline >] validate_chain kernel/locking/lockdep.c:2144
[<ffffffff8144398d>] __lock_acquire+0x320d/0x4720 kernel/locking/lockdep.c:3206
[< inline >] __lock_release kernel/locking/lockdep.c:3432
[<ffffffff81447e17>] lock_release+0x697/0xce0 kernel/locking/lockdep.c:3604
[<ffffffff81434ada>] up_write+0x1a/0x60 kernel/locking/rwsem.c:91
[< inline >] i_mmap_unlock_write include/linux/fs.h:504
[< inline >] vm_unlock_mapping mm/mmap.c:3254
[<ffffffff816e2bf6>] mm_drop_all_locks+0x266/0x320 mm/mmap.c:3278
[<ffffffff81729506>] do_mmu_notifier_register+0x286/0x420 mm/mmu_notifier.c:292
[<ffffffff817296c2>] mmu_notifier_register+0x22/0x30 mm/mmu_notifier.c:317
[< inline >] kvm_init_mmu_notifier
arch/x86/kvm/../../../virt/kvm/kvm_main.c:474
[< inline >] kvm_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:592
[< inline >] kvm_dev_ioctl_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2966
[<ffffffff8101acea>] kvm_dev_ioctl+0x72a/0x920
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2995
[< inline >] vfs_ioctl fs/ioctl.c:43
[<ffffffff817b66f1>] do_vfs_ioctl+0x681/0xe40 fs/ioctl.c:607
[< inline >] SYSC_ioctl fs/ioctl.c:622
[<ffffffff817b6f3f>] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:613
[<ffffffff85e77af6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

Kirill A. Shutemov

unread,

Jan 8, 2016, 6:23:55 PM1/8/16

to Dmitry Vyukov, Michal Hocko, Peter Zijlstra, Ingo Molnar, LKML, Andrew Morton, Kirill A. Shutemov, Oleg Nesterov, Chen Gang, linu...@kvack.org, syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin

On Fri, Jan 08, 2016 at 05:58:33PM +0100, Dmitry Vyukov wrote:
> Hello,
>
> I've hit the following deadlock warning while running syzkaller fuzzer
> on commit b06f3a168cdcd80026276898fd1fee443ef25743. As far as I
> understand this is a false positive, because both call stacks are
> protected by mm_all_locks_mutex.

+Michal

I don't think it's false positive.

The reason we don't care about order of taking i_mmap_rwsem is that we
never takes i_mmap_rwsem under other i_mmap_rwsem, but that's not true for
i_mmap_rwsem vs. hugetlbfs_i_mmap_rwsem_key. That's why we have the
annotation in the first place.

See commit b610ded71918 ("hugetlb: fix lockdep splat caused by pmd
sharing").

Consider totally untested patch below.

diff --git a/mm/mmap.c b/mm/mmap.c
index 2ce04a649f6b..63aefcf409e1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3203,7 +3203,16 @@ int mm_take_all_locks(struct mm_struct *mm)
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (signal_pending(current))
goto out_unlock;
- if (vma->vm_file && vma->vm_file->f_mapping)
+ if (vma->vm_file && vma->vm_file->f_mapping &&
+ !is_vm_hugetlb_page(vma))
+ vm_lock_mapping(mm, vma->vm_file->f_mapping);
+ }
+
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ if (signal_pending(current))
+ goto out_unlock;
+ if (vma->vm_file && vma->vm_file->f_mapping &&
+ is_vm_hugetlb_page(vma))
vm_lock_mapping(mm, vma->vm_file->f_mapping);
}

--
Kirill A. Shutemov

Dmitry Vyukov

unread,

Jan 10, 2016, 3:05:52 AM1/10/16

to Kirill A. Shutemov, Michal Hocko, Peter Zijlstra, Ingo Molnar, LKML, Andrew Morton, Kirill A. Shutemov, Oleg Nesterov, Chen Gang, linu...@kvack.org, syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin

On Sat, Jan 9, 2016 at 12:23 AM, Kirill A. Shutemov
<kir...@shutemov.name> wrote:
> On Fri, Jan 08, 2016 at 05:58:33PM +0100, Dmitry Vyukov wrote:
>> Hello,
>>
>> I've hit the following deadlock warning while running syzkaller fuzzer
>> on commit b06f3a168cdcd80026276898fd1fee443ef25743. As far as I
>> understand this is a false positive, because both call stacks are
>> protected by mm_all_locks_mutex.
>
> +Michal
>
> I don't think it's false positive.
>
> The reason we don't care about order of taking i_mmap_rwsem is that we
> never takes i_mmap_rwsem under other i_mmap_rwsem, but that's not true for
> i_mmap_rwsem vs. hugetlbfs_i_mmap_rwsem_key. That's why we have the
> annotation in the first place.
>
> See commit b610ded71918 ("hugetlb: fix lockdep splat caused by pmd
> sharing").

Description of b610ded71918 suggests that that code takes hugetlb
mutex first and them normal page mutex. In this patch you take them in
the opposite order: normal mutex, then hugetlb mutex. Won't this patch
only increase probability of deadlocks? Shouldn't you take them in the
opposite order?

Kirill A. Shutemov

unread,

Jan 10, 2016, 3:39:08 PM1/10/16

to Dmitry Vyukov, Michal Hocko, Peter Zijlstra, Ingo Molnar, LKML, Andrew Morton, Kirill A. Shutemov, Oleg Nesterov, Chen Gang, linu...@kvack.org, syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin

On Sun, Jan 10, 2016 at 09:05:32AM +0100, Dmitry Vyukov wrote:
> On Sat, Jan 9, 2016 at 12:23 AM, Kirill A. Shutemov
> <kir...@shutemov.name> wrote:
> > On Fri, Jan 08, 2016 at 05:58:33PM +0100, Dmitry Vyukov wrote:
> >> Hello,
> >>
> >> I've hit the following deadlock warning while running syzkaller fuzzer
> >> on commit b06f3a168cdcd80026276898fd1fee443ef25743. As far as I
> >> understand this is a false positive, because both call stacks are
> >> protected by mm_all_locks_mutex.
> >
> > +Michal
> >
> > I don't think it's false positive.
> >
> > The reason we don't care about order of taking i_mmap_rwsem is that we
> > never takes i_mmap_rwsem under other i_mmap_rwsem, but that's not true for
> > i_mmap_rwsem vs. hugetlbfs_i_mmap_rwsem_key. That's why we have the
> > annotation in the first place.
> >
> > See commit b610ded71918 ("hugetlb: fix lockdep splat caused by pmd
> > sharing").
>
> Description of b610ded71918 suggests that that code takes hugetlb
> mutex first and them normal page mutex. In this patch you take them in
> the opposite order: normal mutex, then hugetlb mutex. Won't this patch
> only increase probability of deadlocks? Shouldn't you take them in the
> opposite order?

You are right. I got it wrong. Conditions should be reversed.

The comment around hugetlbfs_i_mmap_rwsem_key definition is somewhat
confusing:

"This needs an annotation because huge_pmd_share() does an allocation
under i_mmap_rwsem."

I read this as we do hugetlb allocation when i_mmap_rwsem already taken
and made locking order respectively. I guess i_mmap_rwsem should be
replaced with hugetlbfs_i_mmap_rwsem_key in the comment.

--
Kirill A. Shutemov

Dmitry Vyukov

unread,

Jan 11, 2016, 4:04:45 AM1/11/16

to Kirill A. Shutemov, Michal Hocko, Peter Zijlstra, Ingo Molnar, LKML, Andrew Morton, Kirill A. Shutemov, Oleg Nesterov, Chen Gang, linu...@kvack.org, syzkaller, Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin

mm: possible deadlock in mm_take_all_locks

Dmitry Vyukov

Kirill A. Shutemov

Dmitry Vyukov

Kirill A. Shutemov

Dmitry Vyukov