Re: [syzbot] possible deadlock in del_gendisk

11 views
Skip to first unread message

Dmitry Vyukov

unread,
Apr 13, 2021, 2:24:14 PM4/13/21
to Steven Rostedt, syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masa...@kernel.org, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j...@intel.com, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers
On Tue, Apr 13, 2021 at 7:43 PM Steven Rostedt <ros...@goodmis.org> wrote:
>
> On Tue, 13 Apr 2021 13:41:47 -0400
> Steven Rostedt <ros...@goodmis.org> wrote:
>
> > As the below splats look like it has nothing to do with this patch, and
> > this patch will add a WARN() if there's broken logic somewhere, I bet the
> > bisect got confused (if it is automated and does a panic_on_warning),
> > because it will panic for broken code that his patch detects.
> >
> > That is, the bisect was confused because it was triggering on two different
> > issues. One that triggered the reported splat below, and another that this
> > commit detects and warns on.
>
> Is it possible to update the the bisect to make sure that if it is failing
> on warnings, to make sure the warnings are somewhat related, before decided
> that its the same bug?

It does not seem to be feasible, bugs manifest differently in both
space and time. Also even if we somehow conclude the crash we see is
different, it says nothing about the original bug. For more data see:
https://groups.google.com/g/syzkaller/c/sR8aAXaWEF4/m/tTWYRgvmAwAJ

Steven Rostedt

unread,
Apr 13, 2021, 2:40:14 PM4/13/21
to Dmitry Vyukov, syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masa...@kernel.org, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j...@intel.com, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers
Sure, but if you trigger a lockdep bug, lockdep bugs are usually very
similar in all instances. It usually will include the same locks, or at
least be the same type of lockdep bug (where some types are related).

If the bisect saw an issue before and at this commit, I would think it
didn't trigger a lockdep bug at all, and simply triggered a warning.

In fact, according to:

https://syzkaller.appspot.com/x/report.txt?x=15a7e77ed00000

That's exactly what it did.

That is, if the original bug is a lockdep warning, you should only be
interested in lockdep warnings (at a minimum). The final bug here had:

------------[ cut here ]------------
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline]
WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7
Modules linked in:
CPU: 0 PID: 8777 Comm: syz-executor.1 Not tainted 5.11.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline]
RIP: 0010:warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7
Code: 51 00 e9 3f fe ff ff cc cc cc cc cc cc 80 3d e0 b4 ce 0a 00 74 01 c3 48 c7 c7 60 f5 8a 88 c6 05 cf b4 ce 0a 01 e8 17 01 a4 06 <0f> 0b c3 48 c7 c0 a0 46 4d 8e 53 48 89 fb 48 ba 00 00 00 00 00 fc
RSP: 0018:ffffc900017bf9f8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8881477b2040 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000004 RDI: fffff520002f7f31
RBP: 0000000000000246 R08: 0000000000000001 R09: ffff8880b9e2015b
R10: ffffed10173c402b R11: 0000000000000001 R12: 0000000000000003
R13: ffffed1028ef6408 R14: 0000000000000001 R15: ffff8880b9e359c0
FS: 00000000017b8400(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000017c1848 CR3: 0000000010920000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
kvm_wait arch/x86/kernel/kvm.c:860 [inline]
kvm_wait+0xc3/0xe0 arch/x86/kernel/kvm.c:837
pv_wait arch/x86/include/asm/paravirt.h:564 [inline]
pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline]
__pv_queued_spin_lock_slowpath+0x8b8/0xb40 kernel/locking/qspinlock.c:508
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:554 [inline]
queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
queued_spin_lock include/asm-generic/qspinlock.h:85 [inline]
do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:113
spin_lock include/linux/spinlock.h:354 [inline]
ext4_lock_group fs/ext4/ext4.h:3379 [inline]
__ext4_new_inode+0x2da2/0x44d0 fs/ext4/ialloc.c:1187
ext4_mkdir+0x298/0x910 fs/ext4/namei.c:2793
vfs_mkdir+0x413/0x660 fs/namei.c:3652
do_mkdirat+0x1eb/0x250 fs/namei.c:3675
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x465567
Code: 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 53 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe00ad69f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 00007ffe00ad6a90 RCX: 0000000000465567
RDX: 0000000000000000 RSI: 00000000000001ff RDI: 00007ffe00ad6a90
RBP: 00007ffe00ad6a6c R08: 0000000000000000 R09: 0000000000000006
R10: 00007ffe00ad6794 R11: 0000000000000202 R12: 0000000000000032
R13: 000000000006549a R14: 0000000000000002 R15: 00007ffe00ad6ad0



Which shows a WARN was triggered, and totally unrelated to what you were
bisecting.

Just matching types of warnings (lockdep to lockdep, or WARN_ON to WARN_ON)
would help reduce the number of bogus bisects you are having.

-- Steve

Steven Rostedt

unread,
Apr 13, 2021, 2:43:40 PM4/13/21
to Dmitry Vyukov, syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masa...@kernel.org, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j...@intel.com, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers
On Tue, 13 Apr 2021 14:40:09 -0400
Steven Rostedt <ros...@goodmis.org> wrote:

> ------------[ cut here ]------------
> raw_local_irq_restore() called with IRQs enabled
> WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore kernel/locking/irqflag-debug.c:9 [inline]
> WARNING: CPU: 0 PID: 8777 at kernel/locking/irqflag-debug.c:9 warn_bogus_irq_restore+0x1d/0x20 kernel/locking/irqflag-debug.c:7

In fact, when you have the above, which is a WARN() with text:

"raw_local_irq_restore() called with IRQs enabled"

It is pretty much guaranteed that all triggers of this bug will have the
above warning with the same text.

-- Steve

Dmitry Vyukov

unread,
Apr 16, 2021, 3:51:59 AM4/16/21
to Steven Rostedt, syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masa...@kernel.org, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j...@intel.com, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers
This looks nice on paper and looking at only 1 bisection log. But
unfortunately in practice theory and practice are different...

This was discussed at length multiple times:
https://groups.google.com/g/syzkaller/search?q=bisection+different+manifestations
https://groups.google.com/g/syzkaller-bugs/c/nFeC8-UG1gg/m/y6gUEsvAAgAJ
https://groups.google.com/g/syzkaller/c/sR8aAXaWEF4/m/tTWYRgvmAwAJ
https://groups.google.com/g/syzkaller/c/9NdprHsGBqo/m/Yj9uWRDgBQAJ

If you look at substantial base of bisection logs, you will find lots
of cases where bug types, functions don't match. Kernel crashes
differently even on the same revision. And obviously things change if
you change revisions. Also if you see presumably a different bug, what
does it say regarding the original bug.

I would very much like to improve automatic bisection quality, but it
does not look trivial at all.

Some random examples where, say, your hypothesis of WARN-to-WARN,
BUG-to-BUG does not hold even on the same kernel revision (add to this
different revisions and the fact that a different bug does not give
info regarding the original bug):

run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read
run #1: crashed: WARNING in request_end
run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read
run #3: OK
run #4: OK

run #0: crashed: KASAN: slab-out-of-bounds Read in __ip_append_data
run #1: crashed: inconsistent lock state in rhashtable_walk_enter
run #2: crashed: inconsistent lock state in rhashtable_walk_enter
run #3: crashed: inconsistent lock state in rhashtable_walk_enter
run #4: crashed: inconsistent lock state in rhashtable_walk_enter
run #5: crashed: inconsistent lock state in rhashtable_walk_enter
run #6: crashed: inconsistent lock state in rhashtable_walk_enter
run #7: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #8: crashed: inconsistent lock state in rhashtable_walk_enter
run #9: crashed: inconsistent lock state in rhashtable_walk_enter

run #0: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #1: crashed: inconsistent lock state in rhashtable_walk_enter
run #2: crashed: inconsistent lock state in rhashtable_walk_enter
run #3: crashed: inconsistent lock state in rhashtable_walk_enter
run #4: crashed: inconsistent lock state in rhashtable_walk_enter
run #5: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #6: crashed: inconsistent lock state in rhashtable_walk_enter
run #7: crashed: kernel BUG at arch/x86/mm/physaddr.c:LINE!
run #8: crashed: inconsistent lock state in rhashtable_walk_enter
run #9: crashed: inconsistent lock state in rhashtable_walk_enter

run #0: crashed: KASAN: use-after-free Read in __vb2_perform_fileio
run #1: crashed: KASAN: use-after-free Write in __vb2_cleanup_fileio
run #2: crashed: KASAN: use-after-free Read in __vb2_perform_fileio
run #3: crashed: KASAN: use-after-free Read in __vb2_perform_fileio
run #4: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #5: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #6: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #7: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #8: crashed: INFO: task hung in vivid_stop_generating_vid_cap
run #9: crashed: INFO: task hung in vivid_stop_generating_vid_cap

run #0: crashed: general protection fault in sctp_assoc_rwnd_increase
run #1: crashed: general protection fault in sctp_ulpevent_free
run #2: crashed: general protection fault in sctp_assoc_rwnd_increase
run #3: crashed: general protection fault in sctp_assoc_rwnd_increase
run #4: crashed: general protection fault in sctp_assoc_rwnd_increase
run #5: crashed: general protection fault in sctp_assoc_rwnd_increase
run #6: crashed: general protection fault in sctp_assoc_rwnd_increase
run #7: crashed: general protection fault in sctp_assoc_rwnd_increase

run #0: crashed: general protection fault in sctp_assoc_rwnd_increase
run #1: crashed: general protection fault in sctp_assoc_rwnd_increase
run #2: crashed: general protection fault in sctp_assoc_rwnd_increase
run #3: crashed: general protection fault in sctp_assoc_rwnd_increase
run #4: crashed: general protection fault in corrupted
run #5: crashed: general protection fault in sctp_assoc_rwnd_increase
run #6: crashed: general protection fault in sctp_assoc_rwnd_increase
run #7: crashed: general protection fault in corrupted

run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #2: crashed: INFO: rcu detected stall in sys_sendfile64
run #3: crashed: INFO: rcu detected stall in corrupted
run #4: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #5: crashed: INFO: rcu detected stall in corrupted
run #6: crashed: INFO: rcu detected stall in corrupted
run #7: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #8: crashed: INFO: rcu detected stall in sys_sendfile64
run #9: crashed: INFO: rcu detected stall in ext4_file_write_iter

run #0: crashed: INFO: rcu detected stall in sys_sendfile64
run #1: crashed: INFO: rcu detected stall in corrupted
run #2: crashed: INFO: rcu detected stall in corrupted
run #3: crashed: INFO: rcu detected stall in sys_sendfile64
run #4: crashed: INFO: rcu detected stall in corrupted
run #5: crashed: INFO: rcu detected stall in corrupted
run #6: crashed: INFO: rcu detected stall in corrupted
run #7: crashed: INFO: rcu detected stall in corrupted
run #8: crashed: INFO: rcu detected stall in sys_sendfile64
run #9: crashed: INFO: rcu detected stall in corrupted

run #0: crashed: INFO: rcu detected stall in rw_verify_area
run #1: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #2: crashed: INFO: rcu detected stall in corrupted
run #3: crashed: INFO: rcu detected stall in corrupted
run #4: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #5: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #6: crashed: INFO: rcu detected stall in corrupted
run #7: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #8: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #9: crashed: INFO: rcu detected stall in rw_verify_area

run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #1: crashed: INFO: rcu detected stall in corrupted
run #2: crashed: INFO: rcu detected stall in sys_sendfile64
run #3: crashed: INFO: rcu detected stall in sys_sendfile64
run #4: crashed: INFO: rcu detected stall in corrupted
run #5: crashed: INFO: rcu detected stall in sys_sendfile64
run #6: crashed: INFO: rcu detected stall in sys_sendfile64
run #7: crashed: INFO: rcu detected stall in corrupted
run #8: crashed: INFO: rcu detected stall in corrupted
run #9: crashed: INFO: rcu detected stall in sys_sendfile64

run #0: crashed: KASAN: use-after-free Read in link_path_walk
run #1: crashed: KASAN: use-after-free Read in link_path_walk
run #2: crashed: KASAN: use-after-free Read in trailing_symlink
run #3: crashed: KASAN: use-after-free Read in trailing_symlink
run #4: crashed: KASAN: use-after-free Read in trailing_symlink
run #5: crashed: KASAN: use-after-free Read in link_path_walk
run #6: crashed: KASAN: use-after-free Read in link_path_walk
run #7: crashed: KASAN: use-after-free Read in link_path_walk
run #8: crashed: KASAN: use-after-free Read in trailing_symlink
run #9: crashed: KASAN: use-after-free Read in trailing_symlink

run #0: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #1: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #2: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #3: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #4: crashed: WARNING: ODEBUG bug in corrupted
run #5: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #6: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #7: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #8: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup
run #9: crashed: BUG: unable to handle kernel NULL pointer dereference
in mrvl_setup

run #0: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #1: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #2: crashed: general protection fault in delayed_uprobe_remove
run #3: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #4: crashed: general protection fault in delayed_uprobe_remove
run #5: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #6: OK
run #7: OK
run #8: OK
run #9: OK

run #0: crashed: general protection fault in delayed_uprobe_remove
run #1: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #2: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #3: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #4: crashed: general protection fault in delayed_uprobe_remove
run #5: crashed: KASAN: use-after-free Read in delayed_uprobe_remove
run #6: OK
run #7: OK
run #8: OK
run #9: OK

Steven Rostedt

unread,
Apr 16, 2021, 9:13:06 AM4/16/21
to Dmitry Vyukov, syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masa...@kernel.org, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j...@intel.com, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers
On Fri, 16 Apr 2021 09:51:45 +0200
Dmitry Vyukov <dvy...@google.com> wrote:
>
> If you look at substantial base of bisection logs, you will find lots
> of cases where bug types, functions don't match. Kernel crashes
> differently even on the same revision. And obviously things change if
> you change revisions. Also if you see presumably a different bug, what
> does it say regarding the original bug.

Yes, but there are also several types of cases where the issue will be the
same. Namely lockdep. I agree that use after free warnings can have a side
effect, and may be more difficult. But there's many other bugs that remain
consistent across kernels. And if you stumble on one of them, look for it
only.

And if you hit another bug, and if it doesn't crash, then ignore it (of
course this could be an issue if you have panic on warning set). But
otherwise, just skip it.

>
> I would very much like to improve automatic bisection quality, but it
> does not look trivial at all.
>
> Some random examples where, say, your hypothesis of WARN-to-WARN,
> BUG-to-BUG does not hold even on the same kernel revision (add to this

At least lockdep to lockdep, as when I do manual bisects, that's exactly
what I look for, and ignore all other warnings. And that has found the
problem commit pretty much every time.

> different revisions and the fact that a different bug does not give
> info regarding the original bug):
>

Can you tell me that all these examples bisected to the commit that caused
the bug? Because if it did not, then you may have just proved my point ;-)

> run #0: crashed: KASAN: use-after-free Read in fuse_dev_do_read
> run #1: crashed: WARNING in request_end
> run #2: crashed: KASAN: use-after-free Read in fuse_dev_do_read
> run #3: OK
> run #4: OK
>

[..]

-- Steve

Dmitry Vyukov

unread,
Apr 16, 2021, 9:26:28 AM4/16/21
to Steven Rostedt, syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masa...@kernel.org, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j...@intel.com, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers
On Fri, Apr 16, 2021 at 3:13 PM Steven Rostedt <ros...@goodmis.org> wrote:
>
> On Fri, 16 Apr 2021 09:51:45 +0200
> Dmitry Vyukov <dvy...@google.com> wrote:
> >
> > If you look at substantial base of bisection logs, you will find lots
> > of cases where bug types, functions don't match. Kernel crashes
> > differently even on the same revision. And obviously things change if
> > you change revisions. Also if you see presumably a different bug, what
> > does it say regarding the original bug.
>
> Yes, but there are also several types of cases where the issue will be the
> same. Namely lockdep. I agree that use after free warnings can have a side
> effect, and may be more difficult.

But how do we know it's lockdep, rather than a use-after-free
manifested as lockdep?
A Significant portion of kernel bugs are caused by concurrency and can
manifest in different ways, e.g. these are not lockdep, or WARN, or
use-after-free, but rather a race in nature.

> But there's many other bugs that remain
> consistent across kernels. And if you stumble on one of them, look for it
> only.

For example? Does not look to be true for WARN, BUG, KASAN,
"inconsistent lock state".


> And if you hit another bug, and if it doesn't crash, then ignore it (of
> course this could be an issue if you have panic on warning set). But
> otherwise, just skip it.

It's not possible to skip, say, BUG.
And if we skip, say, a use-after-free, how do we know we are not
making things worse? Because now we are running on corrupted memory,
so anything can happen. Definitely a stray lockdep report can happen,
or other way around not happen when it should...

> > I would very much like to improve automatic bisection quality, but it
> > does not look trivial at all.
> >
> > Some random examples where, say, your hypothesis of WARN-to-WARN,
> > BUG-to-BUG does not hold even on the same kernel revision (add to this
>
> At least lockdep to lockdep, as when I do manual bisects, that's exactly
> what I look for, and ignore all other warnings. And that has found the
> problem commit pretty much every time.

What lockdep bug types do you mean? All?
In the examples above you can see at least "inconsistent lock state"
mixed with 2 other completely different bug types.

> > different revisions and the fact that a different bug does not give
> > info regarding the original bug):
> >
>
> Can you tell me that all these examples bisected to the commit that caused
> the bug? Because if it did not, then you may have just proved my point ;-)

I don't know now what was the result, but for a single run these were
manifestations of the same root bug.
E.g. see below, that's UAF in fuse_dev_do_read vs WARNING in
request_end. request_end is also fuse. And you can see that a memory
corruption causing a random bug type, in this case WARNING, but can as
well be LOCKDEP.

Dmitry Vyukov

unread,
Apr 16, 2021, 9:49:08 AM4/16/21
to Steven Rostedt, syzkaller, syzbot, Andrew Morton, Jens Axboe, Borislav Petkov, H. Peter Anvin, Jim Mattson, Joerg Roedel, KVM list, linux-block, LKML, Mark Rutland, masa...@kernel.org, Ingo Molnar, Paolo Bonzini, Peter Zijlstra, rafael.j...@intel.com, Sean Christopherson, syzkaller-bugs, Thomas Gleixner, Vitaly Kuznetsov, Wanpeng Li, Will Deacon, the arch/x86 maintainers
I've looked for some examples.

Here a reliable deadlock in io_uring changes to reliable task hung in io_uring:
run #6: crashed: possible deadlock in io_poll_double_wake
all runs: crashed: INFO: task hung in io_ring_exit_work
https://syzkaller.appspot.com/text?tag=Log&x=135e4d31d00000

Here deadlock is mixed with "lock held when returning to user space":
run #6: crashed: possible deadlock in inet_stream_connect
run #7: crashed: WARNING: lock held when returning to user space in
inet_stream_connect
https://syzkaller.appspot.com/text?tag=Log&x=109b88f2d00000

Here a reliable deadlock in n_tty_receive_buf_common changes to UAF in
n_tty_receive_buf_common:
all runs: crashed: possible deadlock in n_tty_receive_buf_common
all runs: crashed: KASAN: use-after-free Read in n_tty_receive_buf_common
https://syzkaller.appspot.com/text?tag=Log&x=106596b5e00000

Here deadlock changes to UAF and corrupted list:
run #4: crashed: possible deadlock in neigh_change_state
run #1: crashed: KASAN: use-after-free Read in neigh_mark_dead
run #2: crashed: BUG: corrupted list in ___neigh_create
run #3: crashed: BUG: corrupted list in neigh_mark_dead
https://syzkaller.appspot.com/text?tag=Log&x=1328a1e2e00000

But it also shows another interesting aspect, kernel output is
sometimes corrupted and kernel does not manage to print proper
complete oops message. In these cases we can't reliably say that it's
also a deadlock or not, we can't say anything about the crash, it's
unparsable. This problem also shows up here:
https://syzkaller.appspot.com/text?tag=Log&x=145c3a6d200000
and here
https://syzkaller.appspot.com/text?tag=Log&x=1006f18c600000
and in the second one also these 2 are mixed:
run #1: crashed: WARNING: locking bug in corrupted
run #2: crashed: WARNING: possible circular locking dependency detected

This one shows a related issue. Kernel crash output periodically
changes, and a testing system fails to parse it. This can happen if we
go back to older releases, or when we do fix bisection and test a very
new kernel with changed output:
https://syzkaller.appspot.com/text?tag=Log&x=108f4416e00000


It's also easy to imagine that a program that triggers 2 deadlocks on
different revisions, both are deadlocks, but completely different and
unrelated. And we can't look at function names as they change over
time.
All these things are almost trivial when you do bisection manually.
But for an automated system, they are quite challenging to handle.
Reply all
Reply to author
Forward
0 new messages