kernel panic: corrupted stack end in wb_workfn

141 views
Skip to first unread message

syzbot

unread,
Dec 30, 2018, 10:41:05 PM12/30/18
to ak...@linux-foundation.org, arya...@virtuozzo.com, gu...@fb.com, han...@cmpxchg.org, jba...@fb.com, ktk...@virtuozzo.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mho...@suse.com, shak...@google.com, syzkall...@googlegroups.com, wi...@infradead.org
Hello,

syzbot found the following crash on:

HEAD commit: 195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=176c0ebf400000
kernel config: https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ec1b75...@syzkaller.appspotmail.com

Kernel panic - not syncing: corrupted stack end detected inside scheduler
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.20.0+ #396
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: writeback wb_workfn (flush-8:0)
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
panic+0x2ad/0x55f kernel/panic.c:189
schedule_debug kernel/sched/core.c:3285 [inline]
__schedule+0x1ec6/0x1ed0 kernel/sched/core.c:3394
preempt_schedule_common+0x1f/0xe0 kernel/sched/core.c:3596
preempt_schedule+0x4d/0x60 kernel/sched/core.c:3622
___preempt_schedule+0x16/0x18
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
_raw_spin_unlock_irqrestore+0xbb/0xd0 kernel/locking/spinlock.c:184
spin_unlock_irqrestore include/linux/spinlock.h:384 [inline]
__remove_mapping+0x932/0x1af0 mm/vmscan.c:967
shrink_page_list+0x6610/0xc2e0 mm/vmscan.c:1461
shrink_inactive_list+0x77b/0x1c60 mm/vmscan.c:1961
shrink_list mm/vmscan.c:2273 [inline]
shrink_node_memcg+0x7a8/0x19a0 mm/vmscan.c:2538
shrink_node+0x3e1/0x17f0 mm/vmscan.c:2753
shrink_zones mm/vmscan.c:2987 [inline]
do_try_to_free_pages+0x3df/0x12a0 mm/vmscan.c:3049
try_to_free_pages+0x4d0/0xb90 mm/vmscan.c:3265
__perform_reclaim mm/page_alloc.c:3920 [inline]
__alloc_pages_direct_reclaim mm/page_alloc.c:3942 [inline]
__alloc_pages_slowpath+0xa5a/0x2db0 mm/page_alloc.c:4335
__alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
alloc_pages include/linux/gfp.h:509 [inline]
__page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
pagecache_get_page+0x396/0xf00 mm/filemap.c:1615
find_or_create_page include/linux/pagemap.h:322 [inline]
ext4_mb_load_buddy_gfp+0xddf/0x1e70 fs/ext4/mballoc.c:1158
ext4_mb_load_buddy fs/ext4/mballoc.c:1241 [inline]
ext4_mb_regular_allocator+0x634/0x1590 fs/ext4/mballoc.c:2190
ext4_mb_new_blocks+0x1de3/0x4840 fs/ext4/mballoc.c:4538
ext4_ext_map_blocks+0x2eef/0x6180 fs/ext4/extents.c:4404
ext4_map_blocks+0x8f7/0x1b60 fs/ext4/inode.c:636
mpage_map_one_extent fs/ext4/inode.c:2480 [inline]
mpage_map_and_submit_extent fs/ext4/inode.c:2533 [inline]
ext4_writepages+0x2564/0x4170 fs/ext4/inode.c:2884
do_writepages+0x9a/0x1a0 mm/page-writeback.c:2335
__writeback_single_inode+0x20a/0x1660 fs/fs-writeback.c:1316
writeback_sb_inodes+0x71f/0x1210 fs/fs-writeback.c:1580
__writeback_inodes_wb+0x1b9/0x340 fs/fs-writeback.c:1649
wb_writeback+0xa73/0xfc0 fs/fs-writeback.c:1758
oom_reaper: reaped process 7963 (syz-executor189), now anon-rss:0kB,
file-rss:0kB, shmem-rss:0kB
rsyslogd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE),
order=0, oom_score_adj=0
wb_check_start_all fs/fs-writeback.c:1882 [inline]
wb_do_writeback fs/fs-writeback.c:1908 [inline]
wb_workfn+0xee9/0x1790 fs/fs-writeback.c:1942
process_one_work+0xc90/0x1c40 kernel/workqueue.c:2153
worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
kthread+0x35a/0x440 kernel/kthread.c:246
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
CPU: 1 PID: 7840 Comm: rsyslogd Not tainted 4.20.0+ #396
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
dump_header+0x253/0x1239 mm/oom_kill.c:451
oom_kill_process.cold.27+0x10/0x903 mm/oom_kill.c:966
out_of_memory+0x8ba/0x1480 mm/oom_kill.c:1133
__alloc_pages_may_oom mm/page_alloc.c:3666 [inline]
__alloc_pages_slowpath+0x230c/0x2db0 mm/page_alloc.c:4379
__alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
alloc_pages include/linux/gfp.h:509 [inline]
__page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
page_cache_read mm/filemap.c:2373 [inline]
filemap_fault+0x1595/0x25f0 mm/filemap.c:2557
ext4_filemap_fault+0x82/0xad fs/ext4/inode.c:6317
__do_fault+0x100/0x6b0 mm/memory.c:2997
do_read_fault mm/memory.c:3409 [inline]
do_fault mm/memory.c:3535 [inline]
handle_pte_fault mm/memory.c:3766 [inline]
__handle_mm_fault+0x392f/0x5630 mm/memory.c:3890
handle_mm_fault+0x54f/0xc70 mm/memory.c:3927
do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
__do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
RIP: 0033:0x7f00f990e1fd
Code: Bad RIP value.
RSP: 002b:00007f00f6eade30 EFLAGS: 00010293
RAX: 0000000000000fd2 RBX: 000000000111f170 RCX: 00007f00f990e1fd
RDX: 0000000000000fff RSI: 00007f00f86e25a0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 000000000110a260 R09: 0000000000000000
R10: 74616c7567657227 R11: 0000000000000293 R12: 000000000065e420
R13: 00007f00f6eae9c0 R14: 00007f00f9f53040 R15: 0000000000000003
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Qian Cai

unread,
Dec 30, 2018, 10:47:05 PM12/30/18
to syzbot, ak...@linux-foundation.org, arya...@virtuozzo.com, gu...@fb.com, han...@cmpxchg.org, jba...@fb.com, ktk...@virtuozzo.com, linux-...@vger.kernel.org, linu...@kvack.org, mgo...@techsingularity.net, mho...@suse.com, shak...@google.com, syzkall...@googlegroups.com, wi...@infradead.org
Ah, it has KASAN_EXTRA. Need this patch then.

https://lore.kernel.org/lkml/2018122802063...@lca.pw/

or to use GCC from the HEAD which suppose to reduce the stack-size in half.

shrink_page_list
shrink_inactive_list

Those things are 7k each, so 32k would be soon gone.

Dmitry Vyukov

unread,
Dec 31, 2018, 1:31:28 AM12/31/18
to Qian Cai, syzbot, Andrew Morton, Andrey Ryabinin, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, Mel Gorman, Michal Hocko, Shakeel Butt, syzkaller-bugs, Matthew Wilcox
On Mon, Dec 31, 2018 at 4:47 AM Qian Cai <c...@lca.pw> wrote:
>
> Ah, it has KASAN_EXTRA. Need this patch then.
>
> https://lore.kernel.org/lkml/2018122802063...@lca.pw/
>
> or to use GCC from the HEAD which suppose to reduce the stack-size in half.
>
> shrink_page_list
> shrink_inactive_list
>
> Those things are 7k each, so 32k would be soon gone.

I am not sure it's just KASAN. I reproduced stack overflow at this
stack without KASAN:
https://groups.google.com/forum/#!msg/syzkaller-bugs/ZaBzAJbn6i8/Py9FVlAqDQAJ

Note: this was originally reported 5 months ago:
https://groups.google.com/forum/#!msg/syzkaller-bugs/C7d0Hm6YcDM/nQeciKgtCgAJ
so now at least in 2 releases and causes stream of induced crashes
that people spent time debugging:
https://groups.google.com/forum/#!msg/syzkaller-bugs/ZaBzAJbn6i8/Py9FVlAqDQAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/GIpnqHiIEQg/5jzwQqqfCwAJ
https://syzkaller.appspot.com/bug?id=26c906d472ea470c2cb58c77f08f964f347cbc68
https://groups.google.com/forum/#!msg/syzkaller-bugs/Ovkbsq5qd84/FHsTYlsfDAAJ
most likely more of these:
https://syzkaller.appspot.com#upstream
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/9fe14b68-5a3c-5964-62b1-53a4ef4c0b76%40lca.pw.
> For more options, visit https://groups.google.com/d/optout.

syzbot

unread,
Mar 17, 2019, 4:49:02 PM3/17/19
to ak...@linux-foundation.org, arya...@virtuozzo.com, c...@lca.pw, da...@davemloft.net, dvy...@google.com, gu...@fb.com, han...@cmpxchg.org, jba...@fb.com, ktk...@virtuozzo.com, linux-...@vger.kernel.org, linu...@kvack.org, linux...@vger.kernel.org, mgo...@techsingularity.net, mho...@suse.com, net...@vger.kernel.org, nho...@tuxdriver.com, shak...@google.com, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk, vyas...@gmail.com, wi...@infradead.org
syzbot has bisected this bug to:

commit c981f254cc82f50f8cb864ce6432097b23195b9c
Author: Al Viro <vi...@zeniv.linux.org.uk>
Date: Sun Jan 7 18:19:09 2018 +0000

sctp: use vmemdup_user() rather than badly open-coding memdup_user()

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
start commit: c981f254 sctp: use vmemdup_user() rather than badly open-c..
git tree: upstream
final crash: https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
Reported-by: syzbot+ec1b75...@syzkaller.appspotmail.com
Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding
memdup_user()")

Xin Long

unread,
Mar 19, 2019, 2:03:16 PM3/19/19
to syzbot, ak...@linux-foundation.org, arya...@virtuozzo.com, c...@lca.pw, davem, Dmitry Vyukov, gu...@fb.com, han...@cmpxchg.org, jba...@fb.com, Kirill Tkhai, LKML, linu...@kvack.org, linux...@vger.kernel.org, mgo...@techsingularity.net, mho...@suse.com, network dev, Neil Horman, shak...@google.com, syzkaller-bugs, vi...@zeniv.linux.org.uk, Vlad Yasevich, wi...@infradead.org
On Mon, Mar 18, 2019 at 4:49 AM syzbot
<syzbot+ec1b75...@syzkaller.appspotmail.com> wrote:
>
> syzbot has bisected this bug to:
>
> commit c981f254cc82f50f8cb864ce6432097b23195b9c
> Author: Al Viro <vi...@zeniv.linux.org.uk>
> Date: Sun Jan 7 18:19:09 2018 +0000
>
> sctp: use vmemdup_user() rather than badly open-coding memdup_user()
'addrs_size' is passed from users, we actually used GFP_USER to
put some more restrictions on it in this commit:

commit cacc06215271104b40773c99547c506095db6ad4
Author: Marcelo Ricardo Leitner <marcelo...@gmail.com>
Date: Mon Nov 30 14:32:54 2015 -0200

sctp: use GFP_USER for user-controlled kmalloc

However, vmemdup_user() will 'ignore' this flag when going to vmalloc_*(),
So we probably should fix it by using memdup_user() to avoid that
open-coding part instead:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ea95cd4..e5bcade 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -999,7 +999,7 @@ static int sctp_setsockopt_bindx(struct sock *sk,
if (unlikely(addrs_size <= 0))
return -EINVAL;

- kaddrs = vmemdup_user(addrs, addrs_size);
+ kaddrs = memdup_user(addrs, addrs_size);

Andrey Ryabinin

unread,
Mar 20, 2019, 5:56:35 AM3/20/19
to syzbot, ak...@linux-foundation.org, c...@lca.pw, da...@davemloft.net, dvy...@google.com, gu...@fb.com, han...@cmpxchg.org, jba...@fb.com, ktk...@virtuozzo.com, linux-...@vger.kernel.org, linu...@kvack.org, linux...@vger.kernel.org, mgo...@techsingularity.net, mho...@suse.com, net...@vger.kernel.org, nho...@tuxdriver.com, shak...@google.com, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk, vyas...@gmail.com, wi...@infradead.org, Xin Long
From bisection log:

testing release v4.17
testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
run #0: crashed: kernel panic: corrupted stack end in wb_workfn
run #1: crashed: kernel panic: corrupted stack end in worker_thread
run #2: crashed: kernel panic: Out of memory and no killable processes...
run #3: crashed: kernel panic: corrupted stack end in wb_workfn
run #4: crashed: kernel panic: corrupted stack end in wb_workfn
run #5: crashed: kernel panic: corrupted stack end in wb_workfn
run #6: crashed: kernel panic: corrupted stack end in wb_workfn
run #7: crashed: kernel panic: corrupted stack end in wb_workfn
run #8: crashed: kernel panic: Out of memory and no killable processes...
run #9: crashed: kernel panic: corrupted stack end in wb_workfn
testing release v4.16
testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
run #0: OK
run #1: OK
run #2: OK
run #3: OK
run #4: OK
run #5: crashed: kernel panic: Out of memory and no killable processes...
run #6: OK
run #7: crashed: kernel panic: Out of memory and no killable processes...
run #8: OK
run #9: OK
testing release v4.15
testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
all runs: OK
# git bisect start v4.16 v4.15

Why bisect started between 4.16 4.15 instead of 4.17 4.16?


testing commit c14376de3a1befa70d9811ca2872d47367b48767 with gcc (GCC) 8.1.0
run #0: crashed: kernel panic: Out of memory and no killable processes...
run #1: crashed: kernel panic: Out of memory and no killable processes...
run #2: crashed: kernel panic: Out of memory and no killable processes...
run #3: crashed: kernel panic: Out of memory and no killable processes...
run #4: OK
run #5: OK
run #6: crashed: WARNING: ODEBUG bug in netdev_freemem
run #7: crashed: no output from test machine
run #8: OK
run #9: OK
# git bisect bad c14376de3a1befa70d9811ca2872d47367b48767

Why c14376de3a1befa70d9811ca2872d47367b48767 is bad? There was no stack corruption.
It looks like the syzbot were bisecting a different bug - "kernel panic: Out of memory and no killable processes..."
And bisection for that bug seems to be correct. kvmalloc() in vmemdup_user() may eat up all memory unlike kmalloc which is limited by KMALLOC_MAX_SIZE (4MB usually).

Dmitry Vyukov

unread,
Mar 20, 2019, 5:59:22 AM3/20/19
to Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
looks like the right range, no?


> testing commit c14376de3a1befa70d9811ca2872d47367b48767 with gcc (GCC) 8.1.0
> run #0: crashed: kernel panic: Out of memory and no killable processes...
> run #1: crashed: kernel panic: Out of memory and no killable processes...
> run #2: crashed: kernel panic: Out of memory and no killable processes...
> run #3: crashed: kernel panic: Out of memory and no killable processes...
> run #4: OK
> run #5: OK
> run #6: crashed: WARNING: ODEBUG bug in netdev_freemem
> run #7: crashed: no output from test machine
> run #8: OK
> run #9: OK
> # git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
>
> Why c14376de3a1befa70d9811ca2872d47367b48767 is bad? There was no stack corruption.
> It looks like the syzbot were bisecting a different bug - "kernel panic: Out of memory and no killable processes..."
> And bisection for that bug seems to be correct. kvmalloc() in vmemdup_user() may eat up all memory unlike kmalloc which is limited by KMALLOC_MAX_SIZE (4MB usually).

Please see https://github.com/google/syzkaller/blob/master/docs/syzbot.md#bisection
for answer.

Dmitry Vyukov

unread,
Mar 20, 2019, 6:38:32 AM3/20/19
to Tetsuo Handa, Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>
> "kernel panic: Out of memory and no killable processes..." is completely
> unrelated to "kernel panic: corrupted stack end in wb_workfn".


Do you think this predicate is possible to code? Looking at the
examples we have, distinguishing different bugs does not look feasible
to me. If the predicate is not accurate, you just trade one set of
false positives to another set of false positives and then you at the
beginning of an infinite slippery slope refining it.
Also, if we see a different bug (assuming we can distinguish them),
does it mean that the original bug is not present? Or it's also
present, but we just hit the other one first? This also does not look
feasible to answer. And if you give a wrong answer, bisection goes the
wrong way and we are where we started. Just with more complex code and
things being even harder to explain to other people.
I mean, yes, I agree, kernel bug bisection won't be perfect. But do
you see anything actionable here?

Dmitry Vyukov

unread,
Mar 20, 2019, 6:42:48 AM3/20/19
to Tetsuo Handa, Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
I see the larger long term bisection quality improvement (for syzbot
and for everybody else) in doing some actual testing for each kernel
commit before it's being merged into any kernel tree, so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc. I don't see how reliable
bisection is possible without that.

Tetsuo Handa

unread,
Mar 20, 2019, 6:59:14 AM3/20/19
to Dmitry Vyukov, Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
On 2019/03/20 19:42, Dmitry Vyukov wrote:
>> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
>> you see anything actionable here?

Allow users to manually tell bisection range when
automatic bisection found a wrong commit.

Also, allow users to specify reproducer program
when automatic bisection found a wrong commit.

Yes, this is anti automation. But since automation can't become perfect,
I'm suggesting manual adjustment. Even if we involve manual adjustment,
the syzbot's plenty CPU resources for building/testing kernels is highly
appreciated (compared to doing manual bisection by building/testing kernels
on personal PC environments).

>
> I see the larger long term bisection quality improvement (for syzbot
> and for everybody else) in doing some actual testing for each kernel
> commit before it's being merged into any kernel tree, so that we have
> less of these a single program triggers 3 different bugs, stray
> unrelated bugs, broken release boots, etc. I don't see how reliable
> bisection is possible without that.
>

syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
Are you saying that syzbot will become be able to test kernels with custom patches?

Tetsuo Handa

unread,
Mar 20, 2019, 9:08:04 AM3/20/19
to Dmitry Vyukov, Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
On 2019/03/20 18:59, Dmitry Vyukov wrote:
No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
"Stack corruption" can't manifest as "Out of memory and no killable processes".

"kernel panic: Out of memory and no killable processes..." is completely
unrelated to "kernel panic: corrupted stack end in wb_workfn".

Andrey Ryabinin

unread,
Mar 20, 2019, 9:33:57 AM3/20/19
to Dmitry Vyukov, Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
Something like bellow probably would work better than current behavior.

For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
Also it might be worth to experiment with using neural networks to identify duplicates.


target_crash = 'kernel panic: corrupted stack end in wb_workfn'
test commit:
bad = false;
skip = true;
foreach run:
run_started, crashed, crash := run_repro();

//kernel built, booted, reproducer launched successfully
if (run_started)
skip = false;
if (crashed && is_duplicates(crash, target_crash))
bad = true;

if (skip)
git bisect skip;
else if (bad)
git bisect bad;
else
git bisect good;

Dmitry Vyukov

unread,
Mar 20, 2019, 9:57:13 AM3/20/19
to Andrey Ryabinin, Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
Lots of bugs (half?) manifest differently. On top of this, titles
change as we go back in history. On top of this, if we see a different
bug, it does not mean that the original bug is also not there.
This will sure solve some subset of cases better then the current
logic. But I feel that that subset is smaller then what the current
logic solves.

> syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.

This is very limited set of info. And in the end I think we've seen
all bug types being duped on all other bugs types pair-wise, and at
the same time we've seen all bug types being not dups to all other bug
types. So I don't see where this gets us.
And again as we go back in history all these titles change.

Dmitry Vyukov

unread,
Mar 20, 2019, 9:59:31 AM3/20/19
to Tetsuo Handa, Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
On Wed, Mar 20, 2019 at 11:59 AM Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 19:42, Dmitry Vyukov wrote:
> >> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> >> you see anything actionable here?
>
> Allow users to manually tell bisection range when
> automatic bisection found a wrong commit.
>
> Also, allow users to specify reproducer program
> when automatic bisection found a wrong commit.
>
> Yes, this is anti automation. But since automation can't become perfect,
> I'm suggesting manual adjustment. Even if we involve manual adjustment,
> the syzbot's plenty CPU resources for building/testing kernels is highly
> appreciated (compared to doing manual bisection by building/testing kernels
> on personal PC environments).

FTR: provided an extended answer here:
https://groups.google.com/d/msg/syzkaller-bugs/1BSkmb_fawo/DOcDxv_KAgAJ


> > I see the larger long term bisection quality improvement (for syzbot
> > and for everybody else) in doing some actual testing for each kernel
> > commit before it's being merged into any kernel tree, so that we have
> > less of these a single program triggers 3 different bugs, stray
> > unrelated bugs, broken release boots, etc. I don't see how reliable
> > bisection is possible without that.
> >
>
> syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
> Are you saying that syzbot will become be able to test kernels with custom patches?

I mean if we start improving kernel quality over time so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc, it will improve bisection
quality for everybody (beside being hugely useful in itself).

Dmitry Vyukov

unread,
Mar 21, 2019, 5:45:58 AM3/21/19
to Andrey Ryabinin, Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
Counter-examples come up in basically every other bisection.
For example:

bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.19
testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.18
testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
testing release v4.17
testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test

That's a different crash title, unless somebody explicitly code this case.

Or, what crash is this?

testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
run #1: crashed: general protection fault in cpuacct_charge
run #2: crashed: WARNING: suspicious RCU usage in corrupted
run #3: crashed: general protection fault in cpuacct_charge
run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
run #6: crashed: WARNING: suspicious RCU usage
run #7: crashed: no output from test machine
run #8: crashed: no output from test machine


Or, that "INFO: trying to register non-static key in can_notifier"
does not do any testing, but is "WARNING in dma_buf_vunmap" still
there or not?

testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: WARNING in dma_buf_vunmap
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: OK
# git bisect start v4.12 v4.11
Bisecting: 7831 revisions left to test after this (roughly 13 steps)
[2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
Bisecting: 3853 revisions left to test after this (roughly 12 steps)
[8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
Bisecting: 2022 revisions left to test after this (roughly 11 steps)
[cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
'mac80211-next-for-davem-2017-04-28' of
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
all runs: crashed: INFO: trying to register non-static key in can_notifier

Dmitry Vyukov

unread,
Mar 21, 2019, 5:51:26 AM3/21/19
to Andrey Ryabinin, Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
And to make things even more interesting, this later changes to "BUG:
unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":

testing release v4.12
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: general protection fault in refcount_sub_and_test
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: crashed: BUG: unable to handle kernel NULL pointer
dereference in vb2_vmalloc_put

And since the original bug is in vb2 subsystem
(https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
it's actually not clear even for me, if we should treat it as the same
bug or not. May be different manifestation of the same root cause, or
a different bug around.

Tetsuo Handa

unread,
Mar 21, 2019, 7:41:54 AM3/21/19
to Dmitry Vyukov, Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller, gu...@fb.com, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM, linux...@vger.kernel.org, Mel Gorman, Michal Hocko, netdev, Neil Horman, Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich, Matthew Wilcox, Xin Long
Well, maybe we should use reproducers for checking whether each not-yet-fixed
problem is reproducible with old kernels rather than finding specific commit
that is causing specific problem?

I think there are two patterns syzbot starts reporting.

(a) a commit which causes one or more problems is merged into a codebase where
syzbot was already testing because syzbot already knew what/how should
that codebase be tested.

(b) a commit which causes one or more problems was already there in a codebase
where syzbot did not know until now what/how should that codebase be tested.

(a) tends to require testing new kernels (i.e. bisection range is narrow) whereas
(b) tends to require testing old kernels (i.e. bisection range is wide).

Regarding case (b), it is difficult for developers to guess when the problem
started, and I think that (b) tends to confuse automatic bisection attempts.

Therefore, instead of trying to find specific commit for specific problem using
"git bisect" approach, try running all reproducers (gathered from all problems)
on each release (e.g. each git tag) and append reproduced crashes to the

Manager Time Kernel Commit Syzkaller Config Log Report Syz repro C repro Maintainers

table for each not-yet-fixed problem of dashboard interface. That is, if running a
repro1 from problem1 on some old kernel reproduced a crash for problem2, append the
crash to the problem2's table. Maybe we want to use a new table with only

Kernel Commit Syzkaller Config Log Report Syz repro C repro

entries because what we want to know is the oldest kernel release which helps
guessing when the problem started.
Reply all
Reply to author
Forward
0 new messages