linux-next boot error: WARNING in kmem_cache_free

18 views
Skip to first unread message

syzbot

unread,
Jun 22, 2020, 1:37:11 AM6/22/20
to linux-...@vger.kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, s...@canb.auug.org.au, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk
Hello,

syzbot found the following crash on:

HEAD commit: 5a94f5bc Add linux-next specific files for 20200621
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12a02c76100000
kernel config: https://syzkaller.appspot.com/x/.config?x=e1788c418b2ddc66
dashboard link: https://syzkaller.appspot.com/bug?extid=95bccd805a4aa06a4b0d
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+95bccd...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/slab.h:232 kmem_cache_free+0x0/0x200 mm/slab.c:2262
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.0-rc1-next-20200621-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:231
__warn.cold+0x2f/0x3a kernel/panic.c:600
report_bug+0x271/0x2f0 lib/bug.c:198
exc_invalid_op+0x1b9/0x370 arch/x86/kernel/traps.c:235
asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:563
RIP: 0010:kmem_cache_debug_flags mm/slab.h:232 [inline]
RIP: 0010:cache_from_obj mm/slab.h:459 [inline]
RIP: 0010:kmem_cache_free+0x0/0x200 mm/slab.c:3678
Code: ff 49 c7 84 24 90 00 00 00 00 00 00 00 83 c3 01 39 1d 2c ec fb 08 77 af 5b 5d 41 5c 41 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 <0f> 0b 48 85 ff 0f 84 a9 01 00 00 48 83 3d 15 6b 02 08 00 0f 84 9c
RSP: 0000:ffffffff89a07a58 EFLAGS: 00010293
RAX: ffffffff89a86580 RBX: ffff8880aa01f0e8 RCX: ffffffff81a84573
RDX: 0000000000000000 RSI: ffff8880aa01f480 RDI: ffff8880aa00fe00
RBP: ffff8880aa01f4a8 R08: ffffffff89a86580 R09: fffffbfff1340f3f
R10: 0000000000000003 R11: fffffbfff1340f3e R12: ffff8880aa01f4b0
R13: ffff8880aa01f688 R14: ffff8880aa01f480 R15: ffffc90000000000
adjust_va_to_fit_type mm/vmalloc.c:980 [inline]
__alloc_vmap_area mm/vmalloc.c:1096 [inline]
alloc_vmap_area+0x1494/0x1df0 mm/vmalloc.c:1196
__get_vm_area_node+0x178/0x3b0 mm/vmalloc.c:2060
__vmalloc_node_range+0x12c/0x910 mm/vmalloc.c:2484
__vmalloc_node mm/vmalloc.c:2532 [inline]
__vmalloc_area_node mm/vmalloc.c:2404 [inline]
__vmalloc_node_range+0x76c/0x910 mm/vmalloc.c:2489
__vmalloc_node mm/vmalloc.c:2532 [inline]
__vmalloc+0x69/0x80 mm/vmalloc.c:2546
alloc_large_system_hash+0x1c9/0x2e2 mm/page_alloc.c:8181
inode_init+0xab/0xbc fs/inode.c:2099
vfs_caches_init+0x104/0x11e fs/dcache.c:3231
start_kernel+0x978/0x9fb init/main.c:1025
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Qian Cai

unread,
Jun 22, 2020, 2:29:38 AM6/22/20
to syzbot, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, s...@canb.auug.org.au, syzkall...@googlegroups.com, vi...@zeniv.linux.org.uk


> On Jun 22, 2020, at 1:37 AM, syzbot <syzbot+95bccd...@syzkaller.appspotmail.com> wrote:
>
> WARNING: CPU: 0 PID: 0 at mm/slab.h:232 kmem_cache_free+0x0/0x200 mm/slab.c:2262

Is there any particular reason to use CONFIG_SLAB rather than CONFIG_SLUB?

You are really asking for trouble to test something that almost nobody is exercising that code path very well nowadays.

Anyway, there is a patchset in -mm that might well introduce this regression that we could go to confirm it, but I kind of don’t want to spend too much time on SLAB that suppose to be obsolete eventually.

Dmitry Vyukov

unread,
Jun 22, 2020, 2:42:25 AM6/22/20
to Qian Cai, syzbot, linux-fsdevel, LKML, Linux-Next Mailing List, Stephen Rothwell, syzkaller-bugs, Al Viro
On Mon, Jun 22, 2020 at 8:29 AM Qian Cai <c...@lca.pw> wrote:
> > On Jun 22, 2020, at 1:37 AM, syzbot <syzbot+95bccd...@syzkaller.appspotmail.com> wrote:
> >
> > WARNING: CPU: 0 PID: 0 at mm/slab.h:232 kmem_cache_free+0x0/0x200 mm/slab.c:2262
>
> Is there any particular reason to use CONFIG_SLAB rather than CONFIG_SLUB?

There is a reason, it's still important for us.
But also it's not our strategy to deal with bugs by not testing
configurations and closing eyes on bugs, right? If it's an official
config in the kernel, it needs to be tested. If SLAB is in the state
that we don't care about any bugs in it, then we need to drop it. It
will automatically remove it from all testing systems out there. Or at
least make it "depends on BROKEN" to slowly phase it out during
several releases.

Qian Cai

unread,
Jun 22, 2020, 3:28:11 AM6/22/20
to Dmitry Vyukov, syzbot, linux-fsdevel, LKML, Linux-Next Mailing List, Stephen Rothwell, syzkaller-bugs, Al Viro


> On Jun 22, 2020, at 2:42 AM, Dmitry Vyukov <dvy...@google.com> wrote:
>
> There is a reason, it's still important for us.
> But also it's not our strategy to deal with bugs by not testing
> configurations and closing eyes on bugs, right? If it's an official
> config in the kernel, it needs to be tested. If SLAB is in the state
> that we don't care about any bugs in it, then we need to drop it. It
> will automatically remove it from all testing systems out there. Or at
> least make it "depends on BROKEN" to slowly phase it out during
> several releases.

Do you mind sharing what’s your use cases with CONFIG_SLAB? The only thing prevents it from being purged early is that it might perform better with a certain type of networking workloads where syzbot should have nothing to gain from it.

I am more of thinking about the testing coverage that we could use for syzbot to test SLUB instead of SLAB. Also, I have no objection for syzbot to test SLAB, but then from my experience, you are probably on your own to debug further with those testing failures. Until you are able to figure out the buggy patch or patchset introduced the regression, I am afraid not many people would be able to spend much time on SLAB. The developers are pretty much already half-hearted on it by only fixing SLAB here and there without runtime testing it.

Eric Biggers

unread,
Jun 27, 2020, 7:10:16 PM6/27/20
to Qian Cai, Dmitry Vyukov, syzbot, LKML, Linux-Next Mailing List, Stephen Rothwell, syzkaller-bugs, linu...@kvack.org
[+Cc linux-mm; +Bcc linux-fsdevel]
This bug also got reported 2 days later by the kernel test robot
(https://lore.kernel.org/lkml/20200623090213.GW5535@shao2-debian/).
Then it was fixed by commit 437edcaafbe3, so telling syzbot:

#syz fix: mm, slab/slub: improve error reporting and overhead of cache_from_obj()-fix

If CONFIG_SLAB is no longer useful and supported then it needs to be removed
from the kernel. Otherwise, it needs to be tested just like all other options.

- Eric

Qian Cai

unread,
Jun 27, 2020, 8:49:32 PM6/27/20
to Eric Biggers, Dmitry Vyukov, syzbot, LKML, Linux-Next Mailing List, Stephen Rothwell, syzkaller-bugs, linu...@kvack.org


> On Jun 27, 2020, at 7:10 PM, Eric Biggers <ebig...@kernel.org> wrote:
>
> This bug also got reported 2 days later by the kernel test robot
> (lore.kernel.org/lkml/20200623090213.GW5535@shao2-debian/).
> Then it was fixed by commit 437edcaafbe3, so telling syzbot:
>
> #syz fix: mm, slab/slub: improve error reporting and overhead of cache_from_obj()-fix
>
> If CONFIG_SLAB is no longer useful and supported then it needs to be removed
> from the kernel. Otherwise, it needs to be tested just like all other options.

It is awesome that kernel test robot was able to bisect it which is especially useful for testing legacy options like SLAB.
Reply all
Reply to author
Forward
0 new messages