KASAN: use-after-free Read in unregister_shrinker

12 views
Skip to first unread message

syzbot

unread,
Jun 5, 2019, 2:42:08 PM6/5/19
to ak...@linux-foundation.org, bfi...@fieldses.org, bfi...@redhat.com, ch...@chrisdown.name, daniel....@oracle.com, gu...@fb.com, han...@cmpxchg.org, jla...@kernel.org, ktk...@virtuozzo.com, laoar...@gmail.com, linux-...@vger.kernel.org, linu...@kvack.org, linu...@vger.kernel.org, mgo...@techsingularity.net, mho...@suse.com, s...@canb.auug.org.au, syzkall...@googlegroups.com, yang...@linux.alibaba.com
Hello,

syzbot found the following crash on:

HEAD commit: b2924447 Add linux-next specific files for 20190605
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17e867eea00000
kernel config: https://syzkaller.appspot.com/x/.config?x=4248d6bc70076f7d
dashboard link: https://syzkaller.appspot.com/bug?extid=83a43746cebef3508b49
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1122965aa00000

The bug was bisected to:

commit db17b61765c2c63b9552d316551550557ff0fcfd
Author: J. Bruce Fields <bfi...@redhat.com>
Date: Fri May 17 13:03:38 2019 +0000

nfsd4: drc containerization

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=110cd22ea00000
final crash: https://syzkaller.appspot.com/x/report.txt?x=130cd22ea00000
console output: https://syzkaller.appspot.com/x/log.txt?x=150cd22ea00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+83a437...@syzkaller.appspotmail.com
Fixes: db17b61765c2 ("nfsd4: drc containerization")

==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xe6/0xf5
lib/list_debug.c:51
Read of size 8 at addr ffff88808a5bd128 by task syz-executor.2/12471

CPU: 0 PID: 12471 Comm: syz-executor.2 Not tainted 5.2.0-rc3-next-20190605
#9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
__kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
kasan_report+0x12/0x20 mm/kasan/common.c:614
__asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
__list_del_entry_valid+0xe6/0xf5 lib/list_debug.c:51
__list_del_entry include/linux/list.h:117 [inline]
list_del include/linux/list.h:125 [inline]
unregister_shrinker+0xb2/0x2e0 mm/vmscan.c:443
nfsd_reply_cache_shutdown+0x26/0x360 fs/nfsd/nfscache.c:194
nfsd_exit_net+0x170/0x4b0 fs/nfsd/nfsctl.c:1272
ops_exit_list.isra.0+0xaa/0x150 net/core/net_namespace.c:154
setup_net+0x400/0x740 net/core/net_namespace.c:333
copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x444/0x980 kernel/fork.c:2718
__do_sys_unshare kernel/fork.c:2786 [inline]
__se_sys_unshare kernel/fork.c:2784 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2784
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459279
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f7ae73e1c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 0000000000459279
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000000
RBP: 000000000075bfc0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7ae73e26d4
R13: 00000000004c84ef R14: 00000000004decb0 R15: 00000000ffffffff

Allocated by task 12460:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_kmalloc mm/kasan/common.c:489 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503
__do_kmalloc mm/slab.c:3654 [inline]
__kmalloc+0x15c/0x740 mm/slab.c:3663
kmalloc include/linux/slab.h:552 [inline]
kzalloc include/linux/slab.h:742 [inline]
ops_init+0xff/0x410 net/core/net_namespace.c:120
setup_net+0x2d3/0x740 net/core/net_namespace.c:316
copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x444/0x980 kernel/fork.c:2718
__do_sys_unshare kernel/fork.c:2786 [inline]
__se_sys_unshare kernel/fork.c:2784 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2784
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 12460:
save_stack+0x23/0x90 mm/kasan/common.c:71
set_track mm/kasan/common.c:79 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
__cache_free mm/slab.c:3426 [inline]
kfree+0x106/0x2a0 mm/slab.c:3753
ops_init+0xd1/0x410 net/core/net_namespace.c:135
setup_net+0x2d3/0x740 net/core/net_namespace.c:316
copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439
create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206
ksys_unshare+0x444/0x980 kernel/fork.c:2718
__do_sys_unshare kernel/fork.c:2786 [inline]
__se_sys_unshare kernel/fork.c:2784 [inline]
__x64_sys_unshare+0x31/0x40 kernel/fork.c:2784
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff88808a5bcdc0
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 872 bytes inside of
1024-byte region [ffff88808a5bcdc0, ffff88808a5bd1c0)
The buggy address belongs to the page:
page:ffffea0002296f00 refcount:1 mapcount:0 mapping:ffff8880aa400ac0
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000249ea08 ffffea000235a588 ffff8880aa400ac0
raw: 0000000000000000 ffff88808a5bc040 0000000100000007 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88808a5bd000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88808a5bd080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88808a5bd100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88808a5bd180: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff88808a5bd200: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

Kirill Tkhai

unread,
Jun 6, 2019, 3:47:54 AM6/6/19
to syzbot, ak...@linux-foundation.org, bfi...@fieldses.org, bfi...@redhat.com, ch...@chrisdown.name, daniel....@oracle.com, gu...@fb.com, han...@cmpxchg.org, jla...@kernel.org, laoar...@gmail.com, linux-...@vger.kernel.org, linu...@kvack.org, linu...@vger.kernel.org, mgo...@techsingularity.net, mho...@suse.com, s...@canb.auug.org.au, syzkall...@googlegroups.com, yang...@linux.alibaba.com
This may be connected with that shrinker unregistering is forgotten on error path.

---
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index ea39497205f0..8705e7d09717 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -181,6 +181,7 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)

return 0;
out_nomem:
+ unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
printk(KERN_ERR "nfsd: failed to allocate reply cache\n");
return -ENOMEM;
}

J. Bruce Fields

unread,
Jun 6, 2019, 9:14:05 AM6/6/19
to Kirill Tkhai, syzbot, ak...@linux-foundation.org, bfi...@redhat.com, ch...@chrisdown.name, daniel....@oracle.com, gu...@fb.com, han...@cmpxchg.org, jla...@kernel.org, laoar...@gmail.com, linux-...@vger.kernel.org, linu...@kvack.org, linu...@vger.kernel.org, mgo...@techsingularity.net, mho...@suse.com, s...@canb.auug.org.au, syzkall...@googlegroups.com, yang...@linux.alibaba.com
On Thu, Jun 06, 2019 at 10:47:43AM +0300, Kirill Tkhai wrote:
> This may be connected with that shrinker unregistering is forgotten on error path.

I was wondering about that too. Seems like it would be hard to hit
reproduceably though: one of the later allocations would have to fail,
then later you'd have to create another namespace and this time have a
later module's init fail.

This is the patch I have, which also fixes a (probably less important)
failure to free the slab cache.

--b.

commit 17c869b35dc9
Author: J. Bruce Fields <bfi...@redhat.com>
Date: Wed Jun 5 18:03:52 2019 -0400

nfsd: fix cleanup of nfsd_reply_cache_init on failure

Make sure everything is cleaned up on failure.

Especially important for the shrinker, which will otherwise eventually
be freed while still referred to by global data structures.

Signed-off-by: J. Bruce Fields <bfi...@redhat.com>

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index ea39497205f0..3dcac164e010 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -157,12 +157,12 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
nn->nfsd_reply_cache_shrinker.seeks = 1;
status = register_shrinker(&nn->nfsd_reply_cache_shrinker);
if (status)
- return status;
+ goto out_nomem;

nn->drc_slab = kmem_cache_create("nfsd_drc",
sizeof(struct svc_cacherep), 0, 0, NULL);
if (!nn->drc_slab)
- goto out_nomem;
+ goto out_shrinker;

nn->drc_hashtbl = kcalloc(hashsize,
sizeof(*nn->drc_hashtbl), GFP_KERNEL);
@@ -170,7 +170,7 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
nn->drc_hashtbl = vzalloc(array_size(hashsize,
sizeof(*nn->drc_hashtbl)));
if (!nn->drc_hashtbl)
- goto out_nomem;
+ goto out_slab;
}

for (i = 0; i < hashsize; i++) {
@@ -180,6 +180,10 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
nn->drc_hashsize = hashsize;

return 0;
+out_slab:
+ kmem_cache_destroy(nn->drc_slab);
+out_shrinker:
+ unregister_shrinker(&nn->nfsd_reply_cache_shrinker);
out_nomem:

Kirill Tkhai

unread,
Jun 6, 2019, 9:43:52 AM6/6/19
to J. Bruce Fields, syzbot, ak...@linux-foundation.org, bfi...@redhat.com, ch...@chrisdown.name, daniel....@oracle.com, gu...@fb.com, han...@cmpxchg.org, jla...@kernel.org, laoar...@gmail.com, linux-...@vger.kernel.org, linu...@kvack.org, linu...@vger.kernel.org, mgo...@techsingularity.net, mho...@suse.com, s...@canb.auug.org.au, syzkall...@googlegroups.com, yang...@linux.alibaba.com
On 06.06.2019 16:13, J. Bruce Fields wrote:
> On Thu, Jun 06, 2019 at 10:47:43AM +0300, Kirill Tkhai wrote:
>> This may be connected with that shrinker unregistering is forgotten on error path.
>
> I was wondering about that too. Seems like it would be hard to hit
> reproduceably though: one of the later allocations would have to fail,
> then later you'd have to create another namespace and this time have a
> later module's init fail.

Yes, it's had to bump into this in real life.

AFAIU, syzbot triggers such the problem by using fault-injections
on allocation places should_failslab()->should_fail(). It's possible
to configure a specific slab, so the allocations will fail with
requested probability.
Looks OK for me. Feel free to add my reviewed-by if you want.

Reviewed-by: Kirill Tkhai <ktk...@virtuozzo.com>

Dmitry Vyukov

unread,
Jun 6, 2019, 10:41:00 AM6/6/19
to Kirill Tkhai, J. Bruce Fields, syzbot, Andrew Morton, bfi...@redhat.com, ch...@chrisdown.name, Daniel Jordan, gu...@fb.com, Johannes Weiner, Jeff Layton, laoar...@gmail.com, LKML, Linux-MM, linu...@vger.kernel.org, Mel Gorman, Michal Hocko, Stephen Rothwell, syzkaller-bugs, yang...@linux.alibaba.com
On Thu, Jun 6, 2019 at 3:43 PM Kirill Tkhai <ktk...@virtuozzo.com> wrote:
>
> On 06.06.2019 16:13, J. Bruce Fields wrote:
> > On Thu, Jun 06, 2019 at 10:47:43AM +0300, Kirill Tkhai wrote:
> >> This may be connected with that shrinker unregistering is forgotten on error path.
> >
> > I was wondering about that too. Seems like it would be hard to hit
> > reproduceably though: one of the later allocations would have to fail,
> > then later you'd have to create another namespace and this time have a
> > later module's init fail.
>
> Yes, it's had to bump into this in real life.
>
> AFAIU, syzbot triggers such the problem by using fault-injections
> on allocation places should_failslab()->should_fail(). It's possible
> to configure a specific slab, so the allocations will fail with
> requested probability.

No fault injection was involved in triggering of this bug.
Fault injection is clearly visible in console log as "INJECTING
FAILURE at this stack track" splats and also for bugs with repros it
would be noted in the syzkaller repro as "fault_call": N. So somehow
this bug was triggered as is.

But overall syzkaller can do better then the old probabilistic
injection. The probabilistic injection tend to both under-test what we
want to test and also crash some system services. syzkaller uses the
new "systematic fault injection" that allows to test specifically each
failure site separately in each syscall separately.
All kernel testing systems should use it. Also in couple with KASAN,
KMEMLEAK, LOCKDEP. It's indispensable in finding kernel bugs.
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/275f77ad-1962-6a60-e60b-6b8845f12c34%40virtuozzo.com.
> For more options, visit https://groups.google.com/d/optout.

Kirill Tkhai

unread,
Jun 6, 2019, 10:54:29 AM6/6/19
to Dmitry Vyukov, J. Bruce Fields, syzbot, Andrew Morton, bfi...@redhat.com, ch...@chrisdown.name, Daniel Jordan, gu...@fb.com, Johannes Weiner, Jeff Layton, laoar...@gmail.com, LKML, Linux-MM, linu...@vger.kernel.org, Mel Gorman, Michal Hocko, Stephen Rothwell, syzkaller-bugs, yang...@linux.alibaba.com
On 06.06.2019 17:40, Dmitry Vyukov wrote:
> On Thu, Jun 6, 2019 at 3:43 PM Kirill Tkhai <ktk...@virtuozzo.com> wrote:
>>
>> On 06.06.2019 16:13, J. Bruce Fields wrote:
>>> On Thu, Jun 06, 2019 at 10:47:43AM +0300, Kirill Tkhai wrote:
>>>> This may be connected with that shrinker unregistering is forgotten on error path.
>>>
>>> I was wondering about that too. Seems like it would be hard to hit
>>> reproduceably though: one of the later allocations would have to fail,
>>> then later you'd have to create another namespace and this time have a
>>> later module's init fail.
>>
>> Yes, it's had to bump into this in real life.
>>
>> AFAIU, syzbot triggers such the problem by using fault-injections
>> on allocation places should_failslab()->should_fail(). It's possible
>> to configure a specific slab, so the allocations will fail with
>> requested probability.
>
> No fault injection was involved in triggering of this bug.
> Fault injection is clearly visible in console log as "INJECTING
> FAILURE at this stack track" splats and also for bugs with repros it
> would be noted in the syzkaller repro as "fault_call": N. So somehow
> this bug was triggered as is.
>
> But overall syzkaller can do better then the old probabilistic
> injection. The probabilistic injection tend to both under-test what we
> want to test and also crash some system services. syzkaller uses the
> new "systematic fault injection" that allows to test specifically each
> failure site separately in each syscall separately.

Oho! Interesting.

Dmitry Vyukov

unread,
Jun 6, 2019, 11:18:31 AM6/6/19
to Kirill Tkhai, J. Bruce Fields, syzbot, Andrew Morton, bfi...@redhat.com, Chris Down, Daniel Jordan, gu...@fb.com, Johannes Weiner, Jeff Layton, laoar...@gmail.com, LKML, Linux-MM, linu...@vger.kernel.org, Mel Gorman, Michal Hocko, Stephen Rothwell, syzkaller-bugs, yang...@linux.alibaba.com
If you are interested. You write N into /proc/thread-self/fail-nth
(say, 5) then it will cause failure of the N-th (5-th) failure site in
the next syscall in this task only. And by reading it back after the
syscall you can figure out if the failure was indeed injected or not
(or the syscall had less than 5 failure sites).
Then, for each syscall in a test (or only for one syscall of
interest), we start by writing "1" into /proc/thread-self/fail-nth; if
the failure was injected, write "2" and restart the test; if the
failure was injected, write "3" and restart the test; and so on, until
the failure wasn't injected (tested all failure sites).
This guarantees systematic testing of each error path with minimal
number of runs. This has obvious extensions to "each pair of failure
sites" (to test failures on error paths), but it's not supported atm.

Kirill Tkhai

unread,
Jun 6, 2019, 11:25:22 AM6/6/19
to Dmitry Vyukov, J. Bruce Fields, syzbot, Andrew Morton, bfi...@redhat.com, Chris Down, Daniel Jordan, gu...@fb.com, Johannes Weiner, Jeff Layton, laoar...@gmail.com, LKML, Linux-MM, linu...@vger.kernel.org, Mel Gorman, Michal Hocko, Stephen Rothwell, syzkaller-bugs, yang...@linux.alibaba.com
And what you do in case of a tested syscall has pre-requisites? Say,
you test close(), which requires open() and some IO before. Are such
the dependencies statically declared in some configuration file? Or
you test any repeatable sequence of syscalls?

Kirill

Dmitry Vyukov

unread,
Jun 6, 2019, 12:01:45 PM6/6/19
to Kirill Tkhai, J. Bruce Fields, syzbot, Andrew Morton, bfi...@redhat.com, Chris Down, Daniel Jordan, gu...@fb.com, Johannes Weiner, Jeff Layton, laoar...@gmail.com, LKML, Linux-MM, linu...@vger.kernel.org, Mel Gorman, Michal Hocko, Stephen Rothwell, syzkaller-bugs, yang...@linux.alibaba.com, syzkaller
There are several things at play here.
1. syzkaller has notion of "resources". A resource is something that's
produced by one system call and consumed by another, like a file
descriptor.
E.g. see this for userfault fd:
https://github.com/google/syzkaller/blob/698773cb4fbe8873ee0a2c37b86caef01e2c6159/sys/linux/uffd.txt#L8-L12
This allows syzkaller to understand that there is something called
fd_uffd that is produced by userfaultfd() and then needs to be passed
to ioctl$UFFDIO_API().
So for close it knows that it needs to get the fd somewhere first.

2. For syscalls are not explicitly tied by any resources, it will just
try to combine them randomly.

3. There is coverage-guided reinforcement learning. When it discovers
some sensible combinations of syscalls (as denoted by new kernel code
coverage) it memorizes that program for future mutations to get even
more interesting and more sensible programs. This is allows syzkaller
to build more and more interesting programs by doing small incremental
steps (this is the general idea of coverage-guided fuzzing).
Reply all
Reply to author
Forward
0 new messages