fs: use-after-free in path_lookupat

162 views
Skip to first unread message

Dmitry Vyukov

unread,
Mar 4, 2017, 9:59:57 AM3/4/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
Hello,

I am getting the following use-after-free reports while running
syzkaller fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760 (but also
happened on 6dc39c50e4aeb769c8ae06edf2b1a732f3490913 and
c82be9d2244aacea9851c86f4fb74694c99cd874).

==================================================================
BUG: KASAN: use-after-free in perf_trace_lock_acquire+0x9cf/0xa00
include/trace/events/lock.h:12 at addr ffff88008477c930
Read of size 8 by task syz-executor3/878
CPU: 1 PID: 878 Comm: syz-executor3 Not tainted 4.10.0+ #276
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:331
perf_trace_lock_acquire+0x9cf/0xa00 include/trace/events/lock.h:12
trace_lock_acquire include/trace/events/lock.h:12 [inline]
lock_acquire+0x473/0x630 kernel/locking/lockdep.c:3752
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
lockref_get_not_dead+0x19/0x80 lib/lockref.c:179
legitimize_path.isra.36+0x7d/0x1a0 fs/namei.c:640
unlazy_walk+0xf2/0x4b0 fs/namei.c:692
complete_walk+0xb2/0x1f0 fs/namei.c:805
path_lookupat+0x1c1/0x400 fs/namei.c:2275
filename_lookup+0x282/0x540 fs/namei.c:2301
user_path_at_empty+0x40/0x50 fs/namei.c:2555
user_path_at include/linux/namei.h:55 [inline]
SYSC_name_to_handle_at fs/fhandle.c:106 [inline]
SyS_name_to_handle_at+0xff/0x720 fs/fhandle.c:92
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458d9
RSP: 002b:00007f2162048b58 EFLAGS: 00000286 ORIG_RAX: 000000000000012f
RAX: ffffffffffffffda RBX: 0000000000000053 RCX: 00000000004458d9
RDX: 0000000020002ff3 RSI: 0000000020002ffa RDI: 0000000000000053
RBP: 00000000006e11b0 R08: 0000000000001000 R09: 0000000000000000
R10: 0000000020002000 R11: 0000000000000286 R12: 0000000000708000
R13: 0000000000000005 R14: 0000000000708020 R15: 00007f2162049700
Object at ffff88008477c880, in cache dentry size: 288
Allocated:
PID = 878
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
kmem_cache_alloc+0x102/0x6e0 mm/slab.c:3571
__d_alloc+0xb3/0xbb0 fs/dcache.c:1571
d_alloc_pseudo+0x1d/0x30 fs/dcache.c:1692
__shmem_file_setup+0x20c/0x5a0 mm/shmem.c:4156
shmem_file_setup mm/shmem.c:4211 [inline]
SYSC_memfd_create mm/shmem.c:3671 [inline]
SyS_memfd_create+0x172/0x2c0 mm/shmem.c:3629
entry_SYSCALL_64_fastpath+0x1f/0xc2
Freed:
PID = 887
kmem_cache_free+0x71/0x240 mm/slab.c:3773
__d_free fs/dcache.c:265 [inline]
dentry_free+0xd5/0x150 fs/dcache.c:314
__dentry_kill+0x471/0x6d0 fs/dcache.c:552
dentry_kill fs/dcache.c:579 [inline]
dput.part.26+0x5ce/0x7c0 fs/dcache.c:791
dput+0x1f/0x30 fs/dcache.c:753
__fput+0x527/0x7f0 fs/file_table.c:226
____fput+0x15/0x20 fs/file_table.c:244
task_work_run+0x18a/0x260 kernel/task_work.c:116
tracehook_notify_resume include/linux/tracehook.h:191 [inline]
exit_to_usermode_loop+0x23b/0x2a0 arch/x86/entry/common.c:160
prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
entry_SYSCALL_64_fastpath+0xc0/0xc2
Memory state around the buggy address:
ffff88008477c800: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
ffff88008477c880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88008477c900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88008477c980: fb fb fb fb fc fc fc fc fc fc fc fc fb fb fb fb
ffff88008477ca00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


Here are 3 more, but they look essentially the same:
https://gist.githubusercontent.com/dvyukov/9c50026c9a82f46cfedf871a48b14b63/raw/9a5d03f10e6eeac3e2113a808463796f9e2921a3/gistfile1.txt


This is barely reproducible on large syzkaller programs. Here are 3 of
them. They look very close, so they are probably mutations of the same
program:
https://gist.githubusercontent.com/dvyukov/f0e9ce7798c6003b8bae4ddb925f10f6/raw/faddf1609c0bfe8cf0465e62204050a95a98323f/gistfile1.txt
Running these programs as:
./syz-execprog -repeat=0 -procs=24 -sandbox=namespace prog
reproduces the crash sometimes after a minute, sometimes after half an
hour. Because of that I can't minimize the reproducer. However, from
the crashes it's clear that involved syscalls are memfd_create and
name_to_handle_at and all programs contain:

r4 = memfd_create(&(0x7f0000013000)="2f6465762f6877726e6700", 0x0)
name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
&(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
0x1000)

What's strange is that dirfd passed to name_to_handle_at is memfd
handle (sic). And path lookup somehow does not fail early on this.
Does it make any sense?
But I don't know if the crash is specific to these calls, or maybe
it's just a common race in lookup code that happens to happen on this
program.

Any ideas how such crash can happen?

Al Viro

unread,
Mar 4, 2017, 2:39:14 PM3/4/17
to Dmitry Vyukov, linux-...@vger.kernel.org, LKML, syzkaller
On Sat, Mar 04, 2017 at 03:59:36PM +0100, Dmitry Vyukov wrote:

> I am getting the following use-after-free reports while running
> syzkaller fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760 (but also
> happened on 6dc39c50e4aeb769c8ae06edf2b1a732f3490913 and
> c82be9d2244aacea9851c86f4fb74694c99cd874).

IOW, it's not fs/namei.c patches from this window...

> unlazy_walk+0xf2/0x4b0 fs/namei.c:692

Could you post disassembly (e.g. from objdump -d) of your unlazy_walk()?
For the kernel the trace is from...

> r4 = memfd_create(&(0x7f0000013000)="2f6465762f6877726e6700", 0x0)
> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
> 0x1000)
>
> What's strange is that dirfd passed to name_to_handle_at is memfd
> handle (sic). And path lookup somehow does not fail early on this.
> Does it make any sense?

It doesn't, but is that the triggering call of name_to_handle_at(), or do you
have it called elsewhere?

FWIW, no LOOKUP_ROOT in filename_lookup() flags + NULL root + dfd not
equal to AT_FDCWD + non-empty name should've ended up in
if (!d_can_lookup(dentry)) {
fdput(f);
return ERR_PTR(-ENOTDIR);
}
in path_init() and it shouldn't have progressed any further. And in case
of name_to_handle_at() we have user_path_at(dfd, name, lookup_flags, &path),
i.e. user_path_at_empty(dfd, name, lookup_flags, &path, NULL), i.e.
filename_lookup(dfd, getname_flags(name, lookup_flags, NULL), lookup_flags,
&path, NULL). IOW, filename_lookup() is called with root equal to NULL,
dfd and name coming straight from userland and lookup_flags containing
nothing beyond LOOKUP_EMPTY and LOOKUP_FOLLOW...

Dmitry Vyukov

unread,
Mar 5, 2017, 6:16:17 AM3/5/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
I probably messed something. Here is a new report on
0710f3ff91ecc4a715db6e4d0690472b13c4dac6. Line numbers seem to match
now.

BUG: KASAN: use-after-free in debug_spin_lock_before
kernel/locking/spinlock_debug.c:83 [inline] at addr ffff880059c2ace4
BUG: KASAN: use-after-free in do_raw_spin_lock+0x1bb/0x1f0
kernel/locking/spinlock_debug.c:112 at addr ffff880059c2ace4
Read of size 4 by task syz-executor/21853
CPU: 0 PID: 21853 Comm: syz-executor Not tainted 4.10.0+ #293
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x2fb/0x3fd lib/dump_stack.c:52
kasan_object_err+0x1c/0x90 mm/kasan/report.c:166
print_address_description mm/kasan/report.c:208 [inline]
kasan_report_error mm/kasan/report.c:292 [inline]
kasan_report.part.2+0x1b0/0x460 mm/kasan/report.c:314
kasan_report mm/kasan/report.c:334 [inline]
__asan_report_load4_noabort+0x29/0x30 mm/kasan/report.c:334
debug_spin_lock_before kernel/locking/spinlock_debug.c:83 [inline]
do_raw_spin_lock+0x1bb/0x1f0 kernel/locking/spinlock_debug.c:112
__raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
_raw_spin_lock+0x3b/0x50 kernel/locking/spinlock.c:151
spin_lock include/linux/spinlock.h:299 [inline]
lockref_get_not_dead+0x19/0x80 lib/lockref.c:179
legitimize_path.isra.36+0x7d/0x1a0 fs/namei.c:640
unlazy_walk+0xf2/0x4b0 fs/namei.c:692
complete_walk+0xb2/0x1f0 fs/namei.c:805
path_lookupat+0x1c1/0x400 fs/namei.c:2275
filename_lookup+0x282/0x540 fs/namei.c:2301
user_path_at_empty+0x40/0x50 fs/namei.c:2555
user_path_at include/linux/namei.h:55 [inline]
SYSC_name_to_handle_at fs/fhandle.c:106 [inline]
SyS_name_to_handle_at+0xff/0x720 fs/fhandle.c:92
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x4458d9
RSP: 002b:00007f462e243b58 EFLAGS: 00000286 ORIG_RAX: 000000000000012f
RAX: ffffffffffffffda RBX: 0000000000000051 RCX: 00000000004458d9
RDX: 0000000020002ff3 RSI: 0000000020002ffa RDI: 0000000000000051
RBP: 00000000006e11b0 R08: 0000000000001000 R09: 0000000000000000
R10: 0000000020002000 R11: 0000000000000286 R12: 0000000000708000
R13: 0000000020000000 R14: 0000000000013000 R15: 0000000000000003
Object at ffff880059c2ac60, in cache dentry size: 288
Allocated:
PID = 21878
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:616
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:555
kmem_cache_alloc+0x102/0x6e0 mm/slab.c:3572
__d_alloc+0xb3/0xbb0 fs/dcache.c:1571
d_alloc_pseudo+0x1d/0x30 fs/dcache.c:1692
__shmem_file_setup+0x20c/0x5a0 mm/shmem.c:4157
shmem_file_setup mm/shmem.c:4212 [inline]
SYSC_memfd_create mm/shmem.c:3672 [inline]
SyS_memfd_create+0x172/0x2c0 mm/shmem.c:3630
entry_SYSCALL_64_fastpath+0x1f/0xc2
Freed:
PID = 21878
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:589
__cache_free mm/slab.c:3514 [inline]
kmem_cache_free+0x71/0x240 mm/slab.c:3774
__d_free fs/dcache.c:265 [inline]
dentry_free+0xd5/0x160 fs/dcache.c:314
__dentry_kill+0x471/0x6d0 fs/dcache.c:552
dentry_kill fs/dcache.c:579 [inline]
dput.part.25+0x5ce/0x7c0 fs/dcache.c:791
dput+0x1f/0x30 fs/dcache.c:753
__fput+0x538/0x800 fs/file_table.c:227
____fput+0x15/0x20 fs/file_table.c:245
task_work_run+0x197/0x260 kernel/task_work.c:116
tracehook_notify_resume include/linux/tracehook.h:191 [inline]
exit_to_usermode_loop+0x23b/0x2a0 arch/x86/entry/common.c:161
prepare_exit_to_usermode arch/x86/entry/common.c:191 [inline]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:260
entry_SYSCALL_64_fastpath+0xc0/0xc2
Memory state around the buggy address:
ffff880059c2ab80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880059c2ac00: 00 00 00 00 fc fc fc fc fc fc fc fc fb fb fb fb
>ffff880059c2ac80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff880059c2ad00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880059c2ad80: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
==================================================================

Dmitry Vyukov

unread,
Mar 5, 2017, 6:20:26 AM3/5/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller

Dmitry Vyukov

unread,
Mar 5, 2017, 6:25:04 AM3/5/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
Yes, but still it somehow happens...

I don't know if it's related or not, but in all cases the path passed
to memfd_create is used in openat (that "2f6465762f6877726e6700" which
stands for "/dev/hwrng"):

r3 = openat$hwrng(0xffffffffffffff9c,
&(0x7f000000f000)="2f6465762f6877726e6700", 0x0, 0x0)
mmap(&(0x7f0000013000/0x1000)=nil, (0x1000), 0x3, 0x32, 0xffffffffffffffff, 0x0)
r4 = memfd_create(&(0x7f0000013000)="2f6465762f6877726e6700", 0x0)
name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
&(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
0x1000)



Dmitry Vyukov

unread,
Mar 5, 2017, 6:37:34 AM3/5/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
On Sun, Mar 5, 2017 at 12:24 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>>>> On Sat, Mar 04, 2017 at 03:59:36PM +0100, Dmitry Vyukov wrote:
>>>>
>>>>> I am getting the following use-after-free reports while running
>>>>> syzkaller fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760 (but also
>>>>> happened on 6dc39c50e4aeb769c8ae06edf2b1a732f3490913 and
>>>>> c82be9d2244aacea9851c86f4fb74694c99cd874).
>>>>
>>>> IOW, it's not fs/namei.c patches from this window...
>>>>
>>>>> unlazy_walk+0xf2/0x4b0 fs/namei.c:692
>>>>
>>>> Could you post disassembly (e.g. from objdump -d) of your unlazy_walk()?
>>>> For the kernel the trace is from...
>>>>
>>>>> r4 = memfd_create(&(0x7f0000013000)="2f6465762f6877726e6700", 0x0)
>>>>> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
>>>>> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
>>>>> 0x1000)
>>>>>
>>>>> What's strange is that dirfd passed to name_to_handle_at is memfd
>>>>> handle (sic). And path lookup somehow does not fail early on this.
>>>>> Does it make any sense?
>>>>
>>>> It doesn't, but is that the triggering call of name_to_handle_at(), or do you
>>>> have it called elsewhere?

I am pretty sure it is that one.
I don't think I ever used name_to_handle_at syscall in my life and I
definitely didn't make it lookup a memfd :)


>>>> FWIW, no LOOKUP_ROOT in filename_lookup() flags + NULL root + dfd not
>>>> equal to AT_FDCWD + non-empty name should've ended up in
>>>> if (!d_can_lookup(dentry)) {
>>>> fdput(f);
>>>> return ERR_PTR(-ENOTDIR);
>>>> }
>>>> in path_init() and it shouldn't have progressed any further. And in case
>>>> of name_to_handle_at() we have user_path_at(dfd, name, lookup_flags, &path),
>>>> i.e. user_path_at_empty(dfd, name, lookup_flags, &path, NULL), i.e.
>>>> filename_lookup(dfd, getname_flags(name, lookup_flags, NULL), lookup_flags,
>>>> &path, NULL). IOW, filename_lookup() is called with root equal to NULL,
>>>> dfd and name coming straight from userland and lookup_flags containing
>>>> nothing beyond LOOKUP_EMPTY and LOOKUP_FOLLOW...
> Yes, but still it somehow happens...


I've added this diff:

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2213,6 +2213,7 @@ static const char *path_init(struct nameidata
*nd, unsigned flags)

dentry = f.file->f_path.dentry;

+pr_err("%d: path_init: s=%s flags=%d\n", current->pid, s, dentry->d_flags);
if (*s) {
if (!d_can_lookup(dentry)) {
fdput(f);


Most of the time flags are 4194304, but occasionally they are 5243016:

[ 172.559822] 21279: path_init: s= flags=4194304
[ 172.572357] 21275: path_init: s= flags=4194304
[ 172.605964] 21297: path_init: s= flags=4194304
[ 172.609712] 21301: path_init: s= flags=4194304
[ 172.620832] 21287: path_init: s= flags=4194304
[ 172.651228] 21288: path_init: s= flags=4194304
[ 172.660516] 21306: path_init: s= flags=4194304
[ 172.689294] 21308: path_init: s= flags=4194304
[ 172.689743] 21281: path_init: s= flags=4194304
[ 172.705908] 21313: path_init: s= flags=4194304
[ 172.755287] 21297: path_init: s= flags=5243016
[ 172.762358] 21306: path_init: s= flags=4194304
[ 172.763248] 21306: path_init: s= flags=4194304
[ 172.766700] 21317: path_init: s= flags=4194304
[ 172.775612] 21313: path_init: s= flags=4194304
[ 172.797624] 21308: path_init: s= flags=4194304
[ 172.798709] 21328: path_init: s= flags=4194304
[ 172.800170] 21322: path_init: s= flags=4194304
...
[ 179.202077] 22190: path_init: s= flags=4194304
[ 179.209753] 22189: path_init: s= flags=4194304
[ 179.211614] 22187: path_init: s= flags=4194304
[ 179.223048] 22165: path_init: s= flags=4194304
[ 179.271114] 22195: path_init: s= flags=4194304
[ 179.290350] 22182: path_init: s= flags=4194304
[ 179.301246] 22189: path_init: s= flags=4194304
[ 179.325996] 22202: path_init: s= flags=4194304
[ 179.327900] 22203: path_init: s= flags=4194304
[ 179.349044] 22195: path_init: s= flags=4194304
[ 179.363826] 22211: path_init: s= flags=4194304
[ 179.364938] 22207: path_init: s= flags=4194304
[ 179.364985] 22206: path_init: s= flags=4194304
[ 179.415240] 22214: path_init: s= flags=4194304
[ 179.464470] 22219: path_init: s= flags=4194304
[ 179.484437] 22225: path_init: s= flags=4194304
[ 179.489139] 22207: path_init: s= flags=4194304
[ 179.495212] 22206: path_init: s= flags=4194304
[ 179.521143] 22216: path_init: s= flags=4194304
[ 179.526780] 22228: path_init: s= flags=4194304
[ 179.540650] 22227: path_init: s= flags=4194304
[ 179.545824] 22225: path_init: s= flags=4194304
[ 179.574581] 22214: path_init: s= flags=4194304
[ 179.577168] 22236: path_init: s= flags=4194304
[ 179.618489] 22240: path_init: s= flags=4194304
[ 179.644057] 22243: path_init: s= flags=4194304
[ 179.647793] 22228: path_init: s= flags=4194304
[ 179.680428] 22248: path_init: s= flags=4194304
[ 179.716533] 22240: path_init: s= flags=4194304
[ 179.720363] 22227: path_init: s= flags=4194304
[ 179.721421] 22236: path_init: s= flags=4194304
[ 179.722195] 22249: path_init: s= flags=4194304
[ 179.729854] 22252: path_init: s= flags=4194304
[ 179.772353] 22248: path_init: s= flags=5243016
[ 179.778042] 22243: path_init: s= flags=4194304
[ 179.779056] ==================================================================
[ 179.779707] BUG: KASAN: use-after-free in
perf_trace_lock_acquire+0x9cf/0xa00 at addr ffff88005c34c930
[ 179.780010] Read of size 8 by task syz-executor/22243
[ 179.780010] CPU: 2 PID: 22243 Comm: syz-executor Not tainted 4.10.0+ #294
[ 179.781396] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[ 179.782914] Call Trace:
[ 179.782914] dump_stack+0x2fb/0x3fd
[ 179.782914] ? arch_local_irq_restore+0x53/0x53
[ 179.782914] ? vprintk_emit+0x566/0x770
[ 179.782914] ? console_unlock+0xf50/0xf50
[ 179.782914] ? console_unlock+0xf50/0xf50
[ 179.782914] ? lock_set_class+0xc00/0xc00
[ 179.782914] ? __lock_is_held+0xb6/0x140
[ 179.782914] ? check_noncircular+0x20/0x20
[ 179.782914] ? lock_set_class+0xc00/0xc00
[ 179.782914] ? __handle_mm_fault+0x1c84/0x2cd0
[ 179.782914] ? vprintk_default+0x28/0x30
[ 179.789989] ? vprintk_func+0x47/0x90
[ 179.789989] ? printk+0xc8/0xf9
[ 179.789989] ? load_image_and_restore+0x134/0x134
[ 179.789989] kasan_object_err+0x1c/0x90
[ 179.789989] kasan_report.part.2+0x1b0/0x460
[ 179.789989] ? kasan_check_write+0x14/0x20
[ 179.789989] ? do_raw_spin_lock+0xbd/0x1f0
[ 179.789989] ? perf_trace_lock_acquire+0x9cf/0xa00
[ 179.789989] __asan_report_load8_noabort+0x29/0x30
[ 179.789989] perf_trace_lock_acquire+0x9cf/0xa00
[ 179.789989] ? RECLAIM_FS_verbose+0x10/0x10
[ 179.789989] ? mark_held_locks+0x100/0x100
[ 179.789989] ? mark_held_locks+0x100/0x100
[ 179.789989] ? __do_page_fault+0x51b/0xb60
[ 179.789989] ? lock_acquire+0x630/0x630
[ 179.789989] ? memset+0x31/0x40
[ 179.789989] lock_acquire+0x473/0x630
[ 179.789989] ? lockref_get_not_dead+0x19/0x80
[ 179.789989] ? lock_set_class+0xc00/0xc00
[ 179.789989] ? trace_hardirqs_on_caller+0x545/0x6f0
[ 179.789989] ? mark_held_locks+0x100/0x100
[ 179.789989] ? trace_hardirqs_on_caller+0x545/0x6f0
[ 179.789989] ? mark_held_locks+0x100/0x100
[ 179.789989] ? mark_held_locks+0x100/0x100
[ 179.789989] ? retint_kernel+0x10/0x10
[ 179.789989] ? trace_hardirqs_on_caller+0x545/0x6f0
[ 179.789989] ? mark_held_locks+0x100/0x100
[ 179.789989] ? trace_hardirqs_on_caller+0x545/0x6f0
[ 179.789989] ? mark_held_locks+0x100/0x100
[ 179.789989] ? trace_hardirqs_on_thunk+0x1a/0x1c
[ 179.789989] ? retint_kernel+0x10/0x10
[ 179.789989] _raw_spin_lock+0x33/0x50
[ 179.789989] ? lockref_get_not_dead+0x19/0x80
[ 179.789989] lockref_get_not_dead+0x19/0x80
[ 179.789989] legitimize_path.isra.36+0x7d/0x1a0
[ 179.789989] unlazy_walk+0xf2/0x4b0
[ 179.789989] complete_walk+0xb2/0x1f0
[ 179.789989] path_lookupat+0x1c1/0x400
[ 179.789989] filename_lookup+0x282/0x540
[ 179.789989] ? filename_parentat+0x5b0/0x5b0
[ 179.806362] ? kmem_cache_alloc+0x3f5/0x6e0
[ 179.806648] ? getname_flags+0x256/0x580
[ 179.806648] user_path_at_empty+0x40/0x50
[ 179.806648] SyS_name_to_handle_at+0xff/0x720
[ 179.806648] ? vfs_dentry_acceptable+0x10/0x10
[ 179.806648] ? retint_kernel+0x10/0x10
[ 179.806648] entry_SYSCALL_64_fastpath+0x1f/0xc2
[ 179.806648] RIP: 0033:0x4458d9
[ 179.806648] RSP: 002b:00007faa1547cb58 EFLAGS: 00000286 ORIG_RAX:
000000000000012f
[ 179.806648] RAX: ffffffffffffffda RBX: 0000000000000050 RCX: 00000000004458d9
[ 179.806648] RDX: 0000000020002ff3 RSI: 0000000020002ffa RDI: 0000000000000050
[ 179.806648] RBP: 00000000006e11b0 R08: 0000000000001000 R09: 0000000000000000
[ 179.806648] R10: 0000000020002000 R11: 0000000000000286 R12: 0000000000708000
[ 179.806648] R13: 0000000000000000 R14: 00007faa1547d9c0 R15: 00007faa1547d700
[ 179.806648] Object at ffff88005c34c880, in cache dentry size: 288
[ 179.814491] 22252: path_init: s= flags=4194304
[ 179.806648] Allocated:
[ 179.806648] PID = 22260
[ 179.806648] save_stack_trace+0x16/0x20
[ 179.806648] save_stack+0x43/0xd0
[ 179.816189] kasan_kmalloc+0xaa/0xd0
[ 179.816189] kasan_slab_alloc+0x12/0x20
[ 179.816189] kmem_cache_alloc+0x102/0x6e0
[ 179.816189] __d_alloc+0xb3/0xbb0
[ 179.816189] d_alloc_pseudo+0x1d/0x30
[ 179.816189] __shmem_file_setup+0x20c/0x5a0
[ 179.816189] SyS_memfd_create+0x172/0x2c0
[ 179.816189] entry_SYSCALL_64_fastpath+0x1f/0xc2
[ 179.816189] Freed:
[ 179.816189] PID = 22265
[ 179.816189] save_stack_trace+0x16/0x20
[ 179.816189] save_stack+0x43/0xd0
[ 179.816189] kasan_slab_free+0x6f/0xb0
[ 179.816189] kmem_cache_free+0x71/0x240
[ 179.816189] dentry_free+0xd5/0x160
[ 179.816189] __dentry_kill+0x471/0x6d0
[ 179.816189] dput.part.25+0x5ce/0x7c0
[ 179.816189] dput+0x1f/0x30
[ 179.816189] __fput+0x538/0x800
[ 179.816189] ____fput+0x15/0x20
[ 179.816189] task_work_run+0x197/0x260
[ 179.816189] exit_to_usermode_loop+0x23b/0x2a0
[ 179.816189] syscall_return_slowpath+0x4d3/0x570
[ 179.816189] entry_SYSCALL_64_fastpath+0xc0/0xc2
[ 179.816189] Disposed:
[ 179.816189] PID = 21945
[ 179.816189] save_stack_trace+0x16/0x20
[ 179.816189] save_stack+0x43/0xd0
[ 179.816189] kasan_set_rcu_track+0xcf/0xf0
[ 179.816189] __call_rcu.constprop.77+0x1d6/0x15a0
[ 179.816189] call_rcu_sched+0x12/0x20
[ 179.816189] dentry_free+0xb7/0x160
[ 179.816189] __dentry_kill+0x471/0x6d0
[ 179.816189] dput.part.25+0x4fe/0x7c0
[ 179.816189] dput+0x1f/0x30
[ 179.816189] __debugfs_remove.part.10+0xb8/0xf0
[ 179.816189] debugfs_remove+0xea/0x1f0
[ 179.816189] bdi_unregister+0x2f9/0x550
[ 179.816189] bdi_destroy+0x15/0x20
[ 179.816189] v9fs_session_init+0x905/0x1a30
[ 179.816189] v9fs_mount+0x81/0x830
[ 179.816189] mount_fs+0x97/0x2e0
[ 179.816189] vfs_kern_mount.part.23+0xc6/0x490
[ 179.816189] do_mount+0x418/0x2da0
[ 179.816189] SyS_mount+0xab/0x120
[ 179.816189] entry_SYSCALL_64_fastpath+0x1f/0xc2
[ 179.816189] Memory state around the buggy address:
[ 179.816189] ffff88005c34c800: fb fb fb fb fb fb fb fb fc fc fc fc
fc fc fc fc
[ 179.816189] ffff88005c34c880: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 179.816189] >ffff88005c34c900: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 179.816189] ^
[ 179.816189] ffff88005c34c980: fb fb fb fb fc fc fc fc fc fc fc fc
00 00 00 00
[ 179.816189] ffff88005c34ca00: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[ 179.816189] ==================================================================

Al Viro

unread,
Mar 5, 2017, 10:57:13 AM3/5/17
to Dmitry Vyukov, linux-...@vger.kernel.org, LKML, syzkaller
On Sun, Mar 05, 2017 at 12:37:13PM +0100, Dmitry Vyukov wrote:

> I am pretty sure it is that one.
> I don't think I ever used name_to_handle_at syscall in my life and I
> definitely didn't make it lookup a memfd :)

So what does it normally return? On the runs where we do not hit that
use-after-free, that is.

What gets triggered there is nd->path.dentry pointing to already freed
dentry. We are in RCU mode, so we are not pinning the dentry and it
might have reached dentry_free(). However, anything with DCACHE_RCUACCESS
set would have freeing RCU-delayed, making that impossible.

memfd stuff does *not* have DCACHE_RCUACCESS, which would've made it
plausible, but... there we really should've been stopped cold by
the d_can_lookup() check - that is done while we are still holding
a reference to struct file, which should've prevented freeing and
reuse. So at the time of that check we have dentry still not reused
by anything, and d_can_lookup() should've failed.

There is a race that could bugger the things up in that area, but it needs
empty name, so this one is something else...

Dmitry Vyukov

unread,
Mar 5, 2017, 11:14:44 AM3/5/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
You can see from the log above that s always empty somehow, so the
d_can_lookup check is simply not done. I have not looked at the code,
but it's not racy, so should follow from the arguments passed to
name_to_handle_at.

Al Viro

unread,
Mar 5, 2017, 11:33:06 AM3/5/17
to Dmitry Vyukov, linux-...@vger.kernel.org, LKML, syzkaller
Umm... name_to_handle_at() in your log:
name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300", &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0, 0x1000)
and unless I'm misreading what you are printing there, you have "./bus0"
passed as the second argument. Right? That's pretty much why I asked about
other possible calls triggering it...

If you are somehow getting there with empty name and if there's another
thread closing these memfd descriptors, I understand what's going on there.
It's how we are getting that empty name on your syscall arguments that
looks very odd...

Dmitry Vyukov

unread,
Mar 5, 2017, 12:33:39 PM3/5/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
Added more debug output.

name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
&(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
0x1000)

actually passes name="" because of the overlapping addresses. Flags
contain AT_EMPTY_PATH.

Dmitry Vyukov

unread,
Mar 5, 2017, 12:38:47 PM3/5/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
The problem can be more general as a bunch of xxxat calls support AT_EMPTY_PATH.

Al Viro

unread,
Mar 5, 2017, 2:18:06 PM3/5/17
to Dmitry Vyukov, linux-...@vger.kernel.org, LKML, syzkaller
On Sun, Mar 05, 2017 at 06:33:18PM +0100, Dmitry Vyukov wrote:

> Added more debug output.
>
> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
> 0x1000)
>
> actually passes name="" because of the overlapping addresses. Flags
> contain AT_EMPTY_PATH.

Bloody hell... So you end up with name == (char *)&handle->handle_type + 3?
Looks like it would be a lot more useful to dump the actual contents of
those suckers right before the syscall...

Anyway, that explains WTF is going on. The bug is in path_init() and
it triggers when you pass something with dentry allocated by d_alloc_pseudo()
as dfd, combined with empty pathname. You need to have the file closed
by another thread, and have that another thread get out of closing syscall
(close(), dup2(), etc.) before the caller of path_init() gets to
complete_walk(). We need to make sure that this sucker gets DCACHE_RCUPDATE
while it's still guaranteed to be pinned down. Could you try to reproduce
with the patch below applied?

diff --git a/fs/namei.c b/fs/namei.c
index 6f7d96368734..70840281a41c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2226,11 +2226,16 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
nd->path = f.file->f_path;
if (flags & LOOKUP_RCU) {
rcu_read_lock();
- nd->inode = nd->path.dentry->d_inode;
- nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
+ if (unlikely(!(dentry->d_flags & DCACHE_RCUACCESS))) {
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags |= DCACHE_RCUACCESS;
+ spin_unlock(&dentry->d_lock);
+ }
+ nd->inode = dentry->d_inode;
+ nd->seq = read_seqcount_begin(&dentry->d_seq);
} else {
path_get(&nd->path);
- nd->inode = nd->path.dentry->d_inode;
+ nd->inode = dentry->d_inode;
}
fdput(f);
return s;

Dmitry Vyukov

unread,
Mar 6, 2017, 4:47:11 AM3/6/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
On Sun, Mar 5, 2017 at 8:18 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
> On Sun, Mar 05, 2017 at 06:33:18PM +0100, Dmitry Vyukov wrote:
>
>> Added more debug output.
>>
>> name_to_handle_at(r4, &(0x7f0000003000-0x6)="2e2f62757300",
>> &(0x7f0000003000-0xd)={0xc, 0x0, "cd21"}, &(0x7f0000002000)=0x0,
>> 0x1000)
>>
>> actually passes name="" because of the overlapping addresses. Flags
>> contain AT_EMPTY_PATH.
>
> Bloody hell... So you end up with name == (char *)&handle->handle_type + 3?
> Looks like it would be a lot more useful to dump the actual contents of
> those suckers right before the syscall...

We can't yet do dumping, it's opposite of generation and we don't have
enough info for it. Strace can do it. But note that it does not
necessary say you true. First, kernel can overwrite some of inputs
with copy_to_user before reading them. Second, racing syscalls that
use the same memory for inputs will lead to non-deterministic inputs,
what you will see from strace is not necessary what kernel sees.
This seems to fix the crash. Reproducer has survived an hour while
usually it crashes within 5 minutes or so.

But we will back to you with data race reports later. All unprotected
accesses should use READ_ONCE/WRITE_ONCE.

Dmitry Vyukov

unread,
Mar 23, 2017, 10:17:43 AM3/23/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
Al, please send this patch officially. I am running with it since then
and have not seen the crashes, nor any other issues that look related.

Thanks!

Dmitry Vyukov

unread,
Apr 28, 2017, 2:19:34 AM4/28/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller
Al, ping. Please send this patch.

Dmitry Vyukov

unread,
May 29, 2017, 10:48:38 AM5/29/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller, Cong Wang
Al, do you want me to mail the patch?
I won't be able to write a super detailed description, but I can do
some format patch.

Al Viro

unread,
May 30, 2017, 2:24:57 AM5/30/17
to Dmitry Vyukov, linux-...@vger.kernel.org, LKML, syzkaller, Cong Wang
On Mon, May 29, 2017 at 04:48:17PM +0200, Dmitry Vyukov wrote:

> Al, do you want me to mail the patch?
> I won't be able to write a super detailed description, but I can do
> some format patch.

It's been fixed by commit c0eb027e5aef7; if you are still able to
trigger it on the current mainline, please yell - that would have
to be something different.

Dmitry Vyukov

unread,
May 30, 2017, 4:20:06 AM5/30/17
to Al Viro, linux-...@vger.kernel.org, LKML, syzkaller, Cong Wang
Thanks for the update!

No, I can't say that I still see this. It happened very infrequently,
so I wanted to make sure that we don't lost it regardless of whether I
see it or not.
Reply all
Reply to author
Forward
0 new messages