KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

20 views
Skip to first unread message

syzbot

unread,
Feb 7, 2020, 3:14:11 AM2/7/20
to b...@alien8.de, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de, tony...@intel.com, x...@kernel.org
Hello,

syzbot found the following crash on:

HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e017e4...@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in percpu_ref_call_confirm_rcu lib/percpu-refcount.c:126 [inline]
BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0x3f7/0x400 lib/percpu-refcount.c:165
Read of size 1 at addr ffff8880a8d91830 by task swapper/0/0

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1fb/0x318 lib/dump_stack.c:118
print_address_description+0x74/0x5c0 mm/kasan/report.c:374
__kasan_report+0x149/0x1c0 mm/kasan/report.c:506
kasan_report+0x26/0x50 mm/kasan/common.c:641
__asan_report_load1_noabort+0x14/0x20 mm/kasan/generic_report.c:132
percpu_ref_call_confirm_rcu lib/percpu-refcount.c:126 [inline]
percpu_ref_switch_to_atomic_rcu+0x3f7/0x400 lib/percpu-refcount.c:165
rcu_do_batch kernel/rcu/tree.c:2186 [inline]
rcu_core+0x81b/0x10c0 kernel/rcu/tree.c:2410
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2419
__do_softirq+0x283/0x7bd kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x227/0x230 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x113/0x280 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
</IRQ>
RIP: 0010:native_safe_halt+0x12/0x20 arch/x86/include/asm/irqflags.h:61
Code: 89 d9 80 e1 07 80 c1 03 38 c1 7c ba 48 89 df e8 e4 5f 9d f9 eb b0 cc cc 55 48 89 e5 e9 07 00 00 00 0f 00 2d 62 17 4c 00 fb f4 <5d> c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e9 07 00 00
RSP: 0018:ffffffff89207db8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff1255a25 RBX: ffffffff89275b00 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff812c06aa RDI: ffffffff89276344
RBP: ffffffff89207db8 R08: ffffffff89276358 R09: fffffbfff124eb61
R10: fffffbfff124eb61 R11: 0000000000000000 R12: 1ffffffff124eb60
R13: dffffc0000000000 R14: dffffc0000000000 R15: 1ffffffff1255a23
arch_safe_halt arch/x86/include/asm/paravirt.h:144 [inline]
default_idle+0x50/0x70 arch/x86/kernel/process.c:695
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:686
default_idle_call+0x59/0xa0 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x1ec/0x630 kernel/sched/idle.c:269
cpu_startup_entry+0x25/0x30 kernel/sched/idle.c:361
rest_init+0x29d/0x2b0 init/main.c:450
arch_call_rest_init+0xe/0x10
start_kernel+0x676/0x777 init/main.c:784
x86_64_start_reservations+0x18/0x2e arch/x86/kernel/head64.c:490
x86_64_start_kernel+0x7a/0x7d arch/x86/kernel/head64.c:471
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242

Allocated by task 25166:
save_stack mm/kasan/common.c:72 [inline]
set_track mm/kasan/common.c:80 [inline]
__kasan_kmalloc+0x118/0x1c0 mm/kasan/common.c:515
kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529
kmem_cache_alloc_trace+0x221/0x2f0 mm/slab.c:3551
kmalloc include/linux/slab.h:555 [inline]
kzalloc include/linux/slab.h:669 [inline]
io_sqe_files_register fs/io_uring.c:5528 [inline]
__io_uring_register fs/io_uring.c:6875 [inline]
__do_sys_io_uring_register fs/io_uring.c:6955 [inline]
__se_sys_io_uring_register+0x1df4/0x3260 fs/io_uring.c:6937
__x64_sys_io_uring_register+0x9b/0xb0 fs/io_uring.c:6937
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 25160:
save_stack mm/kasan/common.c:72 [inline]
set_track mm/kasan/common.c:80 [inline]
kasan_set_free_info mm/kasan/common.c:337 [inline]
__kasan_slab_free+0x12e/0x1e0 mm/kasan/common.c:476
kasan_slab_free+0xe/0x10 mm/kasan/common.c:485
__cache_free mm/slab.c:3426 [inline]
kfree+0x10d/0x220 mm/slab.c:3757
io_sqe_files_unregister+0x238/0x2b0 fs/io_uring.c:5250
io_ring_ctx_free fs/io_uring.c:6229 [inline]
io_ring_ctx_wait_and_kill+0x343d/0x3b00 fs/io_uring.c:6310
io_uring_release+0x5d/0x70 fs/io_uring.c:6318
__fput+0x2e4/0x740 fs/file_table.c:280
____fput+0x15/0x20 fs/file_table.c:313
task_work_run+0x176/0x1b0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880a8d91800
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 48 bytes inside of
256-byte region [ffff8880a8d91800, ffff8880a8d91900)
The buggy address belongs to the page:
page:ffffea0002a36440 refcount:1 mapcount:0 mapping:ffff8880aa4008c0 index:0x0
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea000265d2c8 ffffea00022f7948 ffff8880aa4008c0
raw: 0000000000000000 ffff8880a8d91000 0000000100000008 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8880a8d91700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8880a8d91780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880a8d91800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880a8d91880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880a8d91900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Dmitry Vyukov

unread,
Mar 4, 2020, 2:59:42 AM3/4/20
to syzbot, Al Viro, io-u...@vger.kernel.org, linux-fsdevel, Jens Axboe, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers
On Fri, Feb 7, 2020 at 9:14 AM syzbot
<syzbot+e017e4...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+e017e4...@syzkaller.appspotmail.com

+io_uring maintainers

Here is a repro:
https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000067c6df059df7f9f5%40google.com.

Jens Axboe

unread,
Mar 4, 2020, 9:40:06 AM3/4/20
to Dmitry Vyukov, syzbot, Al Viro, io-u...@vger.kernel.org, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers
On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
> On Fri, Feb 7, 2020 at 9:14 AM syzbot
> <syzbot+e017e4...@syzkaller.appspotmail.com> wrote:
>>
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
>> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+e017e4...@syzkaller.appspotmail.com
>
> +io_uring maintainers
>
> Here is a repro:
> https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt

I've queued up a fix for this:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3

--
Jens Axboe

Dan Carpenter

unread,
Mar 6, 2020, 9:36:14 AM3/6/20
to Jens Axboe, Dmitry Vyukov, syzbot, Al Viro, io-u...@vger.kernel.org, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers

There a bunch of similar bugs. It's seems a common anti-pattern.

block/blk-cgroup.c:85 blkg_free() warn: freeing 'blkg' which has percpu_ref_exit()
block/blk-core.c:558 blk_alloc_queue_node() warn: freeing 'q' which has percpu_ref_exit()
drivers/md/md.c:5528 md_free() warn: freeing 'mddev' which has percpu_ref_exit()
drivers/target/target_core_transport.c:583 transport_free_session() warn: freeing 'se_sess' which has percpu_ref_exit()
fs/aio.c:592 free_ioctx() warn: freeing 'ctx' which has percpu_ref_exit()
fs/aio.c:806 ioctx_alloc() warn: freeing 'ctx' which has percpu_ref_exit()
fs/io_uring.c:6115 io_sqe_files_unregister() warn: freeing 'data' which has percpu_ref_exit()
fs/io_uring.c:6431 io_sqe_files_register() warn: freeing 'ctx->file_data' which has percpu_ref_exit()
fs/io_uring.c:7134 io_ring_ctx_free() warn: freeing 'ctx' which has percpu_ref_exit()
kernel/cgroup/cgroup.c:4948 css_free_rwork_fn() warn: freeing 'css' which has percpu_ref_exit()
mm/backing-dev.c:615 cgwb_create() warn: freeing 'wb' which has percpu_ref_exit()

regards,
dan carpenter

Jens Axboe

unread,
Mar 6, 2020, 9:57:11 AM3/6/20
to Dan Carpenter, Dmitry Vyukov, syzbot, Al Viro, io-u...@vger.kernel.org, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers
The file table io_uring issue is using the ref in a funky way, switching
in and out of atomic if we need to quiesce it. That's different from
other use cases, that just use it as a "normal" reference. Hence for the
funky use case, you can potentially have a switch in progress when you
exit the ref. You really want to wait for that, the easiest solution is
to punt the exit + free to an RCU callback, if there's nothing else you
need to handle once the switch is done.

So I would not be so quick to assume that similar patterns (exit + free)
have similar issues.

--
Jens Axboe

Jann Horn

unread,
Mar 6, 2020, 9:58:24 AM3/6/20
to Jens Axboe, Paul E . McKenney, Dmitry Vyukov, syzbot, Al Viro, io-uring, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers, Dan Carpenter
+paulmck
I believe that this fix relies on call_rcu() having FIFO ordering; but
<https://www.kernel.org/doc/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#Callback%20Registry>
says:

| call_rcu() normally acts only on CPU-local state[...] It simply
enqueues the rcu_head structure on a per-CPU list,

Is this fix really correct?

Jens Axboe

unread,
Mar 6, 2020, 10:34:32 AM3/6/20
to Jann Horn, Paul E . McKenney, Dmitry Vyukov, syzbot, Al Viro, io-uring, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers, Dan Carpenter
That's a good point, there's a potentially stronger guarantee we need
here that isn't "nobody is inside an RCU critical section", but rather
that we're depending on a previous call_rcu() to have happened. Hence I
think you are right - it'll shrink the window drastically, since the
previous callback is already queued up, but it's not a full close.

Hmm...

--
Jens Axboe

Jann Horn

unread,
Mar 6, 2020, 10:36:48 AM3/6/20
to Jens Axboe, Paul E . McKenney, Dmitry Vyukov, syzbot, Al Viro, io-uring, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers, Dan Carpenter
You could potentially hack up the semantics you want by doing a
call_rcu() whose callback does another call_rcu(), or something like
that - but I'd like to hear paulmck's opinion on this first.

Paul E. McKenney

unread,
Mar 6, 2020, 11:44:44 AM3/6/20
to Jann Horn, Jens Axboe, Dmitry Vyukov, syzbot, Al Viro, io-uring, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers, Dan Carpenter
Indeed. For but one example, if there was a CPU-to-CPU migration between
the two call_rcu() invocations, it would not be at all surprising for
the two callbacks to execute out of order.

> > > Is this fix really correct?
> >
> > That's a good point, there's a potentially stronger guarantee we need
> > here that isn't "nobody is inside an RCU critical section", but rather
> > that we're depending on a previous call_rcu() to have happened. Hence I
> > think you are right - it'll shrink the window drastically, since the
> > previous callback is already queued up, but it's not a full close.
> >
> > Hmm...
>
> You could potentially hack up the semantics you want by doing a
> call_rcu() whose callback does another call_rcu(), or something like
> that - but I'd like to hear paulmck's opinion on this first.

That would work!

Or, alternatively, do an rcu_barrier() between the two calls to
call_rcu(), assuming that the use case can tolerate rcu_barrier()
overhead and latency.

Thanx, Paul

Jens Axboe

unread,
Mar 6, 2020, 12:00:23 PM3/6/20
to pau...@kernel.org, Jann Horn, Dmitry Vyukov, syzbot, Al Viro, io-uring, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers, Dan Carpenter
If the nested call_rcu() works, that seems greatly preferable to needing
the rcu_barrier(), even if that would not be a showstopper for me. The
nested call_rcu() is just a bit odd, but with a comment it should be OK.

Incremental here I'm going to test, would just fold in of course.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index f3218fc81943..95ba95b4d8ec 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5330,7 +5330,7 @@ static void io_file_ref_kill(struct percpu_ref *ref)
complete(&data->done);
}

-static void io_file_ref_exit_and_free(struct rcu_head *rcu)
+static void __io_file_ref_exit_and_free(struct rcu_head *rcu)
{
struct fixed_file_data *data = container_of(rcu, struct fixed_file_data,
rcu);
@@ -5338,6 +5338,18 @@ static void io_file_ref_exit_and_free(struct rcu_head *rcu)
kfree(data);
}

+static void io_file_ref_exit_and_free(struct rcu_head *rcu)
+{
+ /*
+ * We need to order our exit+free call again the potentially
+ * existing call_rcu() for switching to atomic. One way to do that
+ * is to have this rcu callback queue the final put and free, as we
+ * could otherwise a pre-existing atomic switch complete _after_
+ * the free callback we queued.
+ */
+ call_rcu(rcu, __io_file_ref_exit_and_free);
+}
+
static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
{
struct fixed_file_data *data = ctx->file_data;

--
Jens Axboe

Jens Axboe

unread,
Mar 6, 2020, 12:08:16 PM3/6/20
to pau...@kernel.org, Jann Horn, Dmitry Vyukov, syzbot, Al Viro, io-uring, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers, Dan Carpenter
Been running for a few minutes just fine, I'm going to leave the
reproducer beating on it for a few hours. But here's the folded in
final:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=fae702294a6a0774ceb3cf250be79e7fe207250a

--
Jens Axboe

Paul E. McKenney

unread,
Mar 6, 2020, 12:19:13 PM3/6/20
to Jens Axboe, Jann Horn, Dmitry Vyukov, syzbot, Al Viro, io-uring, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers, Dan Carpenter
Looks good to me!

Thanx, Paul

Jens Axboe

unread,
Mar 6, 2020, 12:21:34 PM3/6/20
to pau...@kernel.org, Jann Horn, Dmitry Vyukov, syzbot, Al Viro, io-uring, linux-fsdevel, Borislav Petkov, H. Peter Anvin, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner, tony...@intel.com, the arch/x86 maintainers, Dan Carpenter
Thanks Paul! I talked to Paul in private, but here's the final version
with updated comment and attributions.

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=c1e2148f8ecb26863b899d402a823dab8e26efd1

--
Jens Axboe

Reply all
Reply to author
Forward
0 new messages