KASAN: stack-out-of-bounds Read in csd_lock_record

15 views
Skip to first unread message

syzbot

unread,
Jul 3, 2020, 7:31:23 PM7/3/20
to big...@linutronix.de, linux-...@vger.kernel.org, mi...@kernel.org, pau...@kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
Hello,

syzbot found the following crash on:

HEAD commit: 9e50b94b Add linux-next specific files for 20200703
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
compiler: gcc (GCC) 10.1.0-syz 20200507
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0f7192...@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118
Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721

CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
csd_lock_record+0xcb/0xe0 kernel/smp.c:118
flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391
__sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248
asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243
asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78
RSP: 0018:ffffc900016e7558 EFLAGS: 00000282
RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282
RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0
R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000
__debug_check_no_obj_freed lib/debugobjects.c:977 [inline]
debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998
free_pages_prepare mm/page_alloc.c:1219 [inline]
__free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471
release_pages+0x5ec/0x17a0 mm/swap.c:880
tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249
zap_pte_range mm/memory.c:1155 [inline]
zap_pmd_range mm/memory.c:1193 [inline]
zap_pud_range mm/memory.c:1222 [inline]
zap_p4d_range mm/memory.c:1243 [inline]
unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264
unmap_single_vma+0x198/0x300 mm/memory.c:1309
unmap_vmas+0x16f/0x2f0 mm/memory.c:1341
exit_mmap+0x2b1/0x530 mm/mmap.c:3165
__mmput+0x122/0x470 kernel/fork.c:1075
mmput+0x53/0x60 kernel/fork.c:1096
exit_mm kernel/exit.c:483 [inline]
do_exit+0xa8f/0x2a40 kernel/exit.c:793
do_group_exit+0x125/0x310 kernel/exit.c:904
get_signal+0x40b/0x1ee0 kernel/signal.c:2743
do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810
exit_to_usermode_loop arch/x86/entry/common.c:218 [inline]
__prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252
do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c
R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c


Memory state around the buggy address:
ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
>ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
^
ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

syzbot

unread,
Jul 3, 2020, 8:48:21 PM7/3/20
to big...@linutronix.de, linux-...@vger.kernel.org, mi...@kernel.org, pau...@kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
syzbot has found a reproducer for the following crash on:

HEAD commit: 9e50b94b Add linux-next specific files for 20200703
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1224dc83100000
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=170442d5100000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=162ef66d100000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0f7192...@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xd2/0xe0 kernel/smp.c:119
Read of size 8 at addr ffffc900016d75f8 by task swapper/1/0

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
csd_lock_record+0xd2/0xe0 kernel/smp.c:119
flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391
__sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248
asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243
asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604
RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61
Code: ff 4c 89 ef e8 33 30 c7 f9 e9 8e fe ff ff 48 89 df e8 26 30 c7 f9 eb 8a cc cc cc cc e9 07 00 00 00 0f 00 2d 14 4b 5c 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 04 4b 5c 00 f4 c3 cc cc 55 53 e8 c9
RSP: 0018:ffffc90000d3fd18 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8880a95f0340 RSI: ffffffff87ec78c8 RDI: ffffffff87ec789e
RBP: ffff88821af4d864 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88821af4d864
R13: 1ffff920001a7fad R14: ffff88821af4d865 R15: 0000000000000001
arch_safe_halt arch/x86/include/asm/paravirt.h:150 [inline]
acpi_safe_halt+0x8d/0x110 drivers/acpi/processor_idle.c:111
acpi_idle_do_entry+0x15c/0x1b0 drivers/acpi/processor_idle.c:525
acpi_idle_enter+0x3f9/0xab0 drivers/acpi/processor_idle.c:651
cpuidle_enter_state+0xff/0x960 drivers/cpuidle/cpuidle.c:235
cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:346
call_cpuidle kernel/sched/idle.c:126 [inline]
cpuidle_idle_call kernel/sched/idle.c:214 [inline]
do_idle+0x431/0x6d0 kernel/sched/idle.c:276
cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:372
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243


Memory state around the buggy address:
ffffc900016d7480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc900016d7500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc900016d7580: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 f3 f3 f3 f3
^
ffffc900016d7600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc900016d7680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

Paul E. McKenney

unread,
Jul 4, 2020, 12:45:24 PM7/4/20
to syzbot, big...@linutronix.de, linux-...@vger.kernel.org, mi...@kernel.org, pet...@infradead.org, syzkall...@googlegroups.com, tg...@linutronix.de
On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 9e50b94b Add linux-next specific files for 20200703
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
> kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
> dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
> compiler: gcc (GCC) 10.1.0-syz 20200507
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0f7192...@syzkaller.appspotmail.com

Good catch! A call to csd_lock_record() was on the wrong side of a
call to csd_unlock().

But is folded into another commit for bisectability reasons, so
"Reported-by" would not make sense. I have instead added this to the
commit log:

[ paulmck: Fix for syzbot+0f7192...@syzkaller.appspotmail.com ]
Link: https://lore.kernel.org/lkml/00000000000042...@google.com
Link: https://lore.kernel.org/lkml/0000000000002e...@google.com

Thanx, Paul

Dmitry Vyukov

unread,
Jul 4, 2020, 2:34:15 PM7/4/20
to Paul E. McKenney, syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner
On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <pau...@kernel.org> wrote:
>
> On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit: 9e50b94b Add linux-next specific files for 20200703
> > git tree: linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
> > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
> > compiler: gcc (GCC) 10.1.0-syz 20200507
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+0f7192...@syzkaller.appspotmail.com
>
> Good catch! A call to csd_lock_record() was on the wrong side of a
> call to csd_unlock().

Thanks for taking a look.

> But is folded into another commit for bisectability reasons, so
> "Reported-by" would not make sense. I have instead added this to the
> commit log:
>
> [ paulmck: Fix for syzbot+0f7192...@syzkaller.appspotmail.com ]
> Link: https://lore.kernel.org/lkml/00000000000042...@google.com
> Link: https://lore.kernel.org/lkml/0000000000002e...@google.com

This should work, as far as I remember sybot looks for the email+hash
anywhere in the commit.
FWIW Tested-by can make sense as well.
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200704164522.GO9247%40paulmck-ThinkPad-P72.

Dmitry Vyukov

unread,
Jul 7, 2020, 11:52:01 AM7/7/20
to Paul E. McKenney, syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner
Paul, there is also some spike of stalls in smp_call_function,
if you look at the top ones at:
https://syzkaller.appspot.com/upstream#open

Can these be caused by the same root cause?
I am not sure what trees the bug was/is present... This seems to only
happen on linux-next and nowhere else. But these stalls equally happen
on mainline...

Paul E. McKenney

unread,
Jul 7, 2020, 12:26:11 PM7/7/20
to Dmitry Vyukov, syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner
I would be surprised, given that the csd_unlock() was before the faulting
reference. But then again, I have been surprised before.

You aren't running scftorture with its longwait parameter set to a
non-zero value, are you? In that case, stalls are expected behavior.
This is to support test the CSD lock diagnostics in -rcu. Which isn't
in mainline yet, so maybe I am asking a stupid question.

If these are repeatable, one thing to try is to build the kernel with
CSD_LOCK_WAIT_DEBUG=y. This requires c6c67d89c059 ("smp: Add source and
destination CPUs to __call_single_data") and 216d15e0d870 ("kernel/smp:
Provide CSD lock timeout diagnostics") from the -rcu tree's "dev" branch.
This will dump out the smp_call_function() function that was to be
invoked, on the off-chance that the problem is something like lock
contention in that function.

Dmitry Vyukov

unread,
Jul 9, 2020, 6:13:57 AM7/9/20
to Paul E. McKenney, syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner
Yes, it seems unrelated.
It looks like something broken in the kernel recently and now instead
of diagnosing a stall on one CPU, it diagnoses it as a stall in
smp_call_function on another CPU. This produces large number of
assorted stall reports which are not too actionable...


> You aren't running scftorture with its longwait parameter set to a
> non-zero value, are you? In that case, stalls are expected behavior.
> This is to support test the CSD lock diagnostics in -rcu. Which isn't
> in mainline yet, so maybe I am asking a stupid question.

Since I don't know what is scftorture/longwait, I guess I am not running it :)

> If these are repeatable, one thing to try is to build the kernel with
> CSD_LOCK_WAIT_DEBUG=y. This requires c6c67d89c059 ("smp: Add source and
> destination CPUs to __call_single_data") and 216d15e0d870 ("kernel/smp:
> Provide CSD lock timeout diagnostics") from the -rcu tree's "dev" branch.
> This will dump out the smp_call_function() function that was to be
> invoked, on the off-chance that the problem is something like lock
> contention in that function.

Here are some with reproducers:
https://syzkaller.appspot.com/bug?id=8a1e95291152ce5afea43c103a1fd62a257fcf4b
https://syzkaller.appspot.com/bug?id=5e3ac329b6304aacc6304cfaab1a514bca12ce82
https://syzkaller.appspot.com/bug?id=a01b4478f89e19cee91531f7c2b7751f0caf8c0c
https://syzkaller.appspot.com/bug?id=e4caef9fc41d0c019c532a4257faec129699a42e

But the question is if this CSD_LOCK_WAIT_DEBUG=y is useful in
general? Should we enable it all the time?

Paul E. McKenney

unread,
Jul 9, 2020, 12:45:32 PM7/9/20
to Dmitry Vyukov, syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner
The CSD_LOCK_WAIT_DEBUG functionality is quite new, so it is quite
possible that it is causing rather than detecting problems. ;-)

But once it is stable, then yes, it might be quite generally useful.

Thanx, Paul
Reply all
Reply to author
Forward
0 new messages