Re: [syzbot] [kernel?] KASAN: stack-out-of-bounds Read in __show_regs (2)

6 views
Skip to first unread message

Tetsuo Handa

unread,
Jun 24, 2024, 9:23:02 PMJun 24
to syzbot, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, x...@kernel.org, kasan-dev, linux-mm
Hello.

This report is triggered by my debug printk() patch at
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/net/core/rtnetlink.c?id=5210cbe9a47fc5c1f43ba16d481e6335f3e2f345
but I can't find where the bug is (x86 bug or mm bug or kasan bug or my bug).

On 2024/06/15 16:06, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: a957267fa7e9 Add linux-next specific files for 20240611
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=171e6e56980000
> kernel config: https://syzkaller.appspot.com/x/.config?x=9a880e96898e79f8
> dashboard link: https://syzkaller.appspot.com/bug?extid=e9be5674af5e3a0b9ecc
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Quoting from https://syzkaller.appspot.com/text?tag=CrashReport&x=17786fb1980000
and https://syzkaller.appspot.com/text?tag=CrashLog&x=15e0202a980000 :

----------------------------------------
BUG: KASAN: stack-out-of-bounds in __show_regs+0xa6/0x610 arch/x86/kernel/process_64.c:83
Read of size 8 at addr ffffc90008807618 by task syz.0.1430/9588

CPU: 0 UID: 0 PID: 9588 Comm: syz.0.1430 Not tainted 6.10.0-rc5-next-20240624-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
__show_regs+0xa6/0x610 arch/x86/kernel/process_64.c:83
show_trace_log_lvl+0x3d4/0x520 arch/x86/kernel/dumpstack.c:301
sched_show_task+0x578/0x740 kernel/sched/core.c:7506
report_rtnl_holders+0x1ba/0x2d0 net/core/rtnetlink.c:104
call_timer_fn+0x18e/0x650 kernel/time/timer.c:1792
expire_timers kernel/time/timer.c:1843 [inline]
__run_timers kernel/time/timer.c:2417 [inline]
__run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2428
run_timer_base kernel/time/timer.c:2437 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2447
handle_softirqs+0x2c4/0x970 kernel/softirq.c:554
__do_softirq kernel/softirq.c:588 [inline]
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637
irq_exit_rcu+0x9/0x30 kernel/softirq.c:649
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0xd8/0x140 kernel/locking/spinlock.c:194
Code: 9c 8f 44 24 20 42 80 3c 23 00 74 08 4c 89 f7 e8 0e 94 61 f6 f6 44 24 21 02 75 52 41 f7 c7 00 02 00 00 74 01 fb bf 01 00 00 00 <e8> c3 10 ca f5 65 8b 05 b4 54 6b 74 85 c0 74 43 48 c7 04 24 0e 36
RSP: 0018:ffffc9000407f600 EFLAGS: 00000206
RAX: 13958dc9d919f000 RBX: 1ffff9200080fec4 RCX: ffffffff816fd2da
RDX: dffffc0000000000 RSI: ffffffff8bcac820 RDI: 0000000000000001
RBP: ffffc9000407f690 R08: ffffffff92fe47ef R09: 1ffffffff25fc8fd
R10: dffffc0000000000 R11: fffffbfff25fc8fe R12: dffffc0000000000
R13: 1ffff9200080fec0 R14: ffffc9000407f620 R15: 0000000000000246
spin_unlock_irqrestore include/linux/spinlock.h:406 [inline]
__wake_up_common_lock+0x18c/0x1e0 kernel/sched/wait.c:108
__unix_dgram_recvmsg+0x5f4/0x12f0 net/unix/af_unix.c:2415
sock_recvmsg_nosec+0x18e/0x1d0 net/socket.c:1046
____sys_recvmsg+0x3c0/0x470 net/socket.c:2814
___sys_recvmsg net/socket.c:2858 [inline]
do_recvmmsg+0x474/0xae0 net/socket.c:2952
__sys_recvmmsg net/socket.c:3031 [inline]
__do_sys_recvmmsg net/socket.c:3054 [inline]
__se_sys_recvmmsg net/socket.c:3047 [inline]
__x64_sys_recvmmsg+0x199/0x250 net/socket.c:3047
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fdfbaf75d39
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fdfbbc7e048 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00007fdfbb104070 RCX: 00007fdfbaf75d39
RDX: 0000000000010106 RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 00007fdfbaff6766 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007fdfbb104070 R15: 00007ffeafeb36a8
</TASK>

The buggy address belongs to the virtual mapping at
[ffffc90008800000, ffffc90008809000) created by:
copy_process+0x5d1/0x3d90 kernel/fork.c:2206

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x295f2
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000000 0000000000000000 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x2dc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_ZERO), pid 1052, tgid 1052 (kworker/u8:5), ts 20453244600, free_ts 0
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1500
prep_new_page mm/page_alloc.c:1508 [inline]
get_page_from_freelist+0x2ccb/0x2d80 mm/page_alloc.c:3487
__alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4745
alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2263
vm_area_alloc_pages mm/vmalloc.c:3576 [inline]
__vmalloc_area_node mm/vmalloc.c:3652 [inline]
__vmalloc_node_range_noprof+0x971/0x1460 mm/vmalloc.c:3833
alloc_thread_stack_node kernel/fork.c:313 [inline]
dup_task_struct+0x444/0x8c0 kernel/fork.c:1114
copy_process+0x5d1/0x3d90 kernel/fork.c:2206
kernel_clone+0x226/0x8f0 kernel/fork.c:2788
user_mode_thread+0x132/0x1a0 kernel/fork.c:2866
call_usermodehelper_exec_work+0x5c/0x230 kernel/umh.c:172
process_one_work kernel/workqueue.c:3224 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3305
worker_thread+0x86d/0xd40 kernel/workqueue.c:3383
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:144
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
page_owner free stack trace missing

Memory state around the buggy address:
ffffc90008807500: 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3 00 00 00 00
ffffc90008807580: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2
>ffffc90008807600: 00 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
^
ffffc90008807680: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 00 f3 f3 f3
ffffc90008807700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
----------------------------------------

----------------------------------------
[ 560.831831][ C0] DEBUG: holding rtnl_mutex for 937 jiffies.
[ 560.838015][ C0] task:kworker/u8:9 state:R running task stack:20216 pid:2460 tgid:2460 ppid:2 flags:0x00004000
[ 560.849882][ C0] Workqueue: netns cleanup_net
[ 560.854770][ C0] Call Trace:
[ 560.854789][ C0] <TASK>
[ 560.872376][ C0] __schedule+0x17e8/0x4a20
[ 560.877336][ C0] ? mark_lock+0x9a/0x360
[ 560.881823][ C0] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 560.887887][ C0] ? __virt_addr_valid+0x183/0x520
[ 560.893171][ C0] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 560.899593][ C0] ? lock_release+0xbf/0x9f0
[ 560.904330][ C0] ? __pfx___schedule+0x10/0x10
[ 560.909271][ C0] ? lockdep_hardirqs_on+0x99/0x150
[ 560.914617][ C0] ? mark_lock+0x9a/0x360
[ 560.919119][ C0] preempt_schedule_irq+0xfb/0x1c0
[ 560.924392][ C0] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 560.931783][ C0] irqentry_exit+0x5e/0x90
[ 560.936590][ C0] asm_sysvec_reschedule_ipi+0x1a/0x20
[ 560.942783][ C0] RIP: 0010:synchronize_rcu+0x0/0x360
[ 560.948403][ C0] Code: e1 07 80 c1 03 38 c1 0f 8c 97 fe ff ff 4c 89 f7 e8 15 50 80 00 e9 8a fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 e4 e0 48
[ 560.968242][ C0] RSP: 76c0:0000000000000a06 EFLAGS: 1ffff92001100ed4
[ 560.975129][ C0] ==================================================================
[ 560.994479][ C0] BUG: KASAN: stack-out-of-bounds in __show_regs+0xa6/0x610
[ 561.002642][ C0] Read of size 8 at addr ffffc90008807618 by task syz.0.1430/9588
[ 561.014598][ C0]
[ 561.017321][ C0] CPU: 0 UID: 0 PID: 9588 Comm: syz.0.1430 Not tainted 6.10.0-rc5-next-20240624-syzkaller #0
[ 561.028952][ C0] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
[ 561.043847][ C0] Call Trace:
[ 561.047213][ C0] <IRQ>
[ 561.050101][ C0] dump_stack_lvl+0x241/0x360
[ 561.054963][ C0] ? __pfx_dump_stack_lvl+0x10/0x10
[ 561.073046][ C0] ? __pfx__printk+0x10/0x10
[ 561.077786][ C0] ? _printk+0xd5/0x120
[ 561.082123][ C0] print_report+0x169/0x550
[ 561.086775][ C0] ? __virt_addr_valid+0xbd/0x520
[ 561.091947][ C0] ? __show_regs+0xa6/0x610
[ 561.096544][ C0] kasan_report+0x143/0x180
[ 561.101170][ C0] ? show_opcodes+0x148/0x170
[ 561.105909][ C0] ? __show_regs+0xa6/0x610
[ 561.110457][ C0] __show_regs+0xa6/0x610
[ 561.114858][ C0] ? asm_sysvec_reschedule_ipi+0x1a/0x20
[ 561.120539][ C0] ? asm_sysvec_reschedule_ipi+0x1a/0x20
[ 561.126227][ C0] show_trace_log_lvl+0x3d4/0x520
[ 561.131292][ C0] ? __pfx_synchronize_rcu+0x10/0x10
[ 561.136630][ C0] sched_show_task+0x578/0x740
[ 561.141466][ C0] ? report_rtnl_holders+0x183/0x2d0
[ 561.147055][ C0] ? __pfx__printk+0x10/0x10
[ 561.151699][ C0] ? __pfx_sched_show_task+0x10/0x10
[ 561.157153][ C0] report_rtnl_holders+0x1ba/0x2d0
[ 561.162519][ C0] ? report_rtnl_holders+0x20/0x2d0
[ 561.167755][ C0] call_timer_fn+0x18e/0x650
[ 561.172361][ C0] ? call_timer_fn+0xc0/0x650
[ 561.177086][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 561.182785][ C0] ? __pfx_call_timer_fn+0x10/0x10
[ 561.187939][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 561.193631][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 561.199303][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 561.204994][ C0] ? _raw_spin_unlock_irq+0x23/0x50
[ 561.210231][ C0] ? lockdep_hardirqs_on+0x99/0x150
[ 561.215469][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 561.221120][ C0] __run_timer_base+0x66a/0x8e0
[ 561.226093][ C0] ? __pfx___run_timer_base+0x10/0x10
[ 561.231493][ C0] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 561.237874][ C0] run_timer_softirq+0xb7/0x170
[ 561.242832][ C0] handle_softirqs+0x2c4/0x970
[ 561.247626][ C0] ? __irq_exit_rcu+0xf4/0x1c0
[ 561.252429][ C0] ? __pfx_handle_softirqs+0x10/0x10
[ 561.257856][ C0] ? irqtime_account_irq+0xd4/0x1e0
[ 561.263090][ C0] __irq_exit_rcu+0xf4/0x1c0
[ 561.267711][ C0] ? __pfx___irq_exit_rcu+0x10/0x10
[ 561.272931][ C0] irq_exit_rcu+0x9/0x30
[ 561.277231][ C0] sysvec_apic_timer_interrupt+0xa6/0xc0
[ 561.283185][ C0] </IRQ>
[ 561.286769][ C0] <TASK>
[ 561.289972][ C0] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 561.297172][ C0] RIP: 0010:_raw_spin_unlock_irqrestore+0xd8/0x140
[ 561.307112][ C0] Code: 9c 8f 44 24 20 42 80 3c 23 00 74 08 4c 89 f7 e8 0e 94 61 f6 f6 44 24 21 02 75 52 41 f7 c7 00 02 00 00 74 01 fb bf 01 00 00 00 <e8> c3 10 ca f5 65 8b 05 b4 54 6b 74 85 c0 74 43 48 c7 04 24 0e 36
[ 561.327228][ C0] RSP: 0018:ffffc9000407f600 EFLAGS: 00000206
[ 561.333355][ C0] RAX: 13958dc9d919f000 RBX: 1ffff9200080fec4 RCX: ffffffff816fd2da
[ 561.341352][ C0] RDX: dffffc0000000000 RSI: ffffffff8bcac820 RDI: 0000000000000001
[ 561.349458][ C0] RBP: ffffc9000407f690 R08: ffffffff92fe47ef R09: 1ffffffff25fc8fd
[ 561.357460][ C0] R10: dffffc0000000000 R11: fffffbfff25fc8fe R12: dffffc0000000000
[ 561.365478][ C0] R13: 1ffff9200080fec0 R14: ffffc9000407f620 R15: 0000000000000246
[ 561.373533][ C0] ? mark_lock+0x9a/0x360
[ 561.378221][ C0] ? __pfx__raw_spin_unlock_irqrestore+0x10/0x10
[ 561.385142][ C0] ? autoremove_wake_function+0x37/0x110
[ 561.391145][ C0] __wake_up_common_lock+0x18c/0x1e0
[ 561.396936][ C0] __unix_dgram_recvmsg+0x5f4/0x12f0
[ 561.403018][ C0] ? __pfx___unix_dgram_recvmsg+0x10/0x10
[ 561.409788][ C0] ? __pfx___might_resched+0x10/0x10
[ 561.415745][ C0] ? iovec_from_user+0x61/0x240
[ 561.421927][ C0] ? unix_dgram_recvmsg+0xb6/0xe0
[ 561.427965][ C0] ? __pfx_unix_dgram_recvmsg+0x10/0x10
[ 561.435584][ C0] sock_recvmsg_nosec+0x18e/0x1d0
[ 561.441322][ C0] ____sys_recvmsg+0x3c0/0x470
[ 561.446583][ C0] ? __pfx_____sys_recvmsg+0x10/0x10
[ 561.455788][ C0] ? __might_fault+0xaa/0x120
[ 561.460634][ C0] do_recvmmsg+0x474/0xae0
[ 561.465088][ C0] ? __pfx___futex_wait+0x10/0x10
[ 561.470148][ C0] ? __pfx_do_recvmmsg+0x10/0x10
[ 561.475130][ C0] ? __pfx_futex_wake_mark+0x10/0x10
[ 561.480509][ C0] ? futex_wait+0x285/0x360
[ 561.485124][ C0] ? __pfx_futex_wait+0x10/0x10
[ 561.490014][ C0] ? fd_install+0x9c/0x5d0
[ 561.494459][ C0] ? __pfx_lock_release+0x10/0x10
[ 561.499504][ C0] ? __pfx_do_futex+0x10/0x10
[ 561.504229][ C0] __x64_sys_recvmmsg+0x199/0x250
[ 561.510481][ C0] ? __pfx___x64_sys_recvmmsg+0x10/0x10
[ 561.517399][ C0] ? do_syscall_64+0x100/0x230
[ 561.522660][ C0] ? do_syscall_64+0xb6/0x230
[ 561.529823][ C0] do_syscall_64+0xf3/0x230
[ 561.534742][ C0] ? clear_bhb_loop+0x35/0x90
[ 561.540096][ C0] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 561.546133][ C0] RIP: 0033:0x7fdfbaf75d39
[ 561.550744][ C0] Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
[ 561.571064][ C0] RSP: 002b:00007fdfbbc7e048 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
[ 561.580376][ C0] RAX: ffffffffffffffda RBX: 00007fdfbb104070 RCX: 00007fdfbaf75d39
[ 561.588397][ C0] RDX: 0000000000010106 RSI: 00000000200000c0 RDI: 0000000000000003
[ 561.596400][ C0] RBP: 00007fdfbaff6766 R08: 0000000000000000 R09: 0000000000000000
[ 561.604404][ C0] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
[ 561.612415][ C0] R13: 000000000000006e R14: 00007fdfbb104070 R15: 00007ffeafeb36a8
[ 561.620458][ C0] </TASK>
[ 561.623517][ C0]
[ 561.625876][ C0] The buggy address belongs to the virtual mapping at
[ 561.625876][ C0] [ffffc90008800000, ffffc90008809000) created by:
[ 561.625876][ C0] copy_process+0x5d1/0x3d90
[ 561.643549][ C0]
[ 561.645879][ C0] The buggy address belongs to the physical page:
[ 561.652306][ C0] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x295f2
[ 561.661135][ C0] flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
[ 561.668346][ C0] raw: 00fff00000000000 0000000000000000 dead000000000122 0000000000000000
[ 561.677050][ C0] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 561.685659][ C0] page dumped because: kasan: bad access detected
[ 561.692219][ C0] page_owner tracks the page as allocated
[ 561.697979][ C0] page last allocated via order 0, migratetype Unmovable, gfp_mask 0x2dc2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_ZERO), pid 1052, tgid 1052 (kworker/u8:5), ts 20453244600, free_ts 0
[ 561.716523][ C0] post_alloc_hook+0x1f3/0x230
[ 561.721344][ C0] get_page_from_freelist+0x2ccb/0x2d80
[ 561.727009][ C0] __alloc_pages_noprof+0x256/0x6c0
[ 561.732233][ C0] alloc_pages_mpol_noprof+0x3e8/0x680
[ 561.737727][ C0] __vmalloc_node_range_noprof+0x971/0x1460
[ 561.743664][ C0] dup_task_struct+0x444/0x8c0
[ 561.748479][ C0] copy_process+0x5d1/0x3d90
[ 561.753128][ C0] kernel_clone+0x226/0x8f0
[ 561.757766][ C0] user_mode_thread+0x132/0x1a0
[ 561.762660][ C0] call_usermodehelper_exec_work+0x5c/0x230
[ 561.768674][ C0] process_scheduled_works+0xa2c/0x1830
[ 561.774240][ C0] worker_thread+0x86d/0xd40
[ 561.778849][ C0] kthread+0x2f0/0x390
[ 561.782979][ C0] ret_from_fork+0x4b/0x80
[ 561.787453][ C0] ret_from_fork_asm+0x1a/0x30
[ 561.792332][ C0] page_owner free stack trace missing
[ 561.797698][ C0]
[ 561.800029][ C0] Memory state around the buggy address:
[ 561.805664][ C0] ffffc90008807500: 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3 00 00 00 00
[ 561.813728][ C0] ffffc90008807580: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2
[ 561.821814][ C0] >ffffc90008807600: 00 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[ 561.829912][ C0] ^
[ 561.834781][ C0] ffffc90008807680: 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2 00 f3 f3 f3
[ 561.842858][ C0] ffffc90008807700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 561.851024][ C0] ==================================================================
[ 561.859206][ C0] Kernel panic - not syncing: KASAN: panic_on_warn set ...
[ 561.866452][ C0] CPU: 0 UID: 0 PID: 9588 Comm: syz.0.1430 Not tainted 6.10.0-rc5-next-20240624-syzkaller #0
----------------------------------------

arch/x86/kernel/process_64.c:83 is

printk("%sRAX: %016lx RBX: %016lx RCX: %016lx\n",
log_lvl, regs->ax, regs->bx, regs->cx);

(which looks nothing special), and kernel stack area [ffffc90008800000, ffffc90008809000) is
32768 bytes + 4096 bytes (which looks sane to me), and ffffc90008807618 is within the 32768
bytes (which looks sane to me).

Kernel config is https://syzkaller.appspot.com/text?tag=KernelConfig&x=6221d1071c39b052 .
Can somebody find what is wrong?

Tetsuo Handa

unread,
Jul 1, 2024, 8:43:48 AMJul 1
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com, kasan-dev, linux-mm, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, mi...@redhat.com, tg...@linutronix.de, x...@kernel.org
Hello, KASAN people.

I suspect that KASAN's metadata for kernel stack memory got out of sync for
unknown reason, for the stack trace of PID=7558 was successfully printed for
two times before KASAN complains upon trying to print for the the third time.
Would you decode what is this KASAN message saying?

Quoting from https://syzkaller.appspot.com/text?tag=CrashLog&x=119fd081980000 :

[ 226.304791][ C0] DEBUG: holding rtnl_mutex for 555 jiffies.
[ 226.310883][ C0] task:syz.1.750 state:R running task stack:24672 pid:7558 tgid:7558 ppid:5945 flags:0x00004006
[ 226.322705][ C0] Call Trace:
[ 226.326074][ C0] <TASK>
[ 226.329029][ C0] __schedule+0x17e8/0x4a20
[ 226.333672][ C0] ? __pfx___schedule+0x10/0x10
[ 226.338599][ C0] ? __pfx_lock_release+0x10/0x10
[ 226.343744][ C0] ? _raw_spin_unlock_irqrestore+0xdd/0x140
[ 226.349775][ C0] ? schedule+0x90/0x320
[ 226.354042][ C0] schedule+0x14b/0x320
[ 226.358254][ C0] synchronize_rcu_expedited+0x684/0x830
[ 226.364012][ C0] ? __pfx_synchronize_rcu_expedited+0x10/0x10
[ 226.370239][ C0] ? __pfx_wait_rcu_exp_gp+0x10/0x10
[ 226.375579][ C0] ? __pfx___might_resched+0x10/0x10
[ 226.380877][ C0] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 226.386897][ C0] ? __pfx_autoremove_wake_function+0x10/0x10
[ 226.392985][ C0] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 226.399381][ C0] synchronize_rcu+0x11b/0x360
[ 226.404181][ C0] ? __pfx_synchronize_rcu+0x10/0x10
[ 226.409555][ C0] lockdep_unregister_key+0x556/0x610
[ 226.414991][ C0] ? __pfx_lockdep_unregister_key+0x10/0x10
[ 226.420908][ C0] ? lockdep_hardirqs_on+0x99/0x150
[ 226.426160][ C0] ? __qdisc_destroy+0x150/0x410
[ 226.431211][ C0] ? kfree+0x149/0x360
[ 226.435414][ C0] ? __pfx_pfifo_fast_destroy+0x10/0x10
[ 226.441013][ C0] __qdisc_destroy+0x165/0x410
[ 226.445888][ C0] dev_shutdown+0x9b/0x440
[ 226.450334][ C0] unregister_netdevice_many_notify+0x9c7/0x1d20
[ 226.456717][ C0] ? __lock_acquire+0x1359/0x2000
[ 226.461786][ C0] ? __pfx_unregister_netdevice_many_notify+0x10/0x10
[ 226.468602][ C0] ? __asan_memset+0x23/0x50
[ 226.473208][ C0] ? skb_queue_purge_reason+0x2de/0x500
[ 226.478815][ C0] ? __asan_memset+0x23/0x50
[ 226.483423][ C0] ? skb_queue_purge_reason+0x2de/0x500
[ 226.489025][ C0] ? do_raw_spin_unlock+0x13c/0x8b0
[ 226.494244][ C0] unregister_netdevice_queue+0x303/0x370
[ 226.500013][ C0] ? __pfx_unregister_netdevice_queue+0x10/0x10
[ 226.506317][ C0] __tun_detach+0x6b6/0x1600
[ 226.510942][ C0] tun_chr_close+0x108/0x1b0
[ 226.515574][ C0] ? __pfx_tun_chr_close+0x10/0x10
[ 226.520705][ C0] __fput+0x24a/0x8a0
[ 226.524742][ C0] task_work_run+0x24f/0x310
[ 226.529352][ C0] ? __pfx_task_work_run+0x10/0x10
[ 226.534482][ C0] ? syscall_exit_to_user_mode+0xa3/0x370
[ 226.540265][ C0] syscall_exit_to_user_mode+0x168/0x370
[ 226.545967][ C0] do_syscall_64+0x100/0x230
[ 226.550580][ C0] ? clear_bhb_loop+0x35/0x90
[ 226.555306][ C0] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 226.561215][ C0] RIP: 0033:0x7f2470375b99
[ 226.565668][ C0] RSP: 002b:00007ffd71072888 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
[ 226.574109][ C0] RAX: 0000000000000000 RBX: 00007f2470505aa0 RCX: 00007f2470375b99
[ 226.582119][ C0] RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
[ 226.590129][ C0] RBP: 00007f2470505aa0 R08: 0000000000000008 R09: 0000000871072b9f
[ 226.598153][ C0] R10: 00000000005f3974 R11: 0000000000000246 R12: 000000000003603f
[ 226.606256][ C0] R13: 0000000000000032 R14: 00007f2470505aa0 R15: 00007f2470504078
[ 226.614265][ C0] </TASK>
[ 226.617357][ C0]
[ 226.617357][ C0] Showing all locks held in the system:
[ 226.625107][ C0] 3 locks held by kworker/u8:0/11:
[ 226.630259][ C0] 2 locks held by getty/4850:
[ 226.634966][ C0] #0: ffff88802a6860a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70
[ 226.644790][ C0] #1: ffffc9000312b2f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6b5/0x1e10
[ 226.654971][ C0] 2 locks held by syz.4.185/5862:
[ 226.660012][ C0] 2 locks held by syz.3.241/6079:
[ 226.665073][ C0] 1 lock held by syz-executor/7179:
[ 226.670288][ C0] 2 locks held by syz.1.750/7558:
[ 226.675345][ C0] #0: ffffffff8f5feac8 (rtnl_mutex){+.+.}-{3:3}, at: tun_chr_close+0x3e/0x1b0
[ 226.684379][ C0] #1: ffffffff8e33abf8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x381/0x830
[ 226.695348][ C0] 2 locks held by syz.0.757/7594:
[ 226.700384][ C0] #0: ffffffff8f6682f0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
[ 226.708731][ C0] #1: ffffffff8f5feac8 (rtnl_mutex){+.+.}-{3:3}, at: nl80211_pre_doit+0x5f/0x8b0
[ 226.718045][ C0] 1 lock held by syz.3.764/7609:
[ 226.723082][ C0] 2 locks held by syz.2.767/7615:
[ 226.728152][ C0] #0: ffffc90000007c00 (net/core/rtnetlink.c:83){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x650
[ 226.738338][ C0] #1: ffffffff8e335820 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0
[ 226.748268][ C0] 3 locks held by syz-executor/7617:
[ 226.753568][ C0]
[ 226.755932][ C0] =============================================
[ 226.755932][ C0]
[ 227.343278][ T5144] usb 3-1: new high-speed USB device number 9 using dummy_hcd
[ 227.536711][ T5144] usb 3-1: config 0 interface 0 altsetting 0 endpoint 0x81 has an invalid bInterval 0, changing to 7
[ 227.568067][ T5144] usb 3-1: config 0 interface 0 altsetting 0 endpoint 0x81 has invalid wMaxPacketSize 0
[ 227.598514][ T5144] usb 3-1: New USB device found, idVendor=05ac, idProduct=022a, bcdDevice= 0.00
[ 227.621562][ T5144] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 227.651457][ T5144] usb 3-1: config 0 descriptor??
[ 227.814998][ C0] DEBUG: holding rtnl_mutex for 706 jiffies.
[ 227.821075][ C0] task:syz.1.750 state:D stack:24672 pid:7558 tgid:7558 ppid:5945 flags:0x00004006
[ 227.831413][ C0] Call Trace:
[ 227.834732][ C0] <TASK>
[ 227.837686][ C0] __schedule+0x17e8/0x4a20
[ 227.842246][ C0] ? __pfx___schedule+0x10/0x10
[ 227.847165][ C0] ? __pfx_lock_release+0x10/0x10
[ 227.852388][ C0] ? _raw_spin_unlock_irqrestore+0xdd/0x140
[ 227.858350][ C0] ? schedule+0x90/0x320
[ 227.862622][ C0] schedule+0x14b/0x320
[ 227.866849][ C0] synchronize_rcu_expedited+0x684/0x830
[ 227.872517][ C0] ? __pfx_synchronize_rcu_expedited+0x10/0x10
[ 227.878773][ C0] ? __pfx_wait_rcu_exp_gp+0x10/0x10
[ 227.884098][ C0] ? __pfx___might_resched+0x10/0x10
[ 227.889440][ C0] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 227.895470][ C0] ? __pfx_autoremove_wake_function+0x10/0x10
[ 227.901561][ C0] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 227.907960][ C0] synchronize_rcu+0x11b/0x360
[ 227.912758][ C0] ? __pfx_synchronize_rcu+0x10/0x10
[ 227.918117][ C0] lockdep_unregister_key+0x556/0x610
[ 227.923521][ C0] ? __pfx_lockdep_unregister_key+0x10/0x10
[ 227.929465][ C0] ? lockdep_hardirqs_on+0x99/0x150
[ 227.934718][ C0] ? __qdisc_destroy+0x150/0x410
[ 227.939964][ C0] ? kfree+0x149/0x360
[ 227.944062][ C0] ? __pfx_pfifo_fast_destroy+0x10/0x10
[ 227.949668][ C0] __qdisc_destroy+0x165/0x410
[ 227.954466][ C0] dev_shutdown+0x9b/0x440
[ 227.958940][ C0] unregister_netdevice_many_notify+0x9c7/0x1d20
[ 227.965321][ C0] ? __lock_acquire+0x1359/0x2000
[ 227.970390][ C0] ? __pfx_unregister_netdevice_many_notify+0x10/0x10
[ 227.977386][ C0] ? __asan_memset+0x23/0x50
[ 227.982011][ C0] ? skb_queue_purge_reason+0x2de/0x500
[ 227.987621][ C0] ? __asan_memset+0x23/0x50
[ 227.992237][ C0] ? skb_queue_purge_reason+0x2de/0x500
[ 227.997854][ C0] ? do_raw_spin_unlock+0x13c/0x8b0
[ 228.003082][ C0] unregister_netdevice_queue+0x303/0x370
[ 228.008865][ C0] ? __pfx_unregister_netdevice_queue+0x10/0x10
[ 228.015177][ C0] __tun_detach+0x6b6/0x1600
[ 228.019812][ C0] tun_chr_close+0x108/0x1b0
[ 228.024437][ C0] ? __pfx_tun_chr_close+0x10/0x10
[ 228.029623][ C0] __fput+0x24a/0x8a0
[ 228.033651][ C0] task_work_run+0x24f/0x310
[ 228.038319][ C0] ? __pfx_task_work_run+0x10/0x10
[ 228.043546][ C0] ? syscall_exit_to_user_mode+0xa3/0x370
[ 228.049341][ C0] syscall_exit_to_user_mode+0x168/0x370
[ 228.055042][ C0] do_syscall_64+0x100/0x230
[ 228.059667][ C0] ? clear_bhb_loop+0x35/0x90
[ 228.064374][ C0] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 228.070336][ C0] RIP: 0033:0x7f2470375b99
[ 228.074815][ C0] RSP: 002b:00007ffd71072888 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
[ 228.083252][ C0] RAX: 0000000000000000 RBX: 00007f2470505aa0 RCX: 00007f2470375b99
[ 228.091289][ C0] RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
[ 228.091369][ T5144] appletouch 3-1:0.0: Geyser mode initialized.
[ 228.099291][ C0] RBP: 00007f2470505aa0 R08: 0000000000000008 R09: 0000000871072b9f
[ 228.099348][ C0] R10: 00000000005f3974 R11: 0000000000000246 R12: 000000000003603f
[ 228.099364][ C0] R13: 0000000000000032 R14: 00007f2470505aa0 R15: 00007f2470504078
[ 228.099396][ C0] </TASK>
[ 228.099408][ C0]
[ 228.099408][ C0] Showing all locks held in the system:
[ 228.099452][ C0] 5 locks held by kworker/u8:7/2895:
[ 228.136372][ T5144] input: appletouch as /devices/platform/dummy_hcd.2/usb3/3-1/3-1:0.0/input/input11
[ 228.140345][ C0] 2 locks held by getty/4850:
[ 228.140365][ C0] #0: ffff88802a6860a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70
[ 228.140438][ C0] #1: ffffc9000312b2f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6b5/0x1e10
[ 228.140505][ C0] 6 locks held by kworker/1:5/5144:
[ 228.140523][ C0] 2 locks held by syz.4.185/5862:
[ 228.140539][ C0] 4 locks held by syz.3.241/6079:
[ 228.140557][ C0] 2 locks held by syz.1.750/7558:
[ 228.140571][ C0] #0: ffffffff8f5feac8 (rtnl_mutex){+.+.}-{3:3}, at: tun_chr_close+0x3e/0x1b0
[ 228.140639][ C0] #1: ffffffff8e33abf8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x381/0x830
[ 228.140716][ C0] 2 locks held by syz.0.757/7594:
[ 228.140730][ C0] #0: ffffffff8f6682f0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40
[ 228.140791][ C0] #1: ffffffff8f5feac8 (rtnl_mutex){+.+.}-{3:3}, at: nl80211_pre_doit+0x5f/0x8b0
[ 228.140857][ C0] 2 locks held by syz.3.775/7633:
[ 228.140871][ C0] #0: ffffc9000499fb40 (net/core/rtnetlink.c:83){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x650
[ 228.140931][ C0] #1: ffffffff8e335820 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0
[ 228.140997][ C0]
[ 228.141004][ C0] =============================================
[ 228.141004][ C0]
[ 229.184903][ C0] DEBUG: holding rtnl_mutex for 843 jiffies.
[ 229.190994][ C0] task:syz.1.750 state:R running task stack:24672 pid:7558 tgid:7558 ppid:5945 flags:0x00004006
[ 229.202817][ C0] Call Trace:
[ 229.206142][ C0] <TASK>
[ 229.209088][ C0] __schedule+0x17e8/0x4a20
[ 229.213626][ C0] ? mark_lock+0x9a/0x360
[ 229.218009][ C0] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 229.224013][ C0] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 229.230388][ C0] ? __pfx___schedule+0x10/0x10
[ 229.235286][ C0] ? lockdep_hardirqs_on+0x99/0x150
[ 229.240775][ C0] ? mark_lock+0x9a/0x360
[ 229.245156][ C0] preempt_schedule_irq+0xfb/0x1c0
[ 229.250289][ C0] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 229.256053][ C0] ? __pfx_pfifo_fast_destroy+0x10/0x10
[ 229.261637][ C0] irqentry_exit+0x5e/0x90
[ 229.266098][ C0] asm_sysvec_reschedule_ipi+0x1a/0x20
[ 229.271585][ C0] RIP: 0010:lockdep_unregister_key+0x56d/0x610
[ 229.277796][ C0] Code: ff 92 48 c7 c6 d0 b7 6f 81 e8 7f 04 0a 00 e8 ea 18 0a 00 e9 e5 fb ff ff e8 c0 06 21 0a 41 f7 c7 00 02 00 00 74 d0 fb 45 84 f6 <75> cf eb e0 90 0f 0b 90 45 31 f6 e9 62 ff ff ff 90 0f 0b 90 e9 a1
[ 229.297447][ C0] RSP: 0018:ffffc90003c4f820 EFLAGS: 00000246
[ 229.303630][ C0] RAX: dffffc0000000000 RBX: 1ffff92000789f0c RCX: ffffffff947f3803
[ 229.311661][ C0] RDX: 0000000000000001 RSI: ffffffff8bcad5e0 RDI: 1ffff92000789f0c
[ 229.319713][ C0] ==================================================================
[ 229.327779][ C0] BUG: KASAN: stack-out-of-bounds in __show_regs+0x172/0x610
[ 229.335174][ C0] Read of size 8 at addr ffffc90003c4f798 by task kworker/u8:5/234
[ 229.343506][ C0]
[ 229.345910][ C0] CPU: 0 UID: 0 PID: 234 Comm: kworker/u8:5 Not tainted 6.10.0-rc6-next-20240701-syzkaller #0
[ 229.356229][ C0] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
[ 229.366301][ C0] Workqueue: bat_events batadv_nc_worker
[ 229.371979][ C0] Call Trace:
[ 229.375266][ C0] <IRQ>
[ 229.378119][ C0] dump_stack_lvl+0x241/0x360
[ 229.382825][ C0] ? __pfx_dump_stack_lvl+0x10/0x10
[ 229.388052][ C0] ? __pfx__printk+0x10/0x10
[ 229.392661][ C0] ? _printk+0xd5/0x120
[ 229.396840][ C0] print_report+0x169/0x550
[ 229.401361][ C0] ? __virt_addr_valid+0xbd/0x530
[ 229.406407][ C0] ? __show_regs+0x172/0x610
[ 229.411022][ C0] kasan_report+0x143/0x180
[ 229.415549][ C0] ? show_opcodes+0x148/0x170
[ 229.420251][ C0] ? __show_regs+0x172/0x610
[ 229.424865][ C0] __show_regs+0x172/0x610
[ 229.429297][ C0] ? asm_sysvec_reschedule_ipi+0x1a/0x20
[ 229.434949][ C0] ? wq_watchdog_touch+0xef/0x180
[ 229.439991][ C0] ? asm_sysvec_reschedule_ipi+0x1a/0x20
[ 229.445645][ C0] show_trace_log_lvl+0x3d4/0x520
[ 229.450700][ C0] ? lockdep_unregister_key+0x56d/0x610
[ 229.456267][ C0] sched_show_task+0x506/0x6d0
[ 229.461051][ C0] ? report_rtnl_holders+0x29e/0x3f0
[ 229.466361][ C0] ? __pfx__printk+0x10/0x10
[ 229.470970][ C0] ? __pfx_sched_show_task+0x10/0x10
[ 229.476270][ C0] ? _raw_spin_unlock_irqrestore+0xdd/0x140
[ 229.482180][ C0] ? __pfx__raw_spin_unlock_irqrestore+0x10/0x10
[ 229.488536][ C0] report_rtnl_holders+0x320/0x3f0
[ 229.493671][ C0] call_timer_fn+0x18e/0x650
[ 229.498279][ C0] ? call_timer_fn+0xc0/0x650
[ 229.502974][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 229.508624][ C0] ? __pfx_call_timer_fn+0x10/0x10
[ 229.513749][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 229.519403][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 229.525056][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 229.530709][ C0] ? _raw_spin_unlock_irq+0x23/0x50
[ 229.535920][ C0] ? lockdep_hardirqs_on+0x99/0x150
[ 229.541145][ C0] ? __pfx_report_rtnl_holders+0x10/0x10
[ 229.546796][ C0] __run_timer_base+0x66a/0x8e0
[ 229.551946][ C0] ? __pfx___run_timer_base+0x10/0x10
[ 229.557433][ C0] run_timer_softirq+0xb7/0x170
[ 229.562312][ C0] handle_softirqs+0x2c4/0x970
[ 229.567095][ C0] ? __irq_exit_rcu+0xf4/0x1c0
[ 229.571877][ C0] ? __pfx_handle_softirqs+0x10/0x10
[ 229.577184][ C0] ? irqtime_account_irq+0xd4/0x1e0
[ 229.582405][ C0] __irq_exit_rcu+0xf4/0x1c0
[ 229.587017][ C0] ? __pfx___irq_exit_rcu+0x10/0x10
[ 229.592330][ C0] irq_exit_rcu+0x9/0x30
[ 229.596591][ C0] sysvec_apic_timer_interrupt+0xa6/0xc0
[ 229.602250][ C0] </IRQ>
[ 229.605190][ C0] <TASK>
[ 229.608128][ C0] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 229.614125][ C0] RIP: 0010:rcu_is_watching+0x3a/0xb0
[ 229.619525][ C0] Code: e8 4b 3b 17 0a 89 c3 83 f8 08 73 7a 49 bf 00 00 00 00 00 fc ff df 4c 8d 34 dd 40 6a de 8d 4c 89 f0 48 c1 e8 03 42 80 3c 38 00 <74> 08 4c 89 f7 e8 8c ac 80 00 48 c7 c3 98 7c 03 00 49 03 1e 48 89
[ 229.639321][ C0] RSP: 0018:ffffc90002fe7a00 EFLAGS: 00000246
[ 229.645414][ C0] RAX: 1ffffffff1bbcd48 RBX: 0000000000000000 RCX: ffffffff816f8930
[ 229.653403][ C0] RDX: 0000000000000000 RSI: ffffffff8c207ae0 RDI: ffffffff8c207aa0
[ 229.661389][ C0] RBP: ffffc90002fe7b48 R08: ffffffff8faf796f R09: 1ffffffff1f5ef2d
[ 229.669374][ C0] R10: dffffc0000000000 R11: fffffbfff1f5ef2e R12: 1ffff920005fcf50
[ 229.677448][ C0] R13: ffffffff8b5b3d7b R14: ffffffff8dde6a40 R15: dffffc0000000000
[ 229.685523][ C0] ? batadv_nc_worker+0xcb/0x610
[ 229.690482][ C0] ? lock_release+0xb0/0x9f0
[ 229.695095][ C0] lock_release+0xbf/0x9f0
[ 229.699531][ C0] ? __pfx_lock_acquire+0x10/0x10
[ 229.704573][ C0] ? batadv_nc_worker+0xcb/0x610
[ 229.709525][ C0] ? __pfx_lock_release+0x10/0x10
[ 229.714577][ C0] ? batadv_nc_worker+0xcb/0x610
[ 229.719556][ C0] batadv_nc_worker+0x28b/0x610
[ 229.724427][ C0] ? batadv_nc_worker+0xcb/0x610
[ 229.729383][ C0] ? process_scheduled_works+0x945/0x1830
[ 229.735214][ C0] process_scheduled_works+0xa2c/0x1830
[ 229.741055][ C0] ? __pfx_process_scheduled_works+0x10/0x10
[ 229.747056][ C0] ? assign_work+0x364/0x3d0
[ 229.751662][ C0] worker_thread+0x86d/0xd40
[ 229.756278][ C0] ? _raw_spin_unlock_irqrestore+0xdd/0x140
[ 229.762193][ C0] ? __kthread_parkme+0x169/0x1d0
[ 229.767245][ C0] ? __pfx_worker_thread+0x10/0x10
[ 229.772373][ C0] kthread+0x2f0/0x390
[ 229.776460][ C0] ? __pfx_worker_thread+0x10/0x10
[ 229.781595][ C0] ? __pfx_kthread+0x10/0x10
[ 229.786210][ C0] ret_from_fork+0x4b/0x80
[ 229.790645][ C0] ? __pfx_kthread+0x10/0x10
[ 229.795257][ C0] ret_from_fork_asm+0x1a/0x30
[ 229.800050][ C0] </TASK>
[ 229.803075][ C0]
[ 229.805402][ C0] The buggy address belongs to the virtual mapping at
[ 229.805402][ C0] [ffffc90003c48000, ffffc90003c51000) created by:
[ 229.805402][ C0] copy_process+0x5d1/0x3d70
[ 229.823040][ C0]
[ 229.825368][ C0] The buggy address belongs to the physical page:
[ 229.831782][ C0] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x8 pfn:0x60439
[ 229.840815][ C0] memcg:ffff88801ae90502
[ 229.845059][ C0] flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
[ 229.852190][ C0] raw: 00fff00000000000 0000000000000000 dead000000000122 0000000000000000
[ 229.860795][ C0] raw: 0000000000000008 0000000000000000 00000001ffffffff ffff88801ae90502
[ 229.869388][ C0] page dumped because: kasan: bad access detected
[ 229.875810][ C0] page_owner tracks the page as allocated
[ 229.881541][ C0] page last allocated via order 0, migratetype Unmovable, gfp_mask 0x102dc2(GFP_HIGHUSER|__GFP_NOWARN|__GFP_ZERO), pid 5945, tgid 5945 (syz-executor), ts 214193789047, free_ts 213702691290
[ 229.900048][ C0] post_alloc_hook+0x1f3/0x230
[ 229.904836][ C0] get_page_from_freelist+0x2ccb/0x2d80
[ 229.910401][ C0] __alloc_pages_noprof+0x256/0x6c0
[ 229.915617][ C0] alloc_pages_mpol_noprof+0x3e8/0x680
[ 229.921098][ C0] __vmalloc_node_range_noprof+0x971/0x1460
[ 229.927002][ C0] dup_task_struct+0x444/0x8c0
[ 229.931782][ C0] copy_process+0x5d1/0x3d70
[ 229.936391][ C0] kernel_clone+0x226/0x8f0
[ 229.940907][ C0] __x64_sys_clone+0x258/0x2a0
[ 229.945685][ C0] do_syscall_64+0xf3/0x230
[ 229.950289][ C0] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 229.956189][ C0] page last free pid 6079 tgid 6077 stack trace:
[ 229.962513][ C0] free_unref_page+0xd22/0xea0
[ 229.967289][ C0] __put_partials+0xeb/0x130
[ 229.971890][ C0] put_cpu_partial+0x17c/0x250
[ 229.976665][ C0] __slab_free+0x2ea/0x3d0
[ 229.981093][ C0] qlist_free_all+0x9e/0x140
[ 229.985702][ C0] kasan_quarantine_reduce+0x14f/0x170
[ 229.991174][ C0] __kasan_slab_alloc+0x23/0x80
[ 229.996036][ C0] kmem_cache_alloc_noprof+0x135/0x2a0
[ 230.001536][ C0] alloc_buffer_head+0x2a/0x290
[ 230.006397][ C0] folio_alloc_buffers+0x241/0x5b0
[ 230.011522][ C0] create_empty_buffers+0x3a/0x740
[ 230.016644][ C0] block_read_full_folio+0x25c/0xcd0
[ 230.021940][ C0] filemap_read_folio+0x1a0/0x790
[ 230.026979][ C0] do_read_cache_folio+0x134/0x820
[ 230.032104][ C0] do_read_cache_page+0x30/0x200
[ 230.037057][ C0] sysv_find_entry+0x1af/0x410
[ 230.041836][ C0]
[ 230.044183][ C0] Memory state around the buggy address:
[ 230.049816][ C0] ffffc90003c4f680: f2 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
[ 230.057889][ C0] ffffc90003c4f700: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
[ 230.065961][ C0] >ffffc90003c4f780: 00 f2 f2 f2 00 f3 f3 f3 00 00 00 00 00 00 00 00
[ 230.074059][ C0] ^
[ 230.078915][ C0] ffffc90003c4f800: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2
[ 230.086983][ C0] ffffc90003c4f880: 00 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[ 230.095056][ C0] ==================================================================
[ 230.103224][ C0] Kernel panic - not syncing: KASAN: panic_on_warn set ...

Andrey Konovalov

unread,
Jul 1, 2024, 11:10:50 AMJul 1
to Tetsuo Handa, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com, kasan-dev, linux-mm, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, mi...@redhat.com, tg...@linutronix.de, x...@kernel.org
On Mon, Jul 1, 2024 at 2:43 PM Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
>
> Hello, KASAN people.
>
> I suspect that KASAN's metadata for kernel stack memory got out of sync for
> unknown reason, for the stack trace of PID=7558 was successfully printed for
> two times before KASAN complains upon trying to print for the the third time.
> Would you decode what is this KASAN message saying?
>
> Quoting from https://syzkaller.appspot.com/text?tag=CrashLog&x=119fd081980000 :

[...]

> [ 229.319713][ C0] ==================================================================
> [ 229.327779][ C0] BUG: KASAN: stack-out-of-bounds in __show_regs+0x172/0x610
> [ 229.335174][ C0] Read of size 8 at addr ffffc90003c4f798 by task kworker/u8:5/234

[...]

> [ 230.044183][ C0] Memory state around the buggy address:
> [ 230.049816][ C0] ffffc90003c4f680: f2 f2 f2 f2 00 00 00 00 00 f3 f3 f3 f3 f3 f3 f3
> [ 230.057889][ C0] ffffc90003c4f700: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
> [ 230.065961][ C0] >ffffc90003c4f780: 00 f2 f2 f2 00 f3 f3 f3 00 00 00 00 00 00 00 00
> [ 230.074059][ C0] ^
> [ 230.078915][ C0] ffffc90003c4f800: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f2 f2 f2
> [ 230.086983][ C0] ffffc90003c4f880: 00 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
> [ 230.095056][ C0] ==================================================================

I checked some of the other syzbot reports for this bug, and this
memory state part in some of them looks different.

Specifically, for
https://syzkaller.appspot.com/text?tag=CrashLog&x=14293f0e980000:

[ 1558.929174][ C1] Memory state around the buggy address:
[ 1558.934796][ C1] ffffc9000b8bf400: 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00
[ 1558.942852][ C1] ffffc9000b8bf480: 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00
[ 1558.950897][ C1] >ffffc9000b8bf500: 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00
[ 1558.958943][ C1] ^
[ 1558.964569][ C1] ffffc9000b8bf580: 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00
[ 1558.972613][ C1] ffffc9000b8bf600: 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00

This is weird, because if the metadata is 00, then the memory should
be accessible and there should be no KASAN report.

Which makes me believe you have some kind of a race in your patch (or
there's a race in the kernel that your patch somehow exposes). At
least between the moment KASAN detected the issue and the moment the
reporting procedure got to printing the memory state, the memory state
changed. As this is stack memory that comes from a vmalloc allocation,
I suspect the task whose stack had been at that location died, and
something else got mapped there.

This is my best guess, I hope it's helpful.

Tetsuo Handa

unread,
Jul 2, 2024, 2:11:24 AMJul 2
to Andrey Konovalov, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com, kasan-dev, linux-mm, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, mi...@redhat.com, tg...@linutronix.de, x...@kernel.org
On 2024/07/02 0:10, Andrey Konovalov wrote:
> This is weird, because if the metadata is 00, then the memory should
> be accessible and there should be no KASAN report.
>
> Which makes me believe you have some kind of a race in your patch (or
> there's a race in the kernel that your patch somehow exposes).

Yes, I consider that my patch is exposing an existing race, for I can't
find a race in my patch. (Since
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b96342141183ffa62bfed5998f9b808c84042322
calls get_task_struct() when recording in-use state, report_rtnl_holders()
can't trigger use-after-free even if the thread died. Also, since previous version
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=5210cbe9a47fc5c1f43ba16d481e6335f3e2f345
synchronously calls synchronize_rcu() when clearing in-use state,
report_rtnl_holders() can't trigger use-after-free because the thread
can't die before calling put_rtnl_holder(). The variable "now" cannot be
0, and !cmpxchg(&rtnl_started[idx], 0, now) must serve as a serialization
lock when recording in-use state.)

> At
> least between the moment KASAN detected the issue and the moment the
> reporting procedure got to printing the memory state, the memory state
> changed.

Indeed, the exact line KASAN complained at varies suggests that the
memory state is modified by somebody else.

> As this is stack memory that comes from a vmalloc allocation,
> I suspect the task whose stack had been at that location died, and
> something else got mapped there.

I consider that the task can't die while calling __show_regs() from
report_rtnl_holders().

>
> This is my best guess, I hope it's helpful.

Well, KASAN says "out-of-bounds". But the reported address

BUG: KASAN: stack-out-of-bounds in __show_regs+0x172/0x610
Read of size 8 at addr ffffc90003c4f798 by task kworker/u8:5/234

is within the kernel stack memory mapping

The buggy address belongs to the virtual mapping at
[ffffc90003c48000, ffffc90003c51000) created by:
copy_process+0x5d1/0x3d7

. Why is this "out-of-bounds" ? What boundary did KASAN compare with?
Is this just a race of KASAN detecting a problem and KASAN reporting
that problem? (But as I explained above, it is unlikely that the thread
to be reported can die while processing report_rtnl_holders()...)

Tetsuo Handa

unread,
Jul 2, 2024, 3:05:29 AMJul 2
to Andrey Konovalov, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com, kasan-dev, linux-mm, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, mi...@redhat.com, tg...@linutronix.de, x...@kernel.org
On 2024/07/02 15:11, Tetsuo Handa wrote:
> Well, KASAN says "out-of-bounds". But the reported address
>
> BUG: KASAN: stack-out-of-bounds in __show_regs+0x172/0x610
> Read of size 8 at addr ffffc90003c4f798 by task kworker/u8:5/234
>
> is within the kernel stack memory mapping
>
> The buggy address belongs to the virtual mapping at
> [ffffc90003c48000, ffffc90003c51000) created by:
> copy_process+0x5d1/0x3d7
>
> . Why is this "out-of-bounds" ? What boundary did KASAN compare with?

I think I found a hint. The KASAN message is printed when the call trace
starts with

__schedule()
preempt_schedule_irq()
irqentry_exit()

. That is, when preemption happens, KASAN by error tries to compare with
unintended stack boundary?

[ 504.507489][ C0] DEBUG: holding rtnl_mutex for 3212 jiffies.
[ 504.513708][ C0] task:kworker/u8:5 state:R running task stack:19992 pid:340 tgid:340 ppid:2 flags:0x00004000
[ 504.525827][ C0] Workqueue: netns cleanup_net
[ 504.530890][ C0] Call Trace:
[ 504.534213][ C0] <TASK>
[ 504.537244][ C0] __schedule+0x17e8/0x4a20
[ 504.541874][ C0] ? mark_lock+0x9a/0x360
[ 504.546279][ C0] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 504.552396][ C0] ? __virt_addr_valid+0x183/0x520
[ 504.557711][ C0] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 504.564121][ C0] ? lock_release+0xbf/0x9f0
[ 504.568918][ C0] ? __pfx___schedule+0x10/0x10
[ 504.573835][ C0] ? lockdep_hardirqs_on+0x99/0x150
[ 504.579189][ C0] ? mark_lock+0x9a/0x360
[ 504.583592][ C0] preempt_schedule_irq+0xfb/0x1c0
[ 504.588984][ C0] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 504.594785][ C0] irqentry_exit+0x5e/0x90
[ 504.599421][ C0] asm_sysvec_reschedule_ipi+0x1a/0x20

[ 463.514954][ C1] DEBUG: holding rtnl_mutex for 993 jiffies.
[ 463.528845][ C1] task:kworker/u8:10 state:R running task stack:19856 pid:5725 tgid:5725 ppid:2 flags:0x00004000
[ 463.536743][ T9938] rock: corrupted directory entry. extent=41, offset=65536, size=8
[ 463.540652][ C1] Workqueue: netns cleanup_net
[ 463.553421][ C1] Call Trace:
[ 463.556740][ C1] <TASK>
[ 463.559706][ C1] __schedule+0x17e8/0x4a20
[ 463.564304][ C1] ? __pfx_validate_chain+0x10/0x10
[ 463.569611][ C1] ? __pfx___schedule+0x10/0x10
[ 463.574628][ C1] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 463.580760][ C1] ? preempt_schedule_irq+0xf0/0x1c0
[ 463.586149][ C1] preempt_schedule_irq+0xfb/0x1c0
[ 463.591401][ C1] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 463.597269][ C1] irqentry_exit+0x5e/0x90
[ 463.601834][ C1] asm_sysvec_apic_timer_interrupt+0x1a/0x20

[ 1558.178669][ C1] DEBUG: holding rtnl_mutex for 536 jiffies.
[ 1558.184806][ C1] task:syz-executor.3 state:R running task stack:25968 pid:6351 tgid:6345 ppid:6200 flags:0x00004006
[ 1558.196699][ C1] Call Trace:
[ 1558.200068][ C1] <TASK>
[ 1558.203055][ C1] __schedule+0x17e8/0x4a20
[ 1558.207638][ C1] ? __pfx___schedule+0x10/0x10
[ 1558.212585][ C1] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 1558.218675][ C1] ? preempt_schedule_irq+0xf0/0x1c0
[ 1558.224004][ C1] preempt_schedule_irq+0xfb/0x1c0
[ 1558.229196][ C1] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 1558.234986][ C1] irqentry_exit+0x5e/0x90
[ 1558.239503][ C1] asm_sysvec_reschedule_ipi+0x1a/0x20

[ 1104.439430][ C0] DEBUG: holding rtnl_mutex for 578 jiffies.
[ 1104.445729][ C0] task:kworker/u8:3 state:R running task stack:18544 pid:53 tgid:53 ppid:2 flags:0x00004000
[ 1104.459070][ C0] Workqueue: netns cleanup_net
[ 1104.464170][ C0] Call Trace:
[ 1104.467478][ C0] <TASK>
[ 1104.470481][ C0] __schedule+0x17e8/0x4a20
[ 1104.476080][ C0] ? mark_lock+0x9a/0x360
[ 1104.480776][ C0] ? __lock_acquire+0x1359/0x2000
[ 1104.486043][ C0] ? __pfx___schedule+0x10/0x10
[ 1104.490987][ C0] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 1104.497017][ C0] ? preempt_schedule_irq+0xf0/0x1c0
[ 1104.502486][ C0] preempt_schedule_irq+0xfb/0x1c0
[ 1104.507809][ C0] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 1104.514030][ C0] irqentry_exit+0x5e/0x90
[ 1104.518689][ C0] asm_sysvec_reschedule_ipi+0x1a/0x20

[ 926.207053][ C1] DEBUG: holding rtnl_mutex for 517 jiffies.
[ 926.213142][ C1] task:syz.1.1365 state:R running task stack:24672 pid:11152 tgid:11152 ppid:10992 flags:0x00004006
[ 926.225053][ C1] Call Trace:
[ 926.228434][ C1] <TASK>
[ 926.231441][ C1] __schedule+0x17e8/0x4a20
[ 926.236054][ C1] ? __pfx___schedule+0x10/0x10
[ 926.241130][ C1] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 926.247265][ C1] ? kasan_save_track+0x51/0x80
[ 926.252225][ C1] ? preempt_schedule_irq+0xf0/0x1c0
[ 926.257705][ C1] preempt_schedule_irq+0xfb/0x1c0
[ 926.262899][ C1] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 926.268725][ C1] ? __pfx_pfifo_fast_destroy+0x10/0x10
[ 926.274379][ C1] irqentry_exit+0x5e/0x90
[ 926.278903][ C1] asm_sysvec_apic_timer_interrupt+0x1a/0x20

[ 940.917894][ C0] DEBUG: holding rtnl_mutex for 1611 jiffies.
[ 940.924066][ C0] task:syz.2.2274 state:R running task stack:24336 pid:15954 tgid:15954 ppid:14850 flags:0x00004006
[ 940.936192][ C0] Call Trace:
[ 940.939550][ C0] <TASK>
[ 940.942540][ C0] __schedule+0x17e8/0x4a20
[ 940.947134][ C0] ? __pfx___schedule+0x10/0x10
[ 940.952070][ C0] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 940.958362][ C0] ? kasan_save_track+0x51/0x80
[ 940.963266][ C0] ? preempt_schedule_irq+0xf0/0x1c0
[ 940.968628][ C0] preempt_schedule_irq+0xfb/0x1c0
[ 940.973790][ C0] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 940.979610][ C0] ? __pfx_pfifo_fast_destroy+0x10/0x10
[ 940.985227][ C0] irqentry_exit+0x5e/0x90
[ 940.989731][ C0] asm_sysvec_apic_timer_interrupt+0x1a/0x20

[ 2120.744289][ C1] DEBUG: holding rtnl_mutex for 1675 jiffies.
[ 2120.750440][ C1] task:syz-executor state:R running task stack:20288 pid:2431 tgid:2431 ppid:1 flags:0x00004006
[ 2120.762291][ C1] Call Trace:
[ 2120.765647][ C1] <TASK>
[ 2120.768615][ C1] __schedule+0x17e8/0x4a20
[ 2120.773210][ C1] ? __pfx___schedule+0x10/0x10
[ 2120.778152][ C1] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 2120.784188][ C1] ? kasan_save_track+0x51/0x80
[ 2120.789118][ C1] ? preempt_schedule_irq+0xf0/0x1c0
[ 2120.794445][ C1] preempt_schedule_irq+0xfb/0x1c0
[ 2120.799621][ C1] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 2120.805378][ C1] ? kvm_kick_cpu+0x26/0xb0
[ 2120.809965][ C1] irqentry_exit+0x5e/0x90
[ 2120.814423][ C1] asm_sysvec_apic_timer_interrupt+0x1a/0x20

[ 1465.514982][ C1] DEBUG: holding rtnl_mutex for 583 jiffies.
[ 1465.521071][ C1] task:kworker/u8:2 state:R running task stack:20232 pid:35 tgid:35 ppid:2 flags:0x00004000
[ 1465.532945][ C1] Workqueue: netns cleanup_net
[ 1465.537846][ C1] Call Trace:
[ 1465.541164][ C1] <TASK>
[ 1465.544132][ C1] __schedule+0x17e8/0x4a20
[ 1465.548730][ C1] ? mark_lock+0x9a/0x360
[ 1465.553148][ C1] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 1465.559257][ C1] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 1465.565697][ C1] ? __pfx___schedule+0x10/0x10
[ 1465.570636][ C1] ? lockdep_hardirqs_on+0x99/0x150
[ 1465.575968][ C1] ? mark_lock+0x9a/0x360
[ 1465.580381][ C1] preempt_schedule_irq+0xfb/0x1c0
[ 1465.585599][ C1] ? __pfx_preempt_schedule_irq+0x10/0x10
[ 1465.591383][ C1] irqentry_exit+0x5e/0x90
[ 1465.595895][ C1] asm_sysvec_reschedule_ipi+0x1a/0x20

Andrey Konovalov

unread,
Jul 2, 2024, 10:29:29 AMJul 2
to Tetsuo Handa, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com, kasan-dev, linux-mm, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, mi...@redhat.com, tg...@linutronix.de, x...@kernel.org
On Tue, Jul 2, 2024 at 8:11 AM Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
>
> Well, KASAN says "out-of-bounds". But the reported address
>
> BUG: KASAN: stack-out-of-bounds in __show_regs+0x172/0x610
> Read of size 8 at addr ffffc90003c4f798 by task kworker/u8:5/234
>
> is within the kernel stack memory mapping
>
> The buggy address belongs to the virtual mapping at
> [ffffc90003c48000, ffffc90003c51000) created by:
> copy_process+0x5d1/0x3d7
>
> . Why is this "out-of-bounds" ? What boundary did KASAN compare with?
> Is this just a race of KASAN detecting a problem and KASAN reporting
> that problem? (But as I explained above, it is unlikely that the thread
> to be reported can die while processing report_rtnl_holders()...)

KASAN creates the bug title based on the memory state byte, it doesn't
compare any boundaries. So if the memory state got
corrupted/out-of-sync, the title can end up arbitrary. In this case,
the bad access likely touched a redzone between stack variables, so
KASAN reported out-of-bounds.

Since syzbot managed to find a reproducer for this bug, you can ask it
to run the reproducer with additional testing patches to check various
hypotheses [1]. Perhaps, you can write a magic value into task_struct
(or into pt_regs of the task?) in get_rtnl_holder() before saving the
task_struct reference, and then check that marker value in
report_rtnl_holders() before doing sched_show_task(). Depending of
whether the check succeeds, this will give additional information.

One other thing that comes to mind with regards to your patch: if the
task is still executing, the location of things on its stack might
change due to CONFIG_RANDOMIZE_KSTACK_OFFSET while you're printing the
task info. However, if the task is sleeping on a lock, this shouldn't
happen... But maybe a task can wake up during sched_show_task() and
start handling a new syscall? Just some guesses.

[1] https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches

Tetsuo Handa

unread,
Jul 2, 2024, 11:21:20 AMJul 2
to Andrey Konovalov, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com, kasan-dev, linux-mm, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, mi...@redhat.com, tg...@linutronix.de, x...@kernel.org
On 2024/07/02 23:29, Andrey Konovalov wrote:
> One other thing that comes to mind with regards to your patch: if the
> task is still executing, the location of things on its stack might
> change due to CONFIG_RANDOMIZE_KSTACK_OFFSET while you're printing the
> task info. However, if the task is sleeping on a lock, this shouldn't
> happen... But maybe a task can wake up during sched_show_task() and
> start handling a new syscall? Just some guesses.

https://syzkaller.appspot.com/bug?extid=d7491e9e156404745fbb says that
this bug happens without my patch. It seems that this bug happens when
printing registers of a preempted thread. 5.15 kernel does not have
CONFIG_RANDOMIZE_KSTACK_OFFSET config option, but

__schedule()
preempt_schedule_irq()
irqentry_exit_cond_resched()
irqentry_exit()

pattern in 5.15 resembles

__schedule()
preempt_schedule_irq()
irqentry_exit()

pattern in linux-next.

[ 1008.224617][T14487] task:syz-executor.1 state:R running task stack:22256 pid:14483 ppid: 434 flags:0x00004000
[ 1008.224656][T14487] Call Trace:
[ 1008.224661][T14487] <TASK>
[ 1008.224669][T14487] __schedule+0xcbe/0x1580
[ 1008.224689][T14487] ? __sched_text_start+0x8/0x8
[ 1008.224709][T14487] ? ttwu_do_activate+0x15d/0x280
[ 1008.224732][T14487] ? _raw_spin_unlock_irqrestore+0x5c/0x80
[ 1008.224758][T14487] preempt_schedule_irq+0xc7/0x140
[ 1008.224781][T14487] ? __cond_resched+0x20/0x20
[ 1008.224802][T14487] ? try_invoke_on_locked_down_task+0x2a0/0x2a0
[ 1008.224829][T14487] irqentry_exit_cond_resched+0x2a/0x30
[ 1008.224851][T14487] irqentry_exit+0x30/0x40
[ 1008.224874][T14487] sysvec_apic_timer_interrupt+0x55/0xc0
[ 1008.224900][T14487] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 1008.224923][T14487] RIP: 0010:preempt_schedule_thunk+0x5/0x18
[ 1008.224950][T14487] Code: fd 85 db 0f 84 98 00 00 00 44 8d 73 01 44 89 f6 09 de bf ff ff ff ff e8 47 e4 8f fd 41 09 de 0f 88 88 00 00 00 e8 89 e0 8f fd <4c> 89 e0 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df 0f b6 04 08 84
[ 1008.224970][T14487] RSP: 0000:0000000000000001 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
[ 1008.224991][T14487] RAX: ffff88811532d948 RBX: ffffc900072ef560 RCX: ffffc900077e7680
[ 1008.225009][T14487] RDX: ffffc900072ef5b0 RSI: ffffffff8100817a RDI: dffffc0000000001
[ 1008.225027][T14487] RBP: 0000000000000001 R08: ffff88811532d948 R09: ffffc900077e7690
[ 1008.225043][T14487] R10: 1ffff92000efced2 R11: ffffffff84bfe126 R12: ffffc900077e7680
[ 1008.225062][T14487] ==================================================================
[ 1008.225071][T14487] BUG: KASAN: stack-out-of-bounds in __show_regs+0x252/0x4d0
[ 1008.225098][T14487] Read of size 8 at addr ffffc900072ef4f8 by task syz-executor.3/14487
[ 1008.225117][T14487]
[ 1008.225123][T14487] CPU: 0 PID: 14487 Comm: syz-executor.3 Not tainted 5.15.118-syzkaller-01748-g241da2ad5601 #0

Tetsuo Handa

unread,
Jul 7, 2024, 6:32:24 AM (12 days ago) Jul 7
to Alexander Potapenko, Marco Elver, Dmitry Vyukov, syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com, kasan-dev, linux-mm, b...@alien8.de, dave....@linux.intel.com, h...@zytor.com, mi...@redhat.com, tg...@linutronix.de, x...@kernel.org, Andrey Konovalov
Guessing from IRQ entry hint, I found
commit 37ad4ee83642 ("x86: kmsan: don't instrument stack walking functions") and
commit 6cae637fa26d ("entry: kmsan: introduce kmsan_unpoison_entry_regs()").
I guess that we need to disable KASAN for the same reason as well as KMSAN.
Alexander, can you write a patch description for below change?

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index a7d562697e50..01edff4a9d6b 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -192,6 +192,9 @@ static void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
int graph_idx = 0;
bool partial = false;

+ /* As with KMSAN, disable KASAN for the same reason. */
+ kasan_disable_current();
+
printk("%sCall Trace:\n", log_lvl);

unwind_start(&state, task, regs, stack);
@@ -304,6 +307,8 @@ static void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
if (stack_name)
printk("%s </%s>\n", log_lvl, stack_name);
}
+
+ kasan_enable_current();
}

void show_stack(struct task_struct *task, unsigned long *sp,
Reply all
Reply to author
Forward
0 new messages