[syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair

syzbot

unread,

Mar 23, 2023, 8:52:41 PM3/23/23

to da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com

Hello,

syzbot found the following issue on:

HEAD commit: fff5a5e7f528 Merge tag 'for-linus' of git://git.armlinux.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11884731c80000
kernel config: https://syzkaller.appspot.com/x/.config?x=aaa4b45720ca0519
dashboard link: https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15d1497ac80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11eed636c80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/33a184f98b9d/disk-fff5a5e7.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/3d75f967571e/vmlinux-fff5a5e7.xz
kernel image: https://storage.googleapis.com/syzbot-assets/4eeb8edbdc7e/bzImage-fff5a5e7.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b53a9c...@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901

CPU: 0 PID: 14901 Comm: syz-executor690 Not tainted 6.3.0-rc3-syzkaller-00026-gfff5a5e7f528 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
print_report mm/kasan/report.c:430 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:536
mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f4f11222579
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 31 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f4f11184208 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f4f112ab2a8 RCX: 00007f4f11222579
RDX: 040000000000009f RSI: 00000000200002c0 RDI: 0000000000000007
RBP: 00007f4f112ab2a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4f112ab2ac
R13: 00007fffc8e5214f R14: 00007f4f11184300 R15: 0000000000022000
</TASK>

Allocated by task 14898:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
____kasan_kmalloc mm/kasan/common.c:333 [inline]
__kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383
kasan_kmalloc include/linux/kasan.h:196 [inline]
__do_kmalloc_node mm/slab_common.c:967 [inline]
__kmalloc_node+0x61/0x1a0 mm/slab_common.c:974
kmalloc_node include/linux/slab.h:610 [inline]
kzalloc_node include/linux/slab.h:731 [inline]
qdisc_alloc+0xb0/0xb30 net/sched/sch_generic.c:938
qdisc_create+0xce/0x1040 net/sched/sch_api.c:1244
tc_modify_qdisc+0x488/0x1a40 net/sched/sch_api.c:1680
rtnetlink_rcv_msg+0x43d/0xd50 net/core/rtnetlink.c:6174
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 21:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1807
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0xaf/0x2d0 mm/slub.c:3800
rcu_do_batch kernel/rcu/tree.c:2112 [inline]
rcu_core+0x814/0x1960 kernel/rcu/tree.c:2372
__do_softirq+0x1d4/0x905 kernel/softirq.c:571

Last potentially related work creation:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
__kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:491
__call_rcu_common.constprop.0+0x99/0x7e0 kernel/rcu/tree.c:2622
qdisc_put_unlocked+0x73/0x90 net/sched/sch_generic.c:1097
tcf_block_release+0x86/0x90 net/sched/cls_api.c:1362
tc_new_tfilter+0xa35/0x2290 net/sched/cls_api.c:2331
rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff888045b31000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 776 bytes inside of
freed 1024-byte region [ffff888045b31000, ffff888045b31400)

The buggy address belongs to the physical page:
page:ffffea000116cc00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888045b37000 pfn:0x45b30
head:ffffea000116cc00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffff888012441dc0 0000000000000000 dead000000000001
raw: ffff888045b37000 000000008010000d 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 14184, tgid 14168 (syz-executor690), ts 420322459064, free_ts 15493742100
prep_new_page mm/page_alloc.c:2552 [inline]
get_page_from_freelist+0x1190/0x2e20 mm/page_alloc.c:4325
__alloc_pages+0x1cb/0x4a0 mm/page_alloc.c:5591
alloc_pages+0x1aa/0x270 mm/mempolicy.c:2283
alloc_slab_page mm/slub.c:1851 [inline]
allocate_slab+0x25f/0x390 mm/slub.c:1998
new_slab mm/slub.c:2051 [inline]
___slab_alloc+0xa91/0x1400 mm/slub.c:3193
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3292
__slab_alloc_node mm/slub.c:3345 [inline]
slab_alloc_node mm/slub.c:3442 [inline]
__kmem_cache_alloc_node+0x136/0x320 mm/slub.c:3491
kmalloc_trace+0x26/0xe0 mm/slab_common.c:1061
kmalloc include/linux/slab.h:580 [inline]
kmalloc_array include/linux/slab.h:635 [inline]
kcalloc include/linux/slab.h:667 [inline]
fl_change+0x1cf/0x4ac0 net/sched/cls_flower.c:2175
tc_new_tfilter+0x97c/0x2290 net/sched/cls_api.c:2310
rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1453 [inline]
free_pcp_prepare+0x5d5/0xa50 mm/page_alloc.c:1503
free_unref_page_prepare mm/page_alloc.c:3387 [inline]
free_unref_page+0x1d/0x490 mm/page_alloc.c:3482
free_contig_range+0xb5/0x180 mm/page_alloc.c:9531
destroy_args+0x6c4/0x920 mm/debug_vm_pgtable.c:1023
debug_vm_pgtable+0x242a/0x4640 mm/debug_vm_pgtable.c:1403
do_one_initcall+0x102/0x540 init/main.c:1310
do_initcall_level init/main.c:1383 [inline]
do_initcalls init/main.c:1399 [inline]
do_basic_setup init/main.c:1418 [inline]
kernel_init_freeable+0x696/0xc00 init/main.c:1638
kernel_init+0x1e/0x2c0 init/main.c:1526
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

Memory state around the buggy address:
ffff888045b31200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888045b31280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888045b31300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888045b31380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888045b31400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Jakub Kicinski

unread,

Mar 28, 2023, 9:47:37 PM3/28/23

to syzbot, Seth Forshee, da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com

Seth, does this looks related to commit 267463823adb ("net: sch:
eliminate unnecessary RCU waits in mini_qdisc_pair_swap()")
by any chance?

Pedro Tammela

unread,

Mar 29, 2023, 6:17:22 PM3/29/23

to Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com

On 29/03/2023 00:37, Seth Forshee wrote:

> On Tue, Mar 28, 2023 at 06:47:33PM -0700, Jakub Kicinski wrote:
>> Seth, does this looks related to commit 267463823adb ("net: sch:
>> eliminate unnecessary RCU waits in mini_qdisc_pair_swap()")
>> by any chance?
>

> I don't see how it could be. The memory being written is part of the
> qdisc private memory, and tc_new_tfilter() takes a reference to the
> qdisc. If that memory has been freed doesn't it mean that something has
> done an unbalanced qdisc_put()?
>

Reverting Seth's patches (85c0c3eb9a66 and 267463823adb) leads to these
traces with the reproducer:
[ 52.704956][ C0] ------------[ cut here ]------------
[ 52.705568][ C0] ODEBUG: free active (active state 1) object:0
[ 52.706542][ C0] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c0
[ 52.707283][ C0] Modules linked in:
[ 52.707602][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0
[ 52.708304][ C0] Hardware name: QEMU Standard PC (i440FX + PI4
[ 52.709032][ C0] RIP: 0010:debug_print_object+0x196/0x290
[ 52.709509][ C0] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85e
[ 52.711011][ C0] RSP: 0018:ffffc90000007cd0 EFLAGS: 00010282
[ 52.711510][ C0] RAX: 0000000000000000 RBX: 0000000000000003 0
[ 52.712125][ C0] RDX: ffffffff8c495800 RSI: ffffffff814b96d7 1
[ 52.712748][ C0] RBP: 0000000000000001 R08: 0000000000000001 0
[ 52.713370][ C0] R10: 0000000000000000 R11: 203a47554245444f 0
[ 52.713983][ C0] R13: ffffffff8aa6e960 R14: 0000000000000000 8
[ 52.714609][ C0] FS: 0000000000000000(0000) GS:ffff8881f5a000
[ 52.715356][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008003
[ 52.715863][ C0] CR2: 000055914686f020 CR3: 000000011e856000 0
[ 52.716486][ C0] Call Trace:
[ 52.716742][ C0] <IRQ>
[ 52.716969][ C0] debug_check_no_obj_freed+0x302/0x420
[ 52.717423][ C0] slab_free_freelist_hook+0xec/0x1c0
[ 52.717848][ C0] ? rcu_core+0x818/0x1930
[ 52.718204][ C0] __kmem_cache_free+0xaf/0x2e0
[ 52.718590][ C0] rcu_core+0x818/0x1930
[ 52.718938][ C0] ? rcu_report_dead+0x610/0x610
[ 52.719328][ C0] __do_softirq+0x1d4/0x8ef
[ 52.719689][ C0] __irq_exit_rcu+0x11d/0x190
[ 52.720062][ C0] irq_exit_rcu+0x9/0x20
[ 52.720402][ C0] sysvec_apic_timer_interrupt+0x97/0xc0
[ 52.720842][ C0] </IRQ>
[ 52.721070][ C0] <TASK>
[ 52.721300][ C0] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 52.721779][ C0] RIP: 0010:default_idle+0xf/0x20
[ 52.722172][ C0] Code: 89 07 49 c7 c0 08 00 00 00 4d 29 c8 4c5
[ 52.723631][ C0] RSP: 0018:ffffffff8c407e30 EFLAGS: 00000202
[ 52.724096][ C0] RAX: 000000000007897f RBX: 0000000000000000 6
[ 52.724702][ C0] RDX: 0000000000000000 RSI: 0000000000000001 0
[ 52.725335][ C0] RBP: ffffffff8c495800 R08: 0000000000000001 b
[ 52.725957][ C0] R10: ffffed103eb46d95 R11: 0000000000000000 0
[ 52.726550][ C0] R13: 0000000000000000 R14: ffffffff8e7834d0 0
[ 52.727162][ C0] ? ct_kernel_exit+0x1d6/0x240
[ 52.727542][ C0] default_idle_call+0x67/0xa0
[ 52.727912][ C0] do_idle+0x31e/0x3e0
[ 52.728241][ C0] ? arch_cpu_idle_exit+0x30/0x30
[ 52.728635][ C0] cpu_startup_entry+0x18/0x20
[ 52.729006][ C0] rest_init+0x16d/0x2b0
[ 52.729338][ C0] ? regulator_has_full_constraints+0x9/0x20
[ 52.729815][ C0] ? trace_init_perf_perm_irq_work_exit+0x20/00
[ 52.730309][ C0] arch_call_rest_init+0x13/0x30
[ 52.730703][ C0] start_kernel+0x352/0x4c0
[ 52.731087][ C0] secondary_startup_64_no_verify+0xce/0xdb
[ 52.731611][ C0] </TASK>
[ 52.731870][ C0] Kernel panic - not syncing: kernel: panic_on.
[ 52.732445][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0
[ 52.733140][ C0] Hardware name: QEMU Standard PC (i440FX + PI4
[ 52.733867][ C0] Call Trace:
[ 52.734143][ C0] <IRQ>
[ 52.734380][ C0] dump_stack_lvl+0xd9/0x150
[ 52.734769][ C0] panic+0x684/0x730
[ 52.735082][ C0] ? panic_smp_self_stop+0x90/0x90
[ 52.735322][ C0] ? show_trace_log_lvl+0x285/0x390
[ 52.735322][ C0] ? debug_print_object+0x196/0x290
[ 52.735322][ C0] check_panic_on_warn+0xb1/0xc0
[ 52.735322][ C0] __warn+0xf2/0x390
[ 52.735322][ C0] ? debug_print_object+0x196/0x290
[ 52.735322][ C0] report_bug+0x2dd/0x500
[ 52.735322][ C0] handle_bug+0x3c/0x70
[ 52.735322][ C0] exc_invalid_op+0x18/0x50
[ 52.735322][ C0] asm_exc_invalid_op+0x1a/0x20
[ 52.735322][ C0] RIP: 0010:debug_print_object+0x196/0x290
[ 52.735322][ C0] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85e
[ 52.735322][ C0] RSP: 0018:ffffc90000007cd0 EFLAGS: 00010282
[ 52.735322][ C0] RAX: 0000000000000000 RBX: 0000000000000003 0
[ 52.735322][ C0] RDX: ffffffff8c495800 RSI: ffffffff814b96d7 1
[ 52.735322][ C0] RBP: 0000000000000001 R08: 0000000000000001 0
[ 52.735322][ C0] R10: 0000000000000000 R11: 203a47554245444f 0
[ 52.735322][ C0] R13: ffffffff8aa6e960 R14: 0000000000000000 8
[ 52.735322][ C0] ? __warn_printk+0x187/0x310
[ 52.735322][ C0] debug_check_no_obj_freed+0x302/0x420
[ 52.735322][ C0] slab_free_freelist_hook+0xec/0x1c0
[ 52.735322][ C0] ? rcu_core+0x818/0x1930
[ 52.735322][ C0] __kmem_cache_free+0xaf/0x2e0
[ 52.735322][ C0] rcu_core+0x818/0x1930
[ 52.735322][ C0] ? rcu_report_dead+0x610/0x610
[ 52.735322][ C0] __do_softirq+0x1d4/0x8ef
[ 52.735322][ C0] __irq_exit_rcu+0x11d/0x190
[ 52.735322][ C0] irq_exit_rcu+0x9/0x20
[ 52.735322][ C0] sysvec_apic_timer_interrupt+0x97/0xc0
[ 52.735322][ C0] </IRQ>
[ 52.735322][ C0] <TASK>
[ 52.735322][ C0] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 52.735322][ C0] RIP: 0010:default_idle+0xf/0x20
[ 52.735322][ C0] Code: 89 07 49 c7 c0 08 00 00 00 4d 29 c8 4c5
[ 52.735322][ C0] RSP: 0018:ffffffff8c407e30 EFLAGS: 00000202
[ 52.735322][ C0] RAX: 000000000007897f RBX: 0000000000000000 6
[ 52.735322][ C0] RDX: 0000000000000000 RSI: 0000000000000001 0
[ 52.735322][ C0] RBP: ffffffff8c495800 R08: 0000000000000001 b
[ 52.735322][ C0] R10: ffffed103eb46d95 R11: 0000000000000000 0
[ 52.735322][ C0] R13: 0000000000000000 R14: ffffffff8e7834d0 0
[ 52.735322][ C0] ? ct_kernel_exit+0x1d6/0x240
[ 52.735322][ C0] default_idle_call+0x67/0xa0
[ 52.735322][ C0] do_idle+0x31e/0x3e0
[ 52.735322][ C0] ? arch_cpu_idle_exit+0x30/0x30
[ 52.735322][ C0] cpu_startup_entry+0x18/0x20
[ 52.735322][ C0] rest_init+0x16d/0x2b0
[ 52.735322][ C0] ? regulator_has_full_constraints+0x9/0x20
[ 52.735322][ C0] ? trace_init_perf_perm_irq_work_exit+0x20/00
[ 52.735322][ C0] arch_call_rest_init+0x13/0x30
[ 52.735322][ C0] start_kernel+0x352/0x4c0
[ 52.735322][ C0] secondary_startup_64_no_verify+0xce/0xdb
[ 52.735322][ C0] </TASK>
[ 52.735322][ C0] Kernel Offset: disabled
[ 52.735322][ C0] Rebooting in 86400 seconds..

Jamal Hadi Salim

unread,

Apr 3, 2023, 12:00:47 PM4/3/23

to Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com

To provide more update:
Happens on single processor before Seth's patches; and only on
multi-processor after Seth's patches.
Theory is: there is a logic bug in the miniqdisc rcu visibility. Feels
like the freeing of the structure is done without rcu involvement.
Jiri/Cong maybe you can take a look since youve been dabbling in
miniqdisc? The reproducer worked for me and Pedro 100% of the time.

cheers,
jamal

Hillf Danton

unread,

Apr 17, 2023, 10:26:14 PM4/17/23

to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com

On 23 Mar 2023 17:52:40 -0700

> HEAD commit: fff5a5e7f528 Merge tag 'for-linus' of git://git.armlinux.o..
> git tree: upstream

> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11eed636c80000

Bail out if no Qdisc is found.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git fff5a5e7f528

--- x/net/sched/cls_api.c
+++ y/net/sched/cls_api.c
@@ -2154,7 +2154,6 @@ replay:
tp = NULL;
cl = 0;
block = NULL;
- q = NULL;
chain = NULL;
flags = 0;

@@ -2173,9 +2172,12 @@ replay:

/* Find head of filter chain. */

+ q = NULL;
err = __tcf_qdisc_find(net, &q, &parent, t->tcm_ifindex, false, extack);
if (err)
return err;
+ if (!q)
+ return -ENOENT;

if (tcf_proto_check_kind(tca[TCA_KIND], name)) {
NL_SET_ERR_MSG(extack, "Specified TC filter name too long");
--

syzbot

unread,

Apr 17, 2023, 11:07:19 PM4/17/23

to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
KASAN: slab-use-after-free Write in mini_qdisc_pair_swap

==================================================================
BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573

Write of size 8 at addr ffff88806348a308 by task syz-executor.0/8529

CPU: 1 PID: 8529 Comm: syz-executor.0 Not tainted 6.3.0-rc3-syzkaller-00026-gfff5a5e7f528-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
print_report mm/kasan/report.c:430 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:536
mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]

tc_new_tfilter+0x1d77/0x2200 net/sched/cls_api.c:2268

rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

RIP: 0033:0x7ffaaee8c0f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffaafc5e168 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007ffaaefac120 RCX: 00007ffaaee8c0f9

RDX: 040000000000009f RSI: 00000000200002c0 RDI: 0000000000000007

RBP: 00007ffaaeee7b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe97997fbf R14: 00007ffaafc5e300 R15: 0000000000022000
</TASK>

Allocated by task 8524:

tc_new_tfilter+0xa2b/0x2200 net/sched/cls_api.c:2333

rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Second to last potentially related work creation:

kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
__kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:491
__call_rcu_common.constprop.0+0x99/0x7e0 kernel/rcu/tree.c:2622

rhashtable_rehash_table lib/rhashtable.c:348 [inline]
rht_deferred_worker+0xb24/0x1ce0 lib/rhashtable.c:432
process_one_work+0x991/0x15c0 kernel/workqueue.c:2390
worker_thread+0x669/0x1090 kernel/workqueue.c:2537
kthread+0x2e8/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

The buggy address belongs to the object at ffff88806348a000

which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 776 bytes inside of

freed 1024-byte region [ffff88806348a000, ffff88806348a400)

The buggy address belongs to the physical page:

page:ffffea00018d2200 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x63488
head:ffffea00018d2200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0

anon flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffff888012441dc0 0000000000000000 dead000000000001

raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000

page dumped because: kasan: bad access detected
page_owner tracks the page as allocated

prep_new_page mm/page_alloc.c:2552 [inline]
get_page_from_freelist+0x1190/0x2e20 mm/page_alloc.c:4325
__alloc_pages+0x1cb/0x4a0 mm/page_alloc.c:5591
alloc_pages+0x1aa/0x270 mm/mempolicy.c:2283
alloc_slab_page mm/slub.c:1851 [inline]
allocate_slab+0x25f/0x390 mm/slub.c:1998
new_slab mm/slub.c:2051 [inline]
___slab_alloc+0xa91/0x1400 mm/slub.c:3193
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3292
__slab_alloc_node mm/slub.c:3345 [inline]
slab_alloc_node mm/slub.c:3442 [inline]
__kmem_cache_alloc_node+0x136/0x320 mm/slub.c:3491
kmalloc_trace+0x26/0xe0 mm/slab_common.c:1061
kmalloc include/linux/slab.h:580 [inline]

kzalloc include/linux/slab.h:720 [inline]
fl_init+0x45/0x2c0 net/sched/cls_flower.c:351
tcf_proto_create net/sched/cls_api.c:398 [inline]
tc_new_tfilter+0xecf/0x2200 net/sched/cls_api.c:2260

ffff88806348a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88806348a280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88806348a300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88806348a380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88806348a400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Tested on:

commit: fff5a5e7 Merge tag 'for-linus' of git://git.armlinux.o..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=10cf71c0280000
kernel config: https://syzkaller.appspot.com/x/.config?x=ea09b0836073ee4

dashboard link: https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

patch: https://syzkaller.appspot.com/x/patch.diff?x=16fa5f23c80000

Peilin Ye

unread,

Apr 18, 2023, 2:46:44 AM4/18/23

to Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com

Hi all,

On Mon, Apr 03, 2023 at 11:58:44AM -0400, Jamal Hadi Salim wrote:
> To provide more update:
> Happens on single processor before Seth's patches; and only on
> multi-processor after Seth's patches.
> Theory is: there is a logic bug in the miniqdisc rcu visibility. Feels
> like the freeing of the structure is done without rcu involvement.
> Jiri/Cong maybe you can take a look since youve been dabbling in
> miniqdisc? The reproducer worked for me and Pedro 100% of the time.

I also reproduced this UAF using the syzkaller reproducer in the report
(the C reproducer did not work for me for unknown reasons). I will look
into this.

Thanks,
Peilin Ye

Hillf Danton

unread,

Apr 18, 2023, 5:22:03 AM4/18/23

to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com

On 23 Mar 2023 17:52:40 -0700

> HEAD commit: fff5a5e7f528 Merge tag 'for-linus' of git://git.armlinux.o..
> git tree: upstream

> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11eed636c80000

Roll back to the locked qdisc ops.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git fff5a5e7f528

--- x/net/sched/sch_ingress.c
+++ y/net/sched/sch_ingress.c
@@ -121,7 +121,6 @@ nla_put_failure:
}

static const struct Qdisc_class_ops ingress_class_ops = {
- .flags = QDISC_CLASS_OPS_DOIT_UNLOCKED,
.leaf = ingress_leaf,
.find = ingress_find,
.walk = ingress_walk,
--

syzbot

unread,

Apr 18, 2023, 5:44:27 AM4/18/23

to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+b53a9c...@syzkaller.appspotmail.com

Tested on:

commit: fff5a5e7 Merge tag 'for-linus' of git://git.armlinux.o..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

console output: https://syzkaller.appspot.com/x/log.txt?x=1658d89fc80000

kernel config: https://syzkaller.appspot.com/x/.config?x=ea09b0836073ee4
dashboard link: https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

patch: https://syzkaller.appspot.com/x/patch.diff?x=10d7cd5bc80000

Note: testing is done by a robot and is best-effort only.

Hillf Danton

unread,

Apr 18, 2023, 9:03:53 AM4/18/23

to Jamal Hadi Salim, Pedro Tammela, Seth Forshee, syzbot, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com

On Mon, 3 Apr 2023 11:58:44 -0400 Jamal Hadi Salim <j...@mojatatu.com>

> To provide more update:
> Happens on single processor before Seth's patches; and only on
> multi-processor after Seth's patches.
> Theory is: there is a logic bug in the miniqdisc rcu visibility. Feels
> like the freeing of the structure is done without rcu involvement.
> Jiri/Cong maybe you can take a look since youve been dabbling in
> miniqdisc? The reproducer worked for me and Pedro 100% of the time.

JFYI reverting 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for
Qdisc ops") survived the syzbot test [1].

[1] https://lore.kernel.org/lkml/00000000000038...@google.com/

Peilin Ye

unread,

Apr 26, 2023, 7:42:07 PM4/26/23

to Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, yepei...@gmail.com, vla...@nvidia.com, hda...@sina.com

+Cc: Vlad Buslov, Hillf Danton

Hi all,

On Mon, Apr 17, 2023 at 04:00:11PM -0700, Peilin Ye wrote:
> I also reproduced this UAF using the syzkaller reproducer in the report
> (the C reproducer did not work for me for unknown reasons). I will look
> into this.

Currently, multiple ingress (clsact) Qdiscs can access the per-netdev
*miniq_ingress (*miniq_egress) pointer concurrently. This is
unfortunately true in two senses:

1. We allow adding ingress (clsact) Qdiscs under parents other than
TC_H_INGRESS (TC_H_CLSACT):

$ ip link add ifb0 numtxqueues 8 type ifb
$ echo clsact > /proc/sys/net/core/default_qdisc
$ tc qdisc add dev ifb0 handle 1: root mq
$ tc qdisc show dev ifb0
qdisc mq 1: root
qdisc clsact 0: parent 1:8
qdisc clsact 0: parent 1:7
qdisc clsact 0: parent 1:6
qdisc clsact 0: parent 1:5
qdisc clsact 0: parent 1:4
qdisc clsact 0: parent 1:3
qdisc clsact 0: parent 1:2
qdisc clsact 0: parent 1:1

This is obviously racy and should be prohibited. I've started working
on patches to fix this. The syz repro for this UAF adds ingress Qdiscs
under TC_H_ROOT, by the way.

2. After introducing RTNL-lockless RTM_{NEW,DEL,GET}TFILTER requests
[1], it is possible that, when replacing ingress (clsact) Qdiscs, the
old one can access *miniq_{in,e}gress concurrently with the new one. For
example, the syz repro does something like the following:

Thread 1 creates sch_ingress Qdisc A (containing mini Qdisc a1 and a2),
then adds a cls_flower filter X to Qdisc A.

Thread 2 creates sch_ingress Qdisc B (containing mini Qdisc b1 and b2)
to replace Qdisc A, then adds a cls_flower filter Y to Qdisc B.

Device has 8 TXQs.

Thread 1 A's refcnt Thread 2
RTM_NEWQDISC (A, locked)
qdisc_create(A) 1
qdisc_graft(A) 9

RTM_NEWTFILTER (X, lockless)
__tcf_qdisc_find(A) 10
tcf_chain0_head_change(A)
! mini_qdisc_pair_swap(A)
| RTM_NEWQDISC (B, locked)
| 2 qdisc_graft(B)
| 1 notify_and_destroy(A)
|
| RTM_NEWTFILTER (Y, lockless)
| tcf_chain0_head_change(B)
| ! mini_qdisc_pair_swap(B)
tcf_block_release(A) 0 |
qdisc_destroy(A) |
tcf_chain0_head_change_cb_del(A) |
! mini_qdisc_pair_swap(A) |
| |
... ...

As we can see there're interleaving mini_qdisc_pair_swap() calls between
Qdisc A and B, causing all kinds of troubles, including the UAF (thread
2 writing to mini Qdisc a1's rcu_state after Qdisc A has already been
freed) reported by syzbot.

To fix this, I'm cooking a patch that, when replacing ingress (clsact)
Qdiscs, in qdisc_graft():

I. We should make sure there's no on-the-fly lockless filter requests
for the old Qdisc, and return -EBUSY if there's any (or can/should
we wait in RTM_NEWQDISC handler?)

II. We should destory the old Qdisc before publishing the new one
(i.e. setting it to dev_ingress_queue(dev)->qdisc_sleeping, so
that subsequent filter requests can see it), because
{ingress,clsact}_destroy() also call mini_qdisc_pair_swap(), which
sets *miniq_{in,e}gress to NULL

Future Qdiscs that support RTNL-lockless cls_ops, if any, won't need
this fix, as long as their ->chain_head_change() don't access
out-of-Qdisc-scope data, like pointers in struct net_device.

Do you think this is the right way to go? Thanks!

[1] Thanks Hillf Danton for the hint:
https://syzkaller.appspot.com/text?tag=Patch&x=10d7cd5bc80000

Thanks,
Peilin Ye

Pedro Tammela

unread,

Apr 26, 2023, 10:31:15 PM4/26/23

to Peilin Ye, Jamal Hadi Salim, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, vla...@nvidia.com, hda...@sina.com

Thanks for the analysis. It makes total sense.
After going through the call chains, please correct me if my ELI5 is wrong:

'clsact_init()' is called for B when dev has miniq_ingress set to 'A',
therefore copying a pointer to the miniq_qdisc with lifetime bound to
'A' in a miniq_qdisc_pair with lifetime bound to 'B' therefore raising
an UAF after A is destroyed and B is manipulated.

>
> To fix this, I'm cooking a patch that, when replacing ingress (clsact)
> Qdiscs, in qdisc_graft():
>
> I. We should make sure there's no on-the-fly lockless filter requests
> for the old Qdisc, and return -EBUSY if there's any (or can/should
> we wait in RTM_NEWQDISC handler?

Makes sense.

>
> II. We should destory the old Qdisc before publishing the new one
> (i.e. setting it to dev_ingress_queue(dev)->qdisc_sleeping, so
> that subsequent filter requests can see it), because
> {ingress,clsact}_destroy() also call mini_qdisc_pair_swap(), which
> sets *miniq_{in,e}gress to NULL >
> Future Qdiscs that support RTNL-lockless cls_ops, if any, won't need
> this fix, as long as their ->chain_head_change() don't access
> out-of-Qdisc-scope data, like pointers in struct net_device.

Probably worth a comment somewhere in the code

Vlad Buslov

unread,

Apr 27, 2023, 8:50:55 AM4/27/23

to Peilin Ye, Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, hda...@sina.com

Hi Peilin,

Hmm, didn't realize it was the case.

Great analysis! However, it is still not quite clear to me how threads 1
and 2 access each other RCU state when q->miniqp is a private memory of
the Qdisc, so 1 should only see A->miniqp and 2 only B->miniqp. And both
miniqps should be protected from deallocation by reference that lockless
RTM_NEWTFILTER obtains.

>
> To fix this, I'm cooking a patch that, when replacing ingress (clsact)
> Qdiscs, in qdisc_graft():
>
> I. We should make sure there's no on-the-fly lockless filter requests
> for the old Qdisc, and return -EBUSY if there's any (or can/should
> we wait in RTM_NEWQDISC handler?)
>
> II. We should destory the old Qdisc before publishing the new one
> (i.e. setting it to dev_ingress_queue(dev)->qdisc_sleeping, so
> that subsequent filter requests can see it), because
> {ingress,clsact}_destroy() also call mini_qdisc_pair_swap(), which
> sets *miniq_{in,e}gress to NULL

Another approach would be to somehow detect concurrent Qdisc replace and
return -EAGAIN from tcf_chain_tp_insert() before calling
tcf_chain0_head_change(). This would leverage existing cls_api
functionality that automatically retries after releasing all references
to chain/tp and obtaining them again instead of messing with qdisc api.
However, since I still didn't fully grasp the issue it is hard for me to
reason whether such approach would be possible to implement in this
case.

Peilin Ye

unread,

Apr 27, 2023, 1:35:59 PM4/27/23

to Vlad Buslov, Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, hda...@sina.com

Hi Pedro, Vlad,

On Thu, Apr 27, 2023 at 03:26:03PM +0300, Vlad Buslov wrote:
> On Wed 26 Apr 2023 at 16:42, Peilin Ye <yepei...@gmail.com> wrote:
> > As we can see there're interleaving mini_qdisc_pair_swap() calls between
> > Qdisc A and B, causing all kinds of troubles, including the UAF (thread
> > 2 writing to mini Qdisc a1's rcu_state after Qdisc A has already been
> > freed) reported by syzbot.
>
> Great analysis! However, it is still not quite clear to me how threads 1
> and 2 access each other RCU state when q->miniqp is a private memory of
> the Qdisc, so 1 should only see A->miniqp and 2 only B->miniqp. And both
> miniqps should be protected from deallocation by reference that lockless
> RTM_NEWTFILTER obtains.

Thanks for taking a look!

To elaborate, p_miniq is a pointer of pointer of struct mini_Qdisc,
initialized in ingress_init() to point to eth0->miniq_ingress, which
isn't private to A or B.

In other words, both A->miniqp->p_miniq and B->miniqp->p_miniq point to
eth0->miniq_ingress.

For your reference, roughly speaking, mini_qdisc_pair_swap() does this:

miniq_old = dev->miniq_ingress;

if (destroying) {
dev->miniq_ingress = NULL;
} else {
rcu_wait();
dev->miniq_ingress = miniq_new;
}

if (miniq_old)
miniq_old->rcu_state = ...

On Wed 26 Apr 2023 at 16:42, Peilin Ye <yepei...@gmail.com> wrote:
> Thread 1 A's refcnt Thread 2
> RTM_NEWQDISC (A, locked)
> qdisc_create(A) 1
> qdisc_graft(A) 9
>
> RTM_NEWTFILTER (X, lockless)
> __tcf_qdisc_find(A) 10
> tcf_chain0_head_change(A)
> ! mini_qdisc_pair_swap(A)

1. A adds its first filter,
miniq_old (eth0->miniq_ingress) is NULL,
RCU wait starts,
RCU wait ends,
change eth0->miniq_ingress to A's mini Qdisc.

2. B adds its first filter,
miniq_old (eth0->miniq_ingress) is A's mini Qdisc,
RCU wait starts,

> tcf_block_release(A) 0 |
> qdisc_destroy(A) |
> tcf_chain0_head_change_cb_del(A) |
> ! mini_qdisc_pair_swap(A) |

3. A destroys itself,
miniq_old (eth0->miniq_ingress) is A's mini Qdisc,
(destroying, so no RCU wait)
change eth0->miniq_ingress to NULL,
update miniq_old, or A's mini Qdisc's RCU state,
A is freed.

2. RCU wait ends,
change eth0->miniq_ingress to B's mini Qdisc,
use-after-free: update miniq_old, or A's mini Qdisc's RCU state.

I hope this helps. Sorry I didn't go into details; this UAF isn't the
only thing that is unacceptable here:

Consider B. We add a filter Y to B, expecting ingress packets on eth0
to go through Y. Then all of a sudden, A sets eth0->miniq_ingress to
NULL during its destruction, so packets will not find Y at all on
datapath (sch_handle_ingress()). New filter becomes invisible - this is
already buggy enough :-/

So I think B's first call to mini_qdisc_pair_swap() should happen after
A's last call (in ingress_destroy()), which is what I am trying to
achieve here.

Thanks,
Peilin Ye

Vlad Buslov

unread,

Apr 28, 2023, 8:44:40 AM4/28/23

to Peilin Ye, Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, hda...@sina.com

Thanks for the clarification.

>
> I hope this helps. Sorry I didn't go into details; this UAF isn't the
> only thing that is unacceptable here:
>
> Consider B. We add a filter Y to B, expecting ingress packets on eth0
> to go through Y. Then all of a sudden, A sets eth0->miniq_ingress to
> NULL during its destruction, so packets will not find Y at all on
> datapath (sch_handle_ingress()). New filter becomes invisible - this is
> already buggy enough :-/
>
> So I think B's first call to mini_qdisc_pair_swap() should happen after
> A's last call (in ingress_destroy()), which is what I am trying to
> achieve here.

Makes sense to me.

Pedro Tammela

unread,

May 24, 2023, 10:32:04 AM5/24/23

to syzbot+b53a9c...@syzkaller.appspotmail.com, da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com

#syz test: git://gitlab.com/tammela/net.git peilin-patches

Double checking with syzbot

syzbot

unread,

May 24, 2023, 11:02:37 AM5/24/23

to da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, pcta...@mojatatu.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com

Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to checkout kernel repo git://gitlab.com/tammela/net.git/peilin-patches: failed to run ["git" "fetch" "--force" "71d757925c19d8f23c660d1e07af98f28b9c6977" "peilin-patches"]: exit status 128
fatal: read error: Connection reset by peer

Tested on:

commit: [unknown
git tree: git://gitlab.com/tammela/net.git peilin-patches

dashboard link: https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
compiler:

Note: no patches were applied.

Pedro Tammela

unread,

May 24, 2023, 11:05:49 AM5/24/23

to syzbot+b53a9c...@syzkaller.appspotmail.com, da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com

#syz test: https://gitlab.com/tammela/net.git peilin-patches
Let's try with https then...

syzbot

unread,

May 24, 2023, 11:34:28 AM5/24/23

to da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, pcta...@mojatatu.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+b53a9c...@syzkaller.appspotmail.com

Tested on:

commit: 6078d01d net/sched: qdisc_destroy() old ingress and cl..
git tree: https://gitlab.com/tammela/net.git peilin-patches
console output: https://syzkaller.appspot.com/x/log.txt?x=111cf24d280000
kernel config: https://syzkaller.appspot.com/x/.config?x=b22b5699e8595bcd
dashboard link: https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b

compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Note: no patches were applied.

Reply all

Reply to author

Forward

[syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap

syzbot

Jakub Kicinski

Pedro Tammela

Jamal Hadi Salim

Hillf Danton

syzbot

Peilin Ye

Hillf Danton

syzbot

Hillf Danton

Peilin Ye

Pedro Tammela

Vlad Buslov

Peilin Ye

Vlad Buslov

Pedro Tammela

syzbot

Pedro Tammela

syzbot