[syzbot] [net?] KASAN: slab-use-after-free Write in mini_qdisc_pair_swap

64 views
Skip to first unread message

syzbot

unread,
Mar 23, 2023, 8:52:41 PM3/23/23
to da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com
Hello,

syzbot found the following issue on:

HEAD commit: fff5a5e7f528 Merge tag 'for-linus' of git://git.armlinux.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11884731c80000
kernel config: https://syzkaller.appspot.com/x/.config?x=aaa4b45720ca0519
dashboard link: https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15d1497ac80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11eed636c80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/33a184f98b9d/disk-fff5a5e7.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/3d75f967571e/vmlinux-fff5a5e7.xz
kernel image: https://storage.googleapis.com/syzbot-assets/4eeb8edbdc7e/bzImage-fff5a5e7.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b53a9c...@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
Write of size 8 at addr ffff888045b31308 by task syz-executor690/14901

CPU: 0 PID: 14901 Comm: syz-executor690 Not tainted 6.3.0-rc3-syzkaller-00026-gfff5a5e7f528 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
print_report mm/kasan/report.c:430 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:536
mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
tc_new_tfilter+0x1de6/0x2290 net/sched/cls_api.c:2266
rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f4f11222579
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 31 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f4f11184208 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f4f112ab2a8 RCX: 00007f4f11222579
RDX: 040000000000009f RSI: 00000000200002c0 RDI: 0000000000000007
RBP: 00007f4f112ab2a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4f112ab2ac
R13: 00007fffc8e5214f R14: 00007f4f11184300 R15: 0000000000022000
</TASK>

Allocated by task 14898:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
____kasan_kmalloc mm/kasan/common.c:333 [inline]
__kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383
kasan_kmalloc include/linux/kasan.h:196 [inline]
__do_kmalloc_node mm/slab_common.c:967 [inline]
__kmalloc_node+0x61/0x1a0 mm/slab_common.c:974
kmalloc_node include/linux/slab.h:610 [inline]
kzalloc_node include/linux/slab.h:731 [inline]
qdisc_alloc+0xb0/0xb30 net/sched/sch_generic.c:938
qdisc_create+0xce/0x1040 net/sched/sch_api.c:1244
tc_modify_qdisc+0x488/0x1a40 net/sched/sch_api.c:1680
rtnetlink_rcv_msg+0x43d/0xd50 net/core/rtnetlink.c:6174
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 21:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:521
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200
kasan_slab_free include/linux/kasan.h:162 [inline]
slab_free_hook mm/slub.c:1781 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1807
slab_free mm/slub.c:3787 [inline]
__kmem_cache_free+0xaf/0x2d0 mm/slub.c:3800
rcu_do_batch kernel/rcu/tree.c:2112 [inline]
rcu_core+0x814/0x1960 kernel/rcu/tree.c:2372
__do_softirq+0x1d4/0x905 kernel/softirq.c:571

Last potentially related work creation:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
__kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:491
__call_rcu_common.constprop.0+0x99/0x7e0 kernel/rcu/tree.c:2622
qdisc_put_unlocked+0x73/0x90 net/sched/sch_generic.c:1097
tcf_block_release+0x86/0x90 net/sched/cls_api.c:1362
tc_new_tfilter+0xa35/0x2290 net/sched/cls_api.c:2331
rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff888045b31000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 776 bytes inside of
freed 1024-byte region [ffff888045b31000, ffff888045b31400)

The buggy address belongs to the physical page:
page:ffffea000116cc00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888045b37000 pfn:0x45b30
head:ffffea000116cc00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffff888012441dc0 0000000000000000 dead000000000001
raw: ffff888045b37000 000000008010000d 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 14184, tgid 14168 (syz-executor690), ts 420322459064, free_ts 15493742100
prep_new_page mm/page_alloc.c:2552 [inline]
get_page_from_freelist+0x1190/0x2e20 mm/page_alloc.c:4325
__alloc_pages+0x1cb/0x4a0 mm/page_alloc.c:5591
alloc_pages+0x1aa/0x270 mm/mempolicy.c:2283
alloc_slab_page mm/slub.c:1851 [inline]
allocate_slab+0x25f/0x390 mm/slub.c:1998
new_slab mm/slub.c:2051 [inline]
___slab_alloc+0xa91/0x1400 mm/slub.c:3193
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3292
__slab_alloc_node mm/slub.c:3345 [inline]
slab_alloc_node mm/slub.c:3442 [inline]
__kmem_cache_alloc_node+0x136/0x320 mm/slub.c:3491
kmalloc_trace+0x26/0xe0 mm/slab_common.c:1061
kmalloc include/linux/slab.h:580 [inline]
kmalloc_array include/linux/slab.h:635 [inline]
kcalloc include/linux/slab.h:667 [inline]
fl_change+0x1cf/0x4ac0 net/sched/cls_flower.c:2175
tc_new_tfilter+0x97c/0x2290 net/sched/cls_api.c:2310
rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1453 [inline]
free_pcp_prepare+0x5d5/0xa50 mm/page_alloc.c:1503
free_unref_page_prepare mm/page_alloc.c:3387 [inline]
free_unref_page+0x1d/0x490 mm/page_alloc.c:3482
free_contig_range+0xb5/0x180 mm/page_alloc.c:9531
destroy_args+0x6c4/0x920 mm/debug_vm_pgtable.c:1023
debug_vm_pgtable+0x242a/0x4640 mm/debug_vm_pgtable.c:1403
do_one_initcall+0x102/0x540 init/main.c:1310
do_initcall_level init/main.c:1383 [inline]
do_initcalls init/main.c:1399 [inline]
do_basic_setup init/main.c:1418 [inline]
kernel_init_freeable+0x696/0xc00 init/main.c:1638
kernel_init+0x1e/0x2c0 init/main.c:1526
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

Memory state around the buggy address:
ffff888045b31200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888045b31280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888045b31300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888045b31380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888045b31400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Jakub Kicinski

unread,
Mar 28, 2023, 9:47:37 PM3/28/23
to syzbot, Seth Forshee, da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com
Seth, does this looks related to commit 267463823adb ("net: sch:
eliminate unnecessary RCU waits in mini_qdisc_pair_swap()")
by any chance?

Pedro Tammela

unread,
Mar 29, 2023, 6:17:22 PM3/29/23
to Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com
On 29/03/2023 00:37, Seth Forshee wrote:
> On Tue, Mar 28, 2023 at 06:47:33PM -0700, Jakub Kicinski wrote:
>> Seth, does this looks related to commit 267463823adb ("net: sch:
>> eliminate unnecessary RCU waits in mini_qdisc_pair_swap()")
>> by any chance?
>
> I don't see how it could be. The memory being written is part of the
> qdisc private memory, and tc_new_tfilter() takes a reference to the
> qdisc. If that memory has been freed doesn't it mean that something has
> done an unbalanced qdisc_put()?
>

Reverting Seth's patches (85c0c3eb9a66 and 267463823adb) leads to these
traces with the reproducer:
[ 52.704956][ C0] ------------[ cut here ]------------
[ 52.705568][ C0] ODEBUG: free active (active state 1) object:0
[ 52.706542][ C0] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c0
[ 52.707283][ C0] Modules linked in:
[ 52.707602][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0
[ 52.708304][ C0] Hardware name: QEMU Standard PC (i440FX + PI4
[ 52.709032][ C0] RIP: 0010:debug_print_object+0x196/0x290
[ 52.709509][ C0] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85e
[ 52.711011][ C0] RSP: 0018:ffffc90000007cd0 EFLAGS: 00010282
[ 52.711510][ C0] RAX: 0000000000000000 RBX: 0000000000000003 0
[ 52.712125][ C0] RDX: ffffffff8c495800 RSI: ffffffff814b96d7 1
[ 52.712748][ C0] RBP: 0000000000000001 R08: 0000000000000001 0
[ 52.713370][ C0] R10: 0000000000000000 R11: 203a47554245444f 0
[ 52.713983][ C0] R13: ffffffff8aa6e960 R14: 0000000000000000 8
[ 52.714609][ C0] FS: 0000000000000000(0000) GS:ffff8881f5a000
[ 52.715356][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008003
[ 52.715863][ C0] CR2: 000055914686f020 CR3: 000000011e856000 0
[ 52.716486][ C0] Call Trace:
[ 52.716742][ C0] <IRQ>
[ 52.716969][ C0] debug_check_no_obj_freed+0x302/0x420
[ 52.717423][ C0] slab_free_freelist_hook+0xec/0x1c0
[ 52.717848][ C0] ? rcu_core+0x818/0x1930
[ 52.718204][ C0] __kmem_cache_free+0xaf/0x2e0
[ 52.718590][ C0] rcu_core+0x818/0x1930
[ 52.718938][ C0] ? rcu_report_dead+0x610/0x610
[ 52.719328][ C0] __do_softirq+0x1d4/0x8ef
[ 52.719689][ C0] __irq_exit_rcu+0x11d/0x190
[ 52.720062][ C0] irq_exit_rcu+0x9/0x20
[ 52.720402][ C0] sysvec_apic_timer_interrupt+0x97/0xc0
[ 52.720842][ C0] </IRQ>
[ 52.721070][ C0] <TASK>
[ 52.721300][ C0] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 52.721779][ C0] RIP: 0010:default_idle+0xf/0x20
[ 52.722172][ C0] Code: 89 07 49 c7 c0 08 00 00 00 4d 29 c8 4c5
[ 52.723631][ C0] RSP: 0018:ffffffff8c407e30 EFLAGS: 00000202
[ 52.724096][ C0] RAX: 000000000007897f RBX: 0000000000000000 6
[ 52.724702][ C0] RDX: 0000000000000000 RSI: 0000000000000001 0
[ 52.725335][ C0] RBP: ffffffff8c495800 R08: 0000000000000001 b
[ 52.725957][ C0] R10: ffffed103eb46d95 R11: 0000000000000000 0
[ 52.726550][ C0] R13: 0000000000000000 R14: ffffffff8e7834d0 0
[ 52.727162][ C0] ? ct_kernel_exit+0x1d6/0x240
[ 52.727542][ C0] default_idle_call+0x67/0xa0
[ 52.727912][ C0] do_idle+0x31e/0x3e0
[ 52.728241][ C0] ? arch_cpu_idle_exit+0x30/0x30
[ 52.728635][ C0] cpu_startup_entry+0x18/0x20
[ 52.729006][ C0] rest_init+0x16d/0x2b0
[ 52.729338][ C0] ? regulator_has_full_constraints+0x9/0x20
[ 52.729815][ C0] ? trace_init_perf_perm_irq_work_exit+0x20/00
[ 52.730309][ C0] arch_call_rest_init+0x13/0x30
[ 52.730703][ C0] start_kernel+0x352/0x4c0
[ 52.731087][ C0] secondary_startup_64_no_verify+0xce/0xdb
[ 52.731611][ C0] </TASK>
[ 52.731870][ C0] Kernel panic - not syncing: kernel: panic_on.
[ 52.732445][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0
[ 52.733140][ C0] Hardware name: QEMU Standard PC (i440FX + PI4
[ 52.733867][ C0] Call Trace:
[ 52.734143][ C0] <IRQ>
[ 52.734380][ C0] dump_stack_lvl+0xd9/0x150
[ 52.734769][ C0] panic+0x684/0x730
[ 52.735082][ C0] ? panic_smp_self_stop+0x90/0x90
[ 52.735322][ C0] ? show_trace_log_lvl+0x285/0x390
[ 52.735322][ C0] ? debug_print_object+0x196/0x290
[ 52.735322][ C0] check_panic_on_warn+0xb1/0xc0
[ 52.735322][ C0] __warn+0xf2/0x390
[ 52.735322][ C0] ? debug_print_object+0x196/0x290
[ 52.735322][ C0] report_bug+0x2dd/0x500
[ 52.735322][ C0] handle_bug+0x3c/0x70
[ 52.735322][ C0] exc_invalid_op+0x18/0x50
[ 52.735322][ C0] asm_exc_invalid_op+0x1a/0x20
[ 52.735322][ C0] RIP: 0010:debug_print_object+0x196/0x290
[ 52.735322][ C0] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85e
[ 52.735322][ C0] RSP: 0018:ffffc90000007cd0 EFLAGS: 00010282
[ 52.735322][ C0] RAX: 0000000000000000 RBX: 0000000000000003 0
[ 52.735322][ C0] RDX: ffffffff8c495800 RSI: ffffffff814b96d7 1
[ 52.735322][ C0] RBP: 0000000000000001 R08: 0000000000000001 0
[ 52.735322][ C0] R10: 0000000000000000 R11: 203a47554245444f 0
[ 52.735322][ C0] R13: ffffffff8aa6e960 R14: 0000000000000000 8
[ 52.735322][ C0] ? __warn_printk+0x187/0x310
[ 52.735322][ C0] debug_check_no_obj_freed+0x302/0x420
[ 52.735322][ C0] slab_free_freelist_hook+0xec/0x1c0
[ 52.735322][ C0] ? rcu_core+0x818/0x1930
[ 52.735322][ C0] __kmem_cache_free+0xaf/0x2e0
[ 52.735322][ C0] rcu_core+0x818/0x1930
[ 52.735322][ C0] ? rcu_report_dead+0x610/0x610
[ 52.735322][ C0] __do_softirq+0x1d4/0x8ef
[ 52.735322][ C0] __irq_exit_rcu+0x11d/0x190
[ 52.735322][ C0] irq_exit_rcu+0x9/0x20
[ 52.735322][ C0] sysvec_apic_timer_interrupt+0x97/0xc0
[ 52.735322][ C0] </IRQ>
[ 52.735322][ C0] <TASK>
[ 52.735322][ C0] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 52.735322][ C0] RIP: 0010:default_idle+0xf/0x20
[ 52.735322][ C0] Code: 89 07 49 c7 c0 08 00 00 00 4d 29 c8 4c5
[ 52.735322][ C0] RSP: 0018:ffffffff8c407e30 EFLAGS: 00000202
[ 52.735322][ C0] RAX: 000000000007897f RBX: 0000000000000000 6
[ 52.735322][ C0] RDX: 0000000000000000 RSI: 0000000000000001 0
[ 52.735322][ C0] RBP: ffffffff8c495800 R08: 0000000000000001 b
[ 52.735322][ C0] R10: ffffed103eb46d95 R11: 0000000000000000 0
[ 52.735322][ C0] R13: 0000000000000000 R14: ffffffff8e7834d0 0
[ 52.735322][ C0] ? ct_kernel_exit+0x1d6/0x240
[ 52.735322][ C0] default_idle_call+0x67/0xa0
[ 52.735322][ C0] do_idle+0x31e/0x3e0
[ 52.735322][ C0] ? arch_cpu_idle_exit+0x30/0x30
[ 52.735322][ C0] cpu_startup_entry+0x18/0x20
[ 52.735322][ C0] rest_init+0x16d/0x2b0
[ 52.735322][ C0] ? regulator_has_full_constraints+0x9/0x20
[ 52.735322][ C0] ? trace_init_perf_perm_irq_work_exit+0x20/00
[ 52.735322][ C0] arch_call_rest_init+0x13/0x30
[ 52.735322][ C0] start_kernel+0x352/0x4c0
[ 52.735322][ C0] secondary_startup_64_no_verify+0xce/0xdb
[ 52.735322][ C0] </TASK>
[ 52.735322][ C0] Kernel Offset: disabled
[ 52.735322][ C0] Rebooting in 86400 seconds..


Jamal Hadi Salim

unread,
Apr 3, 2023, 12:00:47 PM4/3/23
to Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com
To provide more update:
Happens on single processor before Seth's patches; and only on
multi-processor after Seth's patches.
Theory is: there is a logic bug in the miniqdisc rcu visibility. Feels
like the freeing of the structure is done without rcu involvement.
Jiri/Cong maybe you can take a look since youve been dabbling in
miniqdisc? The reproducer worked for me and Pedro 100% of the time.

cheers,
jamal

Hillf Danton

unread,
Apr 17, 2023, 10:26:14 PM4/17/23
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 23 Mar 2023 17:52:40 -0700
> HEAD commit: fff5a5e7f528 Merge tag 'for-linus' of git://git.armlinux.o..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11eed636c80000

Bail out if no Qdisc is found.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git fff5a5e7f528

--- x/net/sched/cls_api.c
+++ y/net/sched/cls_api.c
@@ -2154,7 +2154,6 @@ replay:
tp = NULL;
cl = 0;
block = NULL;
- q = NULL;
chain = NULL;
flags = 0;

@@ -2173,9 +2172,12 @@ replay:

/* Find head of filter chain. */

+ q = NULL;
err = __tcf_qdisc_find(net, &q, &parent, t->tcm_ifindex, false, extack);
if (err)
return err;
+ if (!q)
+ return -ENOENT;

if (tcf_proto_check_kind(tca[TCA_KIND], name)) {
NL_SET_ERR_MSG(extack, "Specified TC filter name too long");
--

syzbot

unread,
Apr 17, 2023, 11:07:19 PM4/17/23
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
KASAN: slab-use-after-free Write in mini_qdisc_pair_swap

==================================================================
BUG: KASAN: slab-use-after-free in mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
Write of size 8 at addr ffff88806348a308 by task syz-executor.0/8529

CPU: 1 PID: 8529 Comm: syz-executor.0 Not tainted 6.3.0-rc3-syzkaller-00026-gfff5a5e7f528-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_address_description.constprop.0+0x2c/0x3c0 mm/kasan/report.c:319
print_report mm/kasan/report.c:430 [inline]
kasan_report+0x11c/0x130 mm/kasan/report.c:536
mini_qdisc_pair_swap+0x1c2/0x1f0 net/sched/sch_generic.c:1573
tcf_chain_head_change_item net/sched/cls_api.c:495 [inline]
tcf_chain0_head_change.isra.0+0xb9/0x120 net/sched/cls_api.c:509
tcf_chain_tp_insert net/sched/cls_api.c:1826 [inline]
tcf_chain_tp_insert_unique net/sched/cls_api.c:1875 [inline]
tc_new_tfilter+0x1d77/0x2200 net/sched/cls_api.c:2268
rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ffaaee8c0f9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffaafc5e168 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007ffaaefac120 RCX: 00007ffaaee8c0f9
RDX: 040000000000009f RSI: 00000000200002c0 RDI: 0000000000000007
RBP: 00007ffaaeee7b39 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffe97997fbf R14: 00007ffaafc5e300 R15: 0000000000022000
</TASK>

Allocated by task 8524:
tc_new_tfilter+0xa2b/0x2200 net/sched/cls_api.c:2333
rtnetlink_rcv_msg+0x996/0xd50 net/core/rtnetlink.c:6165
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x925/0xe30 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg+0xde/0x190 net/socket.c:747
____sys_sendmsg+0x334/0x900 net/socket.c:2501
___sys_sendmsg+0x110/0x1b0 net/socket.c:2555
__sys_sendmmsg+0x18f/0x460 net/socket.c:2641
__do_sys_sendmmsg net/socket.c:2670 [inline]
__se_sys_sendmmsg net/socket.c:2667 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Second to last potentially related work creation:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
__kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:491
__call_rcu_common.constprop.0+0x99/0x7e0 kernel/rcu/tree.c:2622
rhashtable_rehash_table lib/rhashtable.c:348 [inline]
rht_deferred_worker+0xb24/0x1ce0 lib/rhashtable.c:432
process_one_work+0x991/0x15c0 kernel/workqueue.c:2390
worker_thread+0x669/0x1090 kernel/workqueue.c:2537
kthread+0x2e8/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

The buggy address belongs to the object at ffff88806348a000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 776 bytes inside of
freed 1024-byte region [ffff88806348a000, ffff88806348a400)

The buggy address belongs to the physical page:
page:ffffea00018d2200 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x63488
head:ffffea00018d2200 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffff888012441dc0 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 8117, tgid 8111 (syz-executor.3), ts 126042830210, free_ts 15956815362
prep_new_page mm/page_alloc.c:2552 [inline]
get_page_from_freelist+0x1190/0x2e20 mm/page_alloc.c:4325
__alloc_pages+0x1cb/0x4a0 mm/page_alloc.c:5591
alloc_pages+0x1aa/0x270 mm/mempolicy.c:2283
alloc_slab_page mm/slub.c:1851 [inline]
allocate_slab+0x25f/0x390 mm/slub.c:1998
new_slab mm/slub.c:2051 [inline]
___slab_alloc+0xa91/0x1400 mm/slub.c:3193
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3292
__slab_alloc_node mm/slub.c:3345 [inline]
slab_alloc_node mm/slub.c:3442 [inline]
__kmem_cache_alloc_node+0x136/0x320 mm/slub.c:3491
kmalloc_trace+0x26/0xe0 mm/slab_common.c:1061
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
fl_init+0x45/0x2c0 net/sched/cls_flower.c:351
tcf_proto_create net/sched/cls_api.c:398 [inline]
tc_new_tfilter+0xecf/0x2200 net/sched/cls_api.c:2260
ffff88806348a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88806348a280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88806348a300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88806348a380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88806348a400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


Tested on:

commit: fff5a5e7 Merge tag 'for-linus' of git://git.armlinux.o..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=10cf71c0280000
kernel config: https://syzkaller.appspot.com/x/.config?x=ea09b0836073ee4
dashboard link: https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=16fa5f23c80000

Peilin Ye

unread,
Apr 18, 2023, 2:46:44 AM4/18/23
to Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com
Hi all,

On Mon, Apr 03, 2023 at 11:58:44AM -0400, Jamal Hadi Salim wrote:
> To provide more update:
> Happens on single processor before Seth's patches; and only on
> multi-processor after Seth's patches.
> Theory is: there is a logic bug in the miniqdisc rcu visibility. Feels
> like the freeing of the structure is done without rcu involvement.
> Jiri/Cong maybe you can take a look since youve been dabbling in
> miniqdisc? The reproducer worked for me and Pedro 100% of the time.

I also reproduced this UAF using the syzkaller reproducer in the report
(the C reproducer did not work for me for unknown reasons). I will look
into this.

Thanks,
Peilin Ye

Hillf Danton

unread,
Apr 18, 2023, 5:22:03 AM4/18/23
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 23 Mar 2023 17:52:40 -0700
> HEAD commit: fff5a5e7f528 Merge tag 'for-linus' of git://git.armlinux.o..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11eed636c80000

Roll back to the locked qdisc ops.
--- x/net/sched/sch_ingress.c
+++ y/net/sched/sch_ingress.c
@@ -121,7 +121,6 @@ nla_put_failure:
}

static const struct Qdisc_class_ops ingress_class_ops = {
- .flags = QDISC_CLASS_OPS_DOIT_UNLOCKED,
.leaf = ingress_leaf,
.find = ingress_find,
.walk = ingress_walk,
--

syzbot

unread,
Apr 18, 2023, 5:44:27 AM4/18/23
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+b53a9c...@syzkaller.appspotmail.com

Tested on:

commit: fff5a5e7 Merge tag 'for-linus' of git://git.armlinux.o..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1658d89fc80000
kernel config: https://syzkaller.appspot.com/x/.config?x=ea09b0836073ee4
dashboard link: https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=10d7cd5bc80000

Note: testing is done by a robot and is best-effort only.

Hillf Danton

unread,
Apr 18, 2023, 9:03:53 AM4/18/23
to Jamal Hadi Salim, Pedro Tammela, Seth Forshee, syzbot, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com
On Mon, 3 Apr 2023 11:58:44 -0400 Jamal Hadi Salim <j...@mojatatu.com>
> To provide more update:
> Happens on single processor before Seth's patches; and only on
> multi-processor after Seth's patches.
> Theory is: there is a logic bug in the miniqdisc rcu visibility. Feels
> like the freeing of the structure is done without rcu involvement.
> Jiri/Cong maybe you can take a look since youve been dabbling in
> miniqdisc? The reproducer worked for me and Pedro 100% of the time.

JFYI reverting 7a096d579e8e ("net: sched: ingress: set 'unlocked' flag for
Qdisc ops") survived the syzbot test [1].

[1] https://lore.kernel.org/lkml/00000000000038...@google.com/

Peilin Ye

unread,
Apr 26, 2023, 7:42:07 PM4/26/23
to Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, yepei...@gmail.com, vla...@nvidia.com, hda...@sina.com
+Cc: Vlad Buslov, Hillf Danton

Hi all,

On Mon, Apr 17, 2023 at 04:00:11PM -0700, Peilin Ye wrote:
> I also reproduced this UAF using the syzkaller reproducer in the report
> (the C reproducer did not work for me for unknown reasons). I will look
> into this.

Currently, multiple ingress (clsact) Qdiscs can access the per-netdev
*miniq_ingress (*miniq_egress) pointer concurrently. This is
unfortunately true in two senses:

1. We allow adding ingress (clsact) Qdiscs under parents other than
TC_H_INGRESS (TC_H_CLSACT):

$ ip link add ifb0 numtxqueues 8 type ifb
$ echo clsact > /proc/sys/net/core/default_qdisc
$ tc qdisc add dev ifb0 handle 1: root mq
$ tc qdisc show dev ifb0
qdisc mq 1: root
qdisc clsact 0: parent 1:8
qdisc clsact 0: parent 1:7
qdisc clsact 0: parent 1:6
qdisc clsact 0: parent 1:5
qdisc clsact 0: parent 1:4
qdisc clsact 0: parent 1:3
qdisc clsact 0: parent 1:2
qdisc clsact 0: parent 1:1

This is obviously racy and should be prohibited. I've started working
on patches to fix this. The syz repro for this UAF adds ingress Qdiscs
under TC_H_ROOT, by the way.

2. After introducing RTNL-lockless RTM_{NEW,DEL,GET}TFILTER requests
[1], it is possible that, when replacing ingress (clsact) Qdiscs, the
old one can access *miniq_{in,e}gress concurrently with the new one. For
example, the syz repro does something like the following:

Thread 1 creates sch_ingress Qdisc A (containing mini Qdisc a1 and a2),
then adds a cls_flower filter X to Qdisc A.

Thread 2 creates sch_ingress Qdisc B (containing mini Qdisc b1 and b2)
to replace Qdisc A, then adds a cls_flower filter Y to Qdisc B.

Device has 8 TXQs.

Thread 1 A's refcnt Thread 2
RTM_NEWQDISC (A, locked)
qdisc_create(A) 1
qdisc_graft(A) 9

RTM_NEWTFILTER (X, lockless)
__tcf_qdisc_find(A) 10
tcf_chain0_head_change(A)
! mini_qdisc_pair_swap(A)
| RTM_NEWQDISC (B, locked)
| 2 qdisc_graft(B)
| 1 notify_and_destroy(A)
|
| RTM_NEWTFILTER (Y, lockless)
| tcf_chain0_head_change(B)
| ! mini_qdisc_pair_swap(B)
tcf_block_release(A) 0 |
qdisc_destroy(A) |
tcf_chain0_head_change_cb_del(A) |
! mini_qdisc_pair_swap(A) |
| |
... ...

As we can see there're interleaving mini_qdisc_pair_swap() calls between
Qdisc A and B, causing all kinds of troubles, including the UAF (thread
2 writing to mini Qdisc a1's rcu_state after Qdisc A has already been
freed) reported by syzbot.

To fix this, I'm cooking a patch that, when replacing ingress (clsact)
Qdiscs, in qdisc_graft():

I. We should make sure there's no on-the-fly lockless filter requests
for the old Qdisc, and return -EBUSY if there's any (or can/should
we wait in RTM_NEWQDISC handler?)

II. We should destory the old Qdisc before publishing the new one
(i.e. setting it to dev_ingress_queue(dev)->qdisc_sleeping, so
that subsequent filter requests can see it), because
{ingress,clsact}_destroy() also call mini_qdisc_pair_swap(), which
sets *miniq_{in,e}gress to NULL

Future Qdiscs that support RTNL-lockless cls_ops, if any, won't need
this fix, as long as their ->chain_head_change() don't access
out-of-Qdisc-scope data, like pointers in struct net_device.

Do you think this is the right way to go? Thanks!

[1] Thanks Hillf Danton for the hint:
https://syzkaller.appspot.com/text?tag=Patch&x=10d7cd5bc80000

Thanks,
Peilin Ye

Pedro Tammela

unread,
Apr 26, 2023, 10:31:15 PM4/26/23
to Peilin Ye, Jamal Hadi Salim, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, vla...@nvidia.com, hda...@sina.com
Thanks for the analysis. It makes total sense.
After going through the call chains, please correct me if my ELI5 is wrong:

'clsact_init()' is called for B when dev has miniq_ingress set to 'A',
therefore copying a pointer to the miniq_qdisc with lifetime bound to
'A' in a miniq_qdisc_pair with lifetime bound to 'B' therefore raising
an UAF after A is destroyed and B is manipulated.

>
> To fix this, I'm cooking a patch that, when replacing ingress (clsact)
> Qdiscs, in qdisc_graft():
>
> I. We should make sure there's no on-the-fly lockless filter requests
> for the old Qdisc, and return -EBUSY if there's any (or can/should
> we wait in RTM_NEWQDISC handler?
Makes sense.

>
> II. We should destory the old Qdisc before publishing the new one
> (i.e. setting it to dev_ingress_queue(dev)->qdisc_sleeping, so
> that subsequent filter requests can see it), because
> {ingress,clsact}_destroy() also call mini_qdisc_pair_swap(), which
> sets *miniq_{in,e}gress to NULL >
> Future Qdiscs that support RTNL-lockless cls_ops, if any, won't need
> this fix, as long as their ->chain_head_change() don't access
> out-of-Qdisc-scope data, like pointers in struct net_device.

Probably worth a comment somewhere in the code

Vlad Buslov

unread,
Apr 27, 2023, 8:50:55 AM4/27/23
to Peilin Ye, Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, hda...@sina.com
Hi Peilin,
Hmm, didn't realize it was the case.
Great analysis! However, it is still not quite clear to me how threads 1
and 2 access each other RCU state when q->miniqp is a private memory of
the Qdisc, so 1 should only see A->miniqp and 2 only B->miniqp. And both
miniqps should be protected from deallocation by reference that lockless
RTM_NEWTFILTER obtains.

>
> To fix this, I'm cooking a patch that, when replacing ingress (clsact)
> Qdiscs, in qdisc_graft():
>
> I. We should make sure there's no on-the-fly lockless filter requests
> for the old Qdisc, and return -EBUSY if there's any (or can/should
> we wait in RTM_NEWQDISC handler?)
>
> II. We should destory the old Qdisc before publishing the new one
> (i.e. setting it to dev_ingress_queue(dev)->qdisc_sleeping, so
> that subsequent filter requests can see it), because
> {ingress,clsact}_destroy() also call mini_qdisc_pair_swap(), which
> sets *miniq_{in,e}gress to NULL

Another approach would be to somehow detect concurrent Qdisc replace and
return -EAGAIN from tcf_chain_tp_insert() before calling
tcf_chain0_head_change(). This would leverage existing cls_api
functionality that automatically retries after releasing all references
to chain/tp and obtaining them again instead of messing with qdisc api.
However, since I still didn't fully grasp the issue it is hard for me to
reason whether such approach would be possible to implement in this
case.

Peilin Ye

unread,
Apr 27, 2023, 1:35:59 PM4/27/23
to Vlad Buslov, Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, hda...@sina.com
Hi Pedro, Vlad,

On Thu, Apr 27, 2023 at 03:26:03PM +0300, Vlad Buslov wrote:
> On Wed 26 Apr 2023 at 16:42, Peilin Ye <yepei...@gmail.com> wrote:
> > As we can see there're interleaving mini_qdisc_pair_swap() calls between
> > Qdisc A and B, causing all kinds of troubles, including the UAF (thread
> > 2 writing to mini Qdisc a1's rcu_state after Qdisc A has already been
> > freed) reported by syzbot.
>
> Great analysis! However, it is still not quite clear to me how threads 1
> and 2 access each other RCU state when q->miniqp is a private memory of
> the Qdisc, so 1 should only see A->miniqp and 2 only B->miniqp. And both
> miniqps should be protected from deallocation by reference that lockless
> RTM_NEWTFILTER obtains.

Thanks for taking a look!

To elaborate, p_miniq is a pointer of pointer of struct mini_Qdisc,
initialized in ingress_init() to point to eth0->miniq_ingress, which
isn't private to A or B.

In other words, both A->miniqp->p_miniq and B->miniqp->p_miniq point to
eth0->miniq_ingress.

For your reference, roughly speaking, mini_qdisc_pair_swap() does this:

miniq_old = dev->miniq_ingress;

if (destroying) {
dev->miniq_ingress = NULL;
} else {
rcu_wait();
dev->miniq_ingress = miniq_new;
}

if (miniq_old)
miniq_old->rcu_state = ...

On Wed 26 Apr 2023 at 16:42, Peilin Ye <yepei...@gmail.com> wrote:
> Thread 1 A's refcnt Thread 2
> RTM_NEWQDISC (A, locked)
> qdisc_create(A) 1
> qdisc_graft(A) 9
>
> RTM_NEWTFILTER (X, lockless)
> __tcf_qdisc_find(A) 10
> tcf_chain0_head_change(A)
> ! mini_qdisc_pair_swap(A)

1. A adds its first filter,
miniq_old (eth0->miniq_ingress) is NULL,
RCU wait starts,
RCU wait ends,
change eth0->miniq_ingress to A's mini Qdisc.

> | RTM_NEWQDISC (B, locked)
> | 2 qdisc_graft(B)
> | 1 notify_and_destroy(A)
> |
> | RTM_NEWTFILTER (Y, lockless)
> | tcf_chain0_head_change(B)
> | ! mini_qdisc_pair_swap(B)

2. B adds its first filter,
miniq_old (eth0->miniq_ingress) is A's mini Qdisc,
RCU wait starts,

> tcf_block_release(A) 0 |
> qdisc_destroy(A) |
> tcf_chain0_head_change_cb_del(A) |
> ! mini_qdisc_pair_swap(A) |

3. A destroys itself,
miniq_old (eth0->miniq_ingress) is A's mini Qdisc,
(destroying, so no RCU wait)
change eth0->miniq_ingress to NULL,
update miniq_old, or A's mini Qdisc's RCU state,
A is freed.

2. RCU wait ends,
change eth0->miniq_ingress to B's mini Qdisc,
use-after-free: update miniq_old, or A's mini Qdisc's RCU state.

I hope this helps. Sorry I didn't go into details; this UAF isn't the
only thing that is unacceptable here:

Consider B. We add a filter Y to B, expecting ingress packets on eth0
to go through Y. Then all of a sudden, A sets eth0->miniq_ingress to
NULL during its destruction, so packets will not find Y at all on
datapath (sch_handle_ingress()). New filter becomes invisible - this is
already buggy enough :-/

So I think B's first call to mini_qdisc_pair_swap() should happen after
A's last call (in ingress_destroy()), which is what I am trying to
achieve here.

Thanks,
Peilin Ye

Vlad Buslov

unread,
Apr 28, 2023, 8:44:40 AM4/28/23
to Peilin Ye, Jamal Hadi Salim, Pedro Tammela, Seth Forshee, Jakub Kicinski, syzbot, da...@davemloft.net, edum...@google.com, ji...@resnulli.us, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com, peil...@bytedance.com, hda...@sina.com
Thanks for the clarification.

>
> I hope this helps. Sorry I didn't go into details; this UAF isn't the
> only thing that is unacceptable here:
>
> Consider B. We add a filter Y to B, expecting ingress packets on eth0
> to go through Y. Then all of a sudden, A sets eth0->miniq_ingress to
> NULL during its destruction, so packets will not find Y at all on
> datapath (sch_handle_ingress()). New filter becomes invisible - this is
> already buggy enough :-/
>
> So I think B's first call to mini_qdisc_pair_swap() should happen after
> A's last call (in ingress_destroy()), which is what I am trying to
> achieve here.

Makes sense to me.

Pedro Tammela

unread,
May 24, 2023, 10:32:04 AM5/24/23
to syzbot+b53a9c...@syzkaller.appspotmail.com, da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com
#syz test: git://gitlab.com/tammela/net.git peilin-patches

Double checking with syzbot

syzbot

unread,
May 24, 2023, 11:02:37 AM5/24/23
to da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, pcta...@mojatatu.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to checkout kernel repo git://gitlab.com/tammela/net.git/peilin-patches: failed to run ["git" "fetch" "--force" "71d757925c19d8f23c660d1e07af98f28b9c6977" "peilin-patches"]: exit status 128
fatal: read error: Connection reset by peer



Tested on:

commit: [unknown
git tree: git://gitlab.com/tammela/net.git peilin-patches
Note: no patches were applied.

Pedro Tammela

unread,
May 24, 2023, 11:05:49 AM5/24/23
to syzbot+b53a9c...@syzkaller.appspotmail.com, da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com
#syz test: https://gitlab.com/tammela/net.git peilin-patches
Let's try with https then...

syzbot

unread,
May 24, 2023, 11:34:28 AM5/24/23
to da...@davemloft.net, edum...@google.com, j...@mojatatu.com, ji...@resnulli.us, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, pab...@redhat.com, pcta...@mojatatu.com, syzkall...@googlegroups.com, xiyou.w...@gmail.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+b53a9c...@syzkaller.appspotmail.com

Tested on:

commit: 6078d01d net/sched: qdisc_destroy() old ingress and cl..
git tree: https://gitlab.com/tammela/net.git peilin-patches
console output: https://syzkaller.appspot.com/x/log.txt?x=111cf24d280000
kernel config: https://syzkaller.appspot.com/x/.config?x=b22b5699e8595bcd
dashboard link: https://syzkaller.appspot.com/bug?extid=b53a9c0d1ea4ad62da8b
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Note: no patches were applied.
Reply all
Reply to author
Forward
0 new messages