[syzbot] KASAN: use-after-free Read in addr_handler (4)

23 views
Skip to first unread message

syzbot

unread,
Sep 15, 2021, 8:41:23 AM9/15/21
to dled...@redhat.com, j...@ziepe.ca, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 926de8c4326c Merge tag 'acpi-5.15-rc1-3' of git://git.kern..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11fd67ed300000
kernel config: https://syzkaller.appspot.com/x/.config?x=37df9ef5660a8387
dashboard link: https://syzkaller.appspot.com/bug?extid=dc3dfba010d7671e05f5
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+dc3dfb...@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in __mutex_lock_common kernel/locking/mutex.c:575 [inline]
BUG: KASAN: use-after-free in __mutex_lock+0x105b/0x12f0 kernel/locking/mutex.c:729
Read of size 8 at addr ffff88803991f3b0 by task kworker/u4:3/158

CPU: 0 PID: 158 Comm: kworker/u4:3 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: ib_addr process_one_req
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description.constprop.0.cold+0x6c/0x309 mm/kasan/report.c:256
__kasan_report mm/kasan/report.c:442 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
__mutex_lock_common kernel/locking/mutex.c:575 [inline]
__mutex_lock+0x105b/0x12f0 kernel/locking/mutex.c:729
addr_handler+0xac/0x470 drivers/infiniband/core/cma.c:3247
process_one_req+0xfa/0x680 drivers/infiniband/core/addr.c:647
process_one_work+0x9bf/0x16b0 kernel/workqueue.c:2297
worker_thread+0x658/0x11f0 kernel/workqueue.c:2444
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Allocated by task 2916:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:434 [inline]
____kasan_kmalloc mm/kasan/common.c:513 [inline]
____kasan_kmalloc mm/kasan/common.c:472 [inline]
__kasan_kmalloc+0xa4/0xd0 mm/kasan/common.c:522
kmalloc include/linux/slab.h:591 [inline]
kzalloc include/linux/slab.h:721 [inline]
__rdma_create_id+0x5b/0x550 drivers/infiniband/core/cma.c:839
rdma_create_user_id+0x79/0xd0 drivers/infiniband/core/cma.c:893
ucma_create_id+0x162/0x360 drivers/infiniband/core/ucma.c:461
ucma_write+0x25c/0x350 drivers/infiniband/core/ucma.c:1732
vfs_write+0x28e/0xae0 fs/read_write.c:592
ksys_write+0x1ee/0x250 fs/read_write.c:647
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 2916:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
____kasan_slab_free mm/kasan/common.c:366 [inline]
____kasan_slab_free mm/kasan/common.c:328 [inline]
__kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
kasan_slab_free include/linux/kasan.h:230 [inline]
slab_free_hook mm/slub.c:1700 [inline]
slab_free_freelist_hook+0x81/0x190 mm/slub.c:1725
slab_free mm/slub.c:3483 [inline]
kfree+0xe4/0x530 mm/slub.c:4543
ucma_close_id drivers/infiniband/core/ucma.c:185 [inline]
ucma_destroy_private_ctx+0x8b3/0xb70 drivers/infiniband/core/ucma.c:576
ucma_destroy_id+0x1e6/0x280 drivers/infiniband/core/ucma.c:614
ucma_write+0x25c/0x350 drivers/infiniband/core/ucma.c:1732
vfs_write+0x28e/0xae0 fs/read_write.c:592
ksys_write+0x1ee/0x250 fs/read_write.c:647
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

Last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe9/0x110 mm/kasan/generic.c:348
__call_rcu kernel/rcu/tree.c:2987 [inline]
call_rcu+0xb1/0x750 kernel/rcu/tree.c:3067
netlink_release+0xdd4/0x1dd0 net/netlink/af_netlink.c:812
__sock_release+0xcd/0x280 net/socket.c:649
sock_close+0x18/0x20 net/socket.c:1314
__fput+0x288/0x9f0 fs/file_table.c:280
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
exit_task_work include/linux/task_work.h:32 [inline]
do_exit+0xbae/0x2a30 kernel/exit.c:825
do_group_exit+0x125/0x310 kernel/exit.c:922
__do_sys_exit_group kernel/exit.c:933 [inline]
__se_sys_exit_group kernel/exit.c:931 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:931
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff88803991f000
which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 944 bytes inside of
2048-byte region [ffff88803991f000, ffff88803991f800)
The buggy address belongs to the page:
page:ffffea0000e64600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x39918
head:ffffea0000e64600 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 dead000000000100 dead000000000122 ffff888010c42000
raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 6632, ts 274175306325, free_ts 0
prep_new_page mm/page_alloc.c:2424 [inline]
get_page_from_freelist+0xa72/0x2f80 mm/page_alloc.c:4153
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5375
alloc_pages+0x1a7/0x300 mm/mempolicy.c:2197
alloc_slab_page mm/slub.c:1763 [inline]
allocate_slab mm/slub.c:1900 [inline]
new_slab+0x319/0x490 mm/slub.c:1963
___slab_alloc+0x921/0xfe0 mm/slub.c:2994
__slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3081
slab_alloc_node mm/slub.c:3172 [inline]
slab_alloc mm/slub.c:3214 [inline]
__kmalloc+0x305/0x320 mm/slub.c:4387
kmalloc_array include/linux/slab.h:631 [inline]
kcalloc include/linux/slab.h:660 [inline]
veth_alloc_queues drivers/net/veth.c:1314 [inline]
veth_dev_init+0x114/0x2e0 drivers/net/veth.c:1341
register_netdevice+0x51e/0x1500 net/core/dev.c:10225
veth_newlink+0x58c/0xb20 drivers/net/veth.c:1726
__rtnl_newlink+0x106d/0x1750 net/core/rtnetlink.c:3458
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
page_owner free stack trace missing

Memory state around the buggy address:
ffff88803991f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88803991f300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88803991f380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88803991f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88803991f480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Jason Gunthorpe

unread,
Sep 15, 2021, 3:36:04 PM9/15/21
to syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, Sep 15, 2021 at 05:41:22AM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 926de8c4326c Merge tag 'acpi-5.15-rc1-3' of git://git.kern..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11fd67ed300000
> kernel config: https://syzkaller.appspot.com/x/.config?x=37df9ef5660a8387
> dashboard link: https://syzkaller.appspot.com/bug?extid=dc3dfba010d7671e05f5
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+dc3dfb...@syzkaller.appspotmail.com

#syz dup: KASAN: use-after-free Write in addr_resolve (2)

Frankly, I still can't figure out how this is happening

RDMA_USER_CM_CMD_RESOLVE_IP triggers a background work and
RDMA_USER_CM_CMD_DESTROY_ID triggers destruction of the memory the
work touches.

rdma_addr_cancel() is supposed to ensure that the work isn't and won't
run.

So to hit this we have to either not call rdma_addr_cancel() when it
is need, or rdma_addr_cancel() has to be broken and continue to allow
the work.

I could find nothing along either path, though rdma_addr_cancel()
relies on some complicated properties of the workqueues I'm not
entirely positive about.

Jason

Dmitry Vyukov

unread,
Sep 16, 2021, 3:43:31 AM9/16/21
to Jason Gunthorpe, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com
I stared at the code, but it's too complex to grasp it all entirely.
There are definitely lots of tricky concurrent state transitions and
potential for unexpected interleavings. My bet would be on some tricky
hard-to-trigger thread interleaving.

The only thing I can think of is adding more WARNINGs to the code to
check more of these assumptions. But I don't know if there are any
useful testable assumptions...

Jason Gunthorpe

unread,
Sep 16, 2021, 9:05:02 AM9/16/21
to Dmitry Vyukov, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com
From a uapi perspective the entire thing is serialized with a mutex..

> The only thing I can think of is adding more WARNINGs to the code to
> check more of these assumptions. But I don't know if there are any
> useful testable assumptions...

Do you have any idea why we can't get a reproduction out of syzkaller
here?

I feel less comfortable with syzkaller's debug output, can you give
some idea what it might be doing concurrently?

Jason

Dmitry Vyukov

unread,
Sep 16, 2021, 10:45:40 AM9/16/21
to Jason Gunthorpe, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
It looks like a very hard to trigger race (few crashes, no reproducer,
but KASAN reports look sensible). That's probably the reason syzkaller
can't create a reproducer.
From the log it looks like it was triggered by one of these programs
below. But I tried to reproduce manually and had no success.
We are currently doing some improvements to race triggering code in
syzkaller, and may try to use this as a litmus test to see if
syzkaller will do any better:
https://github.com/google/syzkaller/issues/612#issuecomment-920961538

Answering your question re what was running concurrently with what.
Each of the syscalls in these programs can run up to 2 times and
ultimately any of these calls can race with any. Potentially syzkaller
can predict values kernel will return (e.g. id's) before kernel
actually returned them. I guess this does not restrict search area for
the bug a lot...


11:16:53 executing program 3:
write$RDMA_USER_CM_CMD_CONNECT(0xffffffffffffffff,
&(0x7f0000000280)={0x6, 0x118, 0xfa00, {{0xfffffff7, 0x6a492eae,
"e0e55819482a40c1c535b72b0bc0bc5e4478995957e1d0fe2311a39ee3960d3488407d52fbef30809118fcbaef590c27d04918aa1348b409d45ba277d9f73bd18868a9c4fde7560288298bde7e9a96c1ef280ca62f4a6f591a2181f2e3d3cf52212fa5ae101aa1bf975763cef32e3a2c73b79d0af1d2e58b82243731e6082cab1cb1c643b7bbec2e6d45bca8a6980f148aaefb71f1933ffa50534b83267139b2324e51ffecb57959bf7e98b60516cebc8f05838a7976cef33b64410626c14dca7dcb22f0902aeb045b88656268a6dd922d6a0e7b7002e8ea90020650dced319050db3130089e5011994d90340a93088e0a8b03ea61ac3f53312342b3d6e038ae",
0xfc, 0xe1, 0xb2, 0xd0, 0x7, 0x40, 0x0, 0x1}}}, 0x120)
r0 = openat$pfkey(0xffffffffffffff9c, &(0x7f00000001c0), 0x80800, 0x0)
r1 = syz_io_uring_setup(0x1c7, &(0x7f0000000080)={0x0, 0x0, 0x0, 0x0,
0x7f, 0x0, r0}, &(0x7f00007b6000/0x2000)=nil,
&(0x7f0000ffd000/0x3000)=nil, &(0x7f0000000140)=<r2=>0x0,
&(0x7f0000000100)=<r3=>0x0)
socketpair$unix(0x1, 0x5, 0x0, &(0x7f0000000040)={0xffffffffffffffff,
<r4=>0xffffffffffffffff})
r5 = openat(0xffffffffffffff9c,
&(0x7f0000000000)='/proc/self/exe\x00', 0x0, 0x0)
mmap(&(0x7f0000000000/0x800000)=nil, 0x800000, 0x0, 0x12, r5, 0x0)
r6 = openat$rdma_cm(0xffffffffffffff9c, &(0x7f0000000540), 0x2, 0x0)
write$RDMA_USER_CM_CMD_CREATE_ID(r6, &(0x7f0000000080)={0x0, 0x18,
0xfa00, {0xffffffffffffffff,
&(0x7f0000000000)={<r7=>0xffffffffffffffff}, 0x13f}}, 0x20)
write$RDMA_USER_CM_CMD_RESOLVE_IP(r6, &(0x7f0000000100)={0x3, 0x40,
0xfa00, {{}, {0xa, 0x0, 0x0, @mcast2}, r7}}, 0x48)
write$RDMA_USER_CM_CMD_RESOLVE_IP(r6, &(0x7f0000000180)={0x3, 0x40,
0xfa00, {{0xa, 0x0, 0x0, @local}, {0xa, 0x0, 0x0, @mcast1}, r7}},
0x48)
write$RDMA_USER_CM_CMD_DESTROY_ID(r6, &(0x7f0000000280)={0x1, 0x10,
0xfa00, {&(0x7f0000000240), r7}}, 0x18)
write$RDMA_USER_CM_CMD_CONNECT(r5, &(0x7f00000003c0)={0x6, 0x118,
0xfa00, {{0x9, 0x9,
"f703ff619e427c1d7d50fc023c22feb64ea5083376891585a4a8b539bead7f61210a9010d88379b67ebe7a1fc77fbdd4dccaec4b498eafe4b08e7e5b28e9fe54606f87e9618b9ade4e28b66e04c73fe4660de33c075bb9b1a43c59e485dcc259fb21fed21380f9ec2c61e8d29b6069786e8bc3da0f3bded0acd13548d2d76af6e701a258307fbce30c0f452b6a25f39209c830fe557de6f1fb3fdfe4347be3a9fdfeaca47b97e333a266013beef7cb7d7ea746bca1d3a929747a269df24d019e3e413309e58095182dd5dc3c8a088e94abf8d5cd389749cc80e4e452c8dabe7eaadd8144e2c4392e35c1b5ad3369ee7b2f855e5ebe9bdc0e8a464e8a9e4f54c0",
0x2, 0xff, 0x1, 0x8f, 0x6, 0x3, 0x6}, r7}}, 0x120)
syz_io_uring_submit(r2, r3,
&(0x7f0000000180)=@IORING_OP_READ=@pass_buffer={0x16, 0x4, 0x0,
@fd=r4, 0x0, &(0x7f0000000000)=""/7, 0x7}, 0x0)
syz_io_uring_submit(r2, r3,
&(0x7f0000002f80)=@IORING_OP_LINK_TIMEOUT={0xf, 0x0, 0x0, 0x0, 0x0,
&(0x7f0000000240)={0x0, 0x3938700}}, 0x10000007)
io_uring_enter(r1, 0x45f5, 0x0, 0x0, 0x0, 0xf5ff)



11:16:55 executing program 4:
r0 = openat$rdma_cm(0xffffffffffffff9c, &(0x7f0000000540), 0x2, 0x0)
write$RDMA_USER_CM_CMD_CREATE_ID(r0, &(0x7f0000000080)={0x0, 0x18,
0xfa00, {0xffffffffffffffff,
&(0x7f0000000000)={<r1=>0xffffffffffffffff}, 0x13f}}, 0x20)
write$RDMA_USER_CM_CMD_RESOLVE_IP(r0, &(0x7f0000000100)={0x3, 0x40,
0xfa00, {{0xa, 0xfffd}, {0xa, 0x0, 0x10000000, @ipv4={'\x00',
'\xff\xff', @broadcast}, 0x3}, r1, 0xfffffffe}}, 0x48)
write$RDMA_USER_CM_CMD_RESOLVE_IP(r0, &(0x7f0000000180)={0x3, 0x40,
0xfa00, {{0xa, 0x2, 0x0, @ipv4={'\x00', '\xff\xff', @multicast2}},
{0xa, 0x0, 0x0, @initdev={0xfe, 0x88, '\x00', 0x1, 0x0}}, r1}}, 0xd5)
write$RDMA_USER_CM_CMD_DESTROY_ID(r0, &(0x7f0000000280)={0x1, 0x10,
0xfa00, {&(0x7f0000000240), r1}}, 0x18)
write$RDMA_USER_CM_CMD_BIND_IP(0xffffffffffffffff,
&(0x7f0000000000)={0x2, 0x28, 0xfa00, {0x0, {0xa, 0x4e23, 0x9,
@ipv4={'\x00', '\xff\xff', @rand_addr=0x64010102}, 0x100}, r1}}, 0x30)
openat$qrtrtun(0xffffffffffffff9c, &(0x7f0000002740), 0x101002)
io_setup(0x8, &(0x7f0000000600)=<r2=>0x0)
clock_getres(0xfffffffffffffffd, 0x0)
r3 = openat$hwrng(0xffffffffffffff9c, &(0x7f0000000040), 0x400, 0x0)
r4 = openat$vcsa(0xffffffffffffff9c, &(0x7f00000002c0), 0x8000, 0x0)
r5 = openat$rdma_cm(0xffffffffffffff9c, &(0x7f0000000540), 0x2, 0x0)
write$RDMA_USER_CM_CMD_CREATE_ID(r5, &(0x7f0000000080)={0x0, 0x18,
0xfa00, {0xffffffffffffffff,
&(0x7f0000000000)={<r6=>0xffffffffffffffff}, 0x13f}}, 0x20)
write$RDMA_USER_CM_CMD_RESOLVE_IP(r5, &(0x7f0000000100)={0x3, 0x40,
0xfa00, {{}, {0xa, 0x0, 0x0, @mcast2}, r6}}, 0x48)
write$RDMA_USER_CM_CMD_RESOLVE_IP(r5, &(0x7f0000000180)={0x3, 0x40,
0xfa00, {{0xa, 0x0, 0x0, @local}, {0xa, 0x0, 0x0, @mcast1}, r6}},
0x48)
write$RDMA_USER_CM_CMD_DESTROY_ID(r5, &(0x7f0000000280)={0x1, 0x10,
0xfa00, {&(0x7f0000000240), r6}}, 0x18)
pipe2(&(0x7f0000000340)={<r7=>0xffffffffffffffff}, 0x800)
write$RDMA_USER_CM_CMD_RESOLVE_IP(r7, &(0x7f0000000380)={0x3, 0x40,
0xfa00, {{0xa, 0x4e23, 0x8, @remote, 0x3ff}, {0xa, 0x4e22, 0x8000,
@initdev={0xfe, 0x88, '\x00', 0x1, 0x0}, 0x8}, r1, 0x1f}}, 0x48)
write$RDMA_USER_CM_CMD_LISTEN(r4, &(0x7f0000000300)={0x7, 0x8, 0xfa00,
{r6, 0x8}}, 0x10)
io_submit(r2, 0x1, &(0x7f0000000200)=[&(0x7f00000000c0)={0x0, 0x0,
0x0, 0x1, 0x0, 0xffffffffffffffff,
&(0x7f0000000400)="03a0a445bc5d7a9d6c", 0x9, 0x7fffffff, 0x0, 0x0,
r3}])

Dmitry Vyukov

unread,
Sep 16, 2021, 10:47:16 AM9/16/21
to Jason Gunthorpe, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
I noticed we also had 2 KCSAN reports that mention rdma_resolve_addr.

On commit 1df0d896:
==================================================================
BUG: KCSAN: data-race in addr_handler / cma_check_port

write to 0xffff88809fa40a1c of 4 bytes by task 21 on cpu 1:
cma_comp_exch drivers/infiniband/core/cma.c:426 [inline]
addr_handler+0x9f/0x2b0 drivers/infiniband/core/cma.c:3141
process_one_req+0x22f/0x300 drivers/infiniband/core/addr.c:645
process_one_work+0x3e1/0x9a0 kernel/workqueue.c:2269
worker_thread+0x665/0xbe0 kernel/workqueue.c:2415
kthread+0x20d/0x230 kernel/kthread.c:291
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293

read to 0xffff88809fa40a1c of 4 bytes by task 11997 on cpu 0:
cma_check_port+0xbd/0x700 drivers/infiniband/core/cma.c:3506
cma_use_port drivers/infiniband/core/cma.c:3541 [inline]
cma_get_port drivers/infiniband/core/cma.c:3623 [inline]
rdma_bind_addr+0x1639/0x1910 drivers/infiniband/core/cma.c:3741
cma_bind_addr drivers/infiniband/core/cma.c:3252 [inline]
rdma_resolve_addr+0x486/0x1240 drivers/infiniband/core/cma.c:3264
ucma_resolve_ip+0x121/0x1b0 drivers/infiniband/core/ucma.c:722
ucma_write+0x229/0x250 drivers/infiniband/core/ucma.c:1764
vfs_write+0x1d6/0x690 fs/read_write.c:576
ksys_write+0xce/0x180 fs/read_write.c:631
__do_sys_write fs/read_write.c:643 [inline]
__se_sys_write fs/read_write.c:640 [inline]
__x64_sys_write+0x3e/0x50 fs/read_write.c:640
do_syscall_64+0x51/0xb0 arch/x86/entry/common.c:384
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11997 Comm: syz-executor.4 Not tainted 5.8.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
==================================================================

and on commit 5863cc79:
==================================================================
BUG: KCSAN: data-race in cma_comp_exch / rdma_resolve_addr

write to 0xffff8880a73bda1c of 4 bytes by task 7740 on cpu 0:
cma_comp_exch+0x84/0xc0 drivers/infiniband/core/cma.c:441
addr_handler+0x80/0x2f0 drivers/infiniband/core/cma.c:3033
process_one_req+0xc2/0x3a0 drivers/infiniband/core/addr.c:644
process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
worker_thread+0xa0/0x800 kernel/workqueue.c:2415
kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352

read to 0xffff8880a73bda1c of 4 bytes by task 30237 on cpu 1:
rdma_resolve_addr+0x56/0x10e0 drivers/infiniband/core/cma.c:3156
ucma_resolve_ip+0x105/0x180 drivers/infiniband/core/ucma.c:708
ucma_write+0x1fe/0x2a0 drivers/infiniband/core/ucma.c:1684
__vfs_write+0x67/0xc0 fs/read_write.c:494
vfs_write fs/read_write.c:558 [inline]
vfs_write+0x18a/0x390 fs/read_write.c:542
ksys_write+0x17b/0x1b0 fs/read_write.c:611
__do_sys_write fs/read_write.c:623 [inline]
__se_sys_write fs/read_write.c:620 [inline]
__x64_sys_write+0x4c/0x60 fs/read_write.c:620
do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 30237 Comm: syz-executor.0 Not tainted 5.4.0-rc7+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
==================================================================

Dmitry Vyukov

unread,
Sep 16, 2021, 10:55:28 AM9/16/21
to Jason Gunthorpe, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
This does not immediately explain the use-after-free for me, but these
races suggest that everything is not protected by a single mutex and
that there may be some surprising interleavings.
E.g. rdma_resolve_addr checks status, and then conditionally executes
cma_bind_addr, but the status can change concurrently.

Jason Gunthorpe

unread,
Sep 16, 2021, 11:08:54 AM9/16/21
to Dmitry Vyukov, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
On Thu, Sep 16, 2021 at 04:55:16PM +0200, Dmitry Vyukov wrote:

> > I noticed we also had 2 KCSAN reports that mention rdma_resolve_addr.
> >
> > On commit 1df0d896:
> > ==================================================================
> > BUG: KCSAN: data-race in addr_handler / cma_check_port
> >
> > write to 0xffff88809fa40a1c of 4 bytes by task 21 on cpu 1:
> > cma_comp_exch drivers/infiniband/core/cma.c:426 [inline]
> > addr_handler+0x9f/0x2b0 drivers/infiniband/core/cma.c:3141
> > process_one_req+0x22f/0x300 drivers/infiniband/core/addr.c:645
> > process_one_work+0x3e1/0x9a0 kernel/workqueue.c:2269
> > worker_thread+0x665/0xbe0 kernel/workqueue.c:2415
> > kthread+0x20d/0x230 kernel/kthread.c:291
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
> >
> > read to 0xffff88809fa40a1c of 4 bytes by task 11997 on cpu 0:
> > cma_check_port+0xbd/0x700 drivers/infiniband/core/cma.c:3506

This has since been fixed, cma_check_port() no longer reads state

> > and on commit 5863cc79:

I can't find this commit? Current rdma_resolve_addr should not trigger
this KCSAN.

> This does not immediately explain the use-after-free for me, but these
> races suggest that everything is not protected by a single mutex and
> that there may be some surprising interleavings.
> E.g. rdma_resolve_addr checks status, and then conditionally executes
> cma_bind_addr, but the status can change concurrently.

It is true, they weren't, however I've fixed them all. These hits look
like they all from before it got fixed up..

Jason

Dmitry Vyukov

unread,
Sep 16, 2021, 11:17:40 AM9/16/21
to Jason Gunthorpe, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
Then sorry for false leads.
The second commit was from https://github.com/google/ktsan.git kcsan
branch. I am not sure if it's still present or was rebased. But either
way it's even older than the first report on upstream (we used ktsan
tree before switched to upstream).

Jason Gunthorpe

unread,
Sep 16, 2021, 12:02:26 PM9/16/21
to Dmitry Vyukov, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
On Thu, Sep 16, 2021 at 04:45:27PM +0200, Dmitry Vyukov wrote:

> It looks like a very hard to trigger race (few crashes, no reproducer,
> but KASAN reports look sensible). That's probably the reason syzkaller
> can't create a reproducer.
> From the log it looks like it was triggered by one of these programs
> below. But I tried to reproduce manually and had no success.
> We are currently doing some improvements to race triggering code in
> syzkaller, and may try to use this as a litmus test to see if
> syzkaller will do any better:
> https://github.com/google/syzkaller/issues/612#issuecomment-920961538

I would suggest to look at this:

https://patchwork.kernel.org/project/linux-rdma/patch/0-v1-9fbb33f5e201...@nvidia.com/

Which I think should be completely deterministic, just do the RDMA_CM
ops in the right order, but syzbot didn't find a reproducer.

The "healer" fork did however:

https://lore.kernel.org/all/CACkBjsY-CNzO74XGo0uJrcaZ...@mail.gmail.com/#r

> Answering your question re what was running concurrently with what.
> Each of the syscalls in these programs can run up to 2 times and
> ultimately any of these calls can race with any. Potentially syzkaller
> can predict values kernel will return (e.g. id's) before kernel
> actually returned them. I guess this does not restrict search area for
> the bug a lot...

Well, it does help if it is only those system calls

And I think I can discount the workqueue as a problem as I'd expect a
kasn hit on the 'req' allocation if the workqueue was malfunctioning -
thus I must conclude we are not calling work cancelation for some
reason.

Jason

Jason Gunthorpe

unread,
Sep 16, 2021, 12:28:54 PM9/16/21
to Dmitry Vyukov, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
On Thu, Sep 16, 2021 at 04:45:27PM +0200, Dmitry Vyukov wrote:

> Answering your question re what was running concurrently with what.
> Each of the syscalls in these programs can run up to 2 times and
> ultimately any of these calls can race with any. Potentially syzkaller
> can predict values kernel will return (e.g. id's) before kernel
> actually returned them. I guess this does not restrict search area for
> the bug a lot...

I have a reasonable theory now..

Based on the ops you provided this FSM sequence is possible

RDMA_USER_CM_CMD_RESOLVE_IP
RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY
does rdma_resolve_ip(addr_handler)

addr_handler
RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND
[.. handler still running ..]

RDMA_USER_CM_CMD_RESOLVE_IP
RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY
does rdma_resolve_ip(addr_handler)

RDMA_DESTROY_ID
rdma_addr_cancel()

Which, if it happens fast enough, could trigger a situation where the
'&id_priv->id.route.addr.dev_addr' "handle" is in the req_list twice
beacause the addr_handler work queue hasn't yet got to the point of
deleting it from the req_list before the the 2nd one is added.

The issue is rdma_addr_cancel() has to be called rdma_resolve_ip() can
be called again.

Skipping it will cause 'req_list' to have two items in the internal
linked list with the same key and it will not cancel the newest one
with the active timer. This would cause the use after free syndrome
like this trace is showing.

I can make a patch, but have no way to know if it is any good :\

Jason

Dmitry Vyukov

unread,
Sep 20, 2021, 4:13:22 AM9/20/21
to Jason Gunthorpe, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
Good detective work!

But if you have a theory of what happens, it's usually easy to write a
reproducer that aims at triggering this exact scenario. Isn't it the
case here? I would assume you need it to add as a test anyway.

Hillf Danton

unread,
Oct 4, 2021, 11:29:15 PM10/4/21
to Dmitry Vyukov, Jason Gunthorpe, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
Greate to know the gadgets on the syzkaller side!

In the scenario derived from the log of 2ee9bf346fbf
("RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()"),

CPU1 CPU2 CPU3
---- ---- ----
netevent_callback() rdma_addr_cancel() process_one_req()

spin_lock_bh()
set_timeout() req->callback()
mod_delayed_work(addr_wq,
&req->work, delay);
spin_unlock_bh()
spin_lock_bh()
list_del_init(&req->list)
spin_unlock_bh()
cancel_delayed_work_sync(&req->work)
kfree(req)
req->callback = NULL

the chance for uaf on CPU3 is not zero, given that canceling of the requeued
work will not wait for the worker running the callback to complete.

Hillf

Jason Gunthorpe

unread,
Oct 5, 2021, 8:23:44 AM10/5/21
to Hillf Danton, Dmitry Vyukov, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
> netevent_callback() rdma_addr_cancel() process_one_req()
>
> spin_lock_bh()
> set_timeout() req->callback()
> mod_delayed_work(addr_wq,
> &req->work, delay);
> spin_unlock_bh()
> spin_lock_bh()
> list_del_init(&req->list)
> spin_unlock_bh()
> cancel_delayed_work_sync(&req->work)
> kfree(req)
> req->callback = NULL
>
> the chance for uaf on CPU3 is not zero, given that canceling of the requeued
> work will not wait for the worker running the callback to complete.

I'm not sure what you are trying to say

Jason

Hillf Danton

unread,
Oct 5, 2021, 11:18:21 PM10/5/21
to Jason Gunthorpe, Dmitry Vyukov, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
No reproducer no test.

Hillf

+++ b/drivers/infiniband/core/addr.c
@@ -795,6 +795,11 @@ void rdma_addr_cancel(struct rdma_dev_ad
* guarentees no work is running and none will be started.
*/
cancel_delayed_work_sync(&found->work);
+ /*
+ * flush is needed if work is queued again while it is running, as
+ * cancel waits nothing.
+ */
+ flush_work(&found->work);
kfree(found);
}
EXPORT_SYMBOL(rdma_addr_cancel);

Jason Gunthorpe

unread,
Oct 6, 2021, 7:42:03 AM10/6/21
to Hillf Danton, Dmitry Vyukov, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
On Wed, Oct 06, 2021 at 11:18:00AM +0800, Hillf Danton wrote:
> +++ b/drivers/infiniband/core/addr.c
> @@ -795,6 +795,11 @@ void rdma_addr_cancel(struct rdma_dev_ad
> * guarentees no work is running and none will be started.
> */
> cancel_delayed_work_sync(&found->work);
> + /*
> + * flush is needed if work is queued again while it is running, as
> + * cancel waits nothing.
> + */
> + flush_work(&found->work);

The _sync() above does the same, cancel doesn't return while the work
is running

Jason

Hillf Danton

unread,
Oct 7, 2021, 7:14:22 AM10/7/21
to Jason Gunthorpe, Dmitry Vyukov, syzbot, dled...@redhat.com, le...@kernel.org, linux-...@vger.kernel.org, linux...@vger.kernel.org, syzkall...@googlegroups.com, Aleksandr Nogikh
You are right. Thanks for your light in the dark.

Hillf
Reply all
Reply to author
Forward
0 new messages