WARNING: refcount bug in qrtr_node_lookup

56 views
Skip to first unread message

syzbot

unread,
Sep 7, 2020, 5:18:16 PM9/7/20
to bjorn.a...@linaro.org, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 9322c47b Merge tag 'xfs-5.9-fixes-2' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16cf729e900000
kernel config: https://syzkaller.appspot.com/x/.config?x=e1c560d0f4e121c9
dashboard link: https://syzkaller.appspot.com/bug?extid=c613e88b3093ebf3686e
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=170c51a5900000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=151a4245900000

The issue was bisected to:

commit e42671084361302141a09284fde9bbc14fdd16bf
Author: Manivannan Sadhasivam <manivannan...@linaro.org>
Date: Thu May 7 12:53:06 2020 +0000

net: qrtr: Do not depend on ARCH_QCOM

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10fd1a9e900000
final oops: https://syzkaller.appspot.com/x/report.txt?x=12fd1a9e900000
console output: https://syzkaller.appspot.com/x/log.txt?x=14fd1a9e900000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c613e8...@syzkaller.appspotmail.com
Fixes: e42671084361 ("net: qrtr: Do not depend on ARCH_QCOM")

------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 21 at lib/refcount.c:25 refcount_warn_saturate+0x13d/0x1a0 lib/refcount.c:25
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 21 Comm: kworker/u4:1 Not tainted 5.9.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: qrtr_ns_handler qrtr_ns_worker
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d6/0x29e lib/dump_stack.c:118
panic+0x2c0/0x800 kernel/panic.c:231
__warn+0x227/0x250 kernel/panic.c:600
report_bug+0x1b1/0x2e0 lib/bug.c:198
handle_bug+0x42/0x80 arch/x86/kernel/traps.c:234
exc_invalid_op+0x16/0x40 arch/x86/kernel/traps.c:254
asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:refcount_warn_saturate+0x13d/0x1a0 lib/refcount.c:25
Code: c7 83 96 15 89 31 c0 e8 b1 34 a6 fd 0f 0b eb a3 e8 a8 94 d4 fd c6 05 04 8e ea 05 01 48 c7 c7 ba 96 15 89 31 c0 e8 93 34 a6 fd <0f> 0b eb 85 e8 8a 94 d4 fd c6 05 e7 8d ea 05 01 48 c7 c7 e6 96 15
RSP: 0018:ffffc90000dd79c0 EFLAGS: 00010046
RAX: eb7b51afe96d3100 RBX: 0000000000000002 RCX: ffff8880a9bf6580
RDX: 0000000000000000 RSI: 0000000080000001 RDI: 0000000000000000
RBP: 0000000000000002 R08: ffffffff815e27d0 R09: ffffed1015d241c3
R10: ffffed1015d241c3 R11: 0000000000000000 R12: ffff8880a761a898
R13: 1ffff1101517fa56 R14: 0000000000000286 R15: ffff8880a761a800
refcount_add include/linux/refcount.h:206 [inline]
refcount_inc include/linux/refcount.h:241 [inline]
kref_get include/linux/kref.h:45 [inline]
qrtr_node_acquire net/qrtr/qrtr.c:196 [inline]
qrtr_node_lookup+0xc0/0xd0 net/qrtr/qrtr.c:388
qrtr_send_resume_tx net/qrtr/qrtr.c:980 [inline]
qrtr_recvmsg+0x429/0xa80 net/qrtr/qrtr.c:1043
qrtr_ns_worker+0x176/0x45f0 net/qrtr/ns.c:624
process_one_work+0x789/0xfc0 kernel/workqueue.c:2269
worker_thread+0xaa4/0x1460 kernel/workqueue.c:2415
kthread+0x37e/0x3a0 drivers/block/aoe/aoecmd.c:1234
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Shutting down cpus with NMI
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Sep 8, 2020, 12:50:24 AM9/8/20
to syzbot, bjorn.a...@linaro.org, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, hda...@sina.com, syzkall...@googlegroups.com

Mon, 07 Sep 2020 14:18:15 -0700
Fix 0a7e0d0ef054 ("net: qrtr: Migrate node lookup tree to spinlock")
by detecting the race between the acquirer and holder of qrtr_nodes_lock.

--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -189,14 +189,6 @@ static void __qrtr_node_release(struct k
kfree(node);
}

-/* Increment reference to node. */
-static struct qrtr_node *qrtr_node_acquire(struct qrtr_node *node)
-{
- if (node)
- kref_get(&node->ref);
- return node;
-}
-
/* Decrement reference to node and release as necessary. */
static void qrtr_node_release(struct qrtr_node *node)
{
@@ -385,7 +377,17 @@ static struct qrtr_node *qrtr_node_looku

spin_lock_irqsave(&qrtr_nodes_lock, flags);
node = radix_tree_lookup(&qrtr_nodes, nid);
- node = qrtr_node_acquire(node);
+ if (node && !kref_get_unless_zero(&node->ref)) {
+ /*
+ * see a node with zero ref in the radix tree, and it
+ * hints a race between lookuper and deleter. But it
+ * seems benign because the race only exists if the
+ * deleter is busy acquiring qrtr_nodes_lock at the
+ * moment, so do nothing and let the peer take care
+ * of the rest.
+ */
+ node = NULL;
+ }
spin_unlock_irqrestore(&qrtr_nodes_lock, flags);

return node;

syzbot

unread,
Sep 8, 2020, 2:30:06 PM9/8/20
to drago...@gmail.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+c613e8...@syzkaller.appspotmail.com

Tested on:

commit: f4d51dff Linux 5.9-rc4
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
kernel config: https://syzkaller.appspot.com/x/.config?x=a49094677fcb4545
dashboard link: https://syzkaller.appspot.com/bug?extid=c613e88b3093ebf3686e
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
patch: https://syzkaller.appspot.com/x/patch.diff?x=120255f9900000

Note: testing is done by a robot and is best-effort only.

syzbot

unread,
Sep 8, 2020, 4:42:05 PM9/8/20
to drago...@gmail.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+c613e8...@syzkaller.appspotmail.com

Tested on:

commit: 34d4ddd3 Merge tag 'linux-kselftest-5.9-rc5' of git://git...
patch: https://syzkaller.appspot.com/x/patch.diff?x=130c5e31900000

syzbot

unread,
Sep 8, 2020, 4:59:05 PM9/8/20
to drago...@gmail.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+c613e8...@syzkaller.appspotmail.com

Tested on:

commit: 34d4ddd3 Merge tag 'linux-kselftest-5.9-rc5' of git://git...
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
kernel config: https://syzkaller.appspot.com/x/.config?x=a49094677fcb4545
dashboard link: https://syzkaller.appspot.com/bug?extid=c613e88b3093ebf3686e
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
patch: https://syzkaller.appspot.com/x/patch.diff?x=133b7335900000

syzbot

unread,
Sep 8, 2020, 6:32:10 PM9/8/20
to anant.th...@gmail.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+c613e8...@syzkaller.appspotmail.com

Tested on:

commit: 34d4ddd3 Merge tag 'linux-kselftest-5.9-rc5' of git://git...
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
kernel config: https://syzkaller.appspot.com/x/.config?x=8f5c353182ed6199
dashboard link: https://syzkaller.appspot.com/bug?extid=c613e88b3093ebf3686e
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
patch: https://syzkaller.appspot.com/x/patch.diff?x=1468a335900000

syzbot

unread,
Aug 28, 2021, 2:32:11 PM8/28/21
to anant.th...@gmail.com, bjorn.a...@linaro.org, butterfl...@gmail.com, da...@davemloft.net, drago...@gmail.com, hda...@sina.com, ku...@kernel.org, linux-...@vger.kernel.org, linux-kern...@lists.linuxfoundation.org, linux-...@vger.kernel.org, ma...@kernel.org, manivannan...@linaro.org, masa...@kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
syzbot suspects this issue was fixed by commit:

commit 7e78c597c3ebfd0cb329aa09a838734147e4f117
Author: Xiaolong Huang <butterfl...@gmail.com>
Date: Thu Aug 19 19:50:34 2021 +0000

net: qrtr: fix another OOB Read in qrtr_endpoint_post

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11279a4d300000
start commit: ba4f184e126b Linux 5.9-rc6
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=af502ec9a451c9fc
dashboard link: https://syzkaller.appspot.com/bug?extid=c613e88b3093ebf3686e
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12263dd9900000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13d77603900000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: net: qrtr: fix another OOB Read in qrtr_endpoint_post

Dmitry Vyukov

unread,
Aug 30, 2021, 4:39:31 AM8/30/21
to syzbot, anant.th...@gmail.com, bjorn.a...@linaro.org, butterfl...@gmail.com, da...@davemloft.net, drago...@gmail.com, hda...@sina.com, ku...@kernel.org, linux-...@vger.kernel.org, linux-kern...@lists.linuxfoundation.org, linux-...@vger.kernel.org, ma...@kernel.org, manivannan...@linaro.org, masa...@kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Hi Hillf,

You posted some patch related to refcounts. Do you think "net: qrtr:
fix another OOB Read in qrtr_endpoint_post" is a plausible fix? Or is
there still something wrong with refcounts?

Hillf Danton

unread,
Aug 30, 2021, 9:08:16 AM8/30/21
to Dmitry Vyukov, syzbot, butterfl...@gmail.com, linux-...@vger.kernel.org, manivannan...@linaro.org, Bjorn Andersson, net...@vger.kernel.org, syzkall...@googlegroups.com
On Mon, 30 Aug 2021 10:39:18 +0200 Dmitry Vyukov wrote:
>On Sat, 28 Aug 2021 at 20:32, syzbot
>>
>> syzbot suspects this issue was fixed by commit:
>>
>> commit 7e78c597c3ebfd0cb329aa09a838734147e4f117
>> Author: Xiaolong Huang <butterfl...@gmail.com>
>> Date: Thu Aug 19 19:50:34 2021 +0000
>>
>> net: qrtr: fix another OOB Read in qrtr_endpoint_post
>>
>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11279a4d300000
>> start commit: ba4f184e126b Linux 5.9-rc6
>> git tree: upstream
>> kernel config: https://syzkaller.appspot.com/x/.config?x=af502ec9a451c9fc
>> dashboard link: https://syzkaller.appspot.com/bug?extid=c613e88b3093ebf3686e
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=12263dd9900000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13d77603900000
>>
>> If the result looks correct, please mark the issue as fixed by replying with:
>>
>> #syz fix: net: qrtr: fix another OOB Read in qrtr_endpoint_post
>>
>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>
>Hi Hillf,

Hi Dmitry,

Thanks for sharing this.
>
>You posted some patch related to refcounts. Do you think "net: qrtr:
>fix another OOB Read in qrtr_endpoint_post" is a plausible fix? Or is
>there still something wrong with refcounts?

I think 7e78c597c3eb fixed different issue to the refcount one reported.

I see the chance for the following race in the mainline tree

CPU0 CPU1
---- ----
qrtr_node_lookup qrtr_node_release
kref_put_mutex(&node->ref, __qrtr_node_release,
&qrtr_node_lock);
refcount_dec_and_mutex_lock
__qrtr_node_release

spin_lock_irqsave(&qrtr_nodes_lock, flags);
node = radix_tree_lookup(&qrtr_nodes, nid);
node = qrtr_node_acquire(node);
if (node)
kref_get(&node->ref);
spin_unlock_irqrestore(&qrtr_nodes_lock, flags);

spin_lock_irqsave(&qrtr_nodes_lock, flags);


and one of the quick fixes is s/kref_get/kref_get_unless_zero/ with count
underflow detected.

Hillf

Hillf Danton

unread,
Aug 31, 2021, 11:06:57 PM8/31/21
to syzbot, bjorn.a...@linaro.org, Dan Carpenter, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, syzkall...@googlegroups.com
On Mon, 07 Sep 2020 14:18:15 -0700
Fix 0a7e0d0ef054 ("net: qrtr: Migrate node lookup tree to spinlock")
by replacing kref_get() with kref_get_unless_zero() to close the race window
between lookup and release pathes.

lookup release
------ -------
kref_put_mutex(&node->ref,
__qrtr_node_release,
&qrtr_node_lock);
__qrtr_node_release

qrtr_nodes_lock
node = radix_tree_lookup(&qrtr_nodes, nid);
node = qrtr_node_acquire(node);
qrtr_nodes_lock

qrtr_nodes_lock
radix_tree_for_each_slot

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

--- a/net/qrtr/qrtr.c
+++ b/net/qrtr/qrtr.c
@@ -198,9 +198,10 @@ static void __qrtr_node_release(struct k
/* Increment reference to node. */
static struct qrtr_node *qrtr_node_acquire(struct qrtr_node *node)
{
- if (node)
- kref_get(&node->ref);
- return node;
+ if (node && kref_get_unless_zero(&node->ref))
+ return node;
+ else
+ return NULL;
}

/* Decrement reference to node and release as necessary. */
--

syzbot

unread,
Aug 31, 2021, 11:41:11 PM8/31/21
to bjorn.a...@linaro.org, dan.ca...@oracle.com, hda...@sina.com, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

arch/x86/kernel/setup.c:916:6: error: implicit declaration of function 'acpi_mps_check' [-Werror,-Wimplicit-function-declaration]
arch/x86/kernel/setup.c:1110:2: error: implicit declaration of function 'acpi_table_upgrade' [-Werror,-Wimplicit-function-declaration]
arch/x86/kernel/setup.c:1112:2: error: implicit declaration of function 'acpi_boot_table_init' [-Werror,-Wimplicit-function-declaration]
arch/x86/kernel/setup.c:1120:2: error: implicit declaration of function 'early_acpi_boot_init' [-Werror,-Wimplicit-function-declaration]
arch/x86/kernel/setup.c:1162:2: error: implicit declaration of function 'acpi_boot_init' [-Werror,-Wimplicit-function-declaration]


Tested on:

commit: 9e9fb765 Merge tag 'net-next-5.15' of git://git.kernel..
git tree: upstream
patch: https://syzkaller.appspot.com/x/patch.diff?x=1413e2f5300000

Eric Dumazet

unread,
Sep 1, 2021, 12:26:37 AM9/1/21
to syzbot, bjorn.a...@linaro.org, dan.ca...@oracle.com, hda...@sina.com, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Tree seems broken, no idea why the following is needed.

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 63b20536c8d236083336c2b50dc5f54225a80eab..6edec9a28293ea3241bd7842ab5555a1691e6cea 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -22,6 +22,7 @@
#include <linux/usb/xhci-dbgp.h>
#include <linux/static_call.h>
#include <linux/swiotlb.h>
+#include <linux/acpi.h>

#include <uapi/linux/mount.h>

Hillf Danton

unread,
Sep 1, 2021, 8:53:10 PM9/1/21
to syzbot, bjorn.a...@linaro.org, Dan Carpenter, linux-...@vger.kernel.org, manivannan...@linaro.org, Eric Dumazet, net...@vger.kernel.org, syzkall...@googlegroups.com
On Mon, 07 Sep 2020 14:18:15 -0700
Fix 0a7e0d0ef054 ("net: qrtr: Migrate node lookup tree to spinlock")
by replacing kref_get() with kref_get_unless_zero() to close the race window
between lookup and release pathes.

lookup release
------ -------
kref_put_mutex(&node->ref,
__qrtr_node_release,
&qrtr_node_lock);
__qrtr_node_release

qrtr_nodes_lock
node = radix_tree_lookup(&qrtr_nodes, nid);
node = qrtr_node_acquire(node);
qrtr_nodes_lock

qrtr_nodes_lock
radix_tree_for_each_slot

#syz test git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git b91db6a0b52e

syzbot

unread,
Sep 1, 2021, 10:32:07 PM9/1/21
to bjorn.a...@linaro.org, dan.ca...@oracle.com, eric.d...@gmail.com, hda...@sina.com, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
UBSAN: object-size-mismatch in send4

================================================================================
UBSAN: object-size-mismatch in ./include/net/flow.h:197:33
member access within address 000000001597b753 with insufficient space
for an object of type 'struct flowi'
CPU: 1 PID: 231 Comm: kworker/u4:4 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: wg-kex-wg0 wg_packet_handshake_send_worker
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x15e/0x1d3 lib/dump_stack.c:105
ubsan_epilogue lib/ubsan.c:148 [inline]
handle_object_size_mismatch lib/ubsan.c:229 [inline]
ubsan_type_mismatch_common+0x1de/0x390 lib/ubsan.c:242
__ubsan_handle_type_mismatch_v1+0x41/0x50 lib/ubsan.c:271
flowi4_to_flowi_common include/net/flow.h:197 [inline]
send4+0x39b/0xdd0 drivers/net/wireguard/socket.c:52
wg_socket_send_skb_to_peer+0xc7/0x200 drivers/net/wireguard/socket.c:174
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x14a/0x190 drivers/net/wireguard/send.c:51
process_one_work+0x471/0x840 kernel/workqueue.c:2276
worker_thread+0x686/0x9e0 kernel/workqueue.c:2422
kthread+0x3ca/0x3f0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
================================================================================
================================================================================
UBSAN: object-size-mismatch in ./include/net/flow.h:197:33
member access within address 000000001597b753 with insufficient space
for an object of type 'union (anonymous union at ./include/net/flow.h:172:2)'
CPU: 1 PID: 231 Comm: kworker/u4:4 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: wg-kex-wg0 wg_packet_handshake_send_worker
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x15e/0x1d3 lib/dump_stack.c:105
ubsan_epilogue lib/ubsan.c:148 [inline]
handle_object_size_mismatch lib/ubsan.c:229 [inline]
ubsan_type_mismatch_common+0x1de/0x390 lib/ubsan.c:242
__ubsan_handle_type_mismatch_v1+0x41/0x50 lib/ubsan.c:271
flowi4_to_flowi_common include/net/flow.h:197 [inline]
send4+0x3aa/0xdd0 drivers/net/wireguard/socket.c:52
wg_socket_send_skb_to_peer+0xc7/0x200 drivers/net/wireguard/socket.c:174
wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline]
wg_packet_handshake_send_worker+0x14a/0x190 drivers/net/wireguard/send.c:51
process_one_work+0x471/0x840 kernel/workqueue.c:2276
worker_thread+0x686/0x9e0 kernel/workqueue.c:2422
kthread+0x3ca/0x3f0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
================================================================================


Tested on:

commit: b91db6a0 Merge tag 'for-5.15/io_uring-vfs-2021-08-30' ..
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=11740133300000
kernel config: https://syzkaller.appspot.com/x/.config?x=7d1b73d0f1d597e4
dashboard link: https://syzkaller.appspot.com/bug?extid=c613e88b3093ebf3686e
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.1
patch: https://syzkaller.appspot.com/x/patch.diff?x=13a44435300000

Hillf Danton

unread,
Sep 2, 2021, 12:13:21 AM9/2/21
to Paul Moore, syzbot, bjorn.a...@linaro.org, dan.ca...@oracle.com, eric.d...@gmail.com, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, syzkall...@googlegroups.com
On Wed, 01 Sep 2021 19:32:06 -0700
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> UBSAN: object-size-mismatch in send4
>
> ================================================================================
> UBSAN: object-size-mismatch in ./include/net/flow.h:197:33
> member access within address 000000001597b753 with insufficient space
> for an object of type 'struct flowi'
> CPU: 1 PID: 231 Comm: kworker/u4:4 Not tainted 5.14.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: wg-kex-wg0 wg_packet_handshake_send_worker
> Call Trace:
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0x15e/0x1d3 lib/dump_stack.c:105
> ubsan_epilogue lib/ubsan.c:148 [inline]
> handle_object_size_mismatch lib/ubsan.c:229 [inline]
> ubsan_type_mismatch_common+0x1de/0x390 lib/ubsan.c:242
> __ubsan_handle_type_mismatch_v1+0x41/0x50 lib/ubsan.c:271
> flowi4_to_flowi_common include/net/flow.h:197 [inline]

This was added in 3df98d79215a ("lsm,selinux: pass flowi_common instead of
flowi to the LSM hooks"), could you take a look at the UBSAN report, Paul?

Thanks
Hillf

Paul Moore

unread,
Sep 2, 2021, 9:58:44 AM9/2/21
to Hillf Danton, syzbot, bjorn.a...@linaro.org, dan.ca...@oracle.com, eric.d...@gmail.com, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, syzkall...@googlegroups.com
On Thu, Sep 2, 2021 at 12:13 AM Hillf Danton <hda...@sina.com> wrote:
> On Wed, 01 Sep 2021 19:32:06 -0700
> >
> > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > UBSAN: object-size-mismatch in send4
> >
> > ================================================================================
> > UBSAN: object-size-mismatch in ./include/net/flow.h:197:33
> > member access within address 000000001597b753 with insufficient space
> > for an object of type 'struct flowi'
> > CPU: 1 PID: 231 Comm: kworker/u4:4 Not tainted 5.14.0-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Workqueue: wg-kex-wg0 wg_packet_handshake_send_worker
> > Call Trace:
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0x15e/0x1d3 lib/dump_stack.c:105
> > ubsan_epilogue lib/ubsan.c:148 [inline]
> > handle_object_size_mismatch lib/ubsan.c:229 [inline]
> > ubsan_type_mismatch_common+0x1de/0x390 lib/ubsan.c:242
> > __ubsan_handle_type_mismatch_v1+0x41/0x50 lib/ubsan.c:271
> > flowi4_to_flowi_common include/net/flow.h:197 [inline]
>
> This was added in 3df98d79215a ("lsm,selinux: pass flowi_common instead of
> flowi to the LSM hooks"), could you take a look at the UBSAN report, Paul?

Sure, although due to some flooding here at home it might take a day
(two?) before I have any real comments on this.

--
paul moore
www.paul-moore.com

Paul Moore

unread,
Sep 2, 2021, 10:40:44 PM9/2/21
to Hillf Danton, syzbot, bjorn.a...@linaro.org, dan.ca...@oracle.com, eric.d...@gmail.com, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, syzkall...@googlegroups.com
I'm looking quickly at this tonight after a long day so it's possible
I'm missing something, but in the original report if you look one step
before the backtrace above you see the caller is send4 in the
wireguard code, which starts off by creating it's own flowi4 variable
on the stack and then eventually passing it down to
flowi4_to_flowi_common() for use as the second parameter to
security_sk_classify_flow(). Because the wireguard code only
allocates a flowi4 and not a full flowi struct it seems like this
would explain the size mismatch warning (flowi is larger than flowi4
due to the union containing flowi6 as well as flowi4).

Off the top of my head it isn't clear to me if it is considered "safe"
to allocate just a flowi4 in this way, you may need to allocate a full
flowi; if none of the netdev folks reply we could always check the
other flowi4 users in the kernel.

Depending on the above, the fix may be either to adjust send4() to use
a full flowi, or to adjust flowi4_to_flowi_common() to use the
flowi_common struct at the top of the flowi4 struct instead of first
looking for the flowi struct and going from there.

--
paul moore
www.paul-moore.com

Hillf Danton

unread,
Sep 3, 2021, 12:28:41 AM9/3/21
to Paul Moore, syzbot, bjorn.a...@linaro.org, dan.ca...@oracle.com, eric.d...@gmail.com, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, syzkall...@googlegroups.com
On Thu, 2 Sep 2021 22:40:31 -0400 Paul Moore wrote:
>
>I'm looking quickly at this tonight after a long day so it's possible

Hi Paul,

Thanks for taking a look.

>I'm missing something, but in the original report if you look one step
>before the backtrace above you see the caller is send4 in the
>wireguard code, which starts off by creating it's own flowi4 variable
>on the stack and then eventually passing it down to
>flowi4_to_flowi_common() for use as the second parameter to
>security_sk_classify_flow(). Because the wireguard code only
>allocates a flowi4 and not a full flowi struct it seems like this
>would explain the size mismatch warning (flowi is larger than flowi4
>due to the union containing flowi6 as well as flowi4).
>
>Off the top of my head it isn't clear to me if it is considered "safe"
>to allocate just a flowi4 in this way, you may need to allocate a full
>flowi; if none of the netdev folks reply we could always check the
>other flowi4 users in the kernel.
>
>Depending on the above, the fix may be either to adjust send4() to use
>a full flowi, or to adjust flowi4_to_flowi_common() to use the
>flowi_common struct at the top of the flowi4 struct instead of first
>looking for the flowi struct and going from there.

Spin with changes to flowi included.
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -194,7 +194,7 @@ static inline struct flowi *flowi4_to_fl

static inline struct flowi_common *flowi4_to_flowi_common(struct flowi4 *fl4)
{
- return &(flowi4_to_flowi(fl4)->u.__fl_common);
+ return &fl4->__fl_common;
}

static inline struct flowi *flowi6_to_flowi(struct flowi6 *fl6)
@@ -204,7 +204,7 @@ static inline struct flowi *flowi6_to_fl

static inline struct flowi_common *flowi6_to_flowi_common(struct flowi6 *fl6)
{
- return &(flowi6_to_flowi(fl6)->u.__fl_common);
+ return &fl6->__fl_common;
}

static inline struct flowi *flowidn_to_flowi(struct flowidn *fldn)

syzbot

unread,
Sep 3, 2021, 7:04:09 AM9/3/21
to bjorn.a...@linaro.org, dan.ca...@oracle.com, eric.d...@gmail.com, hda...@sina.com, linux-...@vger.kernel.org, manivannan...@linaro.org, net...@vger.kernel.org, pa...@paul-moore.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
UBSAN: object-size-mismatch in wg_xmit

================================================================================
UBSAN: object-size-mismatch in ./include/linux/skbuff.h:2048:28
member access within address 0000000096a277f4 with insufficient space
for an object of type 'struct sk_buff'
CPU: 0 PID: 3568 Comm: kworker/0:5 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x15e/0x1d3 lib/dump_stack.c:105
ubsan_epilogue lib/ubsan.c:151 [inline]
handle_object_size_mismatch lib/ubsan.c:232 [inline]
ubsan_type_mismatch_common+0x1de/0x390 lib/ubsan.c:245
__ubsan_handle_type_mismatch_v1+0x41/0x50 lib/ubsan.c:274
__skb_queue_before include/linux/skbuff.h:2048 [inline]
__skb_queue_tail include/linux/skbuff.h:2081 [inline]
wg_xmit+0x4da/0xa60 drivers/net/wireguard/device.c:182
__netdev_start_xmit include/linux/netdevice.h:4970 [inline]
netdev_start_xmit+0x7b/0x140 include/linux/netdevice.h:4984
xmit_one net/core/dev.c:3576 [inline]
dev_hard_start_xmit+0x182/0x2e0 net/core/dev.c:3592
__dev_queue_xmit+0x13b0/0x21a0 net/core/dev.c:4202
neigh_output include/net/neighbour.h:510 [inline]
ip6_finish_output2+0xc51/0x11b0 net/ipv6/ip6_output.c:126
dst_output include/net/dst.h:450 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ndisc_send_skb+0x835/0xcf0 net/ipv6/ndisc.c:508
addrconf_dad_completed+0x6c5/0xa70 net/ipv6/addrconf.c:4203
addrconf_dad_work+0xba5/0x1510 net/ipv6/addrconf.c:3970
process_one_work+0x4b5/0x8d0 kernel/workqueue.c:2297
worker_thread+0x686/0x9e0 kernel/workqueue.c:2444
kthread+0x3ca/0x3f0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
================================================================================
================================================================================
UBSAN: object-size-mismatch in ./include/linux/skbuff.h:1941:2
member access within address 0000000096a277f4 with insufficient space
for an object of type 'struct sk_buff'
CPU: 0 PID: 3568 Comm: kworker/0:5 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x15e/0x1d3 lib/dump_stack.c:105
ubsan_epilogue lib/ubsan.c:151 [inline]
handle_object_size_mismatch lib/ubsan.c:232 [inline]
ubsan_type_mismatch_common+0x1de/0x390 lib/ubsan.c:245
__ubsan_handle_type_mismatch_v1+0x41/0x50 lib/ubsan.c:274
__skb_insert include/linux/skbuff.h:1941 [inline]
__skb_queue_before include/linux/skbuff.h:2048 [inline]
__skb_queue_tail include/linux/skbuff.h:2081 [inline]
wg_xmit+0x53c/0xa60 drivers/net/wireguard/device.c:182
__netdev_start_xmit include/linux/netdevice.h:4970 [inline]
netdev_start_xmit+0x7b/0x140 include/linux/netdevice.h:4984
xmit_one net/core/dev.c:3576 [inline]
dev_hard_start_xmit+0x182/0x2e0 net/core/dev.c:3592
__dev_queue_xmit+0x13b0/0x21a0 net/core/dev.c:4202
neigh_output include/net/neighbour.h:510 [inline]
ip6_finish_output2+0xc51/0x11b0 net/ipv6/ip6_output.c:126
dst_output include/net/dst.h:450 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ndisc_send_skb+0x835/0xcf0 net/ipv6/ndisc.c:508
addrconf_dad_completed+0x6c5/0xa70 net/ipv6/addrconf.c:4203
addrconf_dad_work+0xba5/0x1510 net/ipv6/addrconf.c:3970
process_one_work+0x4b5/0x8d0 kernel/workqueue.c:2297
worker_thread+0x686/0x9e0 kernel/workqueue.c:2444
kthread+0x3ca/0x3f0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
================================================================================
IPv6: ADDRCONF(NETDEV_CHANGE): macsec0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): macsec0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): batadv_slave_0: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0_to_batadv: link becomes ready


Tested on:

commit: a9c9a6f7 Merge tag 'scsi-misc' of git://git.kernel.org..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=134c5b6d300000
kernel config: https://syzkaller.appspot.com/x/.config?x=5f845e3d82a95a0e
dashboard link: https://syzkaller.appspot.com/bug?extid=c613e88b3093ebf3686e
compiler: Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.1
patch: https://syzkaller.appspot.com/x/patch.diff?x=14673df5300000

syzbot

unread,
Sep 28, 2022, 5:58:30 PM9/28/22
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
No recent activity, existing reproducers are no longer triggering the issue.
Reply all
Reply to author
Forward
0 new messages