UBSAN: object-size-mismatch in wg_xmit

49 views
Skip to first unread message

syzbot

unread,
Dec 20, 2020, 11:54:13 AM12/20/20
to Ja...@zx2c4.com, da...@davemloft.net, ku...@kernel.org, linux-...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, wire...@lists.zx2c4.com
Hello,

syzbot found the following issue on:

HEAD commit: 5e60366d Merge tag 'fallthrough-fixes-clang-5.11-rc1' of g..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12b12c13500000
kernel config: https://syzkaller.appspot.com/x/.config?x=267a60b188ded8ed
dashboard link: https://syzkaller.appspot.com/bug?extid=8f90d005ab2d22342b6d
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project.git ca2dcbd030eadbf0aa9b660efe864ff08af6e18b)

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8f90d0...@syzkaller.appspotmail.com

================================================================================
UBSAN: object-size-mismatch in ./include/linux/skbuff.h:2021:28
member access within address 0000000085889cc2 with insufficient space
for an object of type 'struct sk_buff'
CPU: 1 PID: 2998 Comm: kworker/1:2 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x137/0x1be lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:148 [inline]
handle_object_size_mismatch lib/ubsan.c:297 [inline]
ubsan_type_mismatch_common+0x1e2/0x390 lib/ubsan.c:310
__ubsan_handle_type_mismatch_v1+0x41/0x50 lib/ubsan.c:339
__skb_queue_before include/linux/skbuff.h:2021 [inline]
__skb_queue_tail include/linux/skbuff.h:2054 [inline]
wg_xmit+0x45d/0xdf0 drivers/net/wireguard/device.c:182
__netdev_start_xmit include/linux/netdevice.h:4775 [inline]
netdev_start_xmit+0x7b/0x140 include/linux/netdevice.h:4789
xmit_one net/core/dev.c:3556 [inline]
dev_hard_start_xmit+0x182/0x2e0 net/core/dev.c:3572
__dev_queue_xmit+0x1229/0x1e60 net/core/dev.c:4133
neigh_output include/net/neighbour.h:510 [inline]
ip6_finish_output2+0xe8d/0x11e0 net/ipv6/ip6_output.c:117
dst_output include/net/dst.h:441 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ndisc_send_skb+0x85b/0xc70 net/ipv6/ndisc.c:508
addrconf_dad_completed+0x5ef/0x990 net/ipv6/addrconf.c:4192
addrconf_dad_work+0xb92/0x1480 net/ipv6/addrconf.c:3959
process_one_work+0x471/0x830 kernel/workqueue.c:2275
worker_thread+0x757/0xb10 kernel/workqueue.c:2421
kthread+0x39a/0x3c0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
================================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Jason A. Donenfeld

unread,
Dec 20, 2020, 4:11:12 PM12/20/20
to Netdev, syzkall...@googlegroups.com, WireGuard mailing list, Eric Dumazet
Hmm, on first glance, I'm not sure I'm seeing the bug:

On Sun, Dec 20, 2020 at 5:54 PM syzbot
<syzbot+8f90d0...@syzkaller.appspotmail.com> wrote:
> UBSAN: object-size-mismatch in ./include/linux/skbuff.h:2021:28
> member access within address 0000000085889cc2 with insufficient space
> for an object of type 'struct sk_buff'
> __skb_queue_before include/linux/skbuff.h:2021 [inline]
> __skb_queue_tail include/linux/skbuff.h:2054 [inline]
> wg_xmit+0x45d/0xdf0 drivers/net/wireguard/device.c:182

The code in question is:

struct sk_buff_head packets;
__skb_queue_head_init(&packets);
...
skb_list_walk_safe(skb, skb, next) {
skb_mark_not_on_list(skb);

skb = skb_share_check(skb, GFP_ATOMIC);
if (unlikely(!skb))
continue;
...
__skb_queue_tail(&packets, skb);
}

We're in a netdev's xmit function, so nothing else should have skb at
that point. Given the warning is about "member access", I assume it's
the next->prev dereference here:

static inline void __skb_queue_before(struct sk_buff_head *list,
struct sk_buff *next,
struct sk_buff *newsk)
{
__skb_insert(newsk, next->prev, next, list);
}

So where is "next" coming from that UBSAN would complain about
object-size-mismatch?

static inline void __skb_queue_tail(struct sk_buff_head *list,
struct sk_buff *newsk)
{
__skb_queue_before(list, (struct sk_buff *)list, newsk);
}

It comes from casting "list" into an sk_buff. While this might be some
CFI-violating polymorphism, I can't see why this cast would actually
be a problem in practice. The top of sk_buff is intentionally the same
as sk_buff_head:

struct sk_buff_head {
struct sk_buff *next;
struct sk_buff *prev;
...
struct sk_buff {
union {
struct {
struct sk_buff *next;
struct sk_buff *prev;
...

I'd suspect, "oh maybe it's just a clang 11 bug", but syzbot says it
can't reproduce. So that makes me a little more nervous.

Does anybody see something I've missed?

Jason

Dmitry Vyukov

unread,
Dec 21, 2020, 4:14:48 AM12/21/20
to Jason A. Donenfeld, Kees Cook, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
+Kees for UBSAN report questions

Hi Jason,

Thanks for looking into this.

Reading clang docs for ubsan:

https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
-fsanitize=object-size: An attempt to potentially use bytes which the
optimizer can determine are not part of the object being accessed.
This will also detect some types of undefined behavior that may not
directly access memory, but are provably incorrect given the size of
the objects involved, such as invalid downcasts and calling methods on
invalid pointers. These checks are made in terms of
__builtin_object_size, and consequently may be able to detect more
problems at higher optimization levels.

From skimming though your description this seems to fall into
"provably incorrect given the size of the objects involved".
I guess it's one of these cases which trigger undefined behavior and
compiler can e.g. remove all of this code assuming it will be never
called at runtime and any branches leading to it will always branch in
other directions, or something.

Jason A. Donenfeld

unread,
Dec 21, 2020, 6:23:18 AM12/21/20
to Dmitry Vyukov, Kees Cook, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
Hi Dmitry,

On Mon, Dec 21, 2020 at 10:14 AM Dmitry Vyukov <dvy...@google.com> wrote:
> Hi Jason,
>
> Thanks for looking into this.
>
> Reading clang docs for ubsan:
>
> https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html
> -fsanitize=object-size: An attempt to potentially use bytes which the
> optimizer can determine are not part of the object being accessed.
> This will also detect some types of undefined behavior that may not
> directly access memory, but are provably incorrect given the size of
> the objects involved, such as invalid downcasts and calling methods on
> invalid pointers. These checks are made in terms of
> __builtin_object_size, and consequently may be able to detect more
> problems at higher optimization levels.
>
> From skimming though your description this seems to fall into
> "provably incorrect given the size of the objects involved".
> I guess it's one of these cases which trigger undefined behavior and
> compiler can e.g. remove all of this code assuming it will be never
> called at runtime and any branches leading to it will always branch in
> other directions, or something.

Right that sort of makes sense, and I can imagine that in more general
cases the struct casting could lead to UB. But what has me scratching
my head is that syzbot couldn't reproduce. The cast happens every
time. What about that one time was special? Did the address happen to
fall on the border of a mapping? Is UBSAN non-deterministic as an
optimization? Or is there actually some mysterious UaF happening with
my usage of skbs that I shouldn't overlook?

Jason

Dmitry Vyukov

unread,
Jan 7, 2021, 7:22:40 AM1/7/21
to Jason A. Donenfeld, Kees Cook, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
These UBSAN checks were just enabled recently.
It's indeed super easy to trigger: 133083 VMs were crashed on this already:
https://syzkaller.appspot.com/bug?extid=8f90d005ab2d22342b6d
So it's one of the top crashers by now.

Jeffrey Walton

unread,
Jan 7, 2021, 7:53:50 AM1/7/21
to Jason A. Donenfeld, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
The object size checker depends upon compiler analysis. If the
compiler can determine the destination buffer size, then the compiler
can insert a call to a safer function, like a safer memcpy.

If the compiler cannot determine the destination buffer size, then the
compiler will not insert a call to a safer function. (And Wireguard
won't see the crash).

Note... The object size checker and use of safer functions when the
compiler can determine destination buffer sizes is quasi-automatic use
of the safer memory functions from TR 24731-1. They are the ones the
Glibc folks refuse to provide to developers.

Jeff

Julian Wiedmann

unread,
Jan 7, 2021, 12:01:43 PM1/7/21
to Jason A. Donenfeld, Dmitry Vyukov, Kees Cook, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
On 21.12.20 12:23, Jason A. Donenfeld wrote:
> Hi Dmitry,
>

...

> fall on the border of a mapping? Is UBSAN non-deterministic as an
> optimization? Or is there actually some mysterious UaF happening with
> my usage of skbs that I shouldn't overlook?
>

One oddity is that wg_xmit() returns negative errnos, rather than a
netdev_tx_t (ie. NETDEV_TX_OK or NETDEV_TX_BUSY).

Any chance that the stack mis-interprets one of those custom errnos
as NETDEV_TX_BUSY, and thus believes that it still owns the skb?

> Jason
>

Jason A. Donenfeld

unread,
Jan 7, 2021, 1:58:51 PM1/7/21
to Julian Wiedmann, Dmitry Vyukov, Kees Cook, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
The stack trace shows the splat happening as a result of
__skb_queue_tail, called from wg_xmit, not something that happens
after wg_xmit returns.

Jason A. Donenfeld

unread,
Jan 7, 2021, 2:00:29 PM1/7/21
to Dmitry Vyukov, Kees Cook, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
Ahh, makes sense. So it is easily reproducible after all.

You're still of the opinion that it's a false positive, right? I
shouldn't spend more cycles on this?


Jason

Jeffrey Walton

unread,
Jan 7, 2021, 2:06:58 PM1/7/21
to Jason A. Donenfeld, Netdev, syzkaller-bugs, WireGuard mailing list
On Thu, Jan 7, 2021 at 2:03 PM Jason A. Donenfeld <Ja...@zx2c4.com> wrote:
>
> On Thu, Jan 7, 2021 at 1:22 PM Dmitry Vyukov <dvy...@google.com> wrote:
> >
> > On Mon, Dec 21, 2020 at 12:23 PM Jason A. Donenfeld <Ja...@zx2c4.com> wrote:
> > >
> > > ...
> >
> > These UBSAN checks were just enabled recently.
> > It's indeed super easy to trigger: 133083 VMs were crashed on this already:
> > https://syzkaller.appspot.com/bug?extid=8f90d005ab2d22342b6d
> > So it's one of the top crashers by now.
>
> Ahh, makes sense. So it is easily reproducible after all.
>
> You're still of the opinion that it's a false positive, right? I
> shouldn't spend more cycles on this?

You might consider making a test build with -fno-lto in case LTO is
mucking things up.

Google Posts Patches So The Linux Kernel Can Be LTO-Optimized By
Clang, https://www.phoronix.com/scan.php?page=news_item&px=Linux-Kernel-Clang-LTO-Patches

Jeff

Corey Costello

unread,
Jan 7, 2021, 7:34:33 PM1/7/21
to nolo...@gmail.com, Jason A. Donenfeld, Netdev, syzkaller-bugs, WireGuard mailing list
Get me off this fucking list ffs.



> On Jan 7, 2021, at 2:06 PM, Jeffrey Walton <nolo...@gmail.com> wrote:
>
> On Thu, Jan 7, 2021 at 2:03 PM Jason A. Donenfeld <Ja...@zx2c4.com> wrote:
>>
>> On Thu, Jan 7, 2021 at 1:22 PM Dmitry Vyukov <dvy...@google.com> wrote:
>>>
>>> On Mon, Dec 21, 2020 at 12:23 PM Jason A. Donenfeld <Ja...@zx2c4.com> wrote:
>>>>
>>>> ...
>>>
>>> These UBSAN checks were just enabled recently.
>>> It's indeed super easy to trigger: 133083 VMs were crashed on this already:
>>> https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fsyzkaller.appspot.com%2fbug%3fextid%3d8f90d005ab2d22342b6d&c=E,1,RVpgZsRUCGs2jKlumiMAMnpeOF4QdiW5h8GDIsBJPz-orFNwvwCXnceC9n5Bhr1h-G2EsU0tlC7N4QUpHuF6tIMI7tTnBoRjAo5tT-Bk9-Fhe8CppuOL4mqdkA,,&typo=1
>>> So it's one of the top crashers by now.
>>
>> Ahh, makes sense. So it is easily reproducible after all.
>>
>> You're still of the opinion that it's a false positive, right? I
>> shouldn't spend more cycles on this?
>
> You might consider making a test build with -fno-lto in case LTO is
> mucking things up.
>
> Google Posts Patches So The Linux Kernel Can Be LTO-Optimized By
> Clang, https://linkprotect.cudasvc.com/url?a=https%3a%2f%2fwww.phoronix.com%2fscan.php%3fpage%3dnews_item%26px%3dLinux-Kernel-Clang-LTO-Patches&c=E,1,7u3-jWadklYo8ai_XrPNvjnu46LLAyg0hqsGIaMPaoQ5UxtcNM84jrHUgSg4VciXKk9XVpwgyBwD85LbbW5_j195jSH6RrAej45I1kr_XfQ,&typo=1
>
> Jeff

Dmitry Vyukov

unread,
Jan 8, 2021, 4:31:12 AM1/8/21
to Jason A. Donenfeld, Kees Cook, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
No, I am not saying this is a false positive. I think it's an
undefined behavior.
Either way, we need to resolve this one way or another to unblock
testing the rest of the kernel, if not with a fix to wg, then with a
fix to ubsan, or disable this check for kernel if kernel community
decides we want to use and keep this type of C undefined behavior in
the code base intentionally.
So far I see only 2 "UBSAN: object-size-mismatch" reports on the dashboard:
https://syzkaller.appspot.com/upstream
So cleaning them up looks doable. Is there a way to restructure the
code to not invoke undefined behavior?
Kees, do you have any suggestions on how to proceed? This seems to
stop testing of the whole kernel at the moment.

Dmitry Vyukov

unread,
Jan 8, 2021, 4:33:31 AM1/8/21
to nolo...@gmail.com, Jason A. Donenfeld, Netdev, syzkaller-bugs, WireGuard mailing list
Hi Jeff,

Are these patches upstream now?
syzbot doesn't enable LTO intentionally, nor I see CONFIG_LTO in the
provided config.

Kees Cook

unread,
Jan 8, 2021, 3:26:25 PM1/8/21
to Dmitry Vyukov, Jason A. Donenfeld, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
Right; that's my question as well.
 
Kees, do you have any suggestions on how to proceed? This seems to
stop testing of the whole kernel at the moment.

If it's blocking other stuff and there isn't a path to fixing it soon, then I think we'll need to disable this check (and open an issue to track it).

--
Kees Cook

Nathan Chancellor

unread,
Jan 8, 2021, 3:54:32 PM1/8/21
to Dmitry Vyukov, nolo...@gmail.com, Jason A. Donenfeld, Netdev, syzkaller-bugs, WireGuard mailing list
Those patches are not upstream yet and LTO will have to be explicitly
enabled via config.

Cheers,
Nathan

Dmitry Vyukov

unread,
Jan 9, 2021, 4:46:15 AM1/9/21
to Kees Cook, Jason A. Donenfeld, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
Oh, I see, the code is actually in skbuff.h:

static inline void __skb_queue_tail(struct sk_buff_head *list, struct
sk_buff *newsk)
{
__skb_queue_before(list, (struct sk_buff *)list, newsk);
}

It casts sk_buff_head to sk_buff relying on equal layout of some
prefix of these structs.
Is it really UB in C? UBSAN docs say:
"An attempt to potentially use bytes which the optimizer can determine
are not part of the object being accessed".
But C has POD layout for structs, right? These next/prev fields are
within sk_buff_head (otherwise things would explode).
I can imagine this may be not valid in C++, can this UBSAN check be
C++-specific? Or at least some subset of this check, I can imagine it
can detect bad bugs in C as well where things go really wrong.

If there is no quick solution proposed, I tend to disable this check
in syzbot for now. We need to clean at least common things like
sk_buff first.

Dmitry Vyukov

unread,
Jan 11, 2021, 12:17:29 PM1/11/21
to Kees Cook, Jason A. Donenfeld, Netdev, syzkaller-bugs, WireGuard mailing list, Eric Dumazet
FTR, I've disabled the following UBSAN configs:
UBSAN_MISC
UBSAN_DIV_ZERO
UBSAN_BOOL
UBSAN_OBJECT_SIZE
UBSAN_SIGNED_OVERFLOW
UBSAN_UNSIGNED_OVERFLOW
UBSAN_ENUM
UBSAN_ALIGNMENT
UBSAN_UNREACHABLE

Only these are enabled now:
UBSAN_BOUNDS
UBSAN_SHIFT

This is commit:
https://github.com/google/syzkaller/commit/2c1f2513486f21d26b1942ce77ffc782677fbf4e

Jeffrey Walton

unread,
Jan 11, 2021, 12:35:47 PM1/11/21
to Dmitry Vyukov, Netdev, syzkaller-bugs, WireGuard mailing list
On Mon, Jan 11, 2021 at 12:20 PM Dmitry Vyukov <dvy...@google.com> wrote:
> ...
> FTR, I've disabled the following UBSAN configs:
> UBSAN_MISC
> UBSAN_DIV_ZERO
> UBSAN_BOOL
> UBSAN_OBJECT_SIZE
> UBSAN_SIGNED_OVERFLOW
> UBSAN_UNSIGNED_OVERFLOW
> UBSAN_ENUM
> UBSAN_ALIGNMENT
> UBSAN_UNREACHABLE
>
> Only these are enabled now:
> UBSAN_BOUNDS
> UBSAN_SHIFT
>
> This is commit:
> https://github.com/google/syzkaller/commit/2c1f2513486f21d26b1942ce77ffc782677fbf4e

I think the commit cut too deep.

The overflows are important if folks are building with compilers other than GCC.

The aligned data accesses are important on platforms like MIPS64 and Sparc64.

Object size is important because it catches destination buffer overflows.

I don't know what's in miscellaneous. There may be something useful in there.

Jeff

Dmitry Vyukov

unread,
Jan 11, 2021, 12:58:47 PM1/11/21
to nolo...@gmail.com, Netdev, syzkaller-bugs, WireGuard mailing list
Hi Jeff,

See the commit for reasons why each of these is disabled.
E.g. object size, somebody first needs to fix bugs like this one.
While things like skbuff have these UBs on trivial workloads, there is
no point in involving fuzzing and making it crash on this trivial bug
all the time and stopping doing any other kernel testing as the
result.

Jeffrey Walton

unread,
Jan 11, 2021, 1:15:07 PM1/11/21
to Dmitry Vyukov, Netdev, syzkaller-bugs, WireGuard mailing list
Going off-topic a bit, what would you suggest for UBSAN_OBJECT_SIZE?

It seems to me object size checking is being conflated with object
type. It seems to me they need to be split: UBSAN_OBJECT_SIZE for
actual object sizes, and UBSAN_OBJECT_TYPE for the casts.

I still have a bitter taste in my mouth from
https://www.cvedetails.com/bugtraq-bid/57602/libupnp-Multiple-Buffer-Overflow-Vulnerabilities.html.
I hate to see buffer checks go away. (And I realize the kernel folks
are more skilled than the guy who wrote libupnp).

Jeff

Dmitry Vyukov

unread,
Jan 12, 2021, 4:54:33 AM1/12/21
to nolo...@gmail.com, Netdev, syzkaller-bugs, WireGuard mailing list
I've filed https://bugs.llvm.org/show_bug.cgi?id=48726 for this. Does
it capture what you are asking? Let's move the discussion re ubsan
there.

However, in the first place I am suggesting fixing the code. E.g. for
sk_buff I would assume it's relatively easily fixable. A
formally legal fix I think should put sk_buff_head into sk_buff and
use it, no downsides and will eliminate the confusing "should go
first" comments.
Or as an workaround maybe we could make __skb_queue_before accept
sk_buff_head and cast the other way around.

syzbot

unread,
Apr 13, 2021, 7:53:12 PM4/13/21
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages