net/rds: use-after-free in inet_create

78 views
Skip to first unread message

Dmitry Vyukov

unread,
Feb 28, 2017, 9:22:57 AM2/28/17
to santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
Hello,

I've got the following report while running syzkaller fuzzer on
linux-next/8d01c069486aca75b8f6018a759215b0ed0c91f0. So far it
happened only once. net was somehow deleted from underneath
inet_create. I've noticed that rds uses sock_create_kern which does
not take net reference. What is that that must keep net alive then?

==================================================================
BUG: KASAN: use-after-free in inet_create+0xdf5/0xf60
net/ipv4/af_inet.c:337 at addr ffff880150898704
Read of size 4 by task kworker/u4:6/3522
CPU: 0 PID: 3522 Comm: kworker/u4:6 Not tainted 4.10.0-next-20170228+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Workqueue: krdsd rds_connect_worker
Call Trace:
__asan_report_load4_noabort+0x29/0x30 mm/kasan/report.c:331
inet_create+0xdf5/0xf60 net/ipv4/af_inet.c:337
__sock_create+0x4e4/0x870 net/socket.c:1197
sock_create_kern+0x3f/0x50 net/socket.c:1243
rds_tcp_conn_path_connect+0x29b/0x9d0 net/rds/tcp_connect.c:108
rds_connect_worker+0x158/0x1e0 net/rds/threads.c:164
process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096
worker_thread+0x223/0x1990 kernel/workqueue.c:2230
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Object at ffff880150898200, in cache net_namespace size: 6784
Allocated:
PID = 3243
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:546
kmem_cache_alloc+0x102/0x680 mm/slab.c:3568
kmem_cache_zalloc include/linux/slab.h:653 [inline]
net_alloc net/core/net_namespace.c:339 [inline]
copy_net_ns+0x196/0x530 net/core/net_namespace.c:379
create_new_namespaces+0x409/0x860 kernel/nsproxy.c:106
copy_namespaces+0x34d/0x420 kernel/nsproxy.c:164
copy_process.part.42+0x223b/0x4d50 kernel/fork.c:1675
copy_process kernel/fork.c:1497 [inline]
_do_fork+0x200/0xff0 kernel/fork.c:1960
SYSC_clone kernel/fork.c:2070 [inline]
SyS_clone+0x37/0x50 kernel/fork.c:2064
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:280
return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 3544
__cache_free mm/slab.c:3510 [inline]
kmem_cache_free+0x71/0x240 mm/slab.c:3770
net_free+0xd7/0x110 net/core/net_namespace.c:355
net_drop_ns+0x31/0x40 net/core/net_namespace.c:362
cleanup_net+0x7f4/0xa90 net/core/net_namespace.c:479
process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096
worker_thread+0x223/0x1990 kernel/workqueue.c:2230
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Memory state around the buggy address:
ffff880150898600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880150898680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff880150898700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff880150898780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880150898800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Sowmini Varadhan

unread,
Feb 28, 2017, 10:37:47 AM2/28/17
to Dmitry Vyukov, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
On (02/28/17 15:22), Dmitry Vyukov wrote:
>
> Hello,
>
> I've got the following report while running syzkaller fuzzer on
> linux-next/8d01c069486aca75b8f6018a759215b0ed0c91f0. So far it
> happened only once. net was somehow deleted from underneath
> inet_create. I've noticed that rds uses sock_create_kern which does
> not take net reference. What is that that must keep net alive then?

The rds_connection (which is where the net pointer is being obtained from)
should keep the connection alive. Did you have the rds[_tcp] modules
loaded at the time of failure? Were there kernel tcp sockets to/from
the 16385 port? any hints on what else the test was doing (was it
running a userspace RDS application that triggered the kernel TCP
connection attempt in the first place)?

--Sowmini

Dmitry Vyukov

unread,
Feb 28, 2017, 10:50:06 AM2/28/17
to Sowmini Varadhan, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
Here is syzkaller log before the crash:
https://gist.githubusercontent.com/dvyukov/8bb6a4c6543597c9598d5771258889fe/raw/08bd950bb69071a260046b0bcc5ab85701aea8e7/gistfile1.txt
Separate tests are separated by "executing program" lines. If a crash
happens within a user process context, it's possible to figure out
what exactly program triggered the bug. But this happened in a kernel
thread context, so I have no glues so far.

Grepping "socket" there, it was doing lots of things with sockets. Are
we looking for some particular socket type? If there are few programs
that create sockets of that type, then we can narrow down the set:

r1 = socket(0x11, 0x5, 0xa)
socket(0x4, 0xffffffffffffffff, 0x0)
socketpair(0x7, 0x805, 0x6,
&(0x7f0000fd0000-0x8)={<r0=>0xffffffffffffffff, 0xffffffffffffffff})
socketpair(0x2, 0x80a, 0x8001,
&(0x7f0000fd1000-0x8)={0xffffffffffffffff, <r1=>0xffffffffffffffff})
socket$alg(0x26, 0x5, 0x0)
socket$sctp6(0xa, 0x8000000001, 0x84)
r10 = socket(0x10, 0x802, 0x0)
socketpair(0x10, 0x0, 0x3,
&(0x7f0000e54000)={<r16=>0xffffffffffffffff, 0xffffffffffffffff})
socket(0x2002, 0x1, 0x7f)
r8 = socket$sctp6(0xa, 0x1, 0x84)
socket(0x0, 0xa, 0x0)
socket(0x0, 0x0, 0x1)
socketpair$unix(0x1, 0x1, 0x0,
&(0x7f0000995000-0x8)={<r14=>0xffffffffffffffff,
<r15=>0xffffffffffffffff})
r1 = socket(0x2, 0x2, 0x0)
r5 = socket$alg(0x26, 0x5, 0x0)
r6 = socket$kcm(0x29, 0x2, 0x0)
r7 = socket$netlink(0x10, 0x3, 0x0)
r10 = socket(0x10, 0x3, 0x0)
r1 = socket(0x4, 0xffffffffffffffff, 0x0)
r2 = socket(0xa, 0x6, 0x0)
r6 = socket(0x2, 0x5, 0x0)
r11 = socket(0xa, 0x2, 0x0)
r12 = socket(0xa, 0x2, 0x0)
socket(0x1, 0x80007, 0xfffffffffffffffd)
socketpair$sctp(0x2, 0x1, 0x84,
&(0x7f0000000000)={<r14=>0xffffffffffffffff,
<r15=>0xffffffffffffffff})
r16 = socket$bt_hci(0x1f, 0x3, 0x1)
r18 = socket(0x10000000a, 0x80001, 0x0)
socket$sctp6(0xa, 0x1, 0x84)
socket$alg(0x26, 0x5, 0x0)
socketpair$unix(0x1, 0x4000000000000003, 0x0,
&(0x7f0000fc1000-0x8)={0xffffffffffffffff, 0xffffffffffffffff})
socketpair$unix(0x1, 0x4000000000001, 0x0,
&(0x7f0000194000)={<r22=>0xffffffffffffffff,
<r23=>0xffffffffffffffff})
socket$bt_bnep(0x1f, 0x3, 0x4)
r0 = socket(0x10, 0x7, 0x8)
r2 = socket$alg(0x26, 0x5, 0x0)
r1 = socket$tcp(0x2, 0x1, 0x0)
r1 = socket(0x0, 0x2, 0x0)
r2 = socket$alg(0x26, 0x5, 0x0)
r4 = socket(0xa, 0x0, 0x40)
r8 = socket$bt_sco(0x1f, 0x5, 0x2)
socketpair$unix(0x1, 0x0, 0x0,
&(0x7f0000024000-0x8)={<r11=>0xffffffffffffffff, 0xffffffffffffffff})
socket$nfc_raw(0x27, 0x3, 0x0)
r15 = socket(0xb, 0x6, 0x0)
socketpair$unix(0x1, 0x5, 0x0,
&(0x7f000002f000-0x8)={0xffffffffffffffff, 0xffffffffffffffff})
r16 = socket(0x10, 0x802, 0x800000010)
socket$sctp6(0xa, 0x1, 0x84)
socket$alg(0x26, 0x5, 0x0)
r3 = socket(0xa, 0x1, 0x0)
r13 = socket(0x10, 0x802, 0x0)
r0 = socket$netlink(0x10, 0x3, 0x10)
socketpair(0x1, 0x80f, 0x7,
&(0x7f0000b67000)={<r0=>0xffffffffffffffff, 0xffffffffffffffff})
r2 = socket$alg(0x26, 0x5, 0x0)
socket$bt_hidp(0x1f, 0x3, 0x6)
socket$bt_bnep(0x1f, 0x3, 0x4)
socket$sctp(0x2, 0x1, 0x84)
r2 = socket(0x2, 0x3, 0x6)
r4 = socket(0x11, 0x802, 0x300)
r0 = socket$kcm(0x29, 0x5, 0x0)
r3 = socket$alg(0x26, 0x5, 0x0)
socketpair$unix(0x1, 0x5, 0x0,
&(0x7f0000510000)={<r8=>0xffffffffffffffff, <r9=>0xffffffffffffffff})
r1 = socket$alg(0x26, 0x5, 0x0)
r0 = socket$bt_cmtp(0x1f, 0x3, 0x5)
socket$unix(0x1, 0x80000000000200, 0x0)
socketpair$unix(0x1, 0x5, 0x0,
&(0x7f0000b30000)={<r6=>0xffffffffffffffff, <r7=>0xffffffffffffffff})
r0 = socket(0xa, 0x1, 0x0)
r7 = socket(0xa, 0x2, 0x41)
r5 = socket(0xa, 0x2, 0x88)
r4 = socket(0xa, 0x2, 0x88)
r0 = socket$icmp6_raw(0xa, 0x3, 0x3a)
r1 = socket(0xa, 0x5, 0x0)
socket$icmp6(0xa, 0x2, 0x3a)
socket$icmp6_raw(0xa, 0x3, 0x3a)

Sowmini Varadhan

unread,
Feb 28, 2017, 11:15:54 AM2/28/17
to Dmitry Vyukov, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
On (02/28/17 16:49), Dmitry Vyukov wrote:
>
> Grepping "socket" there, it was doing lots of things with sockets. Are
> we looking for some particular socket type? If there are few programs
> that create sockets of that type, then we can narrow down the set:

Yes, we are looking for PF_RDS/AF_RDS - this should be
#define AF_RDS 21 /* RDS sockets */

I see PF_KCM there (value 41) but no instances of 0x15.. how did
the rds_connect_worker thread get kicked off at all?

the way this is supposed to work is
1. someone modprobes rds-tcp
2. app tries to do rds_sendmsg to some ip address in a netns - this triggers the
creation of an rds_connection, and subsequent kernel socket TCP connection
threads (i.e., rds_connect_worker) for that netns
3. if you unload rds-tcp, the module_unload should do all the cleanup
needed via rds_tcp_conn_paths_destroy. This is done
Its not clear to me that the test is doing any of this...

is this reproducible? let me check if there is some race window where
we can restart a connection attempt when rds_tcp_kill_sock assumes
that the connect worker has been quiesced..

--Sowmini

Dmitry Vyukov

unread,
Feb 28, 2017, 11:33:19 AM2/28/17
to Sowmini Varadhan, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
Not reproducible so far.

rds is compiled into kernel (no modules):
CONFIG_RDS=y
CONFIG_RDS_TCP=y

Also fuzzer actively creates and destroys namespaces.

Yes, I don't see socket(0x15) in the log. Probably it was truncated.

Sowmini Varadhan

unread,
Feb 28, 2017, 11:38:45 AM2/28/17
to Dmitry Vyukov, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
On (02/28/17 17:32), Dmitry Vyukov wrote:
> Not reproducible so far.
>
> rds is compiled into kernel (no modules):
> CONFIG_RDS=y
> CONFIG_RDS_TCP=y

I see. So if it never gets unloaded, the rds_connections "should"
be around forever.. let me inspect code and see if I spot some
race-window..

> Also fuzzer actively creates and destroys namespaces.
> Yes, I don't see socket(0x15) in the log. Probably it was truncated.

I see. May be useful if we coudl get a crash dump to see what
other threads were going on (might give a hint about which threads
were racing). I'll try reproducing this at my end too.

--Sowmini

Dmitry Vyukov

unread,
Feb 28, 2017, 11:51:23 AM2/28/17
to Sowmini Varadhan, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
Searching other crashes for "net/rds" I found 2 more crashes that may
be related. They suggest that the delayed works are not properly
stopped when the socket is destroyed. That would explain how
rds_connect_worker accesses freed net, right?


BUG: KASAN: use-after-free in memcmp+0xe3/0x160 lib/string.c:768 at
addr ffff88018d49cb20
Read of size 1 by task kworker/u4:4/3546
CPU: 1 PID: 3546 Comm: kworker/u4:4 Not tainted 4.9.0 #7
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Workqueue: krdsd rds_send_worker
ffff8801ccd46628 ffffffff8234ce1f ffffffff00000001 1ffff100399a8c58
ffffed00399a8c50 0000000041b58ab3 ffffffff84b38258 ffffffff8234cb31
0000000000000000 00000000000010bf 000000008156afb0 ffffffff858c8e58
Call Trace:
[<ffffffff8234ce1f>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff8234ce1f>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
[<ffffffff819e242c>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
[<ffffffff819e26c5>] print_address_description mm/kasan/report.c:200 [inline]
[<ffffffff819e26c5>] kasan_report_error mm/kasan/report.c:289 [inline]
[<ffffffff819e26c5>] kasan_report.part.2+0x1e5/0x4b0 mm/kasan/report.c:311
[<ffffffff819e29b9>] kasan_report mm/kasan/report.c:329 [inline]
[<ffffffff819e29b9>] __asan_report_load1_noabort+0x29/0x30
mm/kasan/report.c:329
[<ffffffff82377e13>] memcmp+0xe3/0x160 lib/string.c:768
[<ffffffff83e8febe>] rhashtable_compare include/linux/rhashtable.h:556 [inline]
[<ffffffff83e8febe>] __rhashtable_lookup
include/linux/rhashtable.h:578 [inline]
[<ffffffff83e8febe>] rhashtable_lookup include/linux/rhashtable.h:610 [inline]
[<ffffffff83e8febe>] rhashtable_lookup_fast
include/linux/rhashtable.h:636 [inline]
[<ffffffff83e8febe>] rds_find_bound+0x4fe/0x8a0 net/rds/bind.c:63
[<ffffffff83e9d03c>] rds_recv_incoming+0x5fc/0x1300 net/rds/recv.c:313
[<ffffffff83eac385>] rds_loop_xmit+0x1c5/0x480 net/rds/loop.c:82
[<ffffffff83ea468a>] rds_send_xmit+0x104a/0x2420 net/rds/send.c:348
[<ffffffff83eab602>] rds_send_worker+0x122/0x2a0 net/rds/threads.c:189
[<ffffffff81492c00>] process_one_work+0xbd0/0x1c10 kernel/workqueue.c:2096
[<ffffffff81493e63>] worker_thread+0x223/0x1990 kernel/workqueue.c:2230
[<ffffffff814abd53>] kthread+0x323/0x3e0 kernel/kthread.c:209
[<ffffffff84377b2a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
Object at ffff88018d49c6c0, in cache RDS size: 1464
Allocated:
PID = 5431
[ 40.943107] [<ffffffff8129c696>] save_stack_trace+0x16/0x20
arch/x86/kernel/stacktrace.c:57
[ 40.950346] [<ffffffff819e16c3>] save_stack+0x43/0xd0 mm/kasan/kasan.c:495
[ 40.957064] [<ffffffff819e194a>] set_track mm/kasan/kasan.c:507 [inline]
[ 40.957064] [<ffffffff819e194a>] kasan_kmalloc+0xaa/0xd0
mm/kasan/kasan.c:598
[ 40.964040] [<ffffffff819e1f42>] kasan_slab_alloc+0x12/0x20
mm/kasan/kasan.c:537
[ 40.971282] [<ffffffff819dd592>] kmem_cache_alloc+0x102/0x680 mm/slab.c:3573
[ 40.978696] [<ffffffff835017e5>] sk_prot_alloc+0x65/0x2a0
net/core/sock.c:1327
[ 40.985766] [<ffffffff8350a20c>] sk_alloc+0x8c/0x460 net/core/sock.c:1389
[ 40.992398] [<ffffffff83e8c90c>] rds_create+0x11c/0x5e0 net/rds/af_rds.c:504
[ 40.999296] [<ffffffff834f9f24>] __sock_create+0x4e4/0x870 net/socket.c:1168
[ 41.006446] [<ffffffff834fa4e9>] sock_create net/socket.c:1208 [inline]
[ 41.006446] [<ffffffff834fa4e9>] SYSC_socket net/socket.c:1238 [inline]
[ 41.006446] [<ffffffff834fa4e9>] SyS_socket+0xf9/0x230 net/socket.c:1218
[ 41.013251] [<ffffffff843778c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2
Freed:
PID = 5431
[ 41.025881] [<ffffffff8129c696>] save_stack_trace+0x16/0x20
arch/x86/kernel/stacktrace.c:57
[ 41.033124] [<ffffffff819e16c3>] save_stack+0x43/0xd0 mm/kasan/kasan.c:495
[ 41.039840] [<ffffffff819e1fbf>] set_track mm/kasan/kasan.c:507 [inline]
[ 41.039840] [<ffffffff819e1fbf>] kasan_slab_free+0x6f/0xb0
mm/kasan/kasan.c:571
[ 41.046992] [<ffffffff819df361>] __cache_free mm/slab.c:3515 [inline]
[ 41.046992] [<ffffffff819df361>] kmem_cache_free+0x71/0x240 mm/slab.c:3775
[ 41.054232] [<ffffffff835054ed>] sk_prot_free net/core/sock.c:1370 [inline]
[ 41.054232] [<ffffffff835054ed>] __sk_destruct+0x47d/0x6a0
net/core/sock.c:1445
[ 41.061383] [<ffffffff8350fa77>] sk_destruct+0x47/0x80 net/core/sock.c:1453
[ 41.068199] [<ffffffff8350fb07>] __sk_free+0x57/0x230 net/core/sock.c:1461
[ 41.074921] [<ffffffff8350fd03>] sk_free+0x23/0x30 net/core/sock.c:1472
[ 41.081398] [<ffffffff83e8c488>] sock_put include/net/sock.h:1591 [inline]
[ 41.081398] [<ffffffff83e8c488>] rds_release+0x358/0x500 net/rds/af_rds.c:89
[ 41.088376] [<ffffffff834f258d>] sock_release+0x8d/0x1e0 net/socket.c:585
[ 41.095358] [<ffffffff834f26f6>] sock_close+0x16/0x20 net/socket.c:1032
[ 41.102083] [<ffffffff81a34772>] __fput+0x332/0x7f0 fs/file_table.c:208
[ 41.108628] [<ffffffff81a34cb5>] ____fput+0x15/0x20 fs/file_table.c:244
[ 41.115184] [<ffffffff814a58ca>] task_work_run+0x18a/0x260
kernel/task_work.c:116
[ 41.122337] [<ffffffff8100793b>] tracehook_notify_resume
include/linux/tracehook.h:191 [inline]
[ 41.122337] [<ffffffff8100793b>] exit_to_usermode_loop+0x23b/0x2a0
arch/x86/entry/common.c:160
[ 41.130193] [<ffffffff81009413>] prepare_exit_to_usermode
arch/x86/entry/common.c:190 [inline]
[ 41.130193] [<ffffffff81009413>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
[ 41.138220] [<ffffffff84377962>] entry_SYSCALL_64_fastpath+0xc0/0xc2
Memory state around the buggy address:
ffff88018d49ca00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88018d49ca80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88018d49cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88018d49cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88018d49cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
==================================================================


BUG: KASAN: use-after-free in memcmp+0xe3/0x160 lib/string.c:768 at
addr ffff88006a2b84b0
Read of size 1 by task kworker/u8:0/5
CPU: 0 PID: 5 Comm: kworker/u8:0 Not tainted 4.10.0-rc8+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: krdsd rds_send_worker
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x292/0x398 lib/dump_stack.c:51
kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
print_address_description mm/kasan/report.c:200 [inline]
kasan_report_error mm/kasan/report.c:289 [inline]
kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
kasan_report mm/kasan/report.c:329 [inline]
__asan_report_load1_noabort+0x29/0x30 mm/kasan/report.c:329
memcmp+0xe3/0x160 lib/string.c:768
rhashtable_compare include/linux/rhashtable.h:556 [inline]
__rhashtable_lookup include/linux/rhashtable.h:578 [inline]
rhashtable_lookup include/linux/rhashtable.h:610 [inline]
rhashtable_lookup_fast include/linux/rhashtable.h:636 [inline]
rds_find_bound+0x4fe/0x8a0 net/rds/bind.c:63
rds_recv_incoming+0x5f3/0x12c0 net/rds/recv.c:349
rds_loop_xmit+0x1c5/0x490 net/rds/loop.c:82
rds_send_xmit+0x1170/0x24a0 net/rds/send.c:349
rds_send_worker+0x12b/0x2b0 net/rds/threads.c:188
process_one_work+0xc06/0x1c20 kernel/workqueue.c:2098
worker_thread+0x223/0x19c0 kernel/workqueue.c:2232
hrtimer: interrupt took 2979772 ns
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Object at ffff88006a2b8040, in cache RDS size: 1480
Allocated:
PID = 5235
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
slab_post_alloc_hook mm/slab.h:432 [inline]
slab_alloc_node mm/slub.c:2715 [inline]
slab_alloc mm/slub.c:2723 [inline]
kmem_cache_alloc+0x1af/0x250 mm/slub.c:2728
sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
sk_alloc+0x105/0x1010 net/core/sock.c:1396
rds_create+0x11c/0x600 net/rds/af_rds.c:504
__sock_create+0x4f6/0x880 net/socket.c:1199
sock_create net/socket.c:1239 [inline]
SYSC_socket net/socket.c:1269 [inline]
SyS_socket+0xf9/0x230 net/socket.c:1249
entry_SYSCALL_64_fastpath+0x1f/0xc2
Freed:
PID = 5235
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
save_stack+0x43/0xd0 mm/kasan/kasan.c:502
set_track mm/kasan/kasan.c:514 [inline]
kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
slab_free_hook mm/slub.c:1355 [inline]
slab_free_freelist_hook mm/slub.c:1377 [inline]
slab_free mm/slub.c:2958 [inline]
kmem_cache_free+0xb2/0x2c0 mm/slub.c:2980
sk_prot_free net/core/sock.c:1377 [inline]
__sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
sk_destruct+0x47/0x80 net/core/sock.c:1460
__sk_free+0x57/0x230 net/core/sock.c:1468
sk_free+0x23/0x30 net/core/sock.c:1479
sock_put include/net/sock.h:1638 [inline]
rds_release+0x3a1/0x4d0 net/rds/af_rds.c:89
sock_release+0x8d/0x1e0 net/socket.c:599
sock_close+0x16/0x20 net/socket.c:1063
__fput+0x332/0x7f0 fs/file_table.c:208
____fput+0x15/0x20 fs/file_table.c:244
task_work_run+0x19b/0x270 kernel/task_work.c:116
tracehook_notify_resume include/linux/tracehook.h:191 [inline]
exit_to_usermode_loop+0x1c2/0x200 arch/x86/entry/common.c:160
prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline]
syscall_return_slowpath+0x3d3/0x420 arch/x86/entry/common.c:259
entry_SYSCALL_64_fastpath+0xc0/0xc2
Memory state around the buggy address:
ffff88006a2b8380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88006a2b8400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88006a2b8480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88006a2b8500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88006a2b8580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Sowmini Varadhan

unread,
Feb 28, 2017, 12:33:39 PM2/28/17
to Dmitry Vyukov, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
On (02/28/17 17:51), Dmitry Vyukov wrote:
> Searching other crashes for "net/rds" I found 2 more crashes that may
> be related. They suggest that the delayed works are not properly
> stopped when the socket is destroyed. That would explain how
> rds_connect_worker accesses freed net, right?

yes, I think we may want to explicitly cancel this workq.. this
in rds_conn_destroy().

I'm trying to build/sanity-test (if lucky, reproduce the bug)
as I send this out.. let me get back to you..

If I have a patch against net-next, would you be willing/able to
try it out? given that this does not show up on demand, I'm not
sure how we can check that "the fix worked"..

--Sowmini

Dmitry Vyukov

unread,
Feb 28, 2017, 12:46:03 PM2/28/17
to Sowmini Varadhan, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
Yes, I can now apply custom patches to the bots. However, it fired
only 3 times, so it will give weak signal. But at least it will test
that the patch does not cause other bad things.

Sowmini Varadhan

unread,
Feb 28, 2017, 12:48:49 PM2/28/17
to Dmitry Vyukov, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
On (02/28/17 18:45), Dmitry Vyukov wrote:
>
> Yes, I can now apply custom patches to the bots. However, it fired
> only 3 times, so it will give weak signal. But at least it will test
> that the patch does not cause other bad things.

Ok, let me do my bit of homework on this one and get back to you
(probably by tomorrow)..


Sowmini Varadhan

unread,
Feb 28, 2017, 4:06:35 PM2/28/17
to Dmitry Vyukov, syzkaller, net...@vger.kernel.org
Just posted an RFC patch, that I'm also testing here..
hopefully we'll se the pr_info light up, and know that the problematic
situation actually happened (I'll remove the pr_info if/when this
gets submitted as a non-RFC patch).. thanks for helping with testing
this..

--Sowmini

Dmitry Vyukov

unread,
Feb 28, 2017, 4:15:01 PM2/28/17
to Sowmini Varadhan, syzkaller, netdev
But the other 2 use-after-frees happened on cp->cp_send_w. Shouldn't
we cancel it as well? And cp_recv_w?

Sowmini Varadhan

unread,
Feb 28, 2017, 4:37:44 PM2/28/17
to Dmitry Vyukov, syzkaller, netdev
On (03/01/17 00:14), Dmitry Vyukov wrote:
>
> But the other 2 use-after-frees happened on cp->cp_send_w. Shouldn't
> we cancel it as well? And cp_recv_w?

yes, good point, I missed that. let me see if I can refactor the code
to release the netns as the last thing before free..

Sowmini Varadhan

unread,
Feb 28, 2017, 5:24:22 PM2/28/17
to Dmitry Vyukov, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller

Actually, I'm not sure if I can assert that these are all manifestations
of the same bug- was a netns-delete involved in this one as well?

I see:

> BUG: KASAN: use-after-free in memcmp+0xe3/0x160 lib/string.c:768 at
:
> memcmp+0xe3/0x160 lib/string.c:768
:
> rds_find_bound+0x4fe/0x8a0 net/rds/bind.c:63
> rds_recv_incoming+0x5f3/0x12c0 net/rds/recv.c:349
> rds_loop_xmit+0x1c5/0x490 net/rds/loop.c:82
:
This appears to be for a looped back packet, and looks like there
are problems with some rds_sock that got removed from the bind_hash_table..

According to the report, socket was created at
> Allocated:
> PID = 5235
:
> sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
> sk_alloc+0x105/0x1010 net/core/sock.c:1396
> rds_create+0x11c/0x600 net/rds/af_rds.c:504

and closed at some point:
> Freed:
> PID = 5235
:
> rds_release+0x3a1/0x4d0 net/rds/af_rds.c:89
> sock_release+0x8d/0x1e0 net/socket.c:599

This is all uspace created rds sockets, and while there may be an
unrelated bug here, I'm not sure I see the netns/kernel-socket
connection.. can you please clarify if this was also seen in some netns
context?

--Sowmini





Dmitry Vyukov

unread,
Mar 1, 2017, 4:47:47 AM3/1/17
to Sowmini Varadhan, santosh....@oracle.com, David Miller, netdev, linux...@vger.kernel.org, rds-...@oss.oracle.com, LKML, Eric Dumazet, syzkaller
Yes, these test processes run in private net namespaces.
Reply all
Reply to author
Forward
0 new messages