BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit

12 views
Skip to first unread message

syzbot

unread,
Dec 18, 2017, 3:43:04 AM12/18/17
to da...@davemloft.net, linux-...@vger.kernel.org, linux...@vger.kernel.org, net...@vger.kernel.org, rds-...@oss.oracle.com, santosh....@oracle.com, syzkall...@googlegroups.com
Hello,

syzkaller hit the following crash on
6084b576dca2e898f5c101baef151f7bfdbb606d
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.

Unfortunately, I don't have any reproducer for this bug yet.


BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
program syz-executor6 is using a deprecated SCSI ioctl, please convert it
to SG_IO
IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186
PGD 20e367067 P4D 20e367067 PUD 2118c1067 PMD 0
Oops: 0000 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4514 Comm: kworker/u4:4 Not tainted 4.15.0-rc3-next-20171214+
#67
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: krdsd rds_send_worker
RIP: 0010:rds_send_xmit+0x80/0x930 net/rds/send.c:186
RSP: 0018:ffffc90000f6fdc0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88020e3234c0 RCX: ffffffff8241b25c
RDX: 0000000000000000 RSI: ffffffff83080700 RDI: ffff88020e323400
RBP: ffffc90000f6fe28 R08: 0000000000000001 R09: 0000000000000004
R10: ffffc90000f6fde0 R11: 0000000000000004 R12: ffff88020e3234c0
R13: ffff88020e323400 R14: ffff88021780d800 R15: ffff8802150fc600
FS: 0000000000000000(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000028 CR3: 00000002119da000 CR4: 00000000001406f0
DR0: 0000000020000000 DR1: 0000000020001008 DR2: 0000000020001010
DR3: 0000000020000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Call Trace:
rds_send_worker+0x37/0x100 net/rds/threads.c:189
process_one_work+0x288/0x7a0 kernel/workqueue.c:2112
worker_thread+0x43/0x4d0 kernel/workqueue.c:2246
kthread+0x149/0x170 kernel/kthread.c:238
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:524
Code: 00 00 00 48 83 c0 01 48 89 45 a0 49 89 85 a8 00 00 00 41 8b 85 a0 00
00 00 83 f8 03 0f 85 4f 08 00 00 e8 e4 f0 e9 fe 48 8b 45 b8 <48> 8b 40 28
48 8b 58 58 48 85 db 74 0a e8 ce f0 e9 fe 4c 89 ef
RIP: rds_send_xmit+0x80/0x930 net/rds/send.c:186 RSP: ffffc90000f6fdc0
CR2: 0000000000000028
---[ end trace 1bd85784f8eb115b ]---
Kernel panic - not syncing: Fatal exception
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.
Please credit me with: Reported-by: syzbot <syzk...@googlegroups.com>

syzbot will keep track of this bug report.
Once a fix for this bug is merged into any tree, reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.
config.txt
raw.log

Sowmini Varadhan

unread,
Dec 18, 2017, 8:55:35 AM12/18/17
to syzbot, net...@vger.kernel.org, rds-...@oss.oracle.com, syzkall...@googlegroups.com
On (12/18/17 00:43), syzbot wrote:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> program syz-executor6 is using a deprecated SCSI ioctl, please convert it to
> SG_IO
> IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186

conn->c_trans is at offset 0x28.

Both this and https://marc.info/?l=linux-netdev&m=151360062922798&w=2
are manifestations of the same bug: somehow the cp_send_w is still
getting queued incorrectly after the conn destroy is initiated (commit
681648e67d fixes one such window, maybe there are others).
Let me look at how this slipped through the cracks.

--Sowmini


Santosh Shilimkar

unread,
Dec 18, 2017, 11:28:15 AM12/18/17
to syzbot, da...@davemloft.net, linux-...@vger.kernel.org, linux...@vger.kernel.org, net...@vger.kernel.org, rds-...@oss.oracle.com, syzkall...@googlegroups.com
On 12/18/2017 12:43 AM, syzbot wrote:
> Hello,
>
> syzkaller hit the following crash on
> 6084b576dca2e898f5c101baef151f7bfdbb606d
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
>
> Unfortunately, I don't have any reproducer for this bug yet.
>
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> program syz-executor6 is using a deprecated SCSI ioctl, please convert
> it to SG_IO
> IP: rds_send_xmit+0x80/0x930 net/rds/send.c:186

Looks like another one tripping on empty transport. Mostly below should
address it but we will test it if it does.

diff --git a/net/rds/send.c b/net/rds/send.c
index 7244d2e..e2d0eaa 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -183,7 +183,7 @@ int rds_send_xmit(struct rds_conn_path *cp)
goto out;
}

- if (conn->c_trans->xmit_path_prepare)
+ if (conn->c_trans && conn->c_trans->xmit_path_prepare)
conn->c_trans->xmit_path_prepare(cp);



David Miller

unread,
Dec 18, 2017, 12:12:17 PM12/18/17
to santosh....@oracle.com, bot+aaf54a8c644d559d34...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, linux...@vger.kernel.org, net...@vger.kernel.org, rds-...@oss.oracle.com, syzkall...@googlegroups.com
From: Santosh Shilimkar <santosh....@oracle.com>
Date: Mon, 18 Dec 2017 08:28:05 -0800
We're seeming to accumulate a lot of checks like this, maybe there
is a more general way to deal with this problem?

Santosh Shilimkar

unread,
Dec 18, 2017, 12:16:13 PM12/18/17
to David Miller, bot+aaf54a8c644d559d34...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, linux...@vger.kernel.org, net...@vger.kernel.org, rds-...@oss.oracle.com, syzkall...@googlegroups.com
Agree. Some of these additional transports hooks got added later
to specific transports which needs them. Will review this overall
and see if it can be addressed generically.

Regards,
Santosh

Sowmini Varadhan

unread,
Dec 18, 2017, 12:22:59 PM12/18/17
to David Miller, santosh....@oracle.com, rds-...@oss.oracle.com, bot+aaf54a8c644d559d34...@syzkaller.appspotmail.com, linux...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, linux-...@vger.kernel.org
> From: Santosh Shilimkar <santosh....@oracle.com>
> Date: Mon, 18 Dec 2017 08:28:05 -0800
:
> > Looks like another one tripping on empty transport. Mostly below
> > should
> > address it but we will test it if it does.

that was my first thought, but it cannot be the case here: rds_sendmsg
etc itself would have bombed if that were the case, and the packet
would never have gotten queued.

This is unlike f3069c6d33, where an applications skips the transport
binding (either misses the explicit bind, or gets the wrong transport
due to an implicit bind) before it triggers the setsockopt.

I suspect that the problems is that the conn (and thus c_trans)
have gotten destroyed, but the cp_send_w work got incorrectly
re-queued. For example, rds_cong_queue_updates() (because the
peer sent a congestion update) can happen in softirq context,
and would end up requeing work in the middle of rds_conn_destroy,
after we have assumed that everything is quisced.

On (12/18/17 12:12), David Miller wrote:
>
> We're seeming to accumulate a lot of checks like this, maybe there
> is a more general way to deal with this problem?

Yeah, I was thinking about this.. let me try to reprodcue this in-house
and get back with a patchset.

--Sowmini


Eric Biggers

unread,
Jan 30, 2018, 5:22:31 PM1/30/18
to Sowmini Varadhan, David Miller, santosh....@oracle.com, rds-...@oss.oracle.com, bot+aaf54a8c644d559d34...@syzkaller.appspotmail.com, linux...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, linux-...@vger.kernel.org
I assume you weren't able to reproduce this? This crash hasn't been seen again,
and it was reported while KASAN was accidentally disabled in the syzbot kconfig
due to a change to the kconfig menus in linux-next. So this crash was possibly
caused by slab corruption elsewhere.

I am invalidating the bug for syzbot so it will report the same crash signature
again if it occurs, but if you think there is a real bug feel free to keep
looking into it.

#syz invalid

Thanks,

Eric

Sowmini Varadhan

unread,
Jan 30, 2018, 5:28:38 PM1/30/18
to Eric Biggers, David Miller, santosh....@oracle.com, rds-...@oss.oracle.com, bot+aaf54a8c644d559d34...@syzkaller.appspotmail.com, linux...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, linux-...@vger.kernel.org
On (01/30/18 14:22), Eric Biggers wrote:
>
> I assume you weren't able to reproduce this? This crash hasn't been
> seen again,
:
> I am invalidating the bug for syzbot so it will report the same crash
> signature
> again if it occurs, but if you think there is a real bug feel free to keep
> looking into it.

correct I was not able to reproduce this. However based on code
inspecion, I came up with

commit 3db6e0d172c94bd9953a1347c55ffb64b1d2e74f
rds: use RCU to synchronize work-enqueue with connection teardown

Marking it invalid sounds good to me.

--Sowmini
Reply all
Reply to author
Forward
0 new messages