general protection fault in encode_rpcb_string

24 views
Skip to first unread message

syzbot

unread,
Apr 17, 2018, 12:02:02ā€ÆAM4/17/18
to anna.sc...@netapp.com, bfi...@fieldses.org, da...@davemloft.net, jla...@kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, trond.m...@primarydata.com
Hello,

syzbot hit the following crash on bpf-next commit
5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +0000)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=4b98281f2401ab849f4b

So far this crash happened 2 times on bpf-next.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6433835633868800
syzkaller reproducer:
https://syzkaller.appspot.com/x/repro.syz?id=6407311794896896
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=5861511176126464
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-5947642240294114534
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4b9828...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.

RBP: 00000000006dbc50 R08: 000000002000a000 R09: 0000000000003437
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe464ffed80
R13: 0030656c69662f2e R14: ffffffffffffffff R15: 0000000000000006
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 1861 Comm: kworker/u4:4 Not tainted 4.16.0+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: rpciod rpc_async_schedule
RIP: 0010:strlen+0x1f/0xa0 lib/string.c:479
RSP: 0018:ffff8801cf75f318 EFLAGS: 00010296
RAX: dffffc0000000000 RBX: ffff8801cf68f200 RCX: ffffffff86a8c407
RDX: 0000000000000000 RSI: ffffffff86a84d7b RDI: 0000000000000000
RBP: ffff8801cf75f330 R08: ffff8801cf7de080 R09: ffffed0039ea3d43
R10: ffffed0039ea3d43 R11: ffff8801cf51ea1f R12: 0000000000000000
R13: 0000000002000000 R14: 0000000000000000 R15: ffff8801cf75f3e0
FS: 0000000000000000(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64808a4000 CR3: 00000001b566a000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
strlen include/linux/string.h:267 [inline]
encode_rpcb_string+0x23/0x70 net/sunrpc/rpcb_clnt.c:914
rpcb_enc_getaddr+0x146/0x1f0 net/sunrpc/rpcb_clnt.c:940
rpcauth_wrap_req_encode net/sunrpc/auth.c:777 [inline]
rpcauth_wrap_req+0x1a8/0x230 net/sunrpc/auth.c:791
rpc_xdr_encode net/sunrpc/clnt.c:1754 [inline]
call_transmit+0x8a9/0xfe0 net/sunrpc/clnt.c:1949
__rpc_execute+0x28a/0xfe0 net/sunrpc/sched.c:784
rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:857
process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
kthread+0x345/0x410 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
Code: 37 ff ff ff 0f 1f 84 00 00 00 00 00 48 b8 00 00 00 00 00 fc ff df 55
48 89 fa 48 c1 ea 03 48 89 e5 41 54 49 89 fc 53 48 83 ec 08 <0f> b6 04 02
48 89 fa 83 e2 07 38 d0 7f 04 84 c0 75 4d 41 80 3c
RIP: strlen+0x1f/0xa0 lib/string.c:479 RSP: ffff8801cf75f318
---[ end trace bd76ed0378a56845 ]---


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.

J. Bruce Fields

unread,
Apr 17, 2018, 5:33:39ā€ÆPM4/17/18
to syzbot, anna.sc...@netapp.com, da...@davemloft.net, jla...@kernel.org, linux-...@vger.kernel.org, linu...@vger.kernel.org, net...@vger.kernel.org, syzkall...@googlegroups.com, trond.m...@primarydata.com
On Mon, Apr 16, 2018 at 09:02:01PM -0700, syzbot wrote:
> syzbot hit the following crash on bpf-next commit
> 5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +0000)
> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=4b98281f2401ab849f4b
>
> So far this crash happened 2 times on bpf-next.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6433835633868800
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=6407311794896896
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5861511176126464

Based on that, looks like it's attempting an nfs mount while causing
kmalloc failures?

Probably one of rpcb->r_netid, r_addr, or r_owner was bad in
rpcb_enc_getaddr.

Hm, and previous log makes it look like it was an rpc_sockaddr2uaddr()
in rpcb_getport_async() that was made to fail. Do we need to check for
failure of:

map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC);

?

--b.

Trond Myklebust

unread,
Apr 20, 2018, 3:15:25ā€ÆAM4/20/18
to bfi...@fieldses.org, syzbot+4b9828...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, anna.sc...@netapp.com, da...@davemloft.net, linux-...@vger.kernel.org, linu...@vger.kernel.org, jla...@kernel.org, net...@vger.kernel.org
Yes, and we can probably convert it, and the other GFP_ATOMIC
allocations in the rpcbind client to use GFP_NOFS in order to improve
reliability.

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.m...@hammer.space

bfi...@fieldses.org

unread,
May 8, 2018, 12:09:53ā€ÆPM5/8/18
to Trond Myklebust, syzbot+4b9828...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, anna.sc...@netapp.com, da...@davemloft.net, linux-...@vger.kernel.org, linu...@vger.kernel.org, jla...@kernel.org, net...@vger.kernel.org
From: "J. Bruce Fields" <bfi...@redhat.com>

If we ignore the error we'll hit a null dereference a little later.

Reported-by: syzbot+4b9828...@syzkaller.appspotmail.com
Signed-off-by: J. Bruce Fields <bfi...@redhat.com>
---
net/sunrpc/rpcb_clnt.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index c526f8fb37c9..82c120e51d64 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -771,6 +771,12 @@ void rpcb_getport_async(struct rpc_task *task)
case RPCBVERS_3:
map->r_netid = xprt->address_strings[RPC_DISPLAY_NETID];
map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC);
+ if (!map->r_addr) {
+ status = -ENOMEM;
+ dprintk("RPC: %5u %s: no memory available\n",
+ task->tk_pid, __func__);
+ goto bailout_free_args;
+ }
map->r_owner = "";
break;
case RPCBVERS_2:
@@ -793,6 +799,8 @@ void rpcb_getport_async(struct rpc_task *task)
rpc_put_task(child);
return;

+bailout_free_args:
+ kfree(map);
bailout_release_client:
rpc_release_client(rpcb_clnt);
bailout_nofree:
--
2.17.0

bfi...@fieldses.org

unread,
May 8, 2018, 12:11:30ā€ÆPM5/8/18
to Trond Myklebust, syzbot+4b9828...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, anna.sc...@netapp.com, da...@davemloft.net, linux-...@vger.kernel.org, linu...@vger.kernel.org, jla...@kernel.org, net...@vger.kernel.org
From: "J. Bruce Fields" <bfi...@redhat.com>
Date: Tue, 8 May 2018 11:47:03 -0400
Subject: [PATCH 2/2] sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS

It's OK to sleep here, we just don't want to recurse into the filesystem
as this writeout could be waiting on this.

As a next step: the documentation for GFP_NOFS says "Please try to avoid
using this flag directly and instead use memalloc_nofs_{save,restore} to
mark the whole scope which cannot/shouldn't recurse into the FS layer
with a short explanation why. All allocation requests will inherit
GFP_NOFS implicitly."

But I'm not sure where to do this. Should the workqueue could be
arranging that for us in the case of workqueues created with
WQ_MEM_RECLAIM?

Reported-by: Trond Myklebust <tro...@hammer.space>
Signed-off-by: J. Bruce Fields <bfi...@redhat.com>
---
net/sunrpc/rpcb_clnt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

On Tue, Apr 17, 2018 at 09:54:36PM +0000, Trond Myklebust wrote:
> Yes, and we can probably convert it, and the other GFP_ATOMIC
> allocations in the rpcbind client to use GFP_NOFS in order to improve
> reliability.

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 82c120e51d64..576e84a1adee 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -752,7 +752,7 @@ void rpcb_getport_async(struct rpc_task *task)
goto bailout_nofree;
}

- map = kzalloc(sizeof(struct rpcbind_args), GFP_ATOMIC);
+ map = kzalloc(sizeof(struct rpcbind_args), GFP_NOFS);
if (!map) {
status = -ENOMEM;
dprintk("RPC: %5u %s: no memory available\n",
@@ -770,7 +770,7 @@ void rpcb_getport_async(struct rpc_task *task)
case RPCBVERS_4:
case RPCBVERS_3:
map->r_netid = xprt->address_strings[RPC_DISPLAY_NETID];
- map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC);
+ map->r_addr = rpc_sockaddr2uaddr(sap, GFP_NOFS);
if (!map->r_addr) {
status = -ENOMEM;
dprintk("RPC: %5u %s: no memory available\n",
--
2.17.0

bfi...@fieldses.org

unread,
May 8, 2018, 12:15:30ā€ÆPM5/8/18
to Trond Myklebust, syzbot+4b9828...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, anna.sc...@netapp.com, da...@davemloft.net, linux-...@vger.kernel.org, linu...@vger.kernel.org, jla...@kernel.org, net...@vger.kernel.org, Chuck Lever
On Tue, Apr 17, 2018 at 09:54:36PM +0000, Trond Myklebust wrote:
> Yes, and we can probably convert it, and the other GFP_ATOMIC
> allocations in the rpcbind client to use GFP_NOFS in order to improve
> reliability.

Chuck, I think the GFP_ATOMIC is unnecessary here as well?

--b.

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index e8adad33d0bb..de90c6c90cde 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -228,7 +228,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
/* XXX: Certain upper layer operations do
* not provide receive buffer pages.
*/
- *ppages = alloc_page(GFP_ATOMIC);
+ *ppages = alloc_page(GFP_NOFS);
if (!*ppages)
return -EAGAIN;
}

Chuck Lever

unread,
May 8, 2018, 12:34:50ā€ÆPM5/8/18
to Bruce Fields, Trond Myklebust, syzbot+4b9828...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Anna Schumaker, da...@davemloft.net, linux-...@vger.kernel.org, Linux NFS Mailing List, jla...@kernel.org, net...@vger.kernel.org
This code can't sleep, as I understand it. Caller is holding
the transport write lock. This logic was copied from
xdr_partial_copy_from_skb, which uses GFP_ATOMIC.

Recall that this is here because of GETACL. As I've stated in
the past, the correct solution is to ensure that these pages
are provided in every case by the upper layer, making this
alloc_page call site unnecessary.


--
Chuck Lever
chuck...@gmail.com



Bruce Fields

unread,
May 8, 2018, 1:44:33ā€ÆPM5/8/18
to Chuck Lever, Trond Myklebust, syzbot+4b9828...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, Anna Schumaker, da...@davemloft.net, linux-...@vger.kernel.org, Linux NFS Mailing List, jla...@kernel.org, net...@vger.kernel.org
On Tue, May 08, 2018 at 12:34:48PM -0400, Chuck Lever wrote:
>
>
> > On May 8, 2018, at 12:15 PM, bfi...@fieldses.org wrote:
> >
> > On Tue, Apr 17, 2018 at 09:54:36PM +0000, Trond Myklebust wrote:
> >> Yes, and we can probably convert it, and the other GFP_ATOMIC
> >> allocations in the rpcbind client to use GFP_NOFS in order to improve
> >> reliability.
> >
> > Chuck, I think the GFP_ATOMIC is unnecessary here as well?
> >
> > --b.
> >
> > diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> > index e8adad33d0bb..de90c6c90cde 100644
> > --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> > +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> > @@ -228,7 +228,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
> > /* XXX: Certain upper layer operations do
> > * not provide receive buffer pages.
> > */
> > - *ppages = alloc_page(GFP_ATOMIC);
> > + *ppages = alloc_page(GFP_NOFS);
> > if (!*ppages)
> > return -EAGAIN;
> > }
>
> This code can't sleep, as I understand it. Caller is holding
> the transport write lock. This logic was copied from
> xdr_partial_copy_from_skb, which uses GFP_ATOMIC.

OK.

> Recall that this is here because of GETACL. As I've stated in
> the past, the correct solution is to ensure that these pages
> are provided in every case by the upper layer, making this
> alloc_page call site unnecessary.

Got it.

--b.
Reply all
Reply to author
Forward
0 new messages