general protection fault in mm_update_next

syzbot

unread,

Jun 8, 2019, 3:13:08 PM6/8/19

to aarc...@redhat.com, ak...@linux-foundation.org, andrea...@amarulasolutions.com, ava...@gmail.com, dbu...@suse.de, ebie...@xmission.com, linux-...@vger.kernel.org, net...@vger.kernel.org, ol...@redhat.com, prs...@codeaurora.org, syzkall...@googlegroups.com

Hello,

syzbot found the following crash on:

HEAD commit: 38e406f6 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=10c90fbaa00000
kernel config: https://syzkaller.appspot.com/x/.config?x=60564cb52ab29d5b
dashboard link: https://syzkaller.appspot.com/bug?extid=f625baafb9a1c4bfc3f6
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1193d81ea00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+f625ba...@syzkaller.appspotmail.com

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 8869 Comm: syz-executor.5 Not tainted 5.2.0-rc3+ #45
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline]
RIP: 0010:mm_update_next_owner+0x3c4/0x640 kernel/exit.c:453
Code: 30 03 00 00 48 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 48 02 00 00 4d 8b
a4 24 30 03 00 00 49 8d 44 24 10 48 89 45 d0 48 c1 e8 03 <80> 3c 18 00 0f
85 1b 02 00 00 49 8b 44 24 10 48 39 45 d0 4c 8d a0
RSP: 0018:ffff88808ff0fd18 EFLAGS: 00010206
RAX: 00000000000825ee RBX: dffffc0000000000 RCX: ffffffff814411a8
RDX: 0000000000000000 RSI: ffffffff814411b6 RDI: ffff88807a8b7fb0
RBP: ffff88808ff0fd78 R08: ffff88809069e300 R09: fffffbfff1141219
R10: fffffbfff1141218 R11: ffffffff88a090c3 R12: 0000000000412f61
R13: ffff88808fe32d80 R14: 0000000000000000 R15: ffff88809069e300
FS: 0000000000000000(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000077fffb CR3: 00000000993de000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
exit_mm kernel/exit.c:546 [inline]
do_exit+0x80e/0x2fa0 kernel/exit.c:864
do_group_exit+0x135/0x370 kernel/exit.c:981
__do_sys_exit_group kernel/exit.c:992 [inline]
__se_sys_exit_group kernel/exit.c:990 [inline]
__x64_sys_exit_group+0x44/0x50 kernel/exit.c:990
do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459279
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffc3b89b6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 000000000000001e RCX: 0000000000459279
RDX: 0000000000412f61 RSI: fffffffffffffff7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffff R09: 00007ffc3b89b700
R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc3b89b700 R14: 0000000000000000 R15: 00007ffc3b89b710
Modules linked in:

======================================================

---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

syzbot

unread,

Jun 8, 2019, 5:17:01 PM6/8/19

to aarc...@redhat.com, ak...@linux-foundation.org, andrea...@amarulasolutions.com, a...@kernel.org, ava...@gmail.com, dan...@iogearbox.net, dbu...@suse.de, ebie...@xmission.com, john.fa...@gmail.com, linux-...@vger.kernel.org, net...@vger.kernel.org, ol...@redhat.com, prs...@codeaurora.org, syzkall...@googlegroups.com

syzbot has bisected this bug to:

commit e9db4ef6bf4ca9894bb324c76e01b8f1a16b2650
Author: John Fastabend <john.fa...@gmail.com>
Date: Sat Jun 30 13:17:47 2018 +0000

bpf: sockhash fix omitted bucket lock in sock_close

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15e978e1a00000
start commit: 38e406f6 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
git tree: net
final crash: https://syzkaller.appspot.com/x/report.txt?x=17e978e1a00000
console output: https://syzkaller.appspot.com/x/log.txt?x=13e978e1a00000

kernel config: https://syzkaller.appspot.com/x/.config?x=60564cb52ab29d5b
dashboard link: https://syzkaller.appspot.com/bug?extid=f625baafb9a1c4bfc3f6

syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1193d81ea00000

Reported-by: syzbot+f625ba...@syzkaller.appspotmail.com
Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Hillf Danton

unread,

Jun 9, 2019, 8:58:19 AM6/9/19

to syzbot, aarc...@redhat.com, ak...@linux-foundation.org, andrea...@amarulasolutions.com, ava...@gmail.com, dbu...@suse.de, ebie...@xmission.com, linux-...@vger.kernel.org, net...@vger.kernel.org, ol...@redhat.com, prs...@codeaurora.org, syzkall...@googlegroups.com

Hi

On Sat, 08 Jun 2019 12:13:06 -0700 (PDT) syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>

> HEAD commit: 38e406f6 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
> git tree: net
> console output: https://syzkaller.appspot.com/x/log.txt?x=10c90fbaa00000

> kernel config: https://syzkaller.appspot.com/x/.config?x=60564cb52ab29d5b
> dashboard link: https://syzkaller.appspot.com/bug?extid=f625baafb9a1c4bfc3f6

Ignore my noise if you have no interest seeing the syzbot report.

The following four tiny diffs, made in the hope that they may help you perhaps
handle the report, fix reference count mismatch and do some cleanup in the
path of updating sock map.

Thanks
Hillf

[1/4] Remove old map entry before adding new one

This is a simple code move in bid to make a clean start for adding new sock
map entry.
---
kernel/bpf/sockmap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 0a0f2ec..46cf204 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2018,6 +2018,11 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
err = -ENOENT;
goto out_unlock;
}
+ if (osock) {
+ psock = smap_psock_sk(osock);
+ smap_list_map_remove(psock, &stab->sock_map[i]);
+ smap_release_sock(psock, osock);
+ }

e->entry = &stab->sock_map[i];
e->map = map;
@@ -2026,11 +2031,6 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
spin_unlock_bh(&psock->maps_lock);

stab->sock_map[i] = sock;
- if (osock) {
- psock = smap_psock_sk(osock);
- smap_list_map_remove(psock, &stab->sock_map[i]);
- smap_release_sock(psock, osock);
- }
raw_spin_unlock_bh(&stab->lock);
return 0;
out_unlock:
--

[2/4] Pump psock's refcnt up

Trying to successfully increment psock's refcnt makes sure that it is valid and
will not go to mars under our feet. That may make the bot uneasy and a bit
grumpy. As a bonus, it also makes the helper function paired with the pump since
refcnt is decremented in smap_release_sock().
---
kernel/bpf/sockmap.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 46cf204..a987522 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2020,6 +2020,10 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
}
if (osock) {
psock = smap_psock_sk(osock);
+ if (!psock || !refcount_inc_not_zero(&psock->refcnt)) {
+ err = -ENOENT;
+ goto out_unlock;
+ }
smap_list_map_remove(psock, &stab->sock_map[i]);
smap_release_sock(psock, osock);
}
--

[3/4] Add new map entry

First psock is unbond from osock. Then refcnt pumpup pairs with release.
---
kernel/bpf/sockmap.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index a987522..346156d 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2028,11 +2028,17 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
smap_release_sock(psock, osock);
}

+ psock = smap_psock_sk(sock);
+ if (!psock || !refcount_inc_not_zero(&psock->refcnt)) {
+ err = -ENOENT;
+ goto out_unlock;
+ }
e->entry = &stab->sock_map[i];
e->map = map;
spin_lock_bh(&psock->maps_lock);
list_add_tail(&e->list, &psock->maps);
spin_unlock_bh(&psock->maps_lock);
+ smap_release_sock(psock, sock);

stab->sock_map[i] = sock;
raw_spin_unlock_bh(&stab->lock);
--

[4/4] Make some code cleanup

We no longer need to derefernce psock before taking lock. What is more
important, there is a psock release that is currently unpaired, time to delete
it.
---
kernel/bpf/sockmap.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 346156d..d925372 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -2006,8 +2006,6 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
if (err)
goto out;

- /* psock guaranteed to be present. */
- psock = smap_psock_sk(sock);
raw_spin_lock_bh(&stab->lock);
osock = stab->sock_map[i];
if (osock && flags == BPF_NOEXIST) {
@@ -2044,7 +2042,6 @@ static int sock_map_ctx_update_elem(struct bpf_sock_ops_kern *skops,
raw_spin_unlock_bh(&stab->lock);
return 0;
out_unlock:
- smap_release_sock(psock, sock);
raw_spin_unlock_bh(&stab->lock);
out:
kfree(e);
--

Eric W. Biederman

unread,

Jun 10, 2019, 5:27:35 PM6/10/19

to syzbot, aarc...@redhat.com, ak...@linux-foundation.org, andrea...@amarulasolutions.com, a...@kernel.org, ava...@gmail.com, dan...@iogearbox.net, dbu...@suse.de, john.fa...@gmail.com, linux-...@vger.kernel.org, net...@vger.kernel.org, ol...@redhat.com, prs...@codeaurora.org, syzkall...@googlegroups.com

How is mm_update_next_owner connected to bpf?

Eric

Dmitry Vyukov

unread,

Jun 11, 2019, 3:00:22 AM6/11/19

to Eric W. Biederman, syzbot, Andrea Arcangeli, Andrew Morton, Andrea Parri, Alexei Starovoitov, ava...@gmail.com, Daniel Borkmann, dbu...@suse.de, John Fastabend, LKML, netdev, Oleg Nesterov, prs...@codeaurora.org, syzkaller-bugs, bpf

There seems to be a nasty bug in bpf that causes assorted crashes
throughout the kernel for some time. I've seen a bunch of reproducers
that do something with bpf and then cause a random crash. The more
unpleasant ones are the bugs without reproducers, because for these we
don't have a way to link them back to the bpf bug but they are still
hanging there without good explanation, e.g. maybe a part of one-off
crashes in moderation:
https://syzkaller.appspot.com/upstream#moderation2

Such bugs are nice to fix asap to not produce more and more random
crash reports.

Hillf, did you understand the mechanics of this bug and memory
corruption? A good question is why this was unnoticed by KASAN. If we
could make it catch it at the point of occurrence, then it would be a
single bug report clearly attributed to bpf rather then dozens of
assorted crashes.

Eric Biggers

unread,

Aug 22, 2019, 11:58:28 AM8/22/19

to syzbot, syzkaller-bugs

#syz fix: bpf: sockmap/tls, close can race with map free

Hillf Danton

unread,

Oct 24, 2021, 1:25:37 AM10/24/21

to Dmitry Vyukov, syzbot, LKML, linu...@kvack.org, syzkaller-bugs

On Tue, 11 Jun 2019 09:00:09 +0200 Dmitry Vyukov wrote:

Sorry for reading this message at lore today and late reply because it
did not land in my inbox in Jun 2019.

A couple of days ago, I saw an offline linux-4.18 page fault Oops report
that could trigger the check for X86_PF_USER and X86_PF_INSTR added in

03c81ea33316 ("x86/fault: Improve kernel-executing-user-memory handling")

and given the reported CPU is Intel Atom, any light on how to reproduce
it is highly appreciated.

Hillf

Reply all

Reply to author

Forward

general protection fault in mm_update_next_owner

syzbot

syzbot

Hillf Danton

Eric W. Biederman

Dmitry Vyukov

Eric Biggers

Hillf Danton