BUG: unable to handle kernel NULL pointer dereference in rb_insert_color

107 views
Skip to first unread message

syzbot

unread,
Dec 19, 2017, 3:41:04ā€ÆAM12/19/17
to adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Hello,

syzkaller hit the following crash on
6084b576dca2e898f5c101baef151f7bfdbb606d
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
compiler: gcc (GCC) 7.1.1 20170620
.config is attached
Raw console output is attached.

Unfortunately, I don't have any reproducer for this bug yet.


sctp: [Deprecated]: syz-executor6 (pid 4202) Use of int in max_burst socket
option.
Use struct sctp_assoc_value instead
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
sctp: [Deprecated]: syz-executor4 (pid 4240) Use of int in max_burst socket
option.
Use struct sctp_assoc_value instead
sctp: [Deprecated]: syz-executor4 (pid 4240) Use of int in max_burst socket
option.
Use struct sctp_assoc_value instead
IP: __rb_insert lib/rbtree.c:126 [inline]
IP: rb_insert_color+0x17/0x190 lib/rbtree.c:452
PGD 0 P4D 0
Oops: 0000 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4244 Comm: modprobe Not tainted 4.15.0-rc3-next-20171214+ #67
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__rb_insert lib/rbtree.c:126 [inline]
RIP: 0010:rb_insert_color+0x17/0x190 lib/rbtree.c:452
RSP: 0018:ffffc900010a7c08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff814ddcb9
RDX: ffff8801ebedf988 RSI: ffff8801ebfd6400 RDI: ffff88021413a408
RBP: ffffc900010a7c08 R08: 000000000002bcf8 R09: ffff88021413a400
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88021413a400
R13: ffff8801ebedf990 R14: 00000000a34fc52a R15: ffff8801ebedf988
FS: 00007f85a5155700(0000) GS:ffff88021fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 00000001eaccd006 CR4: 00000000001606f0
DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
ext4_htree_store_dirent+0x122/0x160 fs/ext4/dir.c:488
htree_dirblock_to_tree+0x112/0x300 fs/ext4/namei.c:1019
ext4_htree_fill_tree+0xdf/0x410 fs/ext4/namei.c:1096
ext4_dx_readdir fs/ext4/dir.c:575 [inline]
ext4_readdir+0x8cf/0xd70 fs/ext4/dir.c:122
iterate_dir+0xb8/0x200 fs/readdir.c:51
SYSC_getdents fs/readdir.c:231 [inline]
SyS_getdents+0xcc/0x1b0 fs/readdir.c:212
entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x7f85a4a45575
RSP: 002b:00007ffc9b5be120 EFLAGS: 00000246 ORIG_RAX: 000000000000004e
RAX: ffffffffffffffda RBX: 00007f85a4d23e98 RCX: 00007f85a4a45575
RDX: 0000000000008000 RSI: 00005633094701e0 RDI: 0000000000000000
RBP: 00007f85a4d23e40 R08: 00005633094701e0 R09: 00007f85a4d23e90
R10: 0000000000000000 R11: 0000000000000246 R12: 00005633094701b0
R13: 0000000000018e21 R14: 0000000000000000 R15: 0000000000000004
Code: 48 85 d2 75 eb 5d c3 31 c0 5d c3 66 0f 1f 84 00 00 00 00 00 55 48 8b
17 48 89 e5 48 85 d2 0f 84 4c 01 00 00 48 8b 02 a8 01 75 5e <48> 8b 48 08
49 89 c0 48 39 d1 74 54 48 85 c9 74 09 f6 01 01 0f
RIP: __rb_insert lib/rbtree.c:126 [inline] RSP: ffffc900010a7c08
RIP: rb_insert_color+0x17/0x190 lib/rbtree.c:452 RSP: ffffc900010a7c08
CR2: 0000000000000008
BUG: unable to handle kernel paging request at 0000000100000001
---[ end trace c403bd3ebad2ccb0 ]---


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzk...@googlegroups.com.
Please credit me with: Reported-by: syzbot <syzk...@googlegroups.com>

syzbot will keep track of this bug report.
Once a fix for this bug is merged into any tree, reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.
config.txt
raw.log

Eric Biggers

unread,
Dec 19, 2017, 4:59:10ā€ÆPM12/19/17
to syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
The line number in lib/rbtree.c seems to be slightly off. Looking at the
disassembly:

ffffffff825b5ea0 <rb_insert_color>:
ffffffff825b5ea0: 55 push %rbp
ffffffff825b5ea1: 48 8b 17 mov (%rdi),%rdx
ffffffff825b5ea4: 48 89 e5 mov %rsp,%rbp
ffffffff825b5ea7: 48 85 d2 test %rdx,%rdx
ffffffff825b5eaa: 0f 84 4c 01 00 00 je ffffffff825b5ffc <rb_insert_color+0x15c>
ffffffff825b5eb0: 48 8b 02 mov (%rdx),%rax
ffffffff825b5eb3: a8 01 test $0x1,%al
ffffffff825b5eb5: 75 5e jne ffffffff825b5f15 <rb_insert_color+0x75>
ffffffff825b5eb7: 48 8b 48 08 mov 0x8(%rax),%rcx

It crashed on 'mov 0x8(%rax),%rcx' which corresponds to
'tmp = gparent->rb_right;' at lib/rbtree.c:131. So 'parent' was the root node,
but its color was red, while it is supposed to be black.

No idea how that happened, but it's almost certainly not an ext4 bug. In fact
there is another report of this same crash that has a different call trace:

Call Trace:
key_alloc_serial security/keys/key.c:170 [inline]
key_alloc+0x54c/0x5b0 security/keys/key.c:319
keyring_alloc+0x4d/0xb0 security/keys/keyring.c:503
install_process_keyring_to_cred.part.3+0x38/0x80 security/keys/process_keys.c:192
install_process_keyring_to_cred security/keys/process_keys.c:634 [inline]
install_process_keyring security/keys/process_keys.c:217 [inline]
lookup_user_key+0x4ed/0x7c0 security/keys/process_keys.c:574
SYSC_add_key security/keys/keyctl.c:114 [inline]
SyS_add_key+0xec/0x260 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96

Dmitry Vyukov

unread,
Dec 20, 2017, 2:51:02ā€ÆAM12/20/17
to Eric Biggers, syzbot, Andreas Dilger, linux...@vger.kernel.org, LKML, syzkall...@googlegroups.com, Theodore Ts'o
My first hypothesis for an non-explainable, non-reproducible
corruption would be a data race. Is there all locking in place?

Eric Biggers

unread,
Dec 20, 2017, 2:59:51ā€ÆAM12/20/17
to Dmitry Vyukov, syzbot, Andreas Dilger, linux...@vger.kernel.org, LKML, syzkall...@googlegroups.com, Theodore Ts'o
It doesn't seem to be a locking problem. In the ext4 case the rbtree is
associated with a struct file's dir_private_info, which is protected by
->f_pos_lock (taken early in sys_getdents()). And in the keyrings case, the
rbtree is protected by key_serial_lock.

Eric

Dmitry Vyukov

unread,
Dec 20, 2017, 3:06:01ā€ÆAM12/20/17
to Eric Biggers, syzbot, Andreas Dilger, linux...@vger.kernel.org, LKML, syzkall...@googlegroups.com, Theodore Ts'o
But this won't prevent somebody else to mess with the struct without
taking the lock.

Eric Biggers

unread,
Jan 30, 2018, 4:43:32ā€ÆPM1/30/18
to Dmitry Vyukov, syzbot, Andreas Dilger, linux...@vger.kernel.org, LKML, syzkall...@googlegroups.com, Theodore Ts'o
Invalidating this bug since it hasn't been seen again, and it was reported while
KASAN was accidentally disabled in the syzbot config due to a change to the
kconfig menus in linux-next (so this crash was probably caused by slab
corruption elsewhere).

#syz invalid

Rafael David Tinoco

unread,
Dec 9, 2019, 8:29:23ā€ÆAM12/9/19
to ebig...@gmail.com, adilger...@dilger.ca, bot+eb13811afcefe99cfe...@syzkaller.appspotmail.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
It looks like the same stacktrace that was reported in this thread. This has
been reported to ppc64el AND we got a reproducer (ocfs2-tools autopkgtests).

[ 85.605850] Faulting instruction address: 0xc000000000e81168
[ 85.605901] Oops: Kernel access of bad area, sig: 11 [#1]
[ 85.605970] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 85.606029] Modules linked in: ocfs2 quota_tree ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue iptable_mangle xt_TCPMSS xt_tcpudp bpfilter dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmx_crypto crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq libcrc32c crc32c_vpmsum virtio_net virtio_blk net_failover failover
[ 85.606291] CPU: 0 PID: 1 Comm: systemd Not tainted 5.3.0-18-generic #19-Ubuntu
[ 85.606350] NIP: c000000000e81168 LR: c00000000054f240 CTR: 0000000000000000
[ 85.606410] REGS: c00000005a3e3700 TRAP: 0300 Not tainted (5.3.0-18-generic)
[ 85.606469] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28024448 XER: 00000000
[ 85.606531] CFAR: 0000701f9806f638 DAR: 0000000001744098 DSISR: 40000000 IRQMASK: 0
[ 85.606531] GPR00: 0000000000007374 c00000005a3e3990 c0000000019c9100 c00000004fe462a8
[ 85.606531] GPR04: c00000005856d840 000000000000000e 0000000074656772 c00000004fe4a568
[ 85.606531] GPR08: 0000000000000000 c000000058568004 0000000001744090 0000000000000000
[ 85.606531] GPR12: 00000000e8086002 c000000001d60000 00007fffddd522d0 0000000000000000
[ 85.606531] GPR16: 0000000000000000 0000000000000000 0000000000000000 c00000000755e07c
[ 85.606531] GPR20: c0000000598caca8 c00000005a3e3a58 0000000000000000 c000000058292f00
[ 85.606531] GPR24: c000000000eea710 0000000000000000 c00000005856d840 c00000000755e074
[ 85.606531] GPR28: 000000006518907d c00000005a3e3a68 c00000004fe4b160 00000000027c47b6
[ 85.607079] NIP [c000000000e81168] rb_insert_color+0x18/0x1c0
[ 85.607137] LR [c00000000054f240] ext4_htree_store_dirent+0x140/0x1c0
[ 85.607186] Call Trace:
[ 85.607208] [c00000005a3e3990] [c00000000054f158] ext4_htree_store_dirent+0x58/0x1c0 (unreliable)
[ 85.607279] [c00000005a3e39e0] [c000000000594cd8] htree_dirblock_to_tree+0x1b8/0x380
[ 85.607340] [c00000005a3e3b00] [c0000000005962c0] ext4_htree_fill_tree+0xc0/0x3f0
[ 85.607401] [c00000005a3e3c00] [c00000000054ebe4] ext4_readdir+0x814/0xce0
[ 85.607459] [c00000005a3e3d40] [c000000000472d6c] iterate_dir+0x1fc/0x280
[ 85.607511] [c00000005a3e3d90] [c0000000004746f0] ksys_getdents64+0xa0/0x1f0
[ 85.607572] [c00000005a3e3e00] [c000000000474868] sys_getdents64+0x28/0x130
[ 85.607622] [c00000005a3e3e20] [c00000000000b388] system_call+0x5c/0x70
[ 85.607672] Instruction dump:
[ 85.607703] 4082ffe8 4e800020 38600000 4e800020 60000000 60000000 e9230000 2c290000
[ 85.607764] 4182018c e9490000 71480001 4c820020 <e90a0008> 7c284840 2fa80000 4182006c
[ 85.607827] ---[ end trace cfc53af0f8d62cef ]---
[ 85.610600]
[ 86.611522] BUG: Unable to handle kernel data access at 0xc000030058567eff
[ 86.611604] Faulting instruction address: 0xc000000000403aa8
[ 86.611656] Oops: Kernel access of bad area, sig: 11 [#2]
[ 86.611697] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 86.611748] Modules linked in: ocfs2 quota_tr

Thread from beginning 2018, so I guess this issue is pretty intermittent but
might exist, and, perhaps, its related to specific arches/machines ?

Dmitry Vyukov

unread,
Dec 9, 2019, 8:46:41ā€ÆAM12/9/19
to Rafael David Tinoco, Eric Biggers, Andreas Dilger, syzbot, linux...@vger.kernel.org, LKML, syzkaller-bugs, Theodore Ts'o

Theodore Y. Ts'o

unread,
Dec 9, 2019, 9:01:36ā€ÆPM12/9/19
to Rafael David Tinoco, ebig...@gmail.com, adilger...@dilger.ca, bot+eb13811afcefe99cfe...@syzkaller.appspotmail.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Mon, Dec 09, 2019 at 10:29:14AM -0300, Rafael David Tinoco wrote:
> It looks like the same stacktrace that was reported in this thread. This has
> been reported to ppc64el AND we got a reproducer (ocfs2-tools autopkgtests).

Can you share your reproducer? Is it a super-simple reproducer that
doesn't require a complex setup and which can be triggered in some
kind of virtual machine (under KVM, etc.)?

> Thread from beginning 2018, so I guess this issue is pretty intermittent but
> might exist, and, perhaps, its related to specific arches/machines ?

What syzbot reported (a) had no reproducer, (b) only reproduced twice
on linux-next in 2017, and never since. So if you're seeing something
in 2019 in ppc64el, it may not be the same issue.

- Ted

Rafael David Tinoco

unread,
Dec 12, 2019, 7:25:36ā€ÆAM12/12/19
to Theodore Y. Ts'o, ebig...@gmail.com, adilger...@dilger.ca, bot+eb13811afcefe99cfe...@syzkaller.appspotmail.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com


>> It looks like the same stacktrace that was reported in this thread. This has
>> been reported to ppc64el AND we got a reproducer (ocfs2-tools autopkgtests).
> Can you share your reproducer? Is it a super-simple reproducer that
> doesn't require a complex setup and which can be triggered in some
> kind of virtual machine (under KVM, etc.)?

Yep, its the autopkgtests (debian/tests/*) from ocfs2-tools in bare
metal ppc64el. A bunch of "mkfs.ocfs2, fsck.ocfs2, debugfs.ocfs2,
mount.ocfs2" commands testing package. I got access to same HW that
generated the trace, I'll generate a kdump and share more data soon.

Reply all
Reply to author
Forward
0 new messages