kernel BUG at fs/btrfs/volumes.c:LINE!

syzbot

unread,

Jun 6, 2018, 9:31:03 AM6/6/18

to c...@fb.com, dst...@suse.com, jba...@fb.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Hello,

syzbot found the following crash on:

HEAD commit: af6c5d5e01ad Merge branch 'for-4.18' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15f700af800000
kernel config: https://syzkaller.appspot.com/x/.config?x=12ff770540994680
dashboard link: https://syzkaller.appspot.com/bug?extid=5b658d997a83984507a6
compiler: gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+5b658d...@syzkaller.appspotmail.com

RDX: 0000000020000080 RSI: 0000000020000040 RDI: 00007f787067fbf0
RBP: 0000000000000001 R08: 00000000200000c0 R09: 0000000020000080
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000014
R13: 0000000000000001 R14: 0000000000700008 R15: 0000000000000043
------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:1032!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 1 PID: 22303 Comm: syz-executor1 Not tainted 4.17.0+ #86
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:btrfs_prepare_close_one_device fs/btrfs/volumes.c:1032 [inline]
RIP: 0010:close_fs_devices+0xba7/0xfa0 fs/btrfs/volumes.c:1052
Code: 56 18 48 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 2b 03 00 00 49 83 6c 24
30 01 e9 25 f8 ff ff e8 90 f4 b3 fe 0f 0b e8 89 f4 b3 fe <0f> 0b 48 89 f7
e8 ef 64 f0 fe e9 f6 f5 ff ff e8 75 f4 b3 fe 0f 0b
RSP: 0018:ffff8801af6ff050 EFLAGS: 00010246
RAX: 0000000000040000 RBX: dffffc0000000000 RCX: ffffc9000c70c000
RDX: 0000000000040000 RSI: ffffffff82c56437 RDI: 0000000000000286
RBP: ffff8801af6ff350 R08: ffffed003b5e46d7 R09: ffffed003b5e46d6
R10: ffffed003b5e46d6 R11: ffff8801daf236b3 R12: ffff8801c58ac190
R13: 0000000000000000 R14: ffff8801b1a6a940 R15: ffff8801b4d7d680
FS: 00007f7870680700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000704094 CR3: 00000001c51e8000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
btrfs_close_devices+0x29/0x150 fs/btrfs/volumes.c:1085
btrfs_mount_root+0x1419/0x1e70 fs/btrfs/super.c:1610
mount_fs+0xae/0x328 fs/super.c:1277
vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
vfs_kern_mount+0x40/0x60 fs/namespace.c:1027
btrfs_mount+0x4a1/0x213e fs/btrfs/super.c:1661
mount_fs+0xae/0x328 fs/super.c:1277
vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
vfs_kern_mount fs/namespace.c:1027 [inline]
do_new_mount fs/namespace.c:2518 [inline]
do_mount+0x564/0x30b0 fs/namespace.c:2848
ksys_mount+0x12d/0x140 fs/namespace.c:3064
__do_sys_mount fs/namespace.c:3078 [inline]
__se_sys_mount fs/namespace.c:3075 [inline]
__x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45843a
Code: b8 a6 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 dd 8f fb ff c3 66 2e 0f
1f 84 00 00 00 00 00 66 90 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff
ff 0f 83 ba 8f fb ff c3 66 0f 1f 84 00 00 00 00 00
RSP: 002b:00007f787067fba8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 0000000020000080 RCX: 000000000045843a
RDX: 0000000020000080 RSI: 0000000020000040 RDI: 00007f787067fbf0
RBP: 0000000000000001 R08: 00000000200000c0 R09: 0000000020000080
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000014
R13: 0000000000000001 R14: 0000000000700008 R15: 0000000000000043
Modules linked in:
Dumping ftrace buffer:
(ftrace buffer empty)
---[ end trace 383b0406a01f2edd ]---
RIP: 0010:btrfs_prepare_close_one_device fs/btrfs/volumes.c:1032 [inline]
RIP: 0010:close_fs_devices+0xba7/0xfa0 fs/btrfs/volumes.c:1052
Code: 56 18 48 89 f8 48 c1 e8 03 80 3c 18 00 0f 85 2b 03 00 00 49 83 6c 24
30 01 e9 25 f8 ff ff e8 90 f4 b3 fe 0f 0b e8 89 f4 b3 fe <0f> 0b 48 89 f7
e8 ef 64 f0 fe e9 f6 f5 ff ff e8 75 f4 b3 fe 0f 0b
RSP: 0018:ffff8801af6ff050 EFLAGS: 00010246
RAX: 0000000000040000 RBX: dffffc0000000000 RCX: ffffc9000c70c000
RDX: 0000000000040000 RSI: ffffffff82c56437 RDI: 0000000000000286
RBP: ffff8801af6ff350 R08: ffffed003b5e46d7 R09: ffffed003b5e46d6
R10: ffffed003b5e46d6 R11: ffff8801daf236b3 R12: ffff8801c58ac190
R13: 0000000000000000 R14: ffff8801b1a6a940 R15: ffff8801b4d7d680
FS: 00007f7870680700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000704094 CR3: 00000001c51e8000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.

Anand Jain

unread,

Jun 6, 2018, 12:12:26 PM6/6/18

to syzbot, c...@fb.com, dst...@suse.com, jba...@fb.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

btrfs_prepare_close_one_device()
::
1031 name = rcu_string_strdup(device->name->str, GFP_NOFS);
1032 BUG_ON(!name); /* -ENOMEM */

The way we close our devices needs new memory allocations
at the time of device close. By doing this apart from the BUG_ON
reported here, there _were_ other complications like managing the sysfs
links and moving them to the newly allocated btrfs_fs_devices.
So sometime back I attempted to correct this approach to a simple
device close without fresh allocation, however it wasn't successful.
I am going to try that again, but its not p1.

Thanks, Anand

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

David Sterba

unread,

Jun 7, 2018, 11:37:38 AM6/7/18

to Anand Jain, syzbot, c...@fb.com, dst...@suse.com, jba...@fb.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

Yeah, getting rid of the allocations while freeing device would be great
but unfortunatelly is not simple.

Normally the GFP_NOFS allocations do not fail so I think the fuzzer
environment is tuned to allow that, which is fine for coverage but does
not happen in practice. This will be fixed eventually.

Dmitry Vyukov

unread,

Jun 7, 2018, 12:28:23 PM6/7/18

to dst...@suse.cz, Anand Jain, syzbot, c...@fb.com, dst...@suse.com, Josef Bacik, linux...@vger.kernel.org, LKML, syzkaller-bugs

Isn't GFP_NOFS more restricted than normal allocations? Are these
allocations accounted against memcg? It's easy to fail any allocation
within a memory container.

David Sterba

unread,

Jun 7, 2018, 12:55:00 PM6/7/18

to Dmitry Vyukov, dst...@suse.cz, Anand Jain, syzbot, c...@fb.com, dst...@suse.com, Josef Bacik, linux...@vger.kernel.org, LKML, syzkaller-bugs

On Thu, Jun 07, 2018 at 06:28:02PM +0200, Dmitry Vyukov wrote:
> > Normally the GFP_NOFS allocations do not fail so I think the fuzzer
> > environment is tuned to allow that, which is fine for coverage but does
> > not happen in practice. This will be fixed eventually.
>
> Isn't GFP_NOFS more restricted than normal allocations? Are these
> allocations accounted against memcg? It's easy to fail any allocation
> within a memory container.

https://lwn.net/Articles/723317/ The 'too small to fail' and some
unwritten semantics of GFP_NOFS but I think you're right about the
memory controler that can fail any allocation though.

Error handling is being improved over time, the memory allocation
failures are in some cases hard and this one would need to update some
logic so it's not a oneliner.

Eric Biggers

unread,

Jun 10, 2019, 7:14:07 PM6/10/19

to Chris Mason, Josef Bacik, David Sterba, linux...@vger.kernel.org, Dmitry Vyukov, Anand Jain, syzbot, LKML, syzkaller-bugs

This bug is still there. In btrfs_close_one_device():

if (device->name) {

name = rcu_string_strdup(device->name->str, GFP_NOFS);

BUG_ON(!name); /* -ENOMEM */
rcu_assign_pointer(new_device->name, name);
}

It assumes that the memory allocation succeeded.

See syzbot report from v5.2-rc3 here: https://syzkaller.appspot.com/text?tag=CrashReport&x=16c839c1a00000

Is there any plan to fix this?

- Eric

David Sterba

unread,

Jun 11, 2019, 6:02:13 AM6/11/19

to Eric Biggers, Chris Mason, Josef Bacik, David Sterba, linux...@vger.kernel.org, Dmitry Vyukov, Anand Jain, syzbot, LKML, syzkaller-bugs

Yes there is, to avoid allocations when closing the device and tracking
the state in another way. As this has never been reported in practice
the priority to fix it is rather low so I can't give you an ETA.

Johannes Thumshirn

unread,

Dec 4, 2019, 9:59:04 AM12/4/19

to syzbot+5b658d...@syzkaller.appspotmail.com, c...@fb.com, dst...@suse.com, jthum...@suse.de, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

#syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
close_fs_devices

Johannes Thumshirn

unread,

Dec 5, 2019, 5:00:55 AM12/5/19

to Johannes Thumshirn, syzbot+5b658d...@syzkaller.appspotmail.com, c...@fb.com, dst...@suse.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

On Wed, Dec 04, 2019 at 03:59:01PM +0100, Johannes Thumshirn wrote:
> #syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
> close_fs_devices

Ok this doesn't look like it worked, let's retry w/o line wrapping

#syz-test git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git close_fs_devices

Dmitry Vyukov

unread,

Dec 5, 2019, 5:07:39 AM12/5/19

to Johannes Thumshirn, Johannes Thumshirn, syzbot, Chris Mason, dst...@suse.com, linux...@vger.kernel.org, LKML, syzkaller-bugs

The correct syntax would be (no dash + colon):

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
close_fs_devices

syzbot

unread,

Dec 5, 2019, 5:07:40 AM12/5/19

to Dmitry Vyukov, c...@fb.com, dst...@suse.com, dvy...@google.com, j...@kernel.org, jthum...@suse.de, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

This crash does not have a reproducer. I cannot test it.

> close_fs_devices

Johannes Thumshirn

unread,

Dec 5, 2019, 6:38:42 AM12/5/19

to Dmitry Vyukov, Johannes Thumshirn, syzbot, Chris Mason, dst...@suse.com, linux...@vger.kernel.org, LKML, syzkaller-bugs

On Thu, Dec 05, 2019 at 11:07:27AM +0100, Dmitry Vyukov wrote:
> The correct syntax would be (no dash + colon):
>
> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git
> close_fs_devices

Ah ok, thanks.

Although syzbot already said it can't test because it has no reproducer.
Anyways good to know for future reports.

Byte,
Johannes

David Sterba

unread,

Dec 5, 2019, 6:50:40 AM12/5/19

to Johannes Thumshirn, Dmitry Vyukov, Johannes Thumshirn, syzbot, Chris Mason, dst...@suse.com, linux...@vger.kernel.org, LKML, syzkaller-bugs

According to

https://syzkaller.appspot.com/bug?id=d50670eeb21302915bde3f25871dfb7ea43db1e4

there is a way how to test it, many reports and the last one about a
week old. Is there a way to instruct syzbot to run the same tests on a
given branch?

(The reproducer is basically setting up environment with limited amount
of memory available for allocation and this hits the BUG_ON.)

Dmitry Vyukov

unread,

Dec 5, 2019, 7:06:36 AM12/5/19

to dst...@suse.cz, Johannes Thumshirn, Dmitry Vyukov, Johannes Thumshirn, syzbot, Chris Mason, dst...@suse.com, linux...@vger.kernel.org, LKML, syzkaller-bugs

syzkaller does this ("rerun the same tests") for every bug always. If
it succeeds (kernel crashes again), it results in a reproducer, that
can later be used for cause/fix bisection and patch testing. In this
case it does not reproduce, so rerunning the same tests will not lead
to anything useful (only if to false confirmation that a patch fixes
the crash).

There is a large number of reasons why a kernel crash may not
reproduce. It may be global accumulated state, non-hermetic tests,
poor syzkaller btrfs descriptions (most likely true) and others.

Need to take a closer look, on first sight it looks like something
that should be reproduced...

Dmitry Vyukov

unread,

Dec 10, 2019, 10:12:08 AM12/10/19

to dst...@suse.cz, Johannes Thumshirn, Dmitry Vyukov, Johannes Thumshirn, syzbot, Chris Mason, dst...@suse.com, linux...@vger.kernel.org, LKML, syzkaller-bugs

Yes, there was a bug around image mount reproduction. Should be fixed
now by https://github.com/google/syzkaller/commit/cb704a294c54aed90281c016a6dc0c40ae295601

Eric Biggers

unread,

Mar 7, 2020, 4:53:35 PM3/7/20

to syzbot, Johannes Thumshirn, linux...@vger.kernel.org, syzkaller-bugs

On Thu, Dec 05, 2019 at 12:38:38PM +0100, Johannes Thumshirn wrote:

Looks like there was a fix for this merged:

commit 321f69f86a0fc40203b43659c3a39464f15c2ee9
Author: Johannes Thumshirn <jthum...@suse.de>
Date: Wed Dec 4 14:36:39 2019 +0100

btrfs: reset device back to allocation state when removing

So telling syzbot:

#syz fix: btrfs: reset device back to allocation state when removing

In the future, please use the Reported-by line that syzbot suggested in its
original mail, so that bugs get automatically closed.

Reply all

Reply to author

Forward