[syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc

syzbot

unread,

Feb 2, 2023, 1:54:42 AM2/2/23

to adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu

Hello,

syzbot found the following issue on:

HEAD commit: c96618275234 Fix up more non-executable files marked execu..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14287dc1480000
kernel config: https://syzkaller.appspot.com/x/.config?x=c8d5c2ee6c2bd4b8
dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler: Debian clang version 13.0.1-6~deb11u1, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/a829cd39e940/disk-c9661827.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/abbc86f52a98/vmlinux-c9661827.xz
kernel image: https://storage.googleapis.com/syzbot-assets/ab0970dd4f84/bzImage-c9661827.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8785e4...@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58
Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586

CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc96618275234 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
print_address_description+0x74/0x340 mm/kasan/report.c:306
print_report+0x107/0x1f0 mm/kasan/report.c:417
kasan_report+0xcd/0x100 mm/kasan/report.c:517
crc16+0x206/0x280 lib/crc16.c:58
ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187
ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210
ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173
ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958
ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416
ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342
ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622
notify_change+0xe50/0x1100 fs/attr.c:482
do_truncate+0x200/0x2f0 fs/open.c:65
handle_truncate fs/namei.c:3216 [inline]
do_open fs/namei.c:3561 [inline]
path_openat+0x272b/0x2dd0 fs/namei.c:3714
do_filp_open+0x264/0x4f0 fs/namei.c:3741
do_sys_openat2+0x124/0x4e0 fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_creat fs/open.c:1402 [inline]
__se_sys_creat fs/open.c:1396 [inline]
__x64_sys_creat+0x11f/0x160 fs/open.c:1396
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f72f8a8c0c9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000
</TASK>

Allocated by task 5119:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x3d/0x60 mm/kasan/common.c:52
__kasan_slab_alloc+0x65/0x70 mm/kasan/common.c:325
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slab.h:761 [inline]
slab_alloc_node mm/slub.c:3452 [inline]
slab_alloc mm/slub.c:3460 [inline]
__kmem_cache_alloc_lru mm/slub.c:3467 [inline]
kmem_cache_alloc+0x1b3/0x350 mm/slub.c:3476
kmem_cache_zalloc include/linux/slab.h:710 [inline]
__kernfs_new_node+0xdb/0x730 fs/kernfs/dir.c:614
kernfs_new_node+0x95/0x160 fs/kernfs/dir.c:676
__kernfs_create_file+0x45/0x2e0 fs/kernfs/file.c:1047
sysfs_add_file_mode_ns+0x21d/0x330 fs/sysfs/file.c:294
create_files fs/sysfs/group.c:64 [inline]
internal_create_group+0x508/0xde0 fs/sysfs/group.c:148
internal_create_groups fs/sysfs/group.c:188 [inline]
sysfs_create_groups+0x5d/0x130 fs/sysfs/group.c:214
create_dir lib/kobject.c:68 [inline]
kobject_add_internal+0x723/0xd10 lib/kobject.c:223
kobject_add_varg lib/kobject.c:358 [inline]
kobject_init_and_add+0x104/0x160 lib/kobject.c:441
netdev_queue_add_kobject net/core/net-sysfs.c:1666 [inline]
netdev_queue_update_kobjects+0x20c/0x4c0 net/core/net-sysfs.c:1718
register_queue_kobjects net/core/net-sysfs.c:1779 [inline]
netdev_register_kobject+0x263/0x310 net/core/net-sysfs.c:2019
register_netdevice+0x1043/0x17a0 net/core/dev.c:10045
bond_newlink+0x3f/0x90 drivers/net/bonding/bond_netlink.c:560
rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
__rtnl_newlink net/core/rtnetlink.c:3624 [inline]
rtnl_newlink+0x14b3/0x2020 net/core/rtnetlink.c:3637
rtnetlink_rcv_msg+0x822/0xf10 net/core/rtnetlink.c:6141
netlink_rcv_skb+0x1f0/0x470 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x7e7/0x9c0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x9b3/0xcd0 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
__sys_sendto+0x46e/0x5f0 net/socket.c:2117
__do_sys_sendto net/socket.c:2129 [inline]
__se_sys_sendto net/socket.c:2125 [inline]
__x64_sys_sendto+0xda/0xf0 net/socket.c:2125
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The buggy address belongs to the object at ffff888075f5c000
which belongs to the cache kernfs_node_cache of size 168
The buggy address is located 0 bytes to the right of
168-byte region [ffff888075f5c000, ffff888075f5c0a8)

The buggy address belongs to the physical page:
page:ffffea0001d7d700 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x75f5c
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 ffff8880129ebc80 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000110011 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 5119, tgid 5119 (syz-executor.3), ts 232703738304, free_ts 232703424583
prep_new_page mm/page_alloc.c:2531 [inline]
get_page_from_freelist+0x742/0x7c0 mm/page_alloc.c:4283
__alloc_pages+0x259/0x560 mm/page_alloc.c:5549
alloc_slab_page+0xbd/0x190 mm/slub.c:1851
allocate_slab+0x5e/0x3c0 mm/slub.c:1998
new_slab mm/slub.c:2051 [inline]
___slab_alloc+0x782/0xe20 mm/slub.c:3193
__slab_alloc mm/slub.c:3292 [inline]
__slab_alloc_node mm/slub.c:3345 [inline]
slab_alloc_node mm/slub.c:3442 [inline]
slab_alloc mm/slub.c:3460 [inline]
__kmem_cache_alloc_lru mm/slub.c:3467 [inline]
kmem_cache_alloc+0x268/0x350 mm/slub.c:3476
kmem_cache_zalloc include/linux/slab.h:710 [inline]
__kernfs_new_node+0xdb/0x730 fs/kernfs/dir.c:614
kernfs_new_node+0x95/0x160 fs/kernfs/dir.c:676
__kernfs_create_file+0x45/0x2e0 fs/kernfs/file.c:1047
sysfs_add_file_mode_ns+0x21d/0x330 fs/sysfs/file.c:294
create_files fs/sysfs/group.c:64 [inline]
internal_create_group+0x508/0xde0 fs/sysfs/group.c:148
internal_create_groups fs/sysfs/group.c:188 [inline]
sysfs_create_groups+0x5d/0x130 fs/sysfs/group.c:214
create_dir lib/kobject.c:68 [inline]
kobject_add_internal+0x723/0xd10 lib/kobject.c:223
kobject_add_varg lib/kobject.c:358 [inline]
kobject_init_and_add+0x104/0x160 lib/kobject.c:441
netdev_queue_add_kobject net/core/net-sysfs.c:1666 [inline]
netdev_queue_update_kobjects+0x20c/0x4c0 net/core/net-sysfs.c:1718
register_queue_kobjects net/core/net-sysfs.c:1779 [inline]
netdev_register_kobject+0x263/0x310 net/core/net-sysfs.c:2019
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1446 [inline]
free_pcp_prepare+0x751/0x780 mm/page_alloc.c:1496
free_unref_page_prepare mm/page_alloc.c:3369 [inline]
free_unref_page+0x19/0x4c0 mm/page_alloc.c:3464
qlist_free_all+0x2b/0x70 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x156/0x170 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0x1f/0x70 mm/kasan/common.c:302
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slab.h:761 [inline]
slab_alloc_node mm/slub.c:3452 [inline]
__kmem_cache_alloc_node+0x1e0/0x340 mm/slub.c:3491
kmalloc_trace+0x26/0x60 mm/slab_common.c:1062
kmalloc include/linux/slab.h:580 [inline]
kzalloc include/linux/slab.h:720 [inline]
ref_tracker_alloc+0x128/0x440 lib/ref_tracker.c:85
__netdev_tracker_alloc include/linux/netdevice.h:4020 [inline]
netdev_hold include/linux/netdevice.h:4049 [inline]
rx_queue_add_kobject net/core/net-sysfs.c:1060 [inline]
net_rx_queue_update_kobjects+0x15d/0x4c0 net/core/net-sysfs.c:1114
register_queue_kobjects net/core/net-sysfs.c:1774 [inline]
netdev_register_kobject+0x222/0x310 net/core/net-sysfs.c:2019
register_netdevice+0x1043/0x17a0 net/core/dev.c:10045
bond_newlink+0x3f/0x90 drivers/net/bonding/bond_netlink.c:560
rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
__rtnl_newlink net/core/rtnetlink.c:3624 [inline]
rtnl_newlink+0x14b3/0x2020 net/core/rtnetlink.c:3637
rtnetlink_rcv_msg+0x822/0xf10 net/core/rtnetlink.c:6141
netlink_rcv_skb+0x1f0/0x470 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x7e7/0x9c0 net/netlink/af_netlink.c:1365

Memory state around the buggy address:
ffff888075f5bf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888075f5c000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff888075f5c080: 00 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00
^
ffff888075f5c100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff888075f5c180: 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 00 00
==================================================================

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

syzbot

unread,

Feb 13, 2023, 10:56:47 AM2/13/23

to adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu

syzbot has found a reproducer for the following issue on:

HEAD commit: ceaa837f96ad Linux 6.2-rc8
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11727cc7480000
kernel config: https://syzkaller.appspot.com/x/.config?x=42ba4da8e1e6af9f
dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14392a4f480000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/88042f9b5fc8/disk-ceaa837f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9945b57ec9ee/vmlinux-ceaa837f.xz
kernel image: https://storage.googleapis.com/syzbot-assets/72ff118ed96b/bzImage-ceaa837f.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/dabec17b2679/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8785e4...@syzkaller.appspotmail.com

==================================================================

BUG: KASAN: use-after-free in crc16+0x1fb/0x280 lib/crc16.c:58
Read of size 1 at addr ffff88807de00000 by task syz-executor.1/5339

CPU: 1 PID: 5339 Comm: syz-executor.1 Not tainted 6.2.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023

Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]

dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:306 [inline]
print_report+0x163/0x4f0 mm/kasan/report.c:417
kasan_report+0x13a/0x170 mm/kasan/report.c:517
crc16+0x1fb/0x280 lib/crc16.c:58
ext4_group_desc_csum+0x90f/0xc50 fs/ext4/super.c:3187
ext4_group_desc_csum_set+0x19b/0x240 fs/ext4/super.c:3210
ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
ext4_free_blocks+0x1c57/0x3010 fs/ext4/mballoc.c:6173

ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]

ext4_ext_remove_space+0x289e/0x5270 fs/ext4/extents.c:2958
ext4_ext_truncate+0x176/0x210 fs/ext4/extents.c:4416
ext4_truncate+0xafa/0x1450 fs/ext4/inode.c:4342
ext4_evict_inode+0xc40/0x1230 fs/ext4/inode.c:286
evict+0x2a4/0x620 fs/inode.c:664
do_unlinkat+0x4f1/0x930 fs/namei.c:4327
__do_sys_unlink fs/namei.c:4368 [inline]
__se_sys_unlink fs/namei.c:4366 [inline]
__x64_sys_unlink+0x49/0x50 fs/namei.c:4366
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fbc85a8c0f9

Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48

RSP: 002b:00007fbc86838168 EFLAGS: 00000246 ORIG_RAX: 0000000000000057
RAX: ffffffffffffffda RBX: 00007fbc85babf80 RCX: 00007fbc85a8c0f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000000
RBP: 00007fbc85ae7ae9 R08: 0000000000000000 R09: 0000000000000000

R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000

R13: 00007ffd5743beaf R14: 00007fbc86838300 R15: 0000000000022000
</TASK>

The buggy address belongs to the physical page:

page:ffffea0001f78000 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x7de00
flags: 0xfff00000000000(node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000000 ffffea0001f86008 ffffea0001db2a08 0000000000000000
raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000

page dumped because: kasan: bad access detected

page_owner tracks the page as freed
page last allocated via order 1, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 4855, tgid 4855 (sshd), ts 43553490210, free_ts 58249059760
prep_new_page mm/page_alloc.c:2531 [inline]
get_page_from_freelist+0x3449/0x35c0 mm/page_alloc.c:4283
__alloc_pages+0x291/0x7e0 mm/page_alloc.c:5549
alloc_slab_page+0x6a/0x160 mm/slub.c:1851
allocate_slab mm/slub.c:1998 [inline]
new_slab+0x84/0x2f0 mm/slub.c:2051
___slab_alloc+0xa85/0x10a0 mm/slub.c:3193
__kmem_cache_alloc_bulk mm/slub.c:3951 [inline]
kmem_cache_alloc_bulk+0x160/0x430 mm/slub.c:4026
mt_alloc_bulk lib/maple_tree.c:157 [inline]
mas_alloc_nodes+0x381/0x640 lib/maple_tree.c:1257
mas_node_count_gfp lib/maple_tree.c:1316 [inline]
mas_preallocate+0x131/0x350 lib/maple_tree.c:5724
vma_expand+0x277/0x850 mm/mmap.c:541
mmap_region+0xc43/0x1fb0 mm/mmap.c:2592
do_mmap+0x8c9/0xf70 mm/mmap.c:1411
vm_mmap_pgoff+0x1ce/0x2e0 mm/util.c:520
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1446 [inline]

free_pcp_prepare mm/page_alloc.c:1496 [inline]
free_unref_page_prepare+0xf3a/0x1040 mm/page_alloc.c:3369
free_unref_page+0x37/0x3f0 mm/page_alloc.c:3464
qlist_free_all+0x22/0x60 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x15a/0x170 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:302
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook+0x68/0x390 mm/slab.h:761
slab_alloc_node mm/slub.c:3452 [inline]
kmem_cache_alloc_node+0x158/0x2c0 mm/slub.c:3497
__alloc_skb+0xd6/0x2d0 net/core/skbuff.c:552
alloc_skb include/linux/skbuff.h:1270 [inline]
alloc_skb_with_frags+0xa8/0x750 net/core/skbuff.c:6194
sock_alloc_send_pskb+0x919/0xa50 net/core/sock.c:2743
unix_dgram_sendmsg+0x5b5/0x2050 net/unix/af_unix.c:1943

sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]

__sys_sendto+0x475/0x5f0 net/socket.c:2117

__do_sys_sendto net/socket.c:2129 [inline]
__se_sys_sendto net/socket.c:2125 [inline]

__x64_sys_sendto+0xde/0xf0 net/socket.c:2125
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Memory state around the buggy address:

ffff88807ddfff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88807ddfff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88807de00000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88807de00080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88807de00100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================

Tudor Ambarus

unread,

Mar 1, 2023, 7:13:55 AM3/1/23

to syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

Hi!

I think the patch from below should fix it.

I printed le16_to_cpu(sbi->s_es->s_desc_size) and it was greater than
EXT4_MAX_DESC_SIZE. What I think it happens is that the contents of the
super block in the buffer get corrupted sometime after the .get_tree
(which eventually calls __ext4_fill_super()) is called. So instead of
relying on the contents of the buffer, we should instead rely on the
s_desc_size initialized at the __ext4_fill_super() time.

If someone finds this good (or bad), or has a more in depth explanation,
please let me know, it will help me better understand the subsystem. In
the meantime I'll continue to investigate this and prepare a patch for
it.

Cheers,
ta

index 260c1b3e3ef2..91d41e84da32 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct
super_block *sb, __u32 block_group,
crc = crc16(crc, (__u8 *)gdp, offset);
offset += sizeof(gdp->bg_checksum); /* skip checksum */
/* for checksum of struct ext4_group_desc do the rest...*/
- if (ext4_has_feature_64bit(sb) &&
- offset < le16_to_cpu(sbi->s_es->s_desc_size))
+ if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
crc = crc16(crc, (__u8 *)gdp + offset,
- le16_to_cpu(sbi->s_es->s_desc_size) -
- offset);
+ sbi->s_desc_size - offset);

out:
return cpu_to_le16(crc);

syzbot

unread,

Mar 3, 2023, 4:43:38 PM3/3/23

to adilger...@dilger.ca, jone...@google.com, linux...@vger.kernel.org, linux-...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, tudor....@linaro.org, ty...@mit.edu

syzbot has found a reproducer for the following issue on:

HEAD commit: 596b6b709632 Merge branch 'for-next/core' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=1151054cc80000
kernel config: https://syzkaller.appspot.com/x/.config?x=3519974f3f27816d
dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ce3de4c80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16b02598c80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/06e2210b88a3/disk-596b6b70.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/79e6930ab577/vmlinux-596b6b70.xz
kernel image: https://storage.googleapis.com/syzbot-assets/56b95e6bcb5c/Image-596b6b70.gz.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/a765d6554060/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8785e4...@syzkaller.appspotmail.com

==================================================================

BUG: KASAN: slab-out-of-bounds in crc16+0xc0/0x104 lib/crc16.c:58
Read of size 1 at addr ffff0000d5eff0a8 by task syz-executor175/8245

CPU: 1 PID: 8245 Comm: syz-executor175 Not tainted 6.2.0-syzkaller-18302-g596b6b709632 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
Call trace:
dump_backtrace+0x1c8/0x1f4 arch/arm64/kernel/stacktrace.c:158
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:165
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:306 [inline]
print_report+0x174/0x4c0 mm/kasan/report.c:417
kasan_report+0xd4/0x130 mm/kasan/report.c:517
__asan_report_load1_noabort+0x2c/0x38 mm/kasan/report_generic.c:348
crc16+0xc0/0x104 lib/crc16.c:58
ext4_group_desc_csum+0x6a8/0x99c fs/ext4/super.c:3187
ext4_group_desc_csum_set+0x17c/0x210 fs/ext4/super.c:3210
__ext4_new_inode+0x20dc/0x3acc fs/ext4/ialloc.c:1227
ext4_create+0x234/0x480 fs/ext4/namei.c:2809
lookup_open fs/namei.c:3413 [inline]
open_last_lookups fs/namei.c:3481 [inline]
path_openat+0xe6c/0x2578 fs/namei.c:3711
do_filp_open+0x1bc/0x3cc fs/namei.c:3741
do_sys_openat2+0x128/0x3d8 fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_openat fs/open.c:1342 [inline]
__se_sys_openat fs/open.c:1337 [inline]
__arm64_sys_openat+0x1f0/0x240 fs/open.c:1337
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

Allocated by task 5961:
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4c/0x80 mm/kasan/common.c:52
kasan_save_alloc_info+0x24/0x30 mm/kasan/generic.c:512
__kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:328
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook+0x80/0x478 mm/slab.h:761

slab_alloc_node mm/slub.c:3452 [inline]
slab_alloc mm/slub.c:3460 [inline]
__kmem_cache_alloc_lru mm/slub.c:3467 [inline]

kmem_cache_alloc+0x288/0x37c mm/slub.c:3476
kmem_cache_zalloc include/linux/slab.h:710 [inline]
__kernfs_new_node+0xe4/0x66c fs/kernfs/dir.c:614
kernfs_new_node+0x98/0x184 fs/kernfs/dir.c:676
__kernfs_create_file+0x60/0x2d4 fs/kernfs/file.c:1047
sysfs_add_file_mode_ns+0x1dc/0x298 fs/sysfs/file.c:294
create_files fs/sysfs/group.c:64 [inline]
internal_create_group+0x428/0xbec fs/sysfs/group.c:148
internal_create_groups fs/sysfs/group.c:188 [inline]
sysfs_create_groups+0x60/0x130 fs/sysfs/group.c:214
create_dir lib/kobject.c:68 [inline]
kobject_add_internal+0x5d4/0xb14 lib/kobject.c:223
kobject_add_varg lib/kobject.c:358 [inline]
kobject_init_and_add+0x130/0x1a0 lib/kobject.c:441
netdev_queue_add_kobject net/core/net-sysfs.c:1666 [inline]
netdev_queue_update_kobjects+0x1d8/0x470 net/core/net-sysfs.c:1718
register_queue_kobjects net/core/net-sysfs.c:1779 [inline]
netdev_register_kobject+0x22c/0x2d8 net/core/net-sysfs.c:2019
register_netdevice+0xcb8/0x1270 net/core/dev.c:10037
bond_newlink+0x50/0xa8 drivers/net/bonding/bond_netlink.c:560

rtnl_newlink_create net/core/rtnetlink.c:3407 [inline]
__rtnl_newlink net/core/rtnetlink.c:3624 [inline]

rtnl_newlink+0x1174/0x1b1c net/core/rtnetlink.c:3637
rtnetlink_rcv_msg+0x6ec/0xc8c net/core/rtnetlink.c:6141
netlink_rcv_skb+0x214/0x3c4 net/netlink/af_netlink.c:2574
rtnetlink_rcv+0x28/0x38 net/core/rtnetlink.c:6159
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x660/0x8d4 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x800/0xae0 net/netlink/af_netlink.c:1942

sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]

__sys_sendto+0x3b4/0x504 net/socket.c:2120
__do_sys_sendto net/socket.c:2132 [inline]
__se_sys_sendto net/socket.c:2128 [inline]
__arm64_sys_sendto+0xd8/0xf8 net/socket.c:2128
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
el0_svc_common+0x138/0x258 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193
el0_svc+0x58/0x168 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591

The buggy address belongs to the object at ffff0000d5eff000

which belongs to the cache kernfs_node_cache of size 168
The buggy address is located 0 bytes to the right of

168-byte region [ffff0000d5eff000, ffff0000d5eff0a8)

The buggy address belongs to the physical page:

page:0000000016584f53 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x115eff
flags: 0x5ffc00000000200(slab|node=0|zone=2|lastcpupid=0x7ff)
raw: 05ffc00000000200 ffff0000c0844c00 dead000000000122 0000000000000000

raw: 0000000000000000 0000000000110011 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:

ffff0000d5efef80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff0000d5eff000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff0000d5eff080: 00 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00
^
ffff0000d5eff100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff0000d5eff180: 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 00 00
==================================================================
EXT4-fs error (device loop3): __ext4_get_inode_loc:4560: comm syz-executor175: Invalid inode table block 4 in block_group 0
EXT4-fs error (device loop3) in ext4_reserve_inode_write:5906: Corrupt filesystem
EXT4-fs error (device loop3): __ext4_get_inode_loc:4560: comm syz-executor175: Invalid inode table block 4 in block_group 0
EXT4-fs error (device loop3) in ext4_reserve_inode_write:5906: Corrupt filesystem
EXT4-fs error (device loop3): ext4_evict_inode:279: inode #18: comm syz-executor175: mark_inode_dirty error
EXT4-fs warning (device loop3): ext4_evict_inode:282: couldn't mark inode dirty (err -117)

Jan Kara

unread,

Mar 7, 2023, 5:40:00 AM3/7/23

to Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

Hi!

If there's something corrupting the superblock while the filesystem is
mounted, we need to find what is corrupting the SB and fix *that*. Not try
to paper over the problem by not using the on-disk data... Maybe journal
replay is corrupting the value or something like that?

Honza

> index 260c1b3e3ef2..91d41e84da32 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -3182,11 +3182,9 @@ static __le16 ext4_group_desc_csum(struct super_block
> *sb, __u32 block_group,
> crc = crc16(crc, (__u8 *)gdp, offset);
> offset += sizeof(gdp->bg_checksum); /* skip checksum */
> /* for checksum of struct ext4_group_desc do the rest...*/
> - if (ext4_has_feature_64bit(sb) &&
> - offset < le16_to_cpu(sbi->s_es->s_desc_size))
> + if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
> crc = crc16(crc, (__u8 *)gdp + offset,
> - le16_to_cpu(sbi->s_es->s_desc_size) -
> - offset);
> + sbi->s_desc_size - offset);
>
> out:
> return cpu_to_le16(crc);

--
Jan Kara <ja...@suse.com>
SUSE Labs, CR

Tudor Ambarus

unread,

Mar 7, 2023, 6:02:53 AM3/7/23

to Jan Kara, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

On 3/7/23 10:39, Jan Kara wrote:
> Hi!

Hi!

Thanks for taking the time to review the proposal!

Ok, I agree. First thing would be to understand the reproducer and to
simplify it if possible. I haven't yet decoded what the syz repro is
doing at
https://syzkaller.appspot.com/text?tag=ReproSyz&x=16ce3de4c80000
Will reply to this email thread once I understand what's happening. If
you or someone else can decode the syz repro faster than me, shoot.

Cheers,
ta

Tudor Ambarus

unread,

Mar 13, 2023, 7:11:36 AM3/13/23

to Jan Kara, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

Hi, Jan,

I can now explain how the contents of the super block of the buffer get
corrupted. After the ext4 fs is mounted to the target ("./bus"), the
reproducer maps 6MB of data starting at offset 0 in the target's file
("./bus"), then it starts overriding the data with something else, by
using memcpy, memset, individual byte inits. Does that mean that we
shouldn't rely on the contents of the super block in the buffer after we
mount the file system? If so, then my patch stands. I'll be happy to
extend it if needed. Below one may find a step by step interpretation of
the reproducer.

We have a strace log for the same bug, but on Android 5.15:
https://syzkaller.appspot.com/text?tag=CrashLog&x=14ecec8cc80000

Look for pid 328. You notice that the bpf() syscalls return error, so I
commented them out in the c repro to confirm that they are not the
cause. The bug reproduced without the bpf() calls. One can find the c
repro at:
https://syzkaller.appspot.com/text?tag=ReproC&x=17c5fc50c80000

Let's look at these calls, just before the bug was hit:
[pid 328] open("./bus",
O_RDWR|O_CREAT|O_TRUNC|O_NONBLOCK|O_SYNC|O_DIRECT|O_LARGEFILE|O_NOATIME,
000) = 4
[pid 328] mount("/dev/loop0", "./bus", NULL, MS_BIND, NULL) = 0
[pid 328] open("./bus", O_RDWR|O_SYNC|O_NOATIME|0x3c) = 5
[pid 328] mmap(0x20000000, 6291456,
PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|0x47ffff0, MAP_SHARED|MAP_FIXED,
5, 0) = 0x20000000

- ./bus is created (if it does not exist), fd 4 is returned.
- /dev/loop0 is mounted to ./bus
- then it creates a new file descriptor (5) for the same ./bus
- then it creates a mapping for ./bus starting at offset zero. The
mapped area is at 0x20000000 and is of 0x600000ul length.

Now look again in the c reproducer. You'll see that after the mapping
lots of bytes are overwritten starting with 0x20000300. If I comment out
all those byte modifications after the mmap, the reproducer is silenced.

Jan Kara

unread,

Mar 13, 2023, 7:57:30 AM3/13/23

to Tudor Ambarus, Jan Kara, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

Hi Tudor!

Yeah, looking at the reproducer, before this the reproducer also mounts
/dev/loop0 as ext4 filesystem.

> - ./bus is created (if it does not exist), fd 4 is returned.
> - /dev/loop0 is mounted to ./bus
> - then it creates a new file descriptor (5) for the same ./bus
> - then it creates a mapping for ./bus starting at offset zero. The
> mapped area is at 0x20000000 and is of 0x600000ul length.

So the result is that the reproducer modified the block device while it is
mounted by the filesystem. We know cases like this can crash the kernel and
it is inherently difficult to fix. We have to trust the buffer cache
contents as otherwise the performance will be unacceptable. For historical
reasons we also have to allow modifications of buffer cache while ext4 is
mounted because tune2fs uses this to e.g. update the label of a mounted
filesystem.

Long-term we are moving ext4 in a direction where we can disallow block
device modifications while the fs is mounted but we are not there yet. I've
discussed some shorter-term solution to avoid such known problems with syzbot
developers and what seems plausible would be a kconfig option to disallow
writing to a block device when it is exclusively open by someone else.
But so far I didn't get to trying whether this would reasonably work. Would
you be interested in having a look into this?

Honza

yebin

unread,

Mar 13, 2023, 8:33:05 AM3/13/23

to Jan Kara, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

I am interested in this job. The file system is often damaged by writing
block devices,
which is a headache. I have always wanted to eradicate this kind of problem.
A few months ago, I tried to add a mount parameter to prohibit
modification after the
block device is mounted.But I encountered several problems that led to
the termination
of my attempt. First of all, the 32-bit super block flags have been used
up and need to
be extended. Secondly, I don't know how to handle read-only flag in the
case of multiple
mount points.

"disallow writing to a block device when it is exclusively open by
someone else. "

-> Perhaps we can add a new IOCTL command to control whether write
operations are
allowed after the block device has been exclusively opened. I don't know
if this is feasible?
Do you have any good suggestions?
> Honza

Jan Kara

unread,

Mar 13, 2023, 9:01:55 AM3/13/23

to yebin, Jan Kara, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

Well, ioctl() for syzbot would be possible as well but for start I'd try
whether the idea with kconfig option will work. Then it will be enough to
just make sure all kernels used for fuzzing are built with this option set.
Thanks for having a look into this!

yebin (H)

unread,

Mar 13, 2023, 9:20:36 AM3/13/23

to Jan Kara, yebin, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

In fact, I also want to solve the problem of file system damage caused
by writing raw disks
in the production environment. Use kconfig directly to control whether
it loses flexibility in
the production environment.
>
> Honza

Tudor Ambarus

unread,

Mar 13, 2023, 10:43:43 AM3/13/23

to Jan Kara, yebin, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

Hi, Jan, Ye,

sounds good.

>>> discussed some shorter-term solution to avoid such known problems with syzbot
>>> developers and what seems plausible would be a kconfig option to disallow
>>> writing to a block device when it is exclusively open by someone else.

How do we determine when a block device is exclusively open by someone else?

>>> But so far I didn't get to trying whether this would reasonably work. Would
>>> you be interested in having a look into this?
>>
>> I am interested in this job. The file system is often damaged by writing

I'm fine with Ye handling this. If that's not the case I can take a look
too, but I need more pointers than the ones already provided, as I've
recently started skimming over ext4.

>> block devices, which is a headache. I have always wanted to eradicate
>> this kind of problem. A few months ago, I tried to add a mount parameter
>> to prohibit modification after the block device is mounted.But I
>> encountered several problems that led to the termination of my attempt.
>> First of all, the 32-bit super block flags have been used up and need to
>> be extended. Secondly, I don't know how to handle read-only flag in the
>> case of multiple mount points.
>> "disallow writing to a block device when it is exclusively open by someone
>> else. "
>> -> Perhaps we can add a new IOCTL command to control whether write
>> operations are allowed after the block device has been exclusively
>> opened. I don't know if this is feasible? Do you have any good
>> suggestions?
>
> Well, ioctl() for syzbot would be possible as well but for start I'd try
> whether the idea with kconfig option will work. Then it will be enough to
> just make sure all kernels used for fuzzing are built with this option set.

How should we treat such bugs until the kconfig option is introduced? Do
we let them open, do we mark them as won't fix? The kconfig solution
feels a bit as a workaround, the bugs will still be hit by someone not
selecting that config option.

Cheers,
ta

Dmitry Vyukov

unread,

Mar 13, 2023, 10:54:12 AM3/13/23

to Jan Kara, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

Hi Jan,

Does this affect only the loop device or also USB storage devices?
Say, if the USB device returns different contents during mount and on
subsequent reads?

Theodore Ts'o

unread,

Mar 13, 2023, 10:26:59 PM3/13/23

to Dmitry Vyukov, Jan Kara, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, Lee Jones

On Mon, Mar 13, 2023 at 03:53:57PM +0100, Dmitry Vyukov wrote:
> > Long-term we are moving ext4 in a direction where we can disallow block
> > device modifications while the fs is mounted but we are not there yet. I've
> > discussed some shorter-term solution to avoid such known problems with syzbot
> > developers and what seems plausible would be a kconfig option to disallow
> > writing to a block device when it is exclusively open by someone else.
> > But so far I didn't get to trying whether this would reasonably work. Would
> > you be interested in having a look into this?
>

> Does this affect only the loop device or also USB storage devices?
> Say, if the USB device returns different contents during mount and on
> subsequent reads?

Modifying the block device while the file system is mounted is
something that we have to allow for now because tune2fs uses it to
modify the superblock. It has historically also been used (rarely) by
people who know what they are doing to do surgery on a mounted file
system. If we create a way for tune2fs to be able to update the
superblock via some kind of ioctl, we could disallow modifying the
block device while the file system is mounted. Of course, it would
require waiting at least 5-6 years since sometimes people will update
the kernel without updating userspace. We'd also need to check to
make sure there aren't boot loader installer (such as grub-install)
that depend on being able to modify the block device while the root
file system is mounted, at least in some rare cases.

The "how" to exclude mounted file systems is relatively easy. The
kernel already knows when the file system is mounted, and it is
already a supported feature that a userspace application that wants to
be careful can open a block device with O_EXCL, and if it is in use by
the kernel --- mounted by a file system, being used by dm-thin, et. al
-- the open(2) system call will fail. From the open(2) man page.

In general, the behavior of O_EXCL is undefined if it is used without
O_CREAT. There is one exception: on Linux 2.6 and later, O_EXCL can
be used without O_CREAT if pathname refers to a block device. If the
block device is in use by the system (e.g., mounted), open() fails
with the error EBUSY.

Something which the syzbot could to do today is to simply use O_EXCL
whenever trying to open a block device. This would avoid a class of
syzbot false positives, since normally it requires root privileges
and/or an experienced sysadmin to try to modify a block device while
it is mounted and/or in use by LVM.

- Ted

P.S. Trivia note: Aproximately month after I started work at VA Linux
Systems, a sysadmin intern which was given the root password to
sourceforge.net, while trying to fix a disk-to-disk backup, ran
mkfs.ext3 on /dev/hdXX, which was also being used as one-half of a
RAID 0 setup on which open source code critical to the community
(including, for example, OpenGL) was mounted and serving. The intern
got about 50% the way through zeroing the inode table on /dev/hdXX
before the file system noticed and threw an error, at which point
wiser heads stopped what the intern was doing and tried to clean up
the mess. Of course, there were no backups, since that was what the
intern was trying to fix!

There are a couple of things that we could learn from this incident.
One was that giving the root password to an untrained intern not
familiar with the setup on the serving system was... an unfortunate
choice. Another was that adding the above-mentioned O_EXCL feature
and teaching mkfs to use it was an obvious post-mortem action item to
prevent this kind of problem in the future...

Jan Kara

unread,

Mar 14, 2023, 4:50:00 AM3/14/23

to Dmitry Vyukov, Jan Kara, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

> Does this affect only the loop device or also USB storage devices?
> Say, if the USB device returns different contents during mount and on
> subsequent reads?

So if USB returns a different content, we are fine because we verify the
content each time when loading it into the buffer cache. But if something
in the software opens the block device and modifies it, it modifies
directly the buffer cache and thus bypasses any checks we do when loading
data from the storage.

Dmitry Vyukov

unread,

Mar 14, 2023, 5:33:41 AM3/14/23

to Jan Kara, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones, syzkaller

Thanks, I see. This is good.

Dmitry Vyukov

unread,

Mar 14, 2023, 5:45:47 AM3/14/23

to Theodore Ts'o, Jan Kara, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, Lee Jones

I am struggling to make my mind re how to think about this case.

"root" is very overloaded, but generally it does not mean "randomly
corrupting memory". Normally it gives access to system-wide changes
but with the same protection/consistency guarantees as for
unprivileged system calls.

There are, of course, things like /dev/{mem,kmem}. But at the same
time there is also lockdown LSM and more distros today enable it.

Btw, should this "prohibit writes to mounted device" be part of
LOCKDOWN_INTEGRITY? It looks like it gives capabilities similar to
/dev/{mem,kmem}.

Disabling in testing something that's enabled in production is
generally not very useful.
So one option is to do nothing about this for now.
If it's a true recognized issue that is in the process of fixing,
syzbot will just show that it's still present. One of the goals of
syzbot is to show the current state of things in an objective manner.
If some kernel developers are aware of an issue, it does not mean that
most distros/users are aware.

It makes sense to disable in testing things that are also recommended
to be disabled in production settings.
And LOCKDOWN_INTEGRITY may play such a role: we include this
restriction into LOCKDOWN_INTEGRITY and enable it on syzbot.
Though, unfortunately, we still don't enable it because it prohibits
access to debugfs, which is required for fuzzing. Need to ask lockdown
maintainers what they think about
LOCKDOWN_TEST_ONLY_DONT_ENABLE_IN_PROD_INTEGRITY which would whitelist
debugfs.

Dmitry Vyukov

unread,

Mar 14, 2023, 6:05:37 AM3/14/23

to Theodore Ts'o, Jan Kara, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, Lee Jones

Asked lockdown maintainers about adding this it lockdown and adding
special mode for fuzzing:
https://lore.kernel.org/all/CACT4Y+Z-9KCgKwkktvdJwNJZxxeA1f74zkP7KD6c=OmKX...@mail.gmail.com/

Jan Kara

unread,

Mar 14, 2023, 7:19:22 AM3/14/23

to yebin (H), Jan Kara, yebin, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, ty...@mit.edu, Lee Jones

I see. But which protections do you exactly want in production? Since you
need to add somewhere the call to ioctl(2) to write-protect the device, you
could as well just "chmod ugo-w <device>" instead, couldn't you? And the
level of protection would be similar.

Theodore Ts'o

unread,

Apr 29, 2023, 10:55:22 PM4/29/23

to Jan Kara, Tudor Ambarus, syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, nat...@kernel.org, ndesau...@google.com, syzkall...@googlegroups.com, tr...@redhat.com, Lee Jones, syzbot+1966db...@syzkaller.appspotmail.com, syzbot+db6caa...@syzkaller.appspotmail.com, syzbot+e2efa3...@syzkaller.appspotmail.com

On Mon, Mar 13, 2023 at 12:57:28PM +0100, Jan Kara wrote:
> >
> > I can now explain how the contents of the super block of the buffer get
> > corrupted. After the ext4 fs is mounted to the target ("./bus"), the
> > reproducer maps 6MB of data starting at offset 0 in the target's file
> > ("./bus"), then it starts overriding the data with something else, by
> > using memcpy, memset, individual byte inits. Does that mean that we
> > shouldn't rely on the contents of the super block in the buffer after we
> > mount the file system?

It's not reasonable to avoid relying on the contents of the superblock
under all cases. HOWEVER, sometimes it might make sense. See below...

> So the result is that the reproducer modified the block device while it is
> mounted by the filesystem. We know cases like this can crash the kernel and
> it is inherently difficult to fix. We have to trust the buffer cache
> contents as otherwise the performance will be unacceptable. For historical
> reasons we also have to allow modifications of buffer cache while ext4 is
> mounted because tune2fs uses this to e.g. update the label of a mounted
> filesystem.

I've been taking a look at some of the syzkaller reports for ext4, and
there are a number of sysbot reports which are caused by the
reproducer messing with the block device while the file system is
mounted, including:

KASAN: slab-out-of-bounds Read in get_max_inline_xattr_value_size
https://syzkaller.appspot.com/bug?id=731e35eeed762019e385baa96953d9ec8eb63c10
syzbot+1966db...@syzkaller.appspotmail.com

KASAN: slab-use-after-free Read in ext4_convert_inline_data_nolock
https://syzkaller.appspot.com/bug?id=434a92f091e845da1ba387fb93f186412e30e35c
syzbot+db6caa...@syzkaller.appspotmail.com

kernel BUG in ext4_get_group_info
https://syzkaller.appspot.com/bug?id=69b28112e098b070f639efb356393af3ffec4220
syzbot+e2efa3...@syzkaller.appspotmail.com

(The easiest way to find them is to look at the Syzkaller reproducer,
and look for bind mounts of /dev/loopN to "./bus". It's much less
painful than trying to find it in the C reproducer text file.)

As Jan has pointed out, we can't disable writing to the block device,
because this would break real-world system administrator workloads,
including the ability to set the label and uuid, use tune2fs to set
various parameters on the file system, etc. We do have ioctls that
allow for setting the label and uuid, and in maybe ten years we should
be able to get to the point where all of the enterprise kernels still
supported by Red Hat, SuSE, etc. can be guaranteed to support all of
the necessary ioctls --- some of which still need to be implemented.

So this will take a *while*, and especially while senior management
types at many companies are announcing layoffs, cutting travel, and
talking about "year of efficiency" and "sharpening focus"[1], I don't
think we'll have much luck getting funded head count to impement
missing ioctls, other than slowly, on volunteer time, and maybe as
intern projects. So what should we do in the intervening
year(s)/decade? I'd propose the following priorities.

[1] while simultaneously whining about "kernel (security) disasters"
and blaming the upstream developers. Sigh...

From a quality of implementation (QoI) perspective, once we've
determined that it's caused by "messing with the block device while it
is mounted", if it just causes a denial of service attack, these should
be the lowest priority. However, if there is an easy way to fix it,
AND if it fixes other issues OR makes the kernel smaller and/or more
efficient, I won't turn away those kind of proposed patches.

For example, in the case of the syzkaller report discussed in this
thread ("KASAN: slab-out-of-bounds Read in ext4_group_desc_csum"),
Tudor's proposed change of replacing

le16_to_cpu(sbi->s_es->s_desc_size)

with
sbi->s_desc_size

will actually reduce ext4's compiled text size, and make the code more
efficient (we remove an extra indirect reference and a potential byte
swap on big endian systems), and there is no downside. In fact, in
many places we use sbi->s_desc_size in preference to accessing the
s_es variable; that's why we put it in the ext4_super_info structure
in the first place! So sure, we should make this change, and if it
avoids a potential KASAN / syzkaller failure, that's a bonus.

Slightly higher in priority are those bugs which might allow kernel
state to be leaked ("kernel confidentiality"). Of course, if the
process with root access can write to the block device, it can almost
certainly read that block device as well; but there might be critical
bits of kernel state (for example, an RSA private key), in kernel
memory, that if leaked, it would be sad.

The highest priority would go to those where root access might be
leveraged to allow arbitrary code to be executed in kernel mode
("kernel integrity") --- which is unfortunate because it allows root
access to breach lockdown security.

Of course, since many of the people working syzbot reports for ext4
are volunteers and/or company engineers working on their own unfunded
personal time, we still can't *guarantee* anything. In addition, I'd
still reject a patch which had an overly expensive CPU or memory
overhead with a "try harder". So it would still be on a case-by-case
basis whether such patches would be accepted. After all, some
business leaders have elected to disable some mitigations for
Spectre/Meltdown and related attacks because they were Too Damn
Expensive. I reserve the right as upstream maintainer to make similar
judgement calls.

- Ted

P.S. As another example, over the weekend, I've been working on some
patches in the works to address the third syzbot report listed above
("kernel BUG in ext4_get_group_info"). When I evaluated these
patches, I found that they increased the compiled text size by 2k when
I added the additional checks, none of which were in hot paths. But
after I un-inlined ext4_get_group_info(), the compiled test size
shrunk by 4k, for a net 2k byte *savings* in compiled kernel text
memory.

We already had similar checks and calls to ext4_error() in
ext4_get_group_desc(); this patch was just added a similar conditional
call to ext4_error() to ext4_get_group_info() --- and changing the
callers of that function to check for a NULL return from that
function. While this change only prevents a denial of service attack,
in my judgement the QoI benefits outweigh the costs.

Theodore Ts'o

unread,

May 6, 2023, 11:54:16 PM5/6/23

to syzbot, syzkall...@googlegroups.com

#syz test git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tt/next

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d39f386e9baf..e3d0d3c04785 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3238,11 +3238,9 @@ static __le16 ext4_group_desc_csum(struct super_block *sb, __u32 block_group,

crc = crc16(crc, (__u8 *)gdp, offset);
offset += sizeof(gdp->bg_checksum); /* skip checksum */
/* for checksum of struct ext4_group_desc do the rest...*/
- if (ext4_has_feature_64bit(sb) &&
- offset < le16_to_cpu(sbi->s_es->s_desc_size))
+ if (ext4_has_feature_64bit(sb) && offset < sbi->s_desc_size)
crc = crc16(crc, (__u8 *)gdp + offset,
- le16_to_cpu(sbi->s_es->s_desc_size) -
- offset);
+ sbi->s_desc_size - offset);

out:
return cpu_to_le16(crc);

--
2.40.1.495.gc816e09b53d-goog

syzbot

unread,

May 7, 2023, 12:19:27 AM5/7/23

to syzkall...@googlegroups.com, ty...@mit.edu

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+8785e4...@syzkaller.appspotmail.com

Tested on:

commit: 0e65baba ext4: fix deadlock when converting an inline ..
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tt/next
console output: https://syzkaller.appspot.com/x/log.txt?x=15f72a6c280000
kernel config: https://syzkaller.appspot.com/x/.config?x=8ded603951470459

dashboard link: https://syzkaller.appspot.com/bug?extid=8785e41224a3afd04321
compiler: Debian clang version 15.0.7, GNU ld (GNU Binutils for Debian) 2.35.2

userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=15f33522280000

Note: testing is done by a robot and is best-effort only.

Reply all

Reply to author

Forward

[syzbot] [ext4?] KASAN: slab-out-of-bounds Read in ext4_group_desc_csum

syzbot

syzbot

Tudor Ambarus

syzbot

Jan Kara

Tudor Ambarus

Tudor Ambarus

Jan Kara

yebin

Jan Kara

yebin (H)

Tudor Ambarus

Dmitry Vyukov

Theodore Ts'o

Jan Kara

Dmitry Vyukov

Dmitry Vyukov

Dmitry Vyukov

Jan Kara

Theodore Ts'o

Theodore Ts'o

syzbot