[syzbot] [ext4?] BUG: unable to handle kernel paging request in do_split

34 views
Skip to first unread message

syzbot

unread,
Jun 29, 2024, 6:05:22 AM6/29/24
to adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
Hello,

syzbot found the following issue on:

HEAD commit: 55027e689933 Merge tag 'input-for-v6.10-rc5' of git://git...
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=100ec271980000
kernel config: https://syzkaller.appspot.com/x/.config?x=67463c0717b8d4ca
dashboard link: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14296bb6980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=10a53e3e980000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/5a4561e75890/disk-55027e68.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/40e478722974/vmlinux-55027e68.xz
kernel image: https://storage.googleapis.com/syzbot-assets/d3bbbd2462f2/bzImage-55027e68.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/451986899a3c/mount_0.gz

Bisection is inconclusive: the first bad commit could be any of:

3b51788a2d5f IB/hfi1: use new function dev_fetch_sw_netstats
44fa32f008ab net: add function dev_fetch_sw_netstats for fetching pcpu_sw_netstats
3618ad2a7c0e virtio-net: ethtool configurable RXCSUM
9d0151673e70 net: macsec: use new function dev_fetch_sw_netstats
c9bf52a173c7 net/af_unix: Remove unused old_pid variable
ec173778e96e net: usb: qmi_wwan: use new function dev_fetch_sw_netstats
0403a2b53c29 net/tls: use semicolons rather than commas to separate statements
ab2b3ff21b9f net: usbnet: use new function dev_fetch_sw_netstats
1f68b2096f65 qtnfmac: use new function dev_fetch_sw_netstats
6159e9633f17 net/ipv6: use semicolons rather than commas to separate statements
44797589c20e tcp: use semicolons rather than commas to separate statements
f3f04f0f3ab9 net: bridge: use new function dev_fetch_sw_netstats
7e38b03f0fe7 net: mscc: ocelot: remove duplicate ocelot_port_dev_check
a0d269810185 net: dsa: use new function dev_fetch_sw_netstats
c93c5482c7d4 Merge branch 'macb-support-the-2-deep-Tx-queue-on-at91'
cf89f18fa407 iptunnel: use new function dev_fetch_sw_netstats
0a4e9ce17ba7 macb: support the two tx descriptors on at91rm9200
6401297e7610 mac80211: use new function dev_fetch_sw_netstats
3569939a811e net: openvswitch: use new function dev_fetch_sw_netstats
73d742281383 macb: prepare at91 to use a 2-frame TX queue
5fc3594d36d1 xfrm: use new function dev_fetch_sw_netstats
fa6031df12fc macb: add RM9200's interrupt flag TBRE
a003ec1f47bc Merge branch 'net-add-and-use-function-dev_fetch_sw_netstats-for-fetching-pcpu_sw_netstats'
ccdf7fae3afa Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
30cf856a691f i40e: Allow changing FEC settings on X722 if supported by FW
a308283fdbf7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
793d5d612426 netfilter: flowtable: reduce calls to pskb_may_pull()
f2bf814a27c5 e1000: remove unused and incorrect code
d3519cb89f6d netfilter: nf_tables: add inet ingress support
d5e6f064ac66 Merge branch '40GbE-Intel-Wired-LAN-Driver-Updates-2020-10-12'
50172733d01c Merge tag 'mlx5-updates-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
60a3815da702 netfilter: add inet ingress support
d25e2e9388ed netfilter: restore NF_INET_NUMHOOKS

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=14154fda980000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+ae688d...@syzkaller.appspotmail.com

EXT4-fs error (device loop0): ext4_orphan_get:1399: comm syz-executor306: couldn't read orphan inode 15 (err -117)
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none.
BUG: unable to handle page fault for address: ffffed11022e24fe
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 23ffee067 P4D 23ffee067 PUD 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 PID: 5079 Comm: syz-executor306 Not tainted 6.10.0-rc5-syzkaller-00018-g55027e689933 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
RIP: 0010:do_split+0x150b/0x2490 fs/ext4/namei.c:2046
Code: 89 f8 48 c1 e8 03 0f b6 04 10 84 c0 48 89 74 24 18 0f 85 f5 0c 00 00 46 8b 3c f6 41 8d 46 ff 48 8d 1c c6 48 89 d8 48 c1 e8 03 <0f> b6 04 10 84 c0 4c 8b a4 24 98 00 00 00 0f 85 f7 0c 00 00 8b 1b
RSP: 0018:ffffc9000327f060 EFLAGS: 00010a02
RAX: 1ffff111022e24fe RBX: ffff8888117127f0 RCX: ffff8880237e1e00
RDX: dffffc0000000000 RSI: ffff8880117127f8 RDI: ffff8880117127f8
RBP: ffffc9000327f250 R08: ffffffff825fc2ad R09: ffffffff82541cf8
R10: 0000000000000007 R11: ffffffff825435f0 R12: 0000000000000000
R13: 0000000000000400 R14: 0000000000000000 R15: 000000002b74e18c
FS: 00005555787b4380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffed11022e24fe CR3: 0000000043598000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
make_indexed_dir+0xdaf/0x13c0 fs/ext4/namei.c:2341
ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2451
ext4_rename fs/ext4/namei.c:3936 [inline]
ext4_rename2+0x26e5/0x4370 fs/ext4/namei.c:4214
vfs_rename+0xbdb/0xf00 fs/namei.c:4887
do_renameat2+0xd94/0x13f0 fs/namei.c:5044
__do_sys_rename fs/namei.c:5091 [inline]
__se_sys_rename fs/namei.c:5089 [inline]
__x64_sys_rename+0x86/0xa0 fs/namei.c:5089
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fe5c7dcdb59
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcd201c228 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
RAX: ffffffffffffffda RBX: 00007fe5c7e16568 RCX: 00007fe5c7dcdb59
RDX: 0000000000000000 RSI: 0000000020000f40 RDI: 00000000200003c0
RBP: 00007fe5c7e16668 R08: 00005555787b54c0 R09: 00005555787b54c0
R10: 00005555787b54c0 R11: 0000000000000246 R12: 00007ffcd201c250
R13: 00007ffcd201c478 R14: 431bde82d7b634db R15: 00007fe5c7e1603b
</TASK>
Modules linked in:
CR2: ffffed11022e24fe
---[ end trace 0000000000000000 ]---
RIP: 0010:do_split+0x150b/0x2490 fs/ext4/namei.c:2046
Code: 89 f8 48 c1 e8 03 0f b6 04 10 84 c0 48 89 74 24 18 0f 85 f5 0c 00 00 46 8b 3c f6 41 8d 46 ff 48 8d 1c c6 48 89 d8 48 c1 e8 03 <0f> b6 04 10 84 c0 4c 8b a4 24 98 00 00 00 0f 85 f7 0c 00 00 8b 1b
RSP: 0018:ffffc9000327f060 EFLAGS: 00010a02
RAX: 1ffff111022e24fe RBX: ffff8888117127f0 RCX: ffff8880237e1e00
RDX: dffffc0000000000 RSI: ffff8880117127f8 RDI: ffff8880117127f8
RBP: ffffc9000327f250 R08: ffffffff825fc2ad R09: ffffffff82541cf8
R10: 0000000000000007 R11: ffffffff825435f0 R12: 0000000000000000
R13: 0000000000000400 R14: 0000000000000000 R15: 000000002b74e18c
FS: 00005555787b4380(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffed11022e24fe CR3: 0000000043598000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 89 f8 mov %edi,%eax
2: 48 c1 e8 03 shr $0x3,%rax
6: 0f b6 04 10 movzbl (%rax,%rdx,1),%eax
a: 84 c0 test %al,%al
c: 48 89 74 24 18 mov %rsi,0x18(%rsp)
11: 0f 85 f5 0c 00 00 jne 0xd0c
17: 46 8b 3c f6 mov (%rsi,%r14,8),%r15d
1b: 41 8d 46 ff lea -0x1(%r14),%eax
1f: 48 8d 1c c6 lea (%rsi,%rax,8),%rbx
23: 48 89 d8 mov %rbx,%rax
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 0f b6 04 10 movzbl (%rax,%rdx,1),%eax <-- trapping instruction
2e: 84 c0 test %al,%al
30: 4c 8b a4 24 98 00 00 mov 0x98(%rsp),%r12
37: 00
38: 0f 85 f7 0c 00 00 jne 0xd35
3e: 8b 1b mov (%rbx),%ebx


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Edward Adam Davis

unread,
Jun 30, 2024, 5:09:53 AM6/30/24
to syzbot+ae688d...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
if split is too small, such as 0, use it to calculate continued will out of bound map

#syz test: upstream 55027e689933

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a630b27a4cc6..0a111274dc4a 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2043,7 +2043,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
split = count/2;

hash2 = map[split].hash;
- continued = hash2 == map[split - 1].hash;
+ continued = split > 0 ? hash2 == map[split - 1].hash : 0;
dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
(unsigned long)dx_get_block(frame->at),
hash2, split, count-split));

syzbot

unread,
Jun 30, 2024, 5:33:04 AM6/30/24
to ead...@qq.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+ae688d...@syzkaller.appspotmail.com

Tested on:

commit: 55027e68 Merge tag 'input-for-v6.10-rc5' of git://git...
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=16bb8761980000
kernel config: https://syzkaller.appspot.com/x/.config?x=67463c0717b8d4ca
dashboard link: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=10754c81980000

Note: testing is done by a robot and is best-effort only.

Edward Adam Davis

unread,
Jul 1, 2024, 10:25:18 AM7/1/24
to syzbot+ae688d...@syzkaller.appspotmail.com, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
When the number of entries mapped is 1, there is no need to split it.

Fixes: ac27a0ec112a ("[PATCH] ext4: initial copy of files from ext3")
Reported-by: syzbot+ae688d...@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0
Signed-off-by: Edward Adam Davis <ead...@qq.com>
---
fs/ext4/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a630b27a4cc6..0a111274dc4a 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2043,7 +2043,7 @@ static struct ext4_dir_entry_2 *do_split(handle_t *handle, struct inode *dir,
split = count/2;

hash2 = map[split].hash;
- continued = hash2 == map[split - 1].hash;
+ continued = split > 0 ? hash2 == map[split - 1].hash : 0;
dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
(unsigned long)dx_get_block(frame->at),
hash2, split, count-split));
--
2.43.0

Baokun Li

unread,
Jul 1, 2024, 10:36:12 AM7/1/24
to syzbot, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, ty...@mit.edu
The immediate cause of this problem is that there is only one valid
dentry for the block to be split during do_split, so split==0 results
in out-of-bounds accesses to the map triggering the problem.

     do_split
       unsigned split
       dx_make_map
        count = 1
       split = count/2 = 0;
       continued = hash2 == map[split - 1].hash;
        ---> map[4294967295]

The root cause is that syzbot constructs a directory that is not inline
but does not have a dirblock, and we don't check for it when we create
files under the folder.

    ext4_mknod
      ext4_add_entry
        // Read block 0
        ext4_read_dirblock(dir, block, DIRENT)
          bh = ext4_bread(NULL, inode, block, 0)
          if (!bh && (type == INDEX || type == DIRENT_HTREE))
          // The first directory block is a hole
          // But type == DIRENT, so no error is reported.

Therefore, reporting error when ext4_read_dirblock() finds the first
directory block is a hole to avoid error spreading leading to
something bad.

Here's the patch in testing, I'll send it out officially after it is
tested.

Regards,
Baokun


From: Baokun Li <liba...@huawei.com>
Date: Mon, 1 Jul 2024 20:23:59 +0800
Subject: [PATCH] ext4: make sure the first directory block is not a hole

Syzbot reports a issue as follows:

============================================
BUG: unable to handle page fault for address: ffffed11022e24fe
PGD 23ffee067 P4D 23ffee067 PUD 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 0 PID: 5079 Comm: syz-executor306 Not tainted
6.10.0-rc5-g55027e689933 #0
Call Trace:
 <TASK>
 make_indexed_dir+0xdaf/0x13c0 fs/ext4/namei.c:2341
 ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2451
 ext4_rename fs/ext4/namei.c:3936 [inline]
 ext4_rename2+0x26e5/0x4370 fs/ext4/namei.c:4214
[...]
============================================

The immediate cause of this problem is that there is only one valid
dentry for
the block to be split during do_split, so split==0 results in out-of-bounds
accesses to the map triggering the problem.

    do_split
      unsigned split
      dx_make_map
       count = 1
      split = count/2 = 0;
      continued = hash2 == map[split - 1].hash;
       ---> map[4294967295]

The root cause is that the syzbot constructs a directory that is not
inline but
does not have a dirblock, and we don't check for it when we create files
under
the folder.

    ext4_mknod
      ext4_add_entry
        // Read block 0
        ext4_read_dirblock(dir, block, DIRENT)
          bh = ext4_bread(NULL, inode, block, 0)
          if (!bh && (type == INDEX || type == DIRENT_HTREE))
          // The first directory block is a hole
          // But type == DIRENT, so no error is reported.

Therefore, report that the filesystem is corrupted when ext4_read_dirblock()
finds the first directory block to be a hole, to avoid spreading the
error to
cause something bad.

Fixes: 4e19d6b65fb4 ("ext4: allow directory holes")
Signed-off-by: Baokun Li <liba...@huawei.com>
---
 fs/ext4/namei.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a630b27a4cc6..facdf0e97a48 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -151,7 +151,8 @@ static struct buffer_head
*__ext4_read_dirblock(struct inode *inode,

         return bh;
     }
-    if (!bh && (type == INDEX || type == DIRENT_HTREE)) {
+    /* The first directory block must not be a hole. */
+    if (!bh && (type == INDEX || type == DIRENT_HTREE || block == 0)) {
         ext4_error_inode(inode, func, line, block,
                  "Directory hole found for htree %s block",
                  (type == INDEX) ? "index" : "leaf");
@@ -3083,10 +3084,7 @@ bool ext4_empty_dir(struct inode *inode)
         EXT4_ERROR_INODE(inode, "invalid size");
         return false;
     }
-    /* The first directory block must not be a hole,
-     * so treat it as DIRENT_HTREE
-     */
-    bh = ext4_read_dirblock(inode, 0, DIRENT_HTREE);
+    bh = ext4_read_dirblock(inode, 0, EITHER)
     if (IS_ERR(bh))
         return false;

@@ -3531,10 +3529,7 @@ static struct buffer_head
*ext4_get_first_dir_block(handle_t *handle,
         struct ext4_dir_entry_2 *de;
         unsigned int offset;

-        /* The first directory block must not be a hole, so
-         * treat it as DIRENT_HTREE
-         */
-        bh = ext4_read_dirblock(inode, 0, DIRENT_HTREE);
+        bh = ext4_read_dirblock(inode, 0, EITHER);
         if (IS_ERR(bh)) {
             *retval = PTR_ERR(bh);
             return NULL;
--
2.31.1

Theodore Ts'o

unread,
Aug 22, 2024, 11:00:31 AM8/22/24
to syzbot+ae688d...@syzkaller.appspotmail.com, Edward Adam Davis, Theodore Ts'o, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com

On Mon, 01 Jul 2024 22:25:03 +0800, Edward Adam Davis wrote:
> When the number of entries mapped is 1, there is no need to split it.
>
>

Applied, thanks!

[1/1] ext4: No need to continue when the number of entries is 1
commit: b2b81e122b5616890ba6657adeb8aa5ca1f05fe2

Best regards,
--
Theodore Ts'o <ty...@mit.edu>

Baokun Li

unread,
Aug 22, 2024, 10:22:25 PM8/22/24
to Theodore Ts'o, Edward Adam Davis, syzbot+ae688d...@syzkaller.appspotmail.com, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 2024/8/22 23:00, Theodore Ts'o wrote:
> On Mon, 01 Jul 2024 22:25:03 +0800, Edward Adam Davis wrote:
>> When the number of entries mapped is 1, there is no need to split it.
>>
>>
> Applied, thanks!
>
> [1/1] ext4: No need to continue when the number of entries is 1
> commit: b2b81e122b5616890ba6657adeb8aa5ca1f05fe2
>
> Best regards,

Hi Ted,

I think this patch is wrong and it will hide the real problem.

The maximum length of a filename is 255 and the minimum block size is 1024,
so it is always guaranteed that the number of entries is greater than or
equal to 2 when do_split() is called.

The problem reported by syzbot was actually caused by a missing check in
make_indexed_dir(). The issue has been fixed:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=50ea741def58

So unless ext4_dx_add_entry() and make_indexed_dir(), or some other function
has a bug, 'split == 0' will not occur.

If we want to defend against future changes that introduce bugs, I think
it's better to add a WARN_ON_ONCE to make sure that the problem isn't hidden
and that it doesn't trigger serious bugs like out-of-bounds access.

continued = WARN_ON_ONCE(split == 0) ? 0 : hash2 == map[split - 1].hash;

--
With Best Regards,
Baokun Li

Theodore Ts'o

unread,
Aug 23, 2024, 12:05:26 PM8/23/24
to Baokun Li, Edward Adam Davis, syzbot+ae688d...@syzkaller.appspotmail.com, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Fri, Aug 23, 2024 at 10:22:19AM +0800, Baokun Li wrote:
>
> I think this patch is wrong and it will hide the real problem.
>
> The maximum length of a filename is 255 and the minimum block size is 1024,
> so it is always guaranteed that the number of entries is greater than or
> equal to 2 when do_split() is called.
>
> The problem reported by syzbot was actually caused by a missing check in
> make_indexed_dir(). The issue has been fixed:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=50ea741def58
>
> So unless ext4_dx_add_entry() and make_indexed_dir(), or some other function
> has a bug, 'split == 0' will not occur.
>
> If we want to defend against future changes that introduce bugs, I think
> it's better to add a WARN_ON_ONCE to make sure that the problem isn't hidden
> and that it doesn't trigger serious bugs like out-of-bounds access.

I agree that given your patch (50ea741def58: "ext4: check dot and
dotdot of dx_root before making dir indexed") split should never be
zero. (Although there are two ways this could happen --- either count
could be 0, or count == max). But this patch isn't wrong per se
because in the case where split == 0, we do want to prevent the
out-of-bounds memory access bug.

That being said; adding a WARN_ON_ONCE(split == 0) might be a good
idea, although I'd probably also print more debugging information so
we can take a look at the file system and understand what might have
happened. Maybe something like this?

if (WARN_ON_ONCE(split == 0)) {
/* should never happen, but... */
ext4_error_inode_block(dir, (*bh)->b_blocknr, 0,
"bad indexed directory? hash=%08x:%08x "
"count=%d move=%u", hinfo->hash, hinfo->minor_hash,
count, move);
brelse(*bh);
brelse(bh2);
*bh = 0;
return ERR_PTR(-EFSCORRUPTED);
}

I haven't checked to make sure all of the error code paths / error
handling right, but something like this might be useful for debugging
purposes --- if the file system developer could get access to the file
system moment the error is logged. If the data center automation
causes the file system to get fsck'ed or reformatted right away (which
is the only scalable thing to do if there are millions of file systems
in production :-), something like this is probably not going to help
all that much. Still, it certainly wouldn't hurt.

If someone does think this would be helpful for them, I wouldn't
object to adding a patch something like this.

Cheers,

- Ted

Baokun Li

unread,
Aug 24, 2024, 12:20:32 AM8/24/24
to Theodore Ts'o, Edward Adam Davis, syzbot+ae688d...@syzkaller.appspotmail.com, adilger...@dilger.ca, linux...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 2024/8/24 0:05, Theodore Ts'o wrote:
> On Fri, Aug 23, 2024 at 10:22:19AM +0800, Baokun Li wrote:
>> I think this patch is wrong and it will hide the real problem.
>>
>> The maximum length of a filename is 255 and the minimum block size is 1024,
>> so it is always guaranteed that the number of entries is greater than or
>> equal to 2 when do_split() is called.
>>
>> The problem reported by syzbot was actually caused by a missing check in
>> make_indexed_dir(). The issue has been fixed:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=50ea741def58
>>
>> So unless ext4_dx_add_entry() and make_indexed_dir(), or some other function
>> has a bug, 'split == 0' will not occur.
>>
>> If we want to defend against future changes that introduce bugs, I think
>> it's better to add a WARN_ON_ONCE to make sure that the problem isn't hidden
>> and that it doesn't trigger serious bugs like out-of-bounds access.
> I agree that given your patch (50ea741def58: "ext4: check dot and
> dotdot of dx_root before making dir indexed") split should never be
> zero. (Although there are two ways this could happen --- either count
> could be 0, or count == max). But this patch isn't wrong per se
> because in the case where split == 0, we do want to prevent the
> out-of-bounds memory access bug.


Agreed, it is correct to avoid serious problems by judging the split,

I was thinking that it is wrong to report no error or hint when split == 0.

> That being said; adding a WARN_ON_ONCE(split == 0) might be a good
> idea, although I'd probably also print more debugging information so
> we can take a look at the file system and understand what might have
> happened. Maybe something like this?
>
> if (WARN_ON_ONCE(split == 0)) {
> /* should never happen, but... */
> ext4_error_inode_block(dir, (*bh)->b_blocknr, 0,
> "bad indexed directory? hash=%08x:%08x "
> "count=%d move=%u", hinfo->hash, hinfo->minor_hash,
> count, move);
> brelse(*bh);
> brelse(bh2);
> *bh = 0;
> return ERR_PTR(-EFSCORRUPTED);
> }
>
> I haven't checked to make sure all of the error code paths / error
> handling right, but something like this might be useful for debugging
> purposes --- if the file system developer could get access to the file
> system moment the error is logged. If the data center automation
> causes the file system to get fsck'ed or reformatted right away (which
> is the only scalable thing to do if there are millions of file systems
> in production :-), something like this is probably not going to help
> all that much. Still, it certainly wouldn't hurt.

Totally agree! These printouts are very useful for debugging.

The modification above looks good. I tested it and it works fine.

But I think we could reuse the error handling code like this:


diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index e6769b97a970..0187910108c4 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1997,6 +1997,15 @@ static struct ext4_dir_entry_2 *do_split(handle_t
*handle, struct inode *dir,
        else
                split = count/2;

+       if (WARN_ON_ONCE(split == 0)) {
+               /* should never happen, but... */
+               ext4_error_inode_block(dir, (*bh)->b_blocknr, 0,
+                               "bad indexed directory? hash=%08x:%08x
count=%d move=%u",
+                               hinfo->hash, hinfo->minor_hash, count,
move);
+               err = -EFSCORRUPTED;
+               goto out;
+       }
+
        hash2 = map[split].hash;
        continued = hash2 == map[split - 1].hash;
        dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n",
@@ -2040,10 +2049,11 @@ static struct ext4_dir_entry_2
*do_split(handle_t *handle, struct inode *dir,
        return de;

 journal_error:
+       ext4_std_error(dir->i_sb, err);
+out:
        brelse(*bh);
        brelse(bh2);
        *bh = NULL;
-       ext4_std_error(dir->i_sb, err);
        return ERR_PTR(err);
 }

>
> If someone does think this would be helpful for them, I wouldn't
> object to adding a patch something like this.
>
> Cheers,
>
> - Ted
>
I think it's very helpful.

Thank you very much for your detailed explanation!


Cheers,
Baokun
Reply all
Reply to author
Forward
0 new messages