[PATCH 00/20] ext4: some bugfixes and cleanups for ext4 extents path

0 views
Skip to first unread message

liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:11:30 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li
From: Baokun Li <liba...@huawei.com>

Hi all!

This patch series is a hardening of ext4 extents path related code.
The following is a brief overview of the patches, see the patches for
more details.

Patch 1-2: Refactor ext4_ext_rm_idx() as suggested by Jan, and add
appropriate error handling branches to ext4_ext_rm_idx() and
ext4_ext_correct_indexes() to avoid inconsistent extents tree.
PS: This comes from the previous work of my colleague zhanchengbin
(see link), who is no longer in charge of these and I have taken over.
Link: https://lore.kernel.org/r/20230213080514.535...@huawei.com/

Patch 3-4: Fix an issue that caused p_bh to be released twice if it wasn't
set to NULL after path->p_bh was released. And add a helper function after
the quick fix to prevent this from happening again.

Patch 5-7: Quick fixes for use-after-free and double-free problems caused
by mixing path(pointer to an extent path) and ppath(pointer to an extent
path pointer).

Patch 8-19: Now the use of path and ppath is so confusing that we can
trigger use-after-free or double-free by accessing a stale pointer, or
we can get a memory leak by forgetting to update ppath. And it's very
difficult to read the code. So to make the code more readable, get rid
of ppath and pass path between functions uniformly to avoid these risks.

Patch 20: Reduces the consumption of unnecessary memory operations by
avoiding repetitive allocation and release paths.

"kvm-xfstests -c ext4/all -g auto" has been executed with no new failures.

Comments and questions are, as always, welcome.
Please let me know what you think.

Thanks,
Baokun

Baokun Li (20):
ext4: refactor ext4_ext_rm_idx() to index 'path'
ext4: prevent partial update of the extents path
ext4: fix double brelse() the buffer of the extents path
ext4: add new ext4_ext_path_brelse() helper
ext4: fix slab-use-after-free in ext4_split_extent_at()
ext4: avoid use-after-free in ext4_ext_show_leaf()
ext4: drop ppath from ext4_ext_replay_update_ex() to avoid double-free
ext4: get rid of ppath in ext4_find_extent()
ext4: get rid of ppath in get_ext_path()
ext4: get rid of ppath in ext4_ext_create_new_leaf()
ext4: get rid of ppath in ext4_ext_insert_extent()
ext4: get rid of ppath in ext4_split_extent_at()
ext4: get rid of ppath in ext4_force_split_extent_at()
ext4: get rid of ppath in ext4_split_extent()
ext4: get rid of ppath in ext4_split_convert_extents()
ext4: get rid of ppath in ext4_convert_unwritten_extents_endio()
ext4: get rid of ppath in ext4_ext_convert_to_initialized()
ext4: get rid of ppath in ext4_ext_handle_unwritten_extents()
ext4: get rid of ppath in convert_initialized_extent()
ext4: avoid unnecessary extent path frees and allocations

fs/ext4/ext4.h | 9 +-
fs/ext4/extents.c | 746 +++++++++++++++++++++++-------------------
fs/ext4/fast_commit.c | 17 +-
fs/ext4/migrate.c | 5 +-
fs/ext4/move_extent.c | 36 +-
5 files changed, 439 insertions(+), 374 deletions(-)

--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:11:32 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li
From: Baokun Li <liba...@huawei.com>

Add ext4_ext_path_brelse() helper function to reduce duplicate code
and ensure that path->p_bh is set to NULL after it is released.

Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 657baf3991c1..6e5b5baf3aa6 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -106,6 +106,12 @@ static int ext4_ext_trunc_restart_fn(struct inode *inode, int *dropped)
return 0;
}

+static inline void ext4_ext_path_brelse(struct ext4_ext_path *path)
+{
+ brelse(path->p_bh);
+ path->p_bh = NULL;
+}
+
static void ext4_ext_drop_refs(struct ext4_ext_path *path)
{
int depth, i;
@@ -113,10 +119,8 @@ static void ext4_ext_drop_refs(struct ext4_ext_path *path)
if (!path)
return;
depth = path->p_depth;
- for (i = 0; i <= depth; i++, path++) {
- brelse(path->p_bh);
- path->p_bh = NULL;
- }
+ for (i = 0; i <= depth; i++, path++)
+ ext4_ext_path_brelse(path);
}

void ext4_free_ext_path(struct ext4_ext_path *path)
@@ -635,8 +639,7 @@ int ext4_ext_precache(struct inode *inode)
*/
if ((i == depth) ||
path[i].p_idx > EXT_LAST_INDEX(path[i].p_hdr)) {
- brelse(path[i].p_bh);
- path[i].p_bh = NULL;
+ ext4_ext_path_brelse(path + i);
i--;
continue;
}
@@ -1887,8 +1890,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle,
(path[1].p_ext - EXT_FIRST_EXTENT(path[1].p_hdr));
path[0].p_hdr->eh_max = cpu_to_le16(max_root);

- brelse(path[1].p_bh);
- path[1].p_bh = NULL;
+ ext4_ext_path_brelse(path + 1);
ext4_free_blocks(handle, inode, NULL, blk, 1,
EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
}
@@ -2956,8 +2958,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
err = ext4_ext_rm_leaf(handle, inode, path,
&partial, start, end);
/* root level has p_bh == NULL, brelse() eats this */
- brelse(path[i].p_bh);
- path[i].p_bh = NULL;
+ ext4_ext_path_brelse(path + i);
i--;
continue;
}
@@ -3019,8 +3020,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
err = ext4_ext_rm_idx(handle, inode, path, i);
}
/* root level has p_bh == NULL, brelse() eats this */
- brelse(path[i].p_bh);
- path[i].p_bh = NULL;
+ ext4_ext_path_brelse(path + i);
i--;
ext_debug(inode, "return to level %d\n", i);
}
--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:11:34 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li, sta...@kernel.org
From: Baokun Li <liba...@huawei.com>

In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been
released, otherwise it may be released twice.

An example of what triggers this is as follows:

split2 map split1
|--------|-------|--------|

ext4_ext_map_blocks
ext4_ext_handle_unwritten_extents
ext4_split_convert_extents
// path->p_depth == 0
ext4_split_extent
// 1. do split1
ext4_split_extent_at
ext4_ext_insert_extent
ext4_ext_create_new_leaf
ext4_ext_grow_indepth
le16_add_cpu(&neh->eh_depth, 1)
ext4_find_extent
path->p_depth = 1
ext4_ext_try_to_merge
ext4_ext_try_to_merge_up
path->p_depth = 0
brelse(path[1].p_bh) ---> not set to NULL here
// 2. update path
ext4_find_extent
// 3. do split2
ext4_split_extent_at
ext4_ext_insert_extent
ext4_ext_create_new_leaf
ext4_ext_grow_indepth
le16_add_cpu(&neh->eh_depth, 1)
ext4_find_extent
path[0].p_bh = NULL;
path->p_depth = 1
read_extent_tree_block ---> return err
// path[1].p_bh is still the old value
ext4_free_ext_path
ext4_ext_drop_refs
// path->p_depth == 1
brelse(path[1].p_bh) ---> brelse a buffer twice

Finally got the following WARRNING when removing the buffer from lru:

============================================
VFS: brelse: Trying to free free buffer
WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90
CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716
RIP: 0010:__brelse+0x58/0x90
Call Trace:
<TASK>
__find_get_block+0x6e7/0x810
bdev_getblk+0x2b/0x480
__ext4_get_inode_loc+0x48a/0x1240
ext4_get_inode_loc+0xb2/0x150
ext4_reserve_inode_write+0xb7/0x230
__ext4_mark_inode_dirty+0x144/0x6a0
ext4_ext_insert_extent+0x9c8/0x3230
ext4_ext_map_blocks+0xf45/0x2dc0
ext4_map_blocks+0x724/0x1700
ext4_do_writepages+0x12d6/0x2a70
[...]
============================================

Fixes: ecb94f5fdf4b ("ext4: collapse a single extent tree block into the inode if possible")
Cc: sta...@kernel.org
Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 4d589d34b30e..657baf3991c1 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1888,6 +1888,7 @@ static void ext4_ext_try_to_merge_up(handle_t *handle,
path[0].p_hdr->eh_max = cpu_to_le16(max_root);

brelse(path[1].p_bh);
+ path[1].p_bh = NULL;
ext4_free_blocks(handle, inode, NULL, blk, 1,
EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
}
--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:11:42 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li, sta...@kernel.org
From: Baokun Li <liba...@huawei.com>

We hit the following use-after-free:

==================================================================
BUG: KASAN: slab-use-after-free in ext4_split_extent_at+0xba8/0xcc0
Read of size 2 at addr ffff88810548ed08 by task kworker/u20:0/40
CPU: 0 PID: 40 Comm: kworker/u20:0 Not tainted 6.9.0-dirty #724
Call Trace:
<TASK>
kasan_report+0x93/0xc0
ext4_split_extent_at+0xba8/0xcc0
ext4_split_extent.isra.0+0x18f/0x500
ext4_split_convert_extents+0x275/0x750
ext4_ext_handle_unwritten_extents+0x73e/0x1580
ext4_ext_map_blocks+0xe20/0x2dc0
ext4_map_blocks+0x724/0x1700
ext4_do_writepages+0x12d6/0x2a70
[...]

Allocated by task 40:
__kmalloc_noprof+0x1ac/0x480
ext4_find_extent+0xf3b/0x1e70
ext4_ext_map_blocks+0x188/0x2dc0
ext4_map_blocks+0x724/0x1700
ext4_do_writepages+0x12d6/0x2a70
[...]

Freed by task 40:
kfree+0xf1/0x2b0
ext4_find_extent+0xa71/0x1e70
ext4_ext_insert_extent+0xa22/0x3260
ext4_split_extent_at+0x3ef/0xcc0
ext4_split_extent.isra.0+0x18f/0x500
ext4_split_convert_extents+0x275/0x750
ext4_ext_handle_unwritten_extents+0x73e/0x1580
ext4_ext_map_blocks+0xe20/0x2dc0
ext4_map_blocks+0x724/0x1700
ext4_do_writepages+0x12d6/0x2a70
[...]
==================================================================

The flow of issue triggering is as follows:

ext4_split_extent_at
path = *ppath
ext4_ext_insert_extent(ppath)
ext4_ext_create_new_leaf(ppath)
ext4_find_extent(orig_path)
path = *orig_path
read_extent_tree_block
// return -ENOMEM or -EIO
ext4_free_ext_path(path)
kfree(path)
*orig_path = NULL
a. If err is -ENOMEM:
ext4_ext_dirty(path + path->p_depth)
// path use-after-free !!!
b. If err is -EIO and we have EXT_DEBUG defined:
ext4_ext_show_leaf(path)
eh = path[depth].p_hdr
// path also use-after-free !!!

So when trying to zeroout or fix the extent length, call ext4_find_extent()
to update the path.

In addition we use *ppath directly as an ext4_ext_show_leaf() input to
avoid possible use-after-free when EXT_DEBUG is defined, and to avoid
unnecessary path updates.

Fixes: dfe5080939ea ("ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code")
Cc: sta...@kernel.org
Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 6e5b5baf3aa6..3a70a0739af8 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3252,6 +3252,25 @@ static int ext4_split_extent_at(handle_t *handle,
if (err != -ENOSPC && err != -EDQUOT && err != -ENOMEM)
goto out;

+ /*
+ * Update path is required because previous ext4_ext_insert_extent()
+ * may have freed or reallocated the path. Using EXT4_EX_NOFAIL
+ * guarantees that ext4_find_extent() will not return -ENOMEM,
+ * otherwise -ENOMEM will cause a retry in do_writepages(), and a
+ * WARN_ON may be triggered in ext4_da_update_reserve_space() due to
+ * an incorrect ee_len causing the i_reserved_data_blocks exception.
+ */
+ path = ext4_find_extent(inode, ee_block, ppath,
+ flags | EXT4_EX_NOFAIL);
+ if (IS_ERR(path)) {
+ EXT4_ERROR_INODE(inode, "Failed split extent on %u, err %ld",
+ split, PTR_ERR(path));
+ return PTR_ERR(path);
+ }
+ depth = ext_depth(inode);
+ ex = path[depth].p_ext;
+ *ppath = path;
+
if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
if (split_flag & EXT4_EXT_DATA_VALID1) {
@@ -3304,7 +3323,7 @@ static int ext4_split_extent_at(handle_t *handle,
ext4_ext_dirty(handle, inode, path + path->p_depth);
return err;
out:
- ext4_ext_show_leaf(inode, path);
+ ext4_ext_show_leaf(inode, *ppath);
return err;
}

--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:13:57 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li
From: Baokun Li <liba...@huawei.com>

The ext4_find_extent() can update the extent path so that it does not have
to allocate and free the path repeatedly, thus reducing the consumption of
memory allocation and freeing in the following functions:

ext4_ext_clear_bb
ext4_ext_replay_set_iblocks
ext4_fc_replay_add_range
ext4_fc_set_bitmaps_and_counters

No functional changes. Note that ext4_find_extent() does not support error
pointers, so in this case set path to NULL first.

Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 51 +++++++++++++++++++------------------------
fs/ext4/fast_commit.c | 11 ++++++----
2 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 737432bb316e..5ff92cd50dc8 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -6068,12 +6068,9 @@ int ext4_ext_replay_set_iblocks(struct inode *inode)
if (IS_ERR(path))
return PTR_ERR(path);
ex = path[path->p_depth].p_ext;
- if (!ex) {
- ext4_free_ext_path(path);
+ if (!ex)
goto out;
- }
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
- ext4_free_ext_path(path);

/* Count the number of data blocks */
cur = 0;
@@ -6099,32 +6096,28 @@ int ext4_ext_replay_set_iblocks(struct inode *inode)
ret = skip_hole(inode, &cur);
if (ret < 0)
goto out;
- path = ext4_find_extent(inode, cur, NULL, 0);
+ path = ext4_find_extent(inode, cur, path, 0);
if (IS_ERR(path))
goto out;
numblks += path->p_depth;
- ext4_free_ext_path(path);
while (cur < end) {
- path = ext4_find_extent(inode, cur, NULL, 0);
+ path = ext4_find_extent(inode, cur, path, 0);
if (IS_ERR(path))
break;
ex = path[path->p_depth].p_ext;
- if (!ex) {
- ext4_free_ext_path(path);
- return 0;
- }
+ if (!ex)
+ goto cleanup;
+
cur = max(cur + 1, le32_to_cpu(ex->ee_block) +
ext4_ext_get_actual_len(ex));
ret = skip_hole(inode, &cur);
- if (ret < 0) {
- ext4_free_ext_path(path);
+ if (ret < 0)
break;
- }
- path2 = ext4_find_extent(inode, cur, NULL, 0);
- if (IS_ERR(path2)) {
- ext4_free_ext_path(path);
+
+ path2 = ext4_find_extent(inode, cur, path2, 0);
+ if (IS_ERR(path2))
break;
- }
+
for (i = 0; i <= max(path->p_depth, path2->p_depth); i++) {
cmp1 = cmp2 = 0;
if (i <= path->p_depth)
@@ -6136,13 +6129,14 @@ int ext4_ext_replay_set_iblocks(struct inode *inode)
if (cmp1 != cmp2 && cmp2 != 0)
numblks++;
}
- ext4_free_ext_path(path);
- ext4_free_ext_path(path2);
}

out:
inode->i_blocks = numblks << (inode->i_sb->s_blocksize_bits - 9);
ext4_mark_inode_dirty(NULL, inode);
+cleanup:
+ ext4_free_ext_path(path);
+ ext4_free_ext_path(path2);
return 0;
}

@@ -6163,12 +6157,9 @@ int ext4_ext_clear_bb(struct inode *inode)
if (IS_ERR(path))
return PTR_ERR(path);
ex = path[path->p_depth].p_ext;
- if (!ex) {
- ext4_free_ext_path(path);
- return 0;
- }
+ if (!ex)
+ goto out;
end = le32_to_cpu(ex->ee_block) + ext4_ext_get_actual_len(ex);
- ext4_free_ext_path(path);

cur = 0;
while (cur < end) {
@@ -6178,16 +6169,16 @@ int ext4_ext_clear_bb(struct inode *inode)
if (ret < 0)
break;
if (ret > 0) {
- path = ext4_find_extent(inode, map.m_lblk, NULL, 0);
- if (!IS_ERR_OR_NULL(path)) {
+ path = ext4_find_extent(inode, map.m_lblk, path, 0);
+ if (!IS_ERR(path)) {
for (j = 0; j < path->p_depth; j++) {
-
ext4_mb_mark_bb(inode->i_sb,
path[j].p_block, 1, false);
ext4_fc_record_regions(inode->i_sb, inode->i_ino,
0, path[j].p_block, 1, 1);
}
- ext4_free_ext_path(path);
+ } else {
+ path = NULL;
}
ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, false);
ext4_fc_record_regions(inode->i_sb, inode->i_ino,
@@ -6196,5 +6187,7 @@ int ext4_ext_clear_bb(struct inode *inode)
cur = cur + map.m_len;
}

+out:
+ ext4_free_ext_path(path);
return 0;
}
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 1dee40477727..1426d595fab7 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -1766,7 +1766,7 @@ static int ext4_fc_replay_add_range(struct super_block *sb,

if (ret == 0) {
/* Range is not mapped */
- path = ext4_find_extent(inode, cur, NULL, 0);
+ path = ext4_find_extent(inode, cur, path, 0);
if (IS_ERR(path))
goto out;
memset(&newex, 0, sizeof(newex));
@@ -1782,7 +1782,6 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
up_write((&EXT4_I(inode)->i_data_sem));
if (IS_ERR(path))
goto out;
- ext4_free_ext_path(path);
goto next;
}

@@ -1830,6 +1829,7 @@ static int ext4_fc_replay_add_range(struct super_block *sb,
ext4_ext_replay_shrink_inode(inode, i_size_read(inode) >>
sb->s_blocksize_bits);
out:
+ ext4_free_ext_path(path);
iput(inode);
return 0;
}
@@ -1930,12 +1930,13 @@ static void ext4_fc_set_bitmaps_and_counters(struct super_block *sb)
break;

if (ret > 0) {
- path = ext4_find_extent(inode, map.m_lblk, NULL, 0);
+ path = ext4_find_extent(inode, map.m_lblk, path, 0);
if (!IS_ERR(path)) {
for (j = 0; j < path->p_depth; j++)
ext4_mb_mark_bb(inode->i_sb,
path[j].p_block, 1, true);
- ext4_free_ext_path(path);
+ } else {
+ path = NULL;
}
cur += ret;
ext4_mb_mark_bb(inode->i_sb, map.m_pblk,
@@ -1946,6 +1947,8 @@ static void ext4_fc_set_bitmaps_and_counters(struct super_block *sb)
}
iput(inode);
}
+
+ ext4_free_ext_path(path);
}

/*
--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:14:06 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li
From: Baokun Li <liba...@huawei.com>

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_split_extent(), the following is done here:

* The 'allocated' is changed from passing a value to passing an address.
* Its caller needs to update ppath if it uses ppath.

No functional changes.

Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 97 ++++++++++++++++++++++++-----------------------
1 file changed, 50 insertions(+), 47 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0bd068ed324f..2a4f6c89858c 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3345,21 +3345,18 @@ static struct ext4_ext_path *ext4_split_extent_at(handle_t *handle,
* c> Splits in three extents: Somone is splitting in middle of the extent
*
*/
-static int ext4_split_extent(handle_t *handle,
- struct inode *inode,
- struct ext4_ext_path **ppath,
- struct ext4_map_blocks *map,
- int split_flag,
- int flags)
+static struct ext4_ext_path *ext4_split_extent(handle_t *handle,
+ struct inode *inode,
+ struct ext4_ext_path *path,
+ struct ext4_map_blocks *map,
+ int split_flag, int flags,
+ unsigned int *allocated)
{
- struct ext4_ext_path *path = *ppath;
ext4_lblk_t ee_block;
struct ext4_extent *ex;
unsigned int ee_len, depth;
- int err = 0;
int unwritten;
int split_flag1, flags1;
- int allocated = map->m_len;

depth = ext_depth(inode);
ex = path[depth].p_ext;
@@ -3377,33 +3374,25 @@ static int ext4_split_extent(handle_t *handle,
split_flag1 |= EXT4_EXT_DATA_VALID1;
path = ext4_split_extent_at(handle, inode, path,
map->m_lblk + map->m_len, split_flag1, flags1);
- if (IS_ERR(path)) {
- err = PTR_ERR(path);
- *ppath = NULL;
- goto out;
+ if (IS_ERR(path))
+ return path;
+ /*
+ * Update path is required because previous ext4_split_extent_at
+ * may result in split of original leaf or extent zeroout.
+ */
+ path = ext4_find_extent(inode, map->m_lblk, path, flags);
+ if (IS_ERR(path))
+ return path;
+ depth = ext_depth(inode);
+ ex = path[depth].p_ext;
+ if (!ex) {
+ EXT4_ERROR_INODE(inode, "unexpected hole at %lu",
+ (unsigned long) map->m_lblk);
+ ext4_free_ext_path(path);
+ return ERR_PTR(-EFSCORRUPTED);
}
- *ppath = path;
- } else {
- allocated = ee_len - (map->m_lblk - ee_block);
- }
- /*
- * Update path is required because previous ext4_split_extent_at() may
- * result in split of original leaf or extent zeroout.
- */
- path = ext4_find_extent(inode, map->m_lblk, path, flags);
- if (IS_ERR(path)) {
- *ppath = NULL;
- return PTR_ERR(path);
- }
- *ppath = path;
- depth = ext_depth(inode);
- ex = path[depth].p_ext;
- if (!ex) {
- EXT4_ERROR_INODE(inode, "unexpected hole at %lu",
- (unsigned long) map->m_lblk);
- return -EFSCORRUPTED;
+ unwritten = ext4_ext_is_unwritten(ex);
}
- unwritten = ext4_ext_is_unwritten(ex);

if (map->m_lblk >= ee_block) {
split_flag1 = split_flag & EXT4_EXT_DATA_VALID2;
@@ -3414,17 +3403,18 @@ static int ext4_split_extent(handle_t *handle,
}
path = ext4_split_extent_at(handle, inode, path,
map->m_lblk, split_flag1, flags);
- if (IS_ERR(path)) {
- err = PTR_ERR(path);
- *ppath = NULL;
- goto out;
- }
- *ppath = path;
+ if (IS_ERR(path))
+ return path;
}

+ if (allocated) {
+ if (map->m_lblk + map->m_len > ee_block + ee_len)
+ *allocated = ee_len - (map->m_lblk - ee_block);
+ else
+ *allocated = map->m_len;
+ }
ext4_ext_show_leaf(inode, path);
-out:
- return err ? err : allocated;
+ return path;
}

/*
@@ -3669,10 +3659,15 @@ static int ext4_ext_convert_to_initialized(handle_t *handle,
}

fallback:
- err = ext4_split_extent(handle, inode, ppath, &split_map, split_flag,
- flags);
- if (err > 0)
- err = 0;
+ path = ext4_split_extent(handle, inode, path, &split_map, split_flag,
+ flags, NULL);
+ if (IS_ERR(path)) {
+ err = PTR_ERR(path);
+ *ppath = NULL;
+ goto out;
+ }
+ err = 0;
+ *ppath = path;
out:
/* If we have gotten a failure, don't zero out status tree */
if (!err) {
@@ -3718,6 +3713,7 @@ static int ext4_split_convert_extents(handle_t *handle,
struct ext4_extent *ex;
unsigned int ee_len;
int split_flag = 0, depth;
+ unsigned int allocated = 0;

ext_debug(inode, "logical block %llu, max_blocks %u\n",
(unsigned long long)map->m_lblk, map->m_len);
@@ -3745,7 +3741,14 @@ static int ext4_split_convert_extents(handle_t *handle,
split_flag |= (EXT4_EXT_MARK_UNWRIT2 | EXT4_EXT_DATA_VALID2);
}
flags |= EXT4_GET_BLOCKS_PRE_IO;
- return ext4_split_extent(handle, inode, ppath, map, split_flag, flags);
+ path = ext4_split_extent(handle, inode, path, map, split_flag, flags,
+ &allocated);
+ if (IS_ERR(path)) {
+ *ppath = NULL;
+ return PTR_ERR(path);
+ }
+ *ppath = path;
+ return allocated;
}

static int ext4_convert_unwritten_extents_endio(handle_t *handle,
--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:18:34 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li
From: Baokun Li <liba...@huawei.com>

In ext4_find_extent(), path may be freed by error or be reallocated, so
using a previously saved *ppath may have been freed and thus may trigger
use-after-free, as follows:

ext4_split_extent
path = *ppath;
ext4_split_extent_at(ppath)
path = ext4_find_extent(ppath)
ext4_split_extent_at(ppath)
// ext4_find_extent fails to free path
// but zeroout succeeds
ext4_ext_show_leaf(inode, path)
eh = path[depth].p_hdr
// path use-after-free !!!

Similar to ext4_split_extent_at(), we use *ppath directly as an input to
ext4_ext_show_leaf(). Fix a spelling error by the way.

Same problem in ext4_ext_handle_unwritten_extents(). Since 'path' is only
used in ext4_ext_show_leaf(), remove 'path' and use *ppath directly.

This issue is triggered only when EXT_DEBUG is defined and therefore does
not affect functionality.

Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 3a70a0739af8..1660434fbfc7 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3328,7 +3328,7 @@ static int ext4_split_extent_at(handle_t *handle,
}

/*
- * ext4_split_extents() splits an extent and mark extent which is covered
+ * ext4_split_extent() splits an extent and mark extent which is covered
* by @map as split_flags indicates
*
* It may result in splitting the extent into multiple extents (up to three)
@@ -3404,7 +3404,7 @@ static int ext4_split_extent(handle_t *handle,
goto out;
}

- ext4_ext_show_leaf(inode, path);
+ ext4_ext_show_leaf(inode, *ppath);
out:
return err ? err : allocated;
}
@@ -3869,14 +3869,13 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
struct ext4_ext_path **ppath, int flags,
unsigned int allocated, ext4_fsblk_t newblock)
{
- struct ext4_ext_path __maybe_unused *path = *ppath;
int ret = 0;
int err = 0;

ext_debug(inode, "logical block %llu, max_blocks %u, flags 0x%x, allocated %u\n",
(unsigned long long)map->m_lblk, map->m_len, flags,
allocated);
- ext4_ext_show_leaf(inode, path);
+ ext4_ext_show_leaf(inode, *ppath);

/*
* When writing into unwritten space, we should not fail to
@@ -3973,7 +3972,7 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
if (allocated > map->m_len)
allocated = map->m_len;
map->m_len = allocated;
- ext4_ext_show_leaf(inode, path);
+ ext4_ext_show_leaf(inode, *ppath);
out2:
return err ? err : allocated;
}
--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:18:46 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li
From: Baokun Li <liba...@huawei.com>

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

Getting rid of ppath in ext4_find_extent() requires its caller to update
ppath. These ppaths will also be dropped later. No functional changes.

Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/ext4.h | 2 +-
fs/ext4/extents.c | 52 +++++++++++++++++++++++--------------------
fs/ext4/move_extent.c | 6 ++---
3 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8007abd4972d..cbe8d6062c52 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3714,7 +3714,7 @@ extern int ext4_ext_insert_extent(handle_t *, struct inode *,
struct ext4_ext_path **,
struct ext4_extent *, int);
extern struct ext4_ext_path *ext4_find_extent(struct inode *, ext4_lblk_t,
- struct ext4_ext_path **,
+ struct ext4_ext_path *,
int flags);
extern void ext4_free_ext_path(struct ext4_ext_path *);
extern int ext4_ext_check_inode(struct inode *inode);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index b1cfce5b57d2..5217e6f53467 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -884,11 +884,10 @@ void ext4_ext_tree_init(handle_t *handle, struct inode *inode)

struct ext4_ext_path *
ext4_find_extent(struct inode *inode, ext4_lblk_t block,
- struct ext4_ext_path **orig_path, int flags)
+ struct ext4_ext_path *path, int flags)
{
struct ext4_extent_header *eh;
struct buffer_head *bh;
- struct ext4_ext_path *path = orig_path ? *orig_path : NULL;
short int depth, i, ppos = 0;
int ret;
gfp_t gfp_flags = GFP_NOFS;
@@ -909,7 +908,7 @@ ext4_find_extent(struct inode *inode, ext4_lblk_t block,
ext4_ext_drop_refs(path);
if (depth > path[0].p_maxdepth) {
kfree(path);
- *orig_path = path = NULL;
+ path = NULL;
}
}
if (!path) {
@@ -964,8 +963,6 @@ ext4_find_extent(struct inode *inode, ext4_lblk_t block,

err:
ext4_free_ext_path(path);
- if (orig_path)
- *orig_path = NULL;
return ERR_PTR(ret);
}

@@ -1430,7 +1427,7 @@ static int ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode,
/* refill path */
path = ext4_find_extent(inode,
(ext4_lblk_t)le32_to_cpu(newext->ee_block),
- ppath, gb_flags);
+ path, gb_flags);
if (IS_ERR(path))
err = PTR_ERR(path);
} else {
@@ -1442,7 +1439,7 @@ static int ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode,
/* refill path */
path = ext4_find_extent(inode,
(ext4_lblk_t)le32_to_cpu(newext->ee_block),
- ppath, gb_flags);
+ path, gb_flags);
if (IS_ERR(path)) {
err = PTR_ERR(path);
goto out;
@@ -1458,8 +1455,8 @@ static int ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode,
goto repeat;
}
}
-
out:
+ *ppath = IS_ERR(path) ? NULL : path;
return err;
}

@@ -3260,11 +3257,12 @@ static int ext4_split_extent_at(handle_t *handle,
* WARN_ON may be triggered in ext4_da_update_reserve_space() due to
* an incorrect ee_len causing the i_reserved_data_blocks exception.
*/
- path = ext4_find_extent(inode, ee_block, ppath,
+ path = ext4_find_extent(inode, ee_block, *ppath,
flags | EXT4_EX_NOFAIL);
if (IS_ERR(path)) {
EXT4_ERROR_INODE(inode, "Failed split extent on %u, err %ld",
split, PTR_ERR(path));
+ *ppath = NULL;
return PTR_ERR(path);
}
depth = ext_depth(inode);
@@ -3379,9 +3377,12 @@ static int ext4_split_extent(handle_t *handle,
* Update path is required because previous ext4_split_extent_at() may
* result in split of original leaf or extent zeroout.
*/
- path = ext4_find_extent(inode, map->m_lblk, ppath, flags);
- if (IS_ERR(path))
+ path = ext4_find_extent(inode, map->m_lblk, *ppath, flags);
+ if (IS_ERR(path)) {
+ *ppath = NULL;
return PTR_ERR(path);
+ }
+ *ppath = path;
depth = ext_depth(inode);
ex = path[depth].p_ext;
if (!ex) {
@@ -3767,9 +3768,12 @@ static int ext4_convert_unwritten_extents_endio(handle_t *handle,
EXT4_GET_BLOCKS_CONVERT);
if (err < 0)
return err;
- path = ext4_find_extent(inode, map->m_lblk, ppath, 0);
- if (IS_ERR(path))
+ path = ext4_find_extent(inode, map->m_lblk, *ppath, 0);
+ if (IS_ERR(path)) {
+ *ppath = NULL;
return PTR_ERR(path);
+ }
+ *ppath = path;
depth = ext_depth(inode);
ex = path[depth].p_ext;
}
@@ -3825,9 +3829,12 @@ convert_initialized_extent(handle_t *handle, struct inode *inode,
EXT4_GET_BLOCKS_CONVERT_UNWRITTEN);
if (err < 0)
return err;
- path = ext4_find_extent(inode, map->m_lblk, ppath, 0);
- if (IS_ERR(path))
+ path = ext4_find_extent(inode, map->m_lblk, *ppath, 0);
+ if (IS_ERR(path)) {
+ *ppath = NULL;
return PTR_ERR(path);
+ }
+ *ppath = path;
depth = ext_depth(inode);
ex = path[depth].p_ext;
if (!ex) {
@@ -5224,7 +5231,7 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
* won't be shifted beyond EXT_MAX_BLOCKS.
*/
if (SHIFT == SHIFT_LEFT) {
- path = ext4_find_extent(inode, start - 1, &path,
+ path = ext4_find_extent(inode, start - 1, path,
EXT4_EX_NOCACHE);
if (IS_ERR(path))
return PTR_ERR(path);
@@ -5273,7 +5280,7 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
* becomes NULL to indicate the end of the loop.
*/
while (iterator && start <= stop) {
- path = ext4_find_extent(inode, *iterator, &path,
+ path = ext4_find_extent(inode, *iterator, path,
EXT4_EX_NOCACHE);
if (IS_ERR(path))
return PTR_ERR(path);
@@ -5854,11 +5861,8 @@ int ext4_clu_mapped(struct inode *inode, ext4_lblk_t lclu)

/* search for the extent closest to the first block in the cluster */
path = ext4_find_extent(inode, EXT4_C2B(sbi, lclu), NULL, 0);
- if (IS_ERR(path)) {
- err = PTR_ERR(path);
- path = NULL;
- goto out;
- }
+ if (IS_ERR(path))
+ return PTR_ERR(path);

depth = ext_depth(inode);

@@ -5942,7 +5946,7 @@ int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
if (ret)
goto out;

- path = ext4_find_extent(inode, start, &path, 0);
+ path = ext4_find_extent(inode, start, path, 0);
if (IS_ERR(path))
return PTR_ERR(path);
ex = path[path->p_depth].p_ext;
@@ -5956,7 +5960,7 @@ int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
if (ret)
goto out;

- path = ext4_find_extent(inode, start, &path, 0);
+ path = ext4_find_extent(inode, start, path, 0);
if (IS_ERR(path))
return PTR_ERR(path);
ex = path[path->p_depth].p_ext;
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 204f53b23622..b0ab19a913bf 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -26,14 +26,14 @@ static inline int
get_ext_path(struct inode *inode, ext4_lblk_t lblock,
struct ext4_ext_path **ppath)
{
- struct ext4_ext_path *path;
+ struct ext4_ext_path *path = *ppath;

- path = ext4_find_extent(inode, lblock, ppath, EXT4_EX_NOCACHE);
+ *ppath = NULL;
+ path = ext4_find_extent(inode, lblock, path, EXT4_EX_NOCACHE);
if (IS_ERR(path))
return PTR_ERR(path);
if (path[ext_depth(inode)].p_ext == NULL) {
ext4_free_ext_path(path);
- *ppath = NULL;
return -ENODATA;
}
*ppath = path;
--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:19:57 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li
From: Baokun Li <liba...@huawei.com>

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_ext_create_new_leaf(), the following is
done here:

* Free the extents path when an error is encountered.
* Its caller needs to update ppath if it uses ppath.

No functional changes.

Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 41 +++++++++++++++++++++--------------------
1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 6dfb5d03e197..0d6ce9e74b01 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1397,13 +1397,12 @@ static int ext4_ext_grow_indepth(handle_t *handle, struct inode *inode,
* finds empty index and adds new leaf.
* if no free index is found, then it requests in-depth growing.
*/
-static int ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode,
- unsigned int mb_flags,
- unsigned int gb_flags,
- struct ext4_ext_path **ppath,
- struct ext4_extent *newext)
+static struct ext4_ext_path *
+ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode,
+ unsigned int mb_flags, unsigned int gb_flags,
+ struct ext4_ext_path *path,
+ struct ext4_extent *newext)
{
- struct ext4_ext_path *path = *ppath;
struct ext4_ext_path *curp;
int depth, i, err = 0;

@@ -1424,28 +1423,24 @@ static int ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode,
* entry: create all needed subtree and add new leaf */
err = ext4_ext_split(handle, inode, mb_flags, path, newext, i);
if (err)
- goto out;
+ goto errout;

/* refill path */
path = ext4_find_extent(inode,
(ext4_lblk_t)le32_to_cpu(newext->ee_block),
path, gb_flags);
- if (IS_ERR(path))
- err = PTR_ERR(path);
} else {
/* tree is full, time to grow in depth */
err = ext4_ext_grow_indepth(handle, inode, mb_flags);
if (err)
- goto out;
+ goto errout;

/* refill path */
path = ext4_find_extent(inode,
(ext4_lblk_t)le32_to_cpu(newext->ee_block),
path, gb_flags);
- if (IS_ERR(path)) {
- err = PTR_ERR(path);
- goto out;
- }
+ if (IS_ERR(path))
+ return path;

/*
* only first (depth 0 -> 1) produces free space;
@@ -1457,9 +1452,11 @@ static int ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode,
goto repeat;
}
}
-out:
- *ppath = IS_ERR(path) ? NULL : path;
- return err;
+ return path;
+
+errout:
+ ext4_free_ext_path(path);
+ return ERR_PTR(err);
}

/*
@@ -2112,10 +2109,14 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
*/
if (gb_flags & EXT4_GET_BLOCKS_METADATA_NOFAIL)
mb_flags |= EXT4_MB_USE_RESERVED;
- err = ext4_ext_create_new_leaf(handle, inode, mb_flags, gb_flags,
- ppath, newext);
- if (err)
+ path = ext4_ext_create_new_leaf(handle, inode, mb_flags, gb_flags,
+ path, newext);
+ if (IS_ERR(path)) {
+ *ppath = NULL;
+ err = PTR_ERR(path);
goto cleanup;
+ }
+ *ppath = path;
depth = ext_depth(inode);
eh = path[depth].p_hdr;

--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:21:33 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li
From: Baokun Li <liba...@huawei.com>

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

After getting rid of ppath in get_ext_path(), its caller may pass an error
pointer to ext4_free_ext_path(), so it needs to teach ext4_free_ext_path()
and ext4_ext_drop_refs() to skip the error pointer. No functional changes.

Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 5 +++--
fs/ext4/move_extent.c | 34 +++++++++++++++++-----------------
2 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 5217e6f53467..6dfb5d03e197 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -116,7 +116,7 @@ static void ext4_ext_drop_refs(struct ext4_ext_path *path)
{
int depth, i;

- if (!path)
+ if (IS_ERR_OR_NULL(path))
return;
depth = path->p_depth;
for (i = 0; i <= depth; i++, path++)
@@ -125,6 +125,8 @@ static void ext4_ext_drop_refs(struct ext4_ext_path *path)

void ext4_free_ext_path(struct ext4_ext_path *path)
{
+ if (IS_ERR_OR_NULL(path))
+ return;
ext4_ext_drop_refs(path);
kfree(path);
}
@@ -4191,7 +4193,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
path = ext4_find_extent(inode, map->m_lblk, NULL, 0);
if (IS_ERR(path)) {
err = PTR_ERR(path);
- path = NULL;
goto out;
}

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index b0ab19a913bf..a7186d63725a 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -17,27 +17,23 @@
* get_ext_path() - Find an extent path for designated logical block number.
* @inode: inode to be searched
* @lblock: logical block number to find an extent path
- * @ppath: pointer to an extent path pointer (for output)
+ * @path: pointer to an extent path
*
- * ext4_find_extent wrapper. Return 0 on success, or a negative error value
- * on failure.
+ * ext4_find_extent wrapper. Return an extent path pointer on success,
+ * or an error pointer on failure.
*/
-static inline int
+static inline struct ext4_ext_path *
get_ext_path(struct inode *inode, ext4_lblk_t lblock,
- struct ext4_ext_path **ppath)
+ struct ext4_ext_path *path)
{
- struct ext4_ext_path *path = *ppath;
-
- *ppath = NULL;
path = ext4_find_extent(inode, lblock, path, EXT4_EX_NOCACHE);
if (IS_ERR(path))
- return PTR_ERR(path);
+ return path;
if (path[ext_depth(inode)].p_ext == NULL) {
ext4_free_ext_path(path);
- return -ENODATA;
+ return ERR_PTR(-ENODATA);
}
- *ppath = path;
- return 0;
+ return path;
}

/**
@@ -95,9 +91,11 @@ mext_check_coverage(struct inode *inode, ext4_lblk_t from, ext4_lblk_t count,
int ret = 0;
ext4_lblk_t last = from + count;
while (from < last) {
- *err = get_ext_path(inode, from, &path);
- if (*err)
- goto out;
+ path = get_ext_path(inode, from, path);
+ if (IS_ERR(path)) {
+ *err = PTR_ERR(path);
+ return ret;
+ }
ext = path[ext_depth(inode)].p_ext;
if (unwritten != ext4_ext_is_unwritten(ext))
goto out;
@@ -624,9 +622,11 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
int offset_in_page;
int unwritten, cur_len;

- ret = get_ext_path(orig_inode, o_start, &path);
- if (ret)
+ path = get_ext_path(orig_inode, o_start, path);
+ if (IS_ERR(path)) {
+ ret = PTR_ERR(path);
goto out;
+ }
ex = path[path->p_depth].p_ext;
cur_blk = le32_to_cpu(ex->ee_block);
cur_len = ext4_ext_get_actual_len(ex);
--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:21:42 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li
From: Baokun Li <liba...@huawei.com>

The use of path and ppath is now very confusing, so to make the code more
readable, pass path between functions uniformly, and get rid of ppath.

To get rid of the ppath in ext4_ext_handle_unwritten_extents(), the
following is done here:

* Free the extents path when an error is encountered.
* The 'allocated' is changed from passing a value to passing an address.

No functional changes.

Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 82 +++++++++++++++++++++--------------------------
1 file changed, 37 insertions(+), 45 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 59e80926fe3a..badadcd64e66 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -3887,18 +3887,18 @@ convert_initialized_extent(handle_t *handle, struct inode *inode,
return 0;
}

-static int
+static struct ext4_ext_path *
ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
struct ext4_map_blocks *map,
- struct ext4_ext_path **ppath, int flags,
- unsigned int allocated, ext4_fsblk_t newblock)
+ struct ext4_ext_path *path, int flags,
+ unsigned int *allocated, ext4_fsblk_t newblock)
{
int err = 0;

ext_debug(inode, "logical block %llu, max_blocks %u, flags 0x%x, allocated %u\n",
(unsigned long long)map->m_lblk, map->m_len, flags,
- allocated);
- ext4_ext_show_leaf(inode, *ppath);
+ *allocated);
+ ext4_ext_show_leaf(inode, path);

/*
* When writing into unwritten space, we should not fail to
@@ -3907,40 +3907,34 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
flags |= EXT4_GET_BLOCKS_METADATA_NOFAIL;

trace_ext4_ext_handle_unwritten_extents(inode, map, flags,
- allocated, newblock);
+ *allocated, newblock);

/* get_block() before submitting IO, split the extent */
if (flags & EXT4_GET_BLOCKS_PRE_IO) {
- *ppath = ext4_split_convert_extents(handle, inode, map, *ppath,
- flags | EXT4_GET_BLOCKS_CONVERT, &allocated);
- if (IS_ERR(*ppath)) {
- err = PTR_ERR(*ppath);
- *ppath = NULL;
- goto out2;
- }
+ path = ext4_split_convert_extents(handle, inode, map, path,
+ flags | EXT4_GET_BLOCKS_CONVERT, allocated);
+ if (IS_ERR(path))
+ return path;
/*
* shouldn't get a 0 allocated when splitting an extent unless
* m_len is 0 (bug) or extent has been corrupted
*/
- if (unlikely(allocated == 0)) {
+ if (unlikely(*allocated == 0)) {
EXT4_ERROR_INODE(inode,
"unexpected allocated == 0, m_len = %u",
map->m_len);
err = -EFSCORRUPTED;
- goto out2;
+ goto errout;
}
map->m_flags |= EXT4_MAP_UNWRITTEN;
goto out;
}
/* IO end_io complete, convert the filled extent to written */
if (flags & EXT4_GET_BLOCKS_CONVERT) {
- *ppath = ext4_convert_unwritten_extents_endio(handle, inode,
- map, *ppath);
- if (IS_ERR(*ppath)) {
- err = PTR_ERR(*ppath);
- *ppath = NULL;
- goto out2;
- }
+ path = ext4_convert_unwritten_extents_endio(handle, inode,
+ map, path);
+ if (IS_ERR(path))
+ return path;
ext4_update_inode_fsync_trans(handle, inode, 1);
goto map_out;
}
@@ -3972,23 +3966,20 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
* For buffered writes, at writepage time, etc. Convert a
* discovered unwritten extent to written.
*/
- *ppath = ext4_ext_convert_to_initialized(handle, inode, map, *ppath,
- flags, &allocated);
- if (IS_ERR(*ppath)) {
- err = PTR_ERR(*ppath);
- *ppath = NULL;
- goto out2;
- }
+ path = ext4_ext_convert_to_initialized(handle, inode, map, path,
+ flags, allocated);
+ if (IS_ERR(path))
+ return path;
ext4_update_inode_fsync_trans(handle, inode, 1);
/*
* shouldn't get a 0 allocated when converting an unwritten extent
* unless m_len is 0 (bug) or extent has been corrupted
*/
- if (unlikely(allocated == 0)) {
+ if (unlikely(*allocated == 0)) {
EXT4_ERROR_INODE(inode, "unexpected allocated == 0, m_len = %u",
map->m_len);
err = -EFSCORRUPTED;
- goto out2;
+ goto errout;
}

out:
@@ -3997,12 +3988,15 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode,
map->m_flags |= EXT4_MAP_MAPPED;
out1:
map->m_pblk = newblock;
- if (allocated > map->m_len)
- allocated = map->m_len;
- map->m_len = allocated;
- ext4_ext_show_leaf(inode, *ppath);
-out2:
- return err ? err : allocated;
+ if (*allocated > map->m_len)
+ *allocated = map->m_len;
+ map->m_len = *allocated;
+ ext4_ext_show_leaf(inode, path);
+ return path;
+
+errout:
+ ext4_free_ext_path(path);
+ return ERR_PTR(err);
}

/*
@@ -4199,7 +4193,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
struct ext4_extent newex, *ex, ex2;
struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
ext4_fsblk_t newblock = 0, pblk;
- int err = 0, depth, ret;
+ int err = 0, depth;
unsigned int allocated = 0, offset = 0;
unsigned int allocated_clusters = 0;
struct ext4_allocation_request ar;
@@ -4273,13 +4267,11 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
goto out;
}

- ret = ext4_ext_handle_unwritten_extents(
- handle, inode, map, &path, flags,
- allocated, newblock);
- if (ret < 0)
- err = ret;
- else
- allocated = ret;
+ path = ext4_ext_handle_unwritten_extents(
+ handle, inode, map, path, flags,
+ &allocated, newblock);
+ if (IS_ERR(path))
+ err = PTR_ERR(path);
goto out;
}
}
--
2.39.2


liba...@huaweicloud.com

unread,
Jul 10, 2024, 12:21:55 AMJul 10
to linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, liba...@huaweicloud.com, Baokun Li, sta...@kernel.org
From: Baokun Li <liba...@huawei.com>

When calling ext4_force_split_extent_at() in ext4_ext_replay_update_ex(),
the 'ppath' is updated but it is the 'path' that is freed, thus potentially
triggering a double-free in the following process:

ext4_ext_replay_update_ex
ppath = path
ext4_force_split_extent_at(&ppath)
ext4_split_extent_at
ext4_ext_insert_extent
ext4_ext_create_new_leaf
ext4_ext_grow_indepth
ext4_find_extent
if (depth > path[0].p_maxdepth)
kfree(path) ---> path First freed
*orig_path = path = NULL ---> null ppath
kfree(path) ---> path double-free !!!

So drop the unnecessary ppath and use path directly to avoid this problem.
And use ext4_find_extent() directly to update path, avoiding unnecessary
memory allocation and freeing. Also, propagate the error returned by
ext4_find_extent() instead of using strange error codes.

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: sta...@kernel.org
Signed-off-by: Baokun Li <liba...@huawei.com>
---
fs/ext4/extents.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 1660434fbfc7..b1cfce5b57d2 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -5920,7 +5920,7 @@ int ext4_clu_mapped(struct inode *inode, ext4_lblk_t lclu)
int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
int len, int unwritten, ext4_fsblk_t pblk)
{
- struct ext4_ext_path *path = NULL, *ppath;
+ struct ext4_ext_path *path;
struct ext4_extent *ex;
int ret;

@@ -5936,30 +5936,29 @@ int ext4_ext_replay_update_ex(struct inode *inode, ext4_lblk_t start,
if (le32_to_cpu(ex->ee_block) != start ||
ext4_ext_get_actual_len(ex) != len) {
/* We need to split this extent to match our extent first */
- ppath = path;
down_write(&EXT4_I(inode)->i_data_sem);
- ret = ext4_force_split_extent_at(NULL, inode, &ppath, start, 1);
+ ret = ext4_force_split_extent_at(NULL, inode, &path, start, 1);
up_write(&EXT4_I(inode)->i_data_sem);
if (ret)
goto out;
- kfree(path);
- path = ext4_find_extent(inode, start, NULL, 0);
+
+ path = ext4_find_extent(inode, start, &path, 0);
if (IS_ERR(path))
- return -1;
- ppath = path;
+ return PTR_ERR(path);
ex = path[path->p_depth].p_ext;
WARN_ON(le32_to_cpu(ex->ee_block) != start);
+
if (ext4_ext_get_actual_len(ex) != len) {
down_write(&EXT4_I(inode)->i_data_sem);
- ret = ext4_force_split_extent_at(NULL, inode, &ppath,
+ ret = ext4_force_split_extent_at(NULL, inode, &path,
start + len, 1);
up_write(&EXT4_I(inode)->i_data_sem);
if (ret)
goto out;
- kfree(path);
- path = ext4_find_extent(inode, start, NULL, 0);
+
+ path = ext4_find_extent(inode, start, &path, 0);
if (IS_ERR(path))
- return -EINVAL;
+ return PTR_ERR(path);
ex = path[path->p_depth].p_ext;
}
}
--
2.39.2


Baokun Li

unread,
Jul 25, 2024, 1:42:07 AM (5 days ago) Jul 25
to Jan Kara, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li, Baokun Li
On 2024/7/25 3:16, Jan Kara wrote:
> On Wed 10-07-24 12:06:40, liba...@huaweicloud.com wrote:
>> From: Baokun Li <liba...@huawei.com>
>>
>> In ext4_find_extent(), path may be freed by error or be reallocated, so
>> using a previously saved *ppath may have been freed and thus may trigger
>> use-after-free, as follows:
>>
>> ext4_split_extent
>> path = *ppath;
>> ext4_split_extent_at(ppath)
>> path = ext4_find_extent(ppath)
>> ext4_split_extent_at(ppath)
>> // ext4_find_extent fails to free path
>> // but zeroout succeeds
>> ext4_ext_show_leaf(inode, path)
>> eh = path[depth].p_hdr
>> // path use-after-free !!!
>>
>> Similar to ext4_split_extent_at(), we use *ppath directly as an input to
>> ext4_ext_show_leaf(). Fix a spelling error by the way.
>>
>> Same problem in ext4_ext_handle_unwritten_extents(). Since 'path' is only
>> used in ext4_ext_show_leaf(), remove 'path' and use *ppath directly.
>>
>> This issue is triggered only when EXT_DEBUG is defined and therefore does
>> not affect functionality.
>>
>> Signed-off-by: Baokun Li <liba...@huawei.com>
> Looks good. Feel free to add:
>
> Reviewed-by: Jan Kara <ja...@suse.cz>
>
> I'd just note that this shows that modifying ppath in the called function
> was not a great idea as it makes possible use-after-free issues due to
> cached values being used very hard to spot and very easy to introduce...
>
> Honza

Hi Honza,

Thank you very much for your review!

Yes, it was too confusing, which is why I dropped all ppaths after fixing
the problem. Judging and using the returned path every time would
make the code look a lot simpler.

Thanks,
Baokun
--
With Best Regards,
Baokun Li


Ojaswin Mujoo

unread,
Jul 25, 2024, 5:03:16 AM (5 days ago) Jul 25
to liba...@huaweicloud.com, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li
Tested kvm-xfstests auto on Powerpc with 4k, 64k and adv and no
new regressions are seen.

Feel free to add:

Tested-by: Ojaswin Mujoo <oja...@linux.ibm.com>

Regards,
Ojaswin

Ojaswin Mujoo

unread,
Jul 26, 2024, 7:46:20 AM (4 days ago) Jul 26
to liba...@huaweicloud.com, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li, sta...@kernel.org
Hi Baokun,

If i'm not wrong, in this trace, we'll enter ext4_ext_insert_extent() with
gb_flags having EXT4_GET_BLOCKS_PRE_IO so we won't actually go for a
merge_up.

That being said, there seems to be a few places where we follow the
split-insert pattern and it might be possible that one of those call
sites might not be passing EXT4_GET_BLOCKS_PRE_IO and we'll the double
free issue you mentioned. I'll check and update if I see anything.

On a separate note, isn't it a bit weird that we grow the tree indepth
(which includes allocation, marking buffer dirty etc) only to later
merge it up again and throwing all the changes we did while growing the
tree. Surely we could optimize this particular case somehow right?
Anyways, I agree that adding this here is the right thing to do:

Reviewed-by: Ojaswin Mujoo <oja...@linux.ibm.com>

Thanks,
Ojaswin

Ojaswin Mujoo

unread,
Jul 26, 2024, 7:56:12 AM (4 days ago) Jul 26
to liba...@huaweicloud.com, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li
On Wed, Jul 10, 2024 at 12:06:38PM +0800, liba...@huaweicloud.com wrote:
> From: Baokun Li <liba...@huawei.com>
>
> Add ext4_ext_path_brelse() helper function to reduce duplicate code
> and ensure that path->p_bh is set to NULL after it is released.
>
> Signed-off-by: Baokun Li <liba...@huawei.com>

Hi Baokun,

The helper looks good, feel free to add:

Reviewed-by: Ojaswin Mujoo <oja...@linux.ibm.com>

Baokun Li

unread,
Jul 27, 2024, 2:19:02 AM (3 days ago) Jul 27
to Jan Kara, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li, Baokun Li
On 2024/7/25 18:38, Jan Kara wrote:
> On Wed 10-07-24 12:06:42, liba...@huaweicloud.com wrote:
>> From: Baokun Li <liba...@huawei.com>
>>
>> The use of path and ppath is now very confusing, so to make the code more
>> readable, pass path between functions uniformly, and get rid of ppath.
>>
>> Getting rid of ppath in ext4_find_extent() requires its caller to update
>> ppath. These ppaths will also be dropped later. No functional changes.
>>
>> Signed-off-by: Baokun Li <liba...@huawei.com>
> One nit below, otherwise feel free to add:
>
> Reviewed-by: Jan Kara <ja...@suse.cz>
>
>> @@ -3260,11 +3257,12 @@ static int ext4_split_extent_at(handle_t *handle,
>> * WARN_ON may be triggered in ext4_da_update_reserve_space() due to
>> * an incorrect ee_len causing the i_reserved_data_blocks exception.
>> */
>> - path = ext4_find_extent(inode, ee_block, ppath,
>> + path = ext4_find_extent(inode, ee_block, *ppath,
>> flags | EXT4_EX_NOFAIL);
>> if (IS_ERR(path)) {
>> EXT4_ERROR_INODE(inode, "Failed split extent on %u, err %ld",
>> split, PTR_ERR(path));
>> + *ppath = NULL;
>> return PTR_ERR(path);
>> }
> I think here you forgot to update ppath on success case. It will go away by
> the end of the series but still it's good to keep thing consistent...
>
> Honza

Hi Honza,

Thank you for your review!

In patch 5, the ppath is already updated in case of success, so there
is no need to add it here. This update was not shown when the patch
was made and it looks like this:

-       path = ext4_find_extent(inode, ee_block, ppath,
+       path = ext4_find_extent(inode, ee_block, *ppath,
                                flags | EXT4_EX_NOFAIL);
        if (IS_ERR(path)) {
                EXT4_ERROR_INODE(inode, "Failed split extent on %u, err
%ld",
                                 split, PTR_ERR(path));
+               *ppath = NULL;
                return PTR_ERR(path);
        }
        depth = ext_depth(inode);
        ex = path[depth].p_ext;
        *ppath = path;

Baokun Li

unread,
Jul 27, 2024, 2:35:43 AM (3 days ago) Jul 27
to Jan Kara, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li, Baokun Li
On 2024/7/25 18:46, Jan Kara wrote:
> On Wed 10-07-24 12:06:44, liba...@huaweicloud.com wrote:
>> From: Baokun Li <liba...@huawei.com>
>>
>> The use of path and ppath is now very confusing, so to make the code more
>> readable, pass path between functions uniformly, and get rid of ppath.
>>
>> To get rid of the ppath in ext4_ext_create_new_leaf(), the following is
>> done here:
>>
>> * Free the extents path when an error is encountered.
>> * Its caller needs to update ppath if it uses ppath.
>>
>> No functional changes.
>>
>> Signed-off-by: Baokun Li <liba...@huawei.com>
> Just one nit below. Otherwise feel free to add:
>
> Reviewed-by: Jan Kara <ja...@suse.cz>
>
>> @@ -1424,28 +1423,24 @@ static int ext4_ext_create_new_leaf(handle_t *handle, struct inode *inode,
>> * entry: create all needed subtree and add new leaf */
>> err = ext4_ext_split(handle, inode, mb_flags, path, newext, i);
>> if (err)
>> - goto out;
>> + goto errout;
>>
>> /* refill path */
>> path = ext4_find_extent(inode,
>> (ext4_lblk_t)le32_to_cpu(newext->ee_block),
>> path, gb_flags);
>> - if (IS_ERR(path))
>> - err = PTR_ERR(path);
> So I'd rather have here:
> return path;
>
> That way it's obvious we will not confuse some code below with error stored
> in 'path' and we can also save one indentation level by removing 'else'
> below (probably do reindenting in a separate patch).
>
> Honza
Yes, that looks clearer! I'll add 'return path' in the next version, and
add a cleanup patch after this one to save indentation.

Thanks,
Baokun

Ojaswin Mujoo

unread,
Jul 27, 2024, 6:37:40 AM (3 days ago) Jul 27
to liba...@huaweicloud.com, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li, sta...@kernel.org
Hi Baokun, nice catch!

I was just wondering if it makes more sense to only update path if we
encounter an error:

err = ext4_ext_insert_extent(handle, inode, ppath, &newex, flags);
if (err != -ENOSPC && err != -EDQUOT && err != -ENOMEM)
goto out;
else if (err < 0) {
...
path = ext4_find_extent(inode, ee_block, ppath, flags | EXT4_EX_NOFAIL);
}

I believe that's the only time we'd end up with the use after free issue
and this way we can avoid the (maybe tiny) performance overhead of calling
the extra find extent. What are your thoughts?

regards,
ojaswin

Ojaswin Mujoo

unread,
Jul 27, 2024, 6:44:10 AM (3 days ago) Jul 27
to liba...@huaweicloud.com, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li
On Wed, Jul 10, 2024 at 12:06:40PM +0800, liba...@huaweicloud.com wrote:
> From: Baokun Li <liba...@huawei.com>
>
> In ext4_find_extent(), path may be freed by error or be reallocated, so
> using a previously saved *ppath may have been freed and thus may trigger
> use-after-free, as follows:
>
> ext4_split_extent
> path = *ppath;
> ext4_split_extent_at(ppath)
> path = ext4_find_extent(ppath)
> ext4_split_extent_at(ppath)
> // ext4_find_extent fails to free path
> // but zeroout succeeds
> ext4_ext_show_leaf(inode, path)
> eh = path[depth].p_hdr
> // path use-after-free !!!
>
> Similar to ext4_split_extent_at(), we use *ppath directly as an input to
> ext4_ext_show_leaf(). Fix a spelling error by the way.
>
> Same problem in ext4_ext_handle_unwritten_extents(). Since 'path' is only
> used in ext4_ext_show_leaf(), remove 'path' and use *ppath directly.
>
> This issue is triggered only when EXT_DEBUG is defined and therefore does
> not affect functionality.
>
> Signed-off-by: Baokun Li <liba...@huawei.com>

Looks good, feel free to add:

Reviewed-by: Ojaswin Mujoo <oja...@linux.ibm.com>

Regards,
ojaswin

Ojaswin Mujoo

unread,
Jul 27, 2024, 7:29:12 AM (3 days ago) Jul 27
to liba...@huaweicloud.com, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li, sta...@kernel.org
On Wed, Jul 10, 2024 at 12:06:41PM +0800, liba...@huaweicloud.com wrote:
> From: Baokun Li <liba...@huawei.com>
>
> When calling ext4_force_split_extent_at() in ext4_ext_replay_update_ex(),
> the 'ppath' is updated but it is the 'path' that is freed, thus potentially
> triggering a double-free in the following process:
>
> ext4_ext_replay_update_ex
> ppath = path
> ext4_force_split_extent_at(&ppath)
> ext4_split_extent_at
> ext4_ext_insert_extent
> ext4_ext_create_new_leaf
> ext4_ext_grow_indepth
> ext4_find_extent
> if (depth > path[0].p_maxdepth)
> kfree(path) ---> path First freed
> *orig_path = path = NULL ---> null ppath
> kfree(path) ---> path double-free !!!
>
> So drop the unnecessary ppath and use path directly to avoid this problem.
> And use ext4_find_extent() directly to update path, avoiding unnecessary
> memory allocation and freeing. Also, propagate the error returned by
> ext4_find_extent() instead of using strange error codes.
>
> Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
> Cc: sta...@kernel.org
> Signed-off-by: Baokun Li <liba...@huawei.com>

Hey Baokun,
These ppath bugs are really subtle indeed :)

Anyways, feel free to add:

Reviewed-by: Ojaswin Mujoo <oja...@linux.ibm.com>

Baokun Li

unread,
4:53 AM (4 hours ago) 4:53 AM
to Ojaswin Mujoo, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li, sta...@kernel.org, Baokun Li
Hi Ojaswin,

You're right. I am very sorry for the confusion.

The trace here is wrong, this patch should actually be placed after the two
UAF patches. Here ext4_ext_try_to_merge() is called when trying zeroout in
ext4_split_extent_at(). It is called when trying zeroout with or without
EXT4_GET_BLOCKS_PRE_IO.The correct trace is as follows:

  split2    map    split1
|--------|-------|--------|

ext4_ext_map_blocks
 ext4_ext_handle_unwritten_extents
  ext4_split_convert_extents
   // path->p_depth == 0
   ext4_split_extent
     // 1. do split1
     ext4_split_extent_at
       |ext4_ext_insert_extent
       |  ext4_ext_create_new_leaf
       |    ext4_ext_grow_indepth
       |      le16_add_cpu(&neh->eh_depth, 1)
       |    ext4_find_extent
       |      // return -ENOMEM
       |// get error and try zeroout
       |path = ext4_find_extent
       |  path->p_depth = 1
       |ext4_ext_try_to_merge
       |  ext4_ext_try_to_merge_up
       |    path->p_depth = 0
       |    brelse(path[1].p_bh)  ---> not set to NULL here
       |// zeroout success
     // 2. update path
     ext4_find_extent
     // 3. do split2
     ext4_split_extent_at
       ext4_ext_insert_extent
         ext4_ext_create_new_leaf
           ext4_ext_grow_indepth
             le16_add_cpu(&neh->eh_depth, 1)
           ext4_find_extent
             path[0].p_bh = NULL;
             path->p_depth = 1
             read_extent_tree_block  ---> return err
             // path[1].p_bh is still the old value
 ext4_free_ext_path
  ext4_ext_drop_refs
   // path->p_depth == 1
   brelse(path[1].p_bh)  ---> brelse a buffer twice

I'll adjust the order of the patches and correct the trace in the next
version.
> On a separate note, isn't it a bit weird that we grow the tree indepth
> (which includes allocation, marking buffer dirty etc) only to later
> merge it up again and throwing all the changes we did while growing the
> tree. Surely we could optimize this particular case somehow right?
Sorry that my trace misled you. It seems reasonable to try to merge extent
in error handling.
Thanks for the review!
>> ext4_free_blocks(handle, inode, NULL, blk, 1,
>> EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET);
>> }
>> --
>> 2.39.2
>>

Baokun Li

unread,
4:58 AM (3 hours ago) 4:58 AM
to Ojaswin Mujoo, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li, sta...@kernel.org, Baokun Li
Hi Ojaswin,

We're already there now, updating the path only on recoverable,
non-critical errors (-ENOSPC, -EDQUOT, and -ENOMEM).

Thanks,
Baokun
>> if (EXT4_EXT_MAY_ZEROOUT & split_flag) {
>> if (split_flag & (EXT4_EXT_DATA_VALID1|EXT4_EXT_DATA_VALID2)) {
>> if (split_flag & EXT4_EXT_DATA_VALID1) {
>> @@ -3304,7 +3323,7 @@ static int ext4_split_extent_at(handle_t *handle,
>> ext4_ext_dirty(handle, inode, path + path->p_depth);
>> return err;
>> out:
>> - ext4_ext_show_leaf(inode, path);
>> + ext4_ext_show_leaf(inode, *ppath);
>> return err;
>> }
>>
>> --
>> 2.39.2
>>

Ojaswin Mujoo

unread,
5:57 AM (3 hours ago) 5:57 AM
to Baokun Li, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li, sta...@kernel.org
Hey Baokun, yes you are right I somehow misread the code:

Feel free to add:

Reviewed-by: Ojaswin Mujoo <oja...@linux.ibm.com>

Regards,
ojaswin

Ojaswin Mujoo

unread,
6:28 AM (2 hours ago) 6:28 AM
to liba...@huaweicloud.com, linux...@vger.kernel.org, ty...@mit.edu, adilger...@dilger.ca, ja...@suse.cz, rites...@gmail.com, linux-...@vger.kernel.org, yi.z...@huawei.com, yang...@huawei.com, Baokun Li
Hi Baokun,

The changes look good to me, afaict we've handled updating ppath in all
the places correctly.

Feel free to add:

Reviewed-by: Ojaswin Mujoo <oja...@linux.ibm.com>

Regards,
ojaswin
> --
> 2.39.2

>

Reply all
Reply to author
Forward
0 new messages