[PATCH 0/2] Do not change split folio target order

Zi Yan

unread,

Oct 10, 2025, 1:39:30 PM (2 days ago) Oct 10

to linm...@huawei.com, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, z...@nvidia.com, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org

Hi all,

Currently, huge page and large folio split APIs would bump new order
when the target folio has min_order_for_split() > 0 and return
success if split succeeds. When callers expect after-split folios to be
order-0, the actual ones are not. The callers might not be able to
handle them, since they call huge page and large folio split APIs to get
order-0 folios. This issue appears in a recent report on
memory_failure()[1], where memory_failure() used split_huge_page() to split
a large forlio to order-0, but after a successful split got non order-0
folios. Because memory_failure() can only handle order-0 folios, this
caused a WARNING.

Fix the issue by not changing split target order and failing the
split if min_order_for_split() is greater than the target order.
In addition, to avoid wasting memory in memory failure handling, a second
patch is added to always split a large folio to min_order_for_split()
even if it is not 0, so that folios not containing the poisoned page can
be freed for reuse. For soft offline, since the folio is still accessible,
do not split if min_order_for_split() is not zero to avoid potential
performance loss.

[1] https://lore.kernel.org/all/68d2c943.a70a022...@google.com/

Zi Yan (2):
mm/huge_memory: do not change split_huge_page*() target order
silently.
mm/memory-failure: improve large block size folio handling.

include/linux/huge_mm.h | 28 +++++-----------------------
mm/huge_memory.c | 9 +--------
mm/memory-failure.c | 25 +++++++++++++++++++++----
mm/truncate.c | 6 ++++--
4 files changed, 31 insertions(+), 37 deletions(-)

--
2.51.0

Zi Yan

unread,

Oct 10, 2025, 1:39:32 PM (2 days ago) Oct 10

to linm...@huawei.com, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, z...@nvidia.com, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org

Page cache folios from a file system that support large block size (LBS)
can have minimal folio order greater than 0, thus a high order folio might
not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
folio in minimum folio order chunks") bumps the target order of
split_huge_page*() to the minimum allowed order when splitting a LBS folio.
This causes confusion for some split_huge_page*() callers like memory
failure handling code, since they expect after-split folios all have
order-0 when split succeeds but in really get min_order_for_split() order
folios.

Fix it by failing a split if the folio cannot be split to the target order.

Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
[The test poisons LBS folios, which cannot be split to order-0 folios, and
also tries to poison all memory. The non split LBS folios take more memory
than the test anticipated, leading to OOM. The patch fixed the kernel
warning and the test needs some change to avoid OOM.]
Reported-by: syzbot+e6367e...@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68d2c943.a70a022...@google.com/
Signed-off-by: Zi Yan <z...@nvidia.com>
---

include/linux/huge_mm.h | 28 +++++-----------------------
mm/huge_memory.c | 9 +--------

mm/truncate.c | 6 ++++--
3 files changed, 10 insertions(+), 33 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 8eec7a2a977b..9950cda1526a 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -394,34 +394,16 @@ static inline int split_huge_page_to_list_to_order(struct page *page, struct lis
* Return: 0: split is successful, otherwise split failed.
*/
static inline int try_folio_split(struct folio *folio, struct page *page,
- struct list_head *list)
+ struct list_head *list, unsigned int order)
{
- int ret = min_order_for_split(folio);
-
- if (ret < 0)
- return ret;
-
- if (!non_uniform_split_supported(folio, 0, false))
+ if (!non_uniform_split_supported(folio, order, false))
return split_huge_page_to_list_to_order(&folio->page, list,
- ret);
- return folio_split(folio, ret, page, list);
+ order);
+ return folio_split(folio, order, page, list);
}
static inline int split_huge_page(struct page *page)
{
- struct folio *folio = page_folio(page);
- int ret = min_order_for_split(folio);
-
- if (ret < 0)
- return ret;
-
- /*
- * split_huge_page() locks the page before splitting and
- * expects the same page that has been split to be locked when
- * returned. split_folio(page_folio(page)) cannot be used here
- * because it converts the page to folio and passes the head
- * page to be split.
- */
- return split_huge_page_to_list_to_order(page, NULL, ret);
+ return split_huge_page_to_list_to_order(page, NULL, 0);
}
void deferred_split_folio(struct folio *folio, bool partially_mapped);

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0fb4af604657..af06ee6d2206 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3829,8 +3829,6 @@ static int __folio_split(struct folio *folio, unsigned int new_order,

min_order = mapping_min_folio_order(folio->mapping);
if (new_order < min_order) {
- VM_WARN_ONCE(1, "Cannot split mapped folio below min-order: %u",
- min_order);
ret = -EINVAL;
goto out;
}
@@ -4173,12 +4171,7 @@ int min_order_for_split(struct folio *folio)

int split_folio_to_list(struct folio *folio, struct list_head *list)
{
- int ret = min_order_for_split(folio);
-
- if (ret < 0)
- return ret;
-
- return split_huge_page_to_list_to_order(&folio->page, list, ret);
+ return split_huge_page_to_list_to_order(&folio->page, list, 0);
}

/*
diff --git a/mm/truncate.c b/mm/truncate.c
index 91eb92a5ce4f..1c15149ae8e9 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -194,6 +194,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
size_t size = folio_size(folio);
unsigned int offset, length;
struct page *split_at, *split_at2;
+ unsigned int min_order;

if (pos < start)
offset = start - pos;
@@ -223,8 +224,9 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
if (!folio_test_large(folio))
return true;

+ min_order = mapping_min_folio_order(folio->mapping);
split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
- if (!try_folio_split(folio, split_at, NULL)) {
+ if (!try_folio_split(folio, split_at, NULL, min_order)) {
/*
* try to split at offset + length to make sure folios within
* the range can be dropped, especially to avoid memory waste
@@ -254,7 +256,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
*/
if (folio_test_large(folio2) &&
folio2->mapping == folio->mapping)
- try_folio_split(folio2, split_at2, NULL);
+ try_folio_split(folio2, split_at2, NULL, min_order);

folio_unlock(folio2);
out:
--
2.51.0

Zi Yan

unread,

Oct 10, 2025, 1:39:33 PM (2 days ago) Oct 10

to linm...@huawei.com, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, z...@nvidia.com, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org

Large block size (LBS) folios cannot be split to order-0 folios but
min_order_for_folio(). Current split fails directly, but that is not
optimal. Split the folio to min_order_for_folio(), so that, after split,
only the folio containing the poisoned page becomes unusable instead.

For soft offline, do not split the large folio if it cannot be split to
order-0. Since the folio is still accessible from userspace and premature
split might lead to potential performance loss.

Suggested-by: Jane Chu <jane...@oracle.com>

Signed-off-by: Zi Yan <z...@nvidia.com>
---

mm/memory-failure.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index f698df156bf8..443df9581c24 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1656,12 +1656,13 @@ static int identify_page_state(unsigned long pfn, struct page *p,
* there is still more to do, hence the page refcount we took earlier
* is still needed.
*/
-static int try_to_split_thp_page(struct page *page, bool release)
+static int try_to_split_thp_page(struct page *page, unsigned int new_order,
+ bool release)
{
int ret;

lock_page(page);
- ret = split_huge_page(page);
+ ret = split_huge_page_to_list_to_order(page, NULL, new_order);
unlock_page(page);

if (ret && release)
@@ -2280,6 +2281,7 @@ int memory_failure(unsigned long pfn, int flags)
folio_unlock(folio);

if (folio_test_large(folio)) {
+ int new_order = min_order_for_split(folio);
/*
* The flag must be set after the refcount is bumped
* otherwise it may race with THP split.
@@ -2294,7 +2296,14 @@ int memory_failure(unsigned long pfn, int flags)
* page is a valid handlable page.
*/
folio_set_has_hwpoisoned(folio);
- if (try_to_split_thp_page(p, false) < 0) {
+ /*
+ * If the folio cannot be split to order-0, kill the process,
+ * but split the folio anyway to minimize the amount of unusable
+ * pages.
+ */
+ if (try_to_split_thp_page(p, new_order, false) || new_order) {
+ /* get folio again in case the original one is split */
+ folio = page_folio(p);
res = -EHWPOISON;
kill_procs_now(p, pfn, flags, folio);
put_page(p);
@@ -2621,7 +2630,15 @@ static int soft_offline_in_use_page(struct page *page)
};

if (!huge && folio_test_large(folio)) {
- if (try_to_split_thp_page(page, true)) {
+ int new_order = min_order_for_split(folio);
+
+ /*
+ * If the folio cannot be split to order-0, do not split it at
+ * all to retain the still accessible large folio.
+ * NOTE: if getting free memory is perferred, split it like it
+ * is done in memory_failure().
+ */
+ if (new_order || try_to_split_thp_page(page, new_order, true)) {
pr_info("%#lx: thp split failed\n", pfn);
return -EBUSY;
}
--
2.51.0

Luis Chamberlain

unread,

Oct 10, 2025, 2:02:23 PM (2 days ago) Oct 10

to Zi Yan, linm...@huawei.com, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, ak...@linux-foundation.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org

On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:
> Page cache folios from a file system that support large block size (LBS)
> can have minimal folio order greater than 0, thus a high order folio might
> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
> folio in minimum folio order chunks") bumps the target order of
> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
> This causes confusion for some split_huge_page*() callers like memory
> failure handling code, since they expect after-split folios all have
> order-0 when split succeeds but in really get min_order_for_split() order
> folios.
>
> Fix it by failing a split if the folio cannot be split to the target order.
>
> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
> [The test poisons LBS folios, which cannot be split to order-0 folios, and
> also tries to poison all memory. The non split LBS folios take more memory
> than the test anticipated, leading to OOM. The patch fixed the kernel
> warning and the test needs some change to avoid OOM.]
> Reported-by: syzbot+e6367e...@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68d2c943.a70a022...@google.com/
> Signed-off-by: Zi Yan <z...@nvidia.com>

Reviewed-by: Luis Chamberlain <mcg...@kernel.org>

Luis

Luis Chamberlain

unread,

Oct 10, 2025, 2:05:42 PM (2 days ago) Oct 10

to Zi Yan, linm...@huawei.com, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, ak...@linux-foundation.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org

On Fri, Oct 10, 2025 at 01:39:06PM -0400, Zi Yan wrote:
> Large block size (LBS) folios cannot be split to order-0 folios but
> min_order_for_folio(). Current split fails directly, but that is not
> optimal. Split the folio to min_order_for_folio(), so that, after split,
> only the folio containing the poisoned page becomes unusable instead.
>
> For soft offline, do not split the large folio if it cannot be split to
> order-0. Since the folio is still accessible from userspace and premature
> split might lead to potential performance loss.
>
> Suggested-by: Jane Chu <jane...@oracle.com>
> Signed-off-by: Zi Yan <z...@nvidia.com>

Lance Yang

unread,

Oct 10, 2025, 10:26:09 PM (2 days ago) Oct 10

to Zi Yan, ak...@linux-foundation.org, syzkall...@googlegroups.com, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, ker...@pankajraghav.com, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, jane...@oracle.com, Dev Jain, Barry Song, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, da...@redhat.com, linux-...@vger.kernel.org, linu...@kvack.org, linm...@huawei.com, syzbot+e6367e...@syzkaller.appspotmail.com

Seems like we need to add the order parameter to the stub for
try_folio_split() as well?

#ifdef CONFIG_TRANSPARENT_HUGEPAGE

...

#else /* CONFIG_TRANSPARENT_HUGEPAGE */

static inline int try_folio_split(struct folio *folio, struct page *page,

struct list_head *list)
{
VM_WARN_ON_ONCE_FOLIO(1, folio);
return -EINVAL;
}

#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

Cheers,
Lance

Miaohe Lin

unread,

Oct 11, 2025, 12:12:21 AM (yesterday) Oct 11

to Zi Yan, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 2025/10/11 1:39, Zi Yan wrote:
> Large block size (LBS) folios cannot be split to order-0 folios but
> min_order_for_folio(). Current split fails directly, but that is not
> optimal. Split the folio to min_order_for_folio(), so that, after split,
> only the folio containing the poisoned page becomes unusable instead.
>
> For soft offline, do not split the large folio if it cannot be split to
> order-0. Since the folio is still accessible from userspace and premature
> split might lead to potential performance loss.

Thanks for your patch.

If original folio A is split and the after-split new folio is B (A != B), will the
refcnt of folio A held above be missing? I.e. get_hwpoison_page() held the extra refcnt
of folio A, but we put the refcnt of folio B below. Is this a problem or am I miss
something?

Thanks.
.

Matthew Wilcox

unread,

Oct 11, 2025, 1:00:32 AM (yesterday) Oct 11

to Miaohe Lin, Zi Yan, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On Sat, Oct 11, 2025 at 12:12:12PM +0800, Miaohe Lin wrote:
> > folio_set_has_hwpoisoned(folio);
> > - if (try_to_split_thp_page(p, false) < 0) {
> > + /*
> > + * If the folio cannot be split to order-0, kill the process,
> > + * but split the folio anyway to minimize the amount of unusable
> > + * pages.
> > + */
> > + if (try_to_split_thp_page(p, new_order, false) || new_order) {
> > + /* get folio again in case the original one is split */
> > + folio = page_folio(p);
>
> If original folio A is split and the after-split new folio is B (A != B), will the
> refcnt of folio A held above be missing? I.e. get_hwpoison_page() held the extra refcnt
> of folio A, but we put the refcnt of folio B below. Is this a problem or am I miss
> something?

That's how split works.

Zi Yan, the kernel-doc for folio_split() could use some attention.
First, it's not kernel-doc; the comment opens with /* instead of /**.
Second, it says:

* After split, folio is left locked for caller.

which isn't actually true, right? The folio which contains
@split_at will be locked. Also, it will contain the additional
reference which was taken on @folio by the caller.

kernel test robot

unread,

Oct 11, 2025, 5:00:43 AM (yesterday) Oct 11

to Zi Yan, linm...@huawei.com, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, oe-kbu...@lists.linux.dev, z...@nvidia.com, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org

Hi Zi,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.17 next-20251010]
[cannot apply to akpm-mm/mm-everything]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Zi-Yan/mm-huge_memory-do-not-change-split_huge_page-target-order-silently/20251011-014145
base: linus/master
patch link: https://lore.kernel.org/r/20251010173906.3128789-2-ziy%40nvidia.com
patch subject: [PATCH 1/2] mm/huge_memory: do not change split_huge_page*() target order silently.
config: parisc-allnoconfig (https://download.01.org/0day-ci/archive/20251011/202510111633...@intel.com/config)
compiler: hppa-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251011/202510111633...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <l...@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510111633...@intel.com/

All errors (new ones prefixed by >>):

mm/truncate.c: In function 'truncate_inode_partial_folio':
>> mm/truncate.c:229:14: error: too many arguments to function 'try_folio_split'; expected 3, have 4
229 | if (!try_folio_split(folio, split_at, NULL, min_order)) {
| ^~~~~~~~~~~~~~~ ~~~~~~~~~
In file included from include/linux/mm.h:1081,
from arch/parisc/include/asm/cacheflush.h:5,
from include/linux/cacheflush.h:5,
from include/linux/highmem.h:8,
from include/linux/bvec.h:10,
from include/linux/blk_types.h:10,
from include/linux/writeback.h:13,
from include/linux/backing-dev.h:16,
from mm/truncate.c:12:
include/linux/huge_mm.h:588:19: note: declared here
588 | static inline int try_folio_split(struct folio *folio, struct page *page,
| ^~~~~~~~~~~~~~~
mm/truncate.c:259:25: error: too many arguments to function 'try_folio_split'; expected 3, have 4
259 | try_folio_split(folio2, split_at2, NULL, min_order);
| ^~~~~~~~~~~~~~~ ~~~~~~~~~
include/linux/huge_mm.h:588:19: note: declared here
588 | static inline int try_folio_split(struct folio *folio, struct page *page,
| ^~~~~~~~~~~~~~~

vim +/try_folio_split +229 mm/truncate.c

179
180 /*
181 * Handle partial folios. The folio may be entirely within the
182 * range if a split has raced with us. If not, we zero the part of the
183 * folio that's within the [start, end] range, and then split the folio if
184 * it's large. split_page_range() will discard pages which now lie beyond
185 * i_size, and we rely on the caller to discard pages which lie within a
186 * newly created hole.
187 *
188 * Returns false if splitting failed so the caller can avoid
189 * discarding the entire folio which is stubbornly unsplit.
190 */
191 bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
192 {
193 loff_t pos = folio_pos(folio);
194 size_t size = folio_size(folio);
195 unsigned int offset, length;
196 struct page *split_at, *split_at2;
197 unsigned int min_order;
198
199 if (pos < start)
200 offset = start - pos;
201 else
202 offset = 0;
203 if (pos + size <= (u64)end)
204 length = size - offset;
205 else
206 length = end + 1 - pos - offset;
207
208 folio_wait_writeback(folio);
209 if (length == size) {
210 truncate_inode_folio(folio->mapping, folio);
211 return true;
212 }
213
214 /*
215 * We may be zeroing pages we're about to discard, but it avoids
216 * doing a complex calculation here, and then doing the zeroing
217 * anyway if the page split fails.
218 */
219 if (!mapping_inaccessible(folio->mapping))
220 folio_zero_range(folio, offset, length);
221
222 if (folio_needs_release(folio))
223 folio_invalidate(folio, offset, length);
224 if (!folio_test_large(folio))
225 return true;
226
227 min_order = mapping_min_folio_order(folio->mapping);
228 split_at = folio_page(folio, PAGE_ALIGN_DOWN(offset) / PAGE_SIZE);
> 229 if (!try_folio_split(folio, split_at, NULL, min_order)) {
230 /*
231 * try to split at offset + length to make sure folios within
232 * the range can be dropped, especially to avoid memory waste
233 * for shmem truncate
234 */
235 struct folio *folio2;
236
237 if (offset + length == size)
238 goto no_split;
239
240 split_at2 = folio_page(folio,
241 PAGE_ALIGN_DOWN(offset + length) / PAGE_SIZE);
242 folio2 = page_folio(split_at2);
243
244 if (!folio_try_get(folio2))
245 goto no_split;
246
247 if (!folio_test_large(folio2))
248 goto out;
249
250 if (!folio_trylock(folio2))
251 goto out;
252
253 /*
254 * make sure folio2 is large and does not change its mapping.
255 * Its split result does not matter here.
256 */
257 if (folio_test_large(folio2) &&
258 folio2->mapping == folio->mapping)
259 try_folio_split(folio2, split_at2, NULL, min_order);
260
261 folio_unlock(folio2);
262 out:
263 folio_put(folio2);
264 no_split:
265 return true;
266 }
267 if (folio_test_dirty(folio))
268 return false;
269 truncate_inode_folio(folio->mapping, folio);
270 return true;
271 }
272

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Miaohe Lin

unread,

Oct 11, 2025, 5:08:02 AM (yesterday) Oct 11

to Matthew Wilcox, Zi Yan, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com

On 2025/10/11 13:00, Matthew Wilcox wrote:
> On Sat, Oct 11, 2025 at 12:12:12PM +0800, Miaohe Lin wrote:
>>> folio_set_has_hwpoisoned(folio);
>>> - if (try_to_split_thp_page(p, false) < 0) {
>>> + /*
>>> + * If the folio cannot be split to order-0, kill the process,
>>> + * but split the folio anyway to minimize the amount of unusable
>>> + * pages.
>>> + */
>>> + if (try_to_split_thp_page(p, new_order, false) || new_order) {
>>> + /* get folio again in case the original one is split */
>>> + folio = page_folio(p);
>>
>> If original folio A is split and the after-split new folio is B (A != B), will the
>> refcnt of folio A held above be missing? I.e. get_hwpoison_page() held the extra refcnt
>> of folio A, but we put the refcnt of folio B below. Is this a problem or am I miss
>> something?
>
> That's how split works.

I read the code and see how split works. Thanks for point this out.

>
> Zi Yan, the kernel-doc for folio_split() could use some attention.

That would be really helpful.

Thanks.

.

> First, it's not kernel-doc; the comment opens with /* instead of /**.
> Second, it says:
>
> * After split, folio is left locked for caller.
>
> which isn't actually true, right? The folio which contains
> @split_at will be locked. Also, it will contain the additional
> reference which was taken on @folio by the caller.
>

> .
>

kernel test robot

unread,

Oct 11, 2025, 6:23:49 AM (24 hours ago) Oct 11

to Zi Yan, linm...@huawei.com, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, oe-kbu...@lists.linux.dev, z...@nvidia.com, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org

Hi Zi,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.17 next-20251010]
[cannot apply to akpm-mm/mm-everything]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Zi-Yan/mm-huge_memory-do-not-change-split_huge_page-target-order-silently/20251011-014145
base: linus/master

patch link: https://lore.kernel.org/r/20251010173906.3128789-3-ziy%40nvidia.com
patch subject: [PATCH 2/2] mm/memory-failure: improve large block size folio handling.
config: parisc-allmodconfig (https://download.01.org/0day-ci/archive/20251011/202510111805...@intel.com/config)
compiler: hppa-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251011/202510111805...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <l...@intel.com>

| Closes: https://lore.kernel.org/oe-kbuild-all/202510111805...@intel.com/

All errors (new ones prefixed by >>):

mm/memory-failure.c: In function 'memory_failure':
>> mm/memory-failure.c:2278:33: error: implicit declaration of function 'min_order_for_split' [-Wimplicit-function-declaration]
2278 | int new_order = min_order_for_split(folio);
| ^~~~~~~~~~~~~~~~~~~

vim +/min_order_for_split +2278 mm/memory-failure.c

2147
2148 /**
2149 * memory_failure - Handle memory failure of a page.
2150 * @pfn: Page Number of the corrupted page
2151 * @flags: fine tune action taken
2152 *
2153 * This function is called by the low level machine check code
2154 * of an architecture when it detects hardware memory corruption
2155 * of a page. It tries its best to recover, which includes
2156 * dropping pages, killing processes etc.
2157 *
2158 * The function is primarily of use for corruptions that
2159 * happen outside the current execution context (e.g. when
2160 * detected by a background scrubber)
2161 *
2162 * Must run in process context (e.g. a work queue) with interrupts
2163 * enabled and no spinlocks held.
2164 *
2165 * Return:
2166 * 0 - success,
2167 * -ENXIO - memory not managed by the kernel
2168 * -EOPNOTSUPP - hwpoison_filter() filtered the error event,
2169 * -EHWPOISON - the page was already poisoned, potentially
2170 * kill process,
2171 * other negative values - failure.
2172 */
2173 int memory_failure(unsigned long pfn, int flags)
2174 {
2175 struct page *p;
2176 struct folio *folio;
2177 struct dev_pagemap *pgmap;
2178 int res = 0;
2179 unsigned long page_flags;
2180 bool retry = true;
2181 int hugetlb = 0;
2182
2183 if (!sysctl_memory_failure_recovery)
2184 panic("Memory failure on page %lx", pfn);
2185
2186 mutex_lock(&mf_mutex);
2187
2188 if (!(flags & MF_SW_SIMULATED))
2189 hw_memory_failure = true;
2190
2191 p = pfn_to_online_page(pfn);
2192 if (!p) {
2193 res = arch_memory_failure(pfn, flags);
2194 if (res == 0)
2195 goto unlock_mutex;
2196
2197 if (pfn_valid(pfn)) {
2198 pgmap = get_dev_pagemap(pfn);
2199 put_ref_page(pfn, flags);
2200 if (pgmap) {
2201 res = memory_failure_dev_pagemap(pfn, flags,
2202 pgmap);
2203 goto unlock_mutex;
2204 }
2205 }
2206 pr_err("%#lx: memory outside kernel control\n", pfn);
2207 res = -ENXIO;
2208 goto unlock_mutex;
2209 }
2210
2211 try_again:
2212 res = try_memory_failure_hugetlb(pfn, flags, &hugetlb);
2213 if (hugetlb)
2214 goto unlock_mutex;
2215
2216 if (TestSetPageHWPoison(p)) {
2217 res = -EHWPOISON;
2218 if (flags & MF_ACTION_REQUIRED)
2219 res = kill_accessing_process(current, pfn, flags);
2220 if (flags & MF_COUNT_INCREASED)
2221 put_page(p);
2222 action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);
2223 goto unlock_mutex;
2224 }
2225
2226 /*
2227 * We need/can do nothing about count=0 pages.
2228 * 1) it's a free page, and therefore in safe hand:
2229 * check_new_page() will be the gate keeper.
2230 * 2) it's part of a non-compound high order page.
2231 * Implies some kernel user: cannot stop them from
2232 * R/W the page; let's pray that the page has been
2233 * used and will be freed some time later.
2234 * In fact it's dangerous to directly bump up page count from 0,
2235 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
2236 */
2237 if (!(flags & MF_COUNT_INCREASED)) {
2238 res = get_hwpoison_page(p, flags);
2239 if (!res) {
2240 if (is_free_buddy_page(p)) {
2241 if (take_page_off_buddy(p)) {
2242 page_ref_inc(p);
2243 res = MF_RECOVERED;
2244 } else {
2245 /* We lost the race, try again */
2246 if (retry) {
2247 ClearPageHWPoison(p);
2248 retry = false;
2249 goto try_again;
2250 }
2251 res = MF_FAILED;
2252 }
2253 res = action_result(pfn, MF_MSG_BUDDY, res);
2254 } else {
2255 res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
2256 }
2257 goto unlock_mutex;
2258 } else if (res < 0) {
2259 res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
2260 goto unlock_mutex;
2261 }
2262 }
2263
2264 folio = page_folio(p);
2265
2266 /* filter pages that are protected from hwpoison test by users */
2267 folio_lock(folio);
2268 if (hwpoison_filter(p)) {
2269 ClearPageHWPoison(p);
2270 folio_unlock(folio);
2271 folio_put(folio);
2272 res = -EOPNOTSUPP;
2273 goto unlock_mutex;
2274 }
2275 folio_unlock(folio);
2276
2277 if (folio_test_large(folio)) {
> 2278 int new_order = min_order_for_split(folio);
2279 /*
2280 * The flag must be set after the refcount is bumped
2281 * otherwise it may race with THP split.
2282 * And the flag can't be set in get_hwpoison_page() since
2283 * it is called by soft offline too and it is just called
2284 * for !MF_COUNT_INCREASED. So here seems to be the best
2285 * place.
2286 *
2287 * Don't need care about the above error handling paths for
2288 * get_hwpoison_page() since they handle either free page
2289 * or unhandlable page. The refcount is bumped iff the
2290 * page is a valid handlable page.
2291 */
2292 folio_set_has_hwpoisoned(folio);
2293 /*
2294 * If the folio cannot be split to order-0, kill the process,
2295 * but split the folio anyway to minimize the amount of unusable
2296 * pages.
2297 */
2298 if (try_to_split_thp_page(p, new_order, false) || new_order) {
2299 /* get folio again in case the original one is split */
2300 folio = page_folio(p);
2301 res = -EHWPOISON;
2302 kill_procs_now(p, pfn, flags, folio);
2303 put_page(p);
2304 action_result(pfn, MF_MSG_UNSPLIT_THP, MF_FAILED);
2305 goto unlock_mutex;
2306 }
2307 VM_BUG_ON_PAGE(!page_count(p), p);
2308 folio = page_folio(p);
2309 }
2310
2311 /*
2312 * We ignore non-LRU pages for good reasons.
2313 * - PG_locked is only well defined for LRU pages and a few others
2314 * - to avoid races with __SetPageLocked()
2315 * - to avoid races with __SetPageSlab*() (and more non-atomic ops)
2316 * The check (unnecessarily) ignores LRU pages being isolated and
2317 * walked by the page reclaim code, however that's not a big loss.
2318 */
2319 shake_folio(folio);
2320
2321 folio_lock(folio);
2322
2323 /*
2324 * We're only intended to deal with the non-Compound page here.
2325 * The page cannot become compound pages again as folio has been
2326 * splited and extra refcnt is held.
2327 */
2328 WARN_ON(folio_test_large(folio));
2329
2330 /*
2331 * We use page flags to determine what action should be taken, but
2332 * the flags can be modified by the error containment action. One
2333 * example is an mlocked page, where PG_mlocked is cleared by
2334 * folio_remove_rmap_*() in try_to_unmap_one(). So to determine page
2335 * status correctly, we save a copy of the page flags at this time.
2336 */
2337 page_flags = folio->flags.f;
2338
2339 /*
2340 * __munlock_folio() may clear a writeback folio's LRU flag without
2341 * the folio lock. We need to wait for writeback completion for this
2342 * folio or it may trigger a vfs BUG while evicting inode.
2343 */
2344 if (!folio_test_lru(folio) && !folio_test_writeback(folio))
2345 goto identify_page_state;
2346
2347 /*
2348 * It's very difficult to mess with pages currently under IO
2349 * and in many cases impossible, so we just avoid it here.
2350 */
2351 folio_wait_writeback(folio);
2352
2353 /*
2354 * Now take care of user space mappings.
2355 * Abort on fail: __filemap_remove_folio() assumes unmapped page.
2356 */
2357 if (!hwpoison_user_mappings(folio, p, pfn, flags)) {
2358 res = action_result(pfn, MF_MSG_UNMAP_FAILED, MF_FAILED);
2359 goto unlock_page;
2360 }
2361
2362 /*
2363 * Torn down by someone else?
2364 */
2365 if (folio_test_lru(folio) && !folio_test_swapcache(folio) &&
2366 folio->mapping == NULL) {
2367 res = action_result(pfn, MF_MSG_TRUNCATED_LRU, MF_IGNORED);
2368 goto unlock_page;
2369 }
2370
2371 identify_page_state:
2372 res = identify_page_state(pfn, p, page_flags);
2373 mutex_unlock(&mf_mutex);
2374 return res;
2375 unlock_page:
2376 folio_unlock(folio);
2377 unlock_mutex:
2378 mutex_unlock(&mf_mutex);
2379 return res;
2380 }
2381 EXPORT_SYMBOL_GPL(memory_failure);
2382

Wei Yang

unread,

Oct 11, 2025, 8:41:56 PM (9 hours ago) Oct 11

to Zi Yan, linm...@huawei.com, da...@redhat.com, jane...@oracle.com, ker...@pankajraghav.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org

On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:

It is better to update the document of try_folio_split()

> static inline int try_folio_split(struct folio *folio, struct page *page,
>- struct list_head *list)
>+ struct list_head *list, unsigned int order)
> {
>- int ret = min_order_for_split(folio);
>-
>- if (ret < 0)
>- return ret;
>-
>- if (!non_uniform_split_supported(folio, 0, false))
>+ if (!non_uniform_split_supported(folio, order, false))
> return split_huge_page_to_list_to_order(&folio->page, list,
>- ret);
>- return folio_split(folio, ret, page, list);
>+ order);
>+ return folio_split(folio, order, page, list);
> }

--
Wei Yang
Help you, Help me

Pankaj Raghav (Samsung)

unread,

4:24 AM (2 hours ago) 4:24 AM

to Zi Yan, linm...@huawei.com, da...@redhat.com, jane...@oracle.com, syzbot+e6367e...@syzkaller.appspotmail.com, syzkall...@googlegroups.com, ak...@linux-foundation.org, mcg...@kernel.org, nao.ho...@gmail.com, Lorenzo Stoakes, Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang, Matthew Wilcox (Oracle), linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org

On Fri, Oct 10, 2025 at 01:39:05PM -0400, Zi Yan wrote:

> Page cache folios from a file system that support large block size (LBS)
> can have minimal folio order greater than 0, thus a high order folio might
> not be able to be split down to order-0. Commit e220917fa507 ("mm: split a
> folio in minimum folio order chunks") bumps the target order of
> split_huge_page*() to the minimum allowed order when splitting a LBS folio.
> This causes confusion for some split_huge_page*() callers like memory
> failure handling code, since they expect after-split folios all have
> order-0 when split succeeds but in really get min_order_for_split() order
> folios.
>
> Fix it by failing a split if the folio cannot be split to the target order.
>
> Fixes: e220917fa507 ("mm: split a folio in minimum folio order chunks")
> [The test poisons LBS folios, which cannot be split to order-0 folios, and
> also tries to poison all memory. The non split LBS folios take more memory
> than the test anticipated, leading to OOM. The patch fixed the kernel
> warning and the test needs some change to avoid OOM.]
> Reported-by: syzbot+e6367e...@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68d2c943.a70a022...@google.com/
> Signed-off-by: Zi Yan <z...@nvidia.com>
> ---

LGTM with the suggested changes to the !CONFIG_THP try_folio_split().

Reviewed-by: Pankaj Raghav <p.ra...@samsung.com>

Reply all

Reply to author

Forward