Deepanshu Kartikey <
karti...@gmail.com> writes:
> On Wed, Feb 4, 2026 at 4:21 AM Ackerley Tng <
acker...@google.com> wrote:
>>
>> #syz test: git://
git.kernel.org/pub/scm/virt/kvm/kvm.git next
>>
>> filemap_{grab,get}_folio() and related functions, used since the early
>> stages of guest_memfd have determined the order of the folio to be
>> allocated by looking up mapping_min_folio_order(mapping). As identified by
>> syzbot, MADV_HUGEPAGE can be used to set the result of
>> mapping_min_folio_order() to a value greater than 0, leading to the
I was wrong here, MADV_HUGEPAGE does not actually update mapping->flags
AFAICT, so it doesn't update the result of mapping_min_folio_order().
MADV_HUGEPAGE only operates on the VMA and doesn't update the mapping's
min or max order, which is a inode/mapping property.
>> allocation of a huge page and subsequent WARNing.
>>
>> Refactor the allocation code of guest_memfd to directly use
>> filemap_add_folio(), specifying an order of 0.
>>
This refactoring is not actually required, since IIUC guest_memfd never
tries to update mapping->flags, and so mapping_min_folio_order() and
mapping_max_folio_order() return the default of 0.
Thank you for working on this bug too!
> and have a concern about the fast-path in your patch.
>
> In kvm_gmem_get_folio(), the fast-path returns any existing folio from
> the page cache without checking if it's a large folio:
>
> folio = __filemap_get_folio(inode->i_mapping, index,
> FGP_LOCK | FGP_ACCESSED, 0);
> if (!IS_ERR(folio))
> return folio; // <-- No size check here
>
> This means if a large folio was previously allocated (e.g., via
This is true, but I tried the above patch because back then I believed
that the filemap_add_folio() within the original
__filemap_get_folio_mpol() was the only place where folios get added to
the filemap.
I'm trying out another patch locally to disable khugepaged. I believe
the issue is that when MADV_HUGEPAGE is used on a guest_memfd vma, it
indirectly enabled khugepaged to work on guest_memfd folios, which we
don't want anyway.
I'm guessing now that the root cause is to disable khugepaged for
guest_memfd, and I will be trying out a few options through the rest of
today. My first thought was to set VM_NO_KHUGEPAGED (semantically
suitable), but looks like it's triggering some hugetlb-related
weirdness. I'm going to try VM_DONTEXPAND next.
Other notes: I trimmed the repro down by disabling calls 4, 5, 6, 9,
those are not necessary for repro. Calls 0, 1, 2, 3 is part of a regular
usage pattern of guest_memfd, so the uncommon usages are call 7
(MADV_HUGEPAGE) (likely culprit), or call 8. I believe call 8 is just
the trigger, since mlock() actually faults in the page to userspace
through gup.
> madvise(MADV_HUGEPAGE)), subsequent faults will find and return it
madvise(MADV_HUGEPAGE) does not allocate folios, it only marks the VMA
to allow huge pages
> from the fast-path, still triggering the WARN_ON_ONCE at line 416 in
> kvm_gmem_fault_user_mapping().
>
> The issue is that while your patch prevents *new* large folio
> allocations by hardcoding order=0 in filemap_alloc_folio(), it doesn't
> handle large folios that already exist in the page cache.
>
> Shouldn't we add a check for folio_test_large() on both the fast-path
> and slow-path to ensure we reject large folios regardless of how they
> were allocated? Something like:
>
> folio = __filemap_get_folio(...);
> if (!IS_ERR(folio))
> goto check_folio;
> // ... allocation code ...
> check_folio:
> if (folio_test_large(folio)) {
> folio_unlock(folio);
> folio_put(folio);
> return ERR_PTR(-E2BIG);
I saw your patch, I feel that returning -E2BIG doesn't address the root
cause of the issue.