Hi Harry,
I see you've been debugging:
KASAN: slab-use-after-free Read in folio_remove_rmap_ptes
https://lore.kernel.org/all/694e3dc6.050a022...@google.com/T/
Can that bug be caused by this data race?
Below is an explanation by Gemini LLM as to why this race is harmful.
Obviously take it with a grain of salt, but with my limited mm
knowledge it does not look immediately wrong (re rmap invariant).
However, now digging into details I see that this Lorenzo's patch
also marked as fixing "KASAN: slab-use-after-free Read in
folio_remove_rmap_ptes":
mm/vma: fix anon_vma UAF on mremap() faulted, unfaulted merge
https://lore.kernel.org/all/b7930ad2b1503a657e29fe928eb33061d7e...@oracle.com/T/
So perhaps the race is still benign (or points to another issue?)
Here is what LLM said about the race:
-----
The bug report is actionable and points to a harmful data race in the Linux
kernel's memory management subsystem, specifically in the handling of
anonymous `hugetlb` mappings.
**Analysis:**
1. **Race Location:** The data race occurs on the `vma->anon_vma` field
of a `struct vm_area_struct`.
* **Writer:** Task 13471 executes `__anon_vma_prepare` in `mm/rmap.c`.
This function initializes the `anon_vma` for a VMA. It holds
`mm->page_table_lock` and writes to `vma->anon_vma` (line 211 in the
viewed source, corresponding to the report's `mm/rmap.c:212` area).
* **Reader:** Task 13473 executes `__vmf_anon_prepare` in `mm/memory.c`.
This function is an optimization wrapper that checks if
`vma->anon_vma` is already set (line 3666/3667) to avoid the overhead
of `__anon_vma_prepare`. This check is performed **without** holding
`mm->page_table_lock`.
2. **Consistency:** The report is consistent. Both tasks are handling
`hugetlb` page faults (`hugetlb_no_page` -> `__vmf_anon_prepare`).
The writer stack shows it proceeded into `__anon_vma_prepare` (implying
`vma->anon_vma` was NULL initially), while the reader stack shows it
reading `vma->anon_vma`. The value change `0x0000000000000000 ->
0xffff888104ecca28` confirms initialization from NULL to a pointer.
3. **Harmfulness (Why it is not benign):**
* In `__anon_vma_prepare`, the code currently initializes
`vma->anon_vma` **before** linking the VMA to the `anon_vma`
structure via `anon_vma_chain_link`.
* ```c
vma->anon_vma = anon_vma;
anon_vma_chain_link(vma, avc, anon_vma);
```
* Because the reader (`__vmf_anon_prepare`) checks `vma->anon_vma`
locklessly, it can see the non-NULL value before `anon_vma_chain_link`
has completed (due to compiler/CPU reordering or simple preemption
between the two statements).
* If the reader proceeds, it assumes the `anon_vma` is fully ready.
It then maps a page and sets `folio->mapping = anon_vma`.
* However, if `anon_vma_chain_link` hasn't finished, the `anon_vma`
(specifically its interval tree) does not yet contain the entry for
this `vma`.
* This breaks the reverse mapping (rmap) invariant. If the kernel
subsequently tries to unmap or migrate this page (finding it via
`folio->mapping`), `rmap_walk` will fail to find the VMA in the
`anon_vma`'s interval tree. This can lead to pages being effectively
pinned, migration failures, or in worst-case scenarios (like memory
corruption handling or specific reclaim paths), logical errors where
a page is assumed unmapped when it is not.
4. **Fix:** The fix requires enforcing ordering. `vma->anon_vma` should
be set **after** `anon_vma_chain_link` is complete, and `smp_store_release`
/ `smp_load_acquire` (or equivalent barriers) should be used to ensure the
reader observes the fully initialized state.