[PATCH RFC v2] mm/shmem: set __GFP_SKIP_KASAN for swap_cluster_readahead

1 view
Skip to first unread message

Chia-I Wu via B4 Relay

unread,
May 20, 2026, 12:32:12 AMMay 20
to Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins, Baolin Wang, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Boris Brezillon, Chia-I Wu
From: Chia-I Wu <olv...@gmail.com>

swap_cluster_readahead can allocate folios for other mappings. If the
gfp flags do not have __GFP_SKIP_KASAN, but the other mappings have
PROT_MTE, we can end up with false KASAN errors such as

BUG: KASAN: invalid-access in swap_writepage+0xb0/0x21c
Read at addr f5ffff81aa71dff8 by task WM.task-4/6956
Pointer tag: [f5], memory tag: [f9]

In the above example, because __GFP_SKIP_KASAN was missing, KASAN set
both pointer tag and memory tag to 0xf5 when swap_cluster_readahead
allocated the folio. But the userspace had already set the memory tag to
0xf9 before swapped out. arch_swap_restore restored the memory tag back
to 0xf9, leading to the mismatch.

Signed-off-by: Chia-I Wu <olv...@gmail.com>
---
Changes in v2:
- set __GFP_SKIP_KASAN for shmem instead of drm/panthor
- Link to v1: https://patch.msgid.link/20260512-panthor-kas...@gmail.com
---
mm/shmem.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 3b5dc21b323c2..db9130a8c5b76 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1784,6 +1784,11 @@ static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp,
pgoff_t ilx;
struct folio *folio;

+ /* swap_cluster_readahead might cross the mapping boundary and
+ * allocate pages for other mappings. We have to skip KASAN.
+ */
+ gfp |= __GFP_SKIP_KASAN;
+
mpol = shmem_get_pgoff_policy(info, index, 0, &ilx);
folio = swap_cluster_readahead(swap, gfp, mpol, ilx);
mpol_cond_put(mpol);

---
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
change-id: 20260512-panthor-kasan-10477239bad1

Best regards,
--
Chia-I Wu <olv...@gmail.com>


Baolin Wang

unread,
May 20, 2026, 6:04:18 AMMay 20
to olv...@gmail.com, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins, Kairui Song, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Boris Brezillon
CC Kairui,
If we force __GFP_SKIP_KASAN, would this cause issues for mappings that
explicitly should NOT have the flag? and your v1 link already mentions
this scenario.

Additionally, I'm wondering if we could use shmem_should_replace_folio()
to detect such cases where shmem is being prematurely swapped in with
incorrect GFP flags (e.g.: __GFP_SKIP_KASAN), and then handle it through
shmem_replace_folio()?

Chia-I Wu

unread,
May 20, 2026, 1:06:23 PMMay 20
to Baolin Wang, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins, Kairui Song, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Boris Brezillon
We lose the benefits of kasan hw tags (other modes are not affected)
by forcing the flag.

The other mappings swap_cluster_readahead can affect are anon
mappings, regular shmem mappings, or gpu shmem mappings. I think only
gpu shmem mappings miss __GFP_SKIP_KASAN. That might not even be
intentional, because gpu shmem mappings pick GFP_HIGHUSER over
GFP_HIGHUSER_MOVABLE to avoid __GFP_MOVABLE. That was before
__GFP_SKIP_KASAN was added to GFP_HIGHUSER_MOVABLE.

I guess what I am trying to say is these are all user pages. We have
to skip kasan when user pages can be mapped PROT_MTE. The
justification for gpu shmem mappings is that they cannot be mapped
PROT_MTE. But if readahead can affect non-gpu shmem mappings, it seems
we have to either force __GFP_SKIP_KASAN or to cap/disable readahead.


>
> Additionally, I'm wondering if we could use shmem_should_replace_folio()
> to detect such cases where shmem is being prematurely swapped in with
> incorrect GFP flags (e.g.: __GFP_SKIP_KASAN), and then handle it through
> shmem_replace_folio()?
I don't know if we want to impose a copy for the benefits. More
importantly, this only helps shmem mappings but not anon mappings.

Baolin Wang

unread,
May 21, 2026, 3:05:28 AMMay 21
to Chia-I Wu, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins, Kairui Song, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org, Boris Brezillon
It sounds like the right approach would be to explicitly set
__GFP_SKIP_KASAN for GPU shmem mappings, no? I think having users
explicitly set __GFP_SKIP_KASAN makes the implications clearer than
having shmem core set it implicitly.

We could also consider adding a VM_WARN in shmem_swapin_cluster() to
detect any mappings missing the __GFP_SKIP_KASAN flag.

> I guess what I am trying to say is these are all user pages. We have
> to skip kasan when user pages can be mapped PROT_MTE. The

Yes, regular shmem mappings typically default to GFP_HIGHUSER_MOVABLE,
while GPU shmem mappings are a special case.

Boris Brezillon

unread,
May 21, 2026, 4:51:56 AMMay 21
to Baolin Wang, Chia-I Wu, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins, Kairui Song, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org
It's a bit of a shame that we have to explicitly set this
__GFP_SKIP_KASAN flag when we select GFP_HIGHUSER though (means a lot
of patching to do in drivers/gpu/drm/ basically, because basically
every driver relying on shmem for its buffer allocation uses this flag).

Also, it feels like KASAN poisoning for these pages would be interesting
to have since we know we won't allow MTE_PROT on userspace mappings
anyway. Oh, and some buffers might even be kernel only (no mmap()
allowed), which makes them even better candidates for poisoning.

>
> We could also consider adding a VM_WARN in shmem_swapin_cluster() to
> detect any mappings missing the __GFP_SKIP_KASAN flag.

If the general consensus is that all shmem-backed allocation must have
__GFP_SKIP_KASAN, yes, it'd make sense to add a VM_WARN.

>
> > I guess what I am trying to say is these are all user pages. We have
> > to skip kasan when user pages can be mapped PROT_MTE. The
>
> Yes, regular shmem mappings typically default to GFP_HIGHUSER_MOVABLE,
> while GPU shmem mappings are a special case.

They are not that special, they are just not MOVABLE because the GPU
might also access the same pages under the hood. If it's assumed that
any page being exposed through mmap() must have __GFP_SKIP_KASAN, why
does GFP_HIGHUSER not have that flag too?

>
> > justification for gpu shmem mappings is that they cannot be mapped
> > PROT_MTE. But if readahead can affect non-gpu shmem mappings, it seems
> > we have to either force __GFP_SKIP_KASAN or to cap/disable readahead.

I'm no MM expert, so it's probably me not understanding how this
swap-readahead logic is supposed to work, but the whole idea of using
different flags from those that were requested by the f_mapping seems
fragile. I mean, this comments proves [1] it's not the first time the
problem is considered, and I'm wondering why __GFP_SKIP_KASAN should be
treated differently from zones. Yes, that's an extra copy if the
SKIP_KASAN flags don't match but the zones do, but in practice, won't
we have GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE in different zones? Or is
the problem that, even with a copy, it's already too late to restore
the flags because they been overwritten during kazan unpoisoning?

[1]https://elixir.bootlin.com/linux/v7.0.9/source/mm/shmem.c#L2112

Chia-I Wu

unread,
May 21, 2026, 11:50:02 AMMay 21
to Boris Brezillon, Baolin Wang, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins, Kairui Song, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org
It is also about whether PROT_MTE is allowed. This becomes a problem
when both kernel and userspace want to modify the tags stored in MTE.

Another way to achieve the same effect as this patch, but is more
explicit, is to have

#define GFP_HIGHUSER_SWAPPABLE (GFP_HIGHUSER | __GFP_SKIP_KASAN)
#define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER_SWAPPABLE | __GFP_MOVABLE)

GPU drivers that can swap should use GFP_HIGHUSER_SWAPPABLE. shmem
core can warn about missing __GFP_SKIP_KASAN.

>
> >
> > > justification for gpu shmem mappings is that they cannot be mapped
> > > PROT_MTE. But if readahead can affect non-gpu shmem mappings, it seems
> > > we have to either force __GFP_SKIP_KASAN or to cap/disable readahead.
>
> I'm no MM expert, so it's probably me not understanding how this
> swap-readahead logic is supposed to work, but the whole idea of using
> different flags from those that were requested by the f_mapping seems
> fragile. I mean, this comments proves [1] it's not the first time the
> problem is considered, and I'm wondering why __GFP_SKIP_KASAN should be
> treated differently from zones. Yes, that's an extra copy if the
> SKIP_KASAN flags don't match but the zones do, but in practice, won't
> we have GFP_HIGHUSER and GFP_HIGHUSER_MOVABLE in different zones? Or is
> the problem that, even with a copy, it's already too late to restore
> the flags because they been overwritten during kazan unpoisoning?
arch_swap_restore is called just before shmem_replace_folio. It is a
bit too late right now but I guess it is fixable.

But shmem is not just a victim. It is also an offender to anon
mappings. We would need a similar replacement logic in do_swap_page
for anon mappings.

>
> [1]https://elixir.bootlin.com/linux/v7.0.9/source/mm/shmem.c#L2112

Chia-I Wu

unread,
May 21, 2026, 5:12:33 PMMay 21
to Boris Brezillon, Baolin Wang, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Vincenzo Frascino, Andrew Morton, Hugh Dickins, Kairui Song, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org
Come to think about it, that's not how things work.

Regular shmems and anon mappings set __GFP_SKIP_KASAN because they can
be mapped PROT_MTE. This calls page_kasan_tag_reset on the pages.

GPU shmems omit __GFP_SKIP_KASAN because they can't be mapped
PROT_MTE. This calls kasan_unpoison_pages on the pages.

With swap readahead, no one can expect the right function is called
anymore. The question is can we detect the mismatch and call
page_kasan_tag_reset/kasan_unpoison_pages to make things right again
in places such as do_swap_page and shmem_swapin_folio?

>
> >
> > [1]https://elixir.bootlin.com/linux/v7.0.9/source/mm/shmem.c#L2112
Reply all
Reply to author
Forward
0 new messages