[syzbot] [arm?] WARNING in copy_highpage

1 view
Skip to first unread message

syzbot

unread,
Oct 1, 2025, 5:48:33 PMĀ (6 days ago)Ā Oct 1
to catalin...@arm.com, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, wi...@kernel.org
Hello,

syzbot found the following issue on:

HEAD commit: fec734e8d564 Merge tag 'riscv-for-linus-v6.17-rc8' of git:..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12187d34580000
kernel config: https://syzkaller.appspot.com/x/.config?x=13bd892ec3b155a2
dashboard link: https://syzkaller.appspot.com/bug?extid=d1974fc28545a3e6218b
compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: arm64

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/fa3fbcfdac58/non_bootable_disk-fec734e8.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/d7e18b408aea/vmlinux-fec734e8.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9b7984f47117/Image-fec734e8.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d1974f...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline]
WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
Modules linked in:
CPU: 1 UID: 0 PID: 25189 Comm: syz.2.7336 Not tainted syzkaller #0 PREEMPT
Hardware name: linux,dummy-virt (DT)
pstate: 00402009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55
lr : copy_highpage+0xb4/0x334 arch/arm64/mm/copypage.c:25
sp : ffff800088053940
x29: ffff800088053940 x28: ffffc1ffc0acf800 x27: ffff800088053b10
x26: ffffc1ffc0acf808 x25: ffffc1ffc037b1c0 x24: ffffc1ffc037b1c0
x23: ffffc1ffc0acf800 x22: ffffc1ffc0acf800 x21: fff000002b3e0000
x20: fff000000dec7000 x19: ffffc1ffc037b1c0 x18: 0000000000000000
x17: fff07ffffcffa000 x16: ffff800080008000 x15: 0000000000000001
x14: 0000000000000000 x13: 0000000000000003 x12: 000000000006d9ad
x11: 0000000000000000 x10: 0000000000000010 x9 : 0000000000000000
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : ffff800088053b18 x4 : ffff80008032df94 x3 : 00000000ff000000
x2 : 01ffc00003000001 x1 : 01ffc00003000001 x0 : 01ffc00003000001
Call trace:
try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline] (P)
copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55 (P)
copy_mc_highpage include/linux/highmem.h:383 [inline]
folio_mc_copy+0x44/0x6c mm/util.c:740
__migrate_folio.constprop.0+0xc4/0x23c mm/migrate.c:851
migrate_folio+0x1c/0x2c mm/migrate.c:882
move_to_new_folio+0x58/0x144 mm/migrate.c:1097
migrate_folio_move mm/migrate.c:1370 [inline]
migrate_folios_move mm/migrate.c:1719 [inline]
migrate_pages_batch+0xaf4/0x1024 mm/migrate.c:1966
migrate_pages_sync mm/migrate.c:2023 [inline]
migrate_pages+0xb9c/0xcdc mm/migrate.c:2105
do_mbind+0x20c/0x4a4 mm/mempolicy.c:1539
kernel_mbind mm/mempolicy.c:1682 [inline]
__do_sys_mbind mm/mempolicy.c:1756 [inline]
__se_sys_mbind mm/mempolicy.c:1752 [inline]
__arm64_sys_mbind+0xd0/0xd8 mm/mempolicy.c:1752
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall+0x48/0x110 arch/arm64/kernel/syscall.c:49
el0_svc_common.constprop.0+0x40/0xe0 arch/arm64/kernel/syscall.c:132
do_el0_svc+0x1c/0x28 arch/arm64/kernel/syscall.c:151
el0_svc+0x34/0x10c arch/arm64/kernel/entry-common.c:879
el0t_64_sync_handler+0xa0/0xe4 arch/arm64/kernel/entry-common.c:898
el0t_64_sync+0x1a4/0x1a8 arch/arm64/kernel/entry.S:596
---[ end trace 0000000000000000 ]---


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

David Hildenbrand

unread,
Oct 6, 2025, 3:55:34 AMĀ (yesterday)Ā Oct 6
to Catalin Marinas, syzbot, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, wi...@kernel.org
> I don't think we ever stressed MTE with mbind before. I have a suspicion
> this problem has been around for some time.
>
> My reading of do_mbind() is that it ends up allocating pages for
> migrating into via alloc_migration_target_by_mpol() ->
> folio_alloc_mpol(). Pages returned should be untagged and uninitialised
> unless the PG_* flags have not been cleared on a prior free. Or
> migrate_pages_batch() somehow reuses some pages instead of reallocating.

Staring at __migrate_folio(), I assume we can end up successfully
calling folio_mc_copy(), but then failing in __folio_migrate_mapping().

Seems to be as easy as failing the folio_ref_freeze() in
__folio_migrate_mapping().

We return -EAGAIN in that case, making the caller retry, stumbling into
an already-tagged page. (with the same source / destination parameters)
IIRC)

So likely this is simply us re-doing the copy after a migration failed
after the copy.

Could it happen that we are calling it with a different
source/destination combination the second time? I don't think so, but I
am not 100% sure.

The most reliable way would be to un-tag in case folio_mc_copy succeeded
but __folio_migrate_mapping() failed.

I'm also wondering whether we can simply perform the copy after the
__folio_migrate_mapping() call: the src folio is locked and unmapped,
nobody can really modify it. Same for the dst folio.

--
Cheers

David / dhildenb

David Hildenbrand

unread,
Oct 6, 2025, 5:38:44 AMĀ (yesterday)Ā Oct 6
to Catalin Marinas, syzbot, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, wi...@kernel.org, Kefeng Wang

>
> The most reliable way would be to un-tag in case folio_mc_copy succeeded
> but __folio_migrate_mapping() failed.
>
> I'm also wondering whether we can simply perform the copy after the
> __folio_migrate_mapping() call: the src folio is locked and unmapped,
> nobody can really modify it. Same for the dst folio.

Answering that myself: obviously we don't want to fail after migrating
the mapping, that is more expensive to recover from.

And I think that also explains how commit 060913999d7a ("mm: migrate:
support poisoned recover from migrate folio") likely introduced the
issue by moving the copy.

CCing Kefeng

Catalin Marinas

unread,
Oct 6, 2025, 9:20:54 AMĀ (22 hours ago)Ā Oct 6
to David Hildenbrand, syzbot, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, wi...@kernel.org, Kefeng Wang
Thanks David. I can now see how it would retry on the same pages without
reallocating. At least we know it's not causing any side-effects, only
messing up the MTE safety warnings.

> The most reliable way would be to un-tag in case folio_mc_copy succeeded but
> __folio_migrate_mapping() failed.

Clearing an MTE specific flag in the core code doesn't look great. Also
going for some generic mask like PAGE_FLAGS_CHECK_AT_PREP may have
side-effects as we don't know where the page is coming from (we have
those get_new_folio()/put_new_folio() arguments passed on by higher up
callers).

I'm tempted to just drop the warning in the arm64 copy_highpage(),
replace it with a comment about migration retrying on a potentially
tagged page. It will have to override the tags each time (as it
currently does but also warns).

--
Catalin

Catalin Marinas

unread,
Oct 6, 2025, 9:20:54 AMĀ (22 hours ago)Ā Oct 6
to syzbot, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, wi...@kernel.org, David Hildenbrand
Thanks for the report (for some reason, outlook did not deliver this to
my inbox; Will pointed me at the message)

Adding David H as well, he may have some ideas. I haven't tried to
reproduce it yet.

On Wed, Oct 01, 2025 at 02:48:30PM -0700, syzbot wrote:
> syzbot found the following issue on:
>
> HEAD commit: fec734e8d564 Merge tag 'riscv-for-linus-v6.17-rc8' of git:..

So that's just before 6.17, not something that turned up during the
merging window.

> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12187d34580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=13bd892ec3b155a2
> dashboard link: https://syzkaller.appspot.com/bug?extid=d1974fc28545a3e6218b
> compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> userspace arch: arm64
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/fa3fbcfdac58/non_bootable_disk-fec734e8.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/d7e18b408aea/vmlinux-fec734e8.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/9b7984f47117/Image-fec734e8.gz.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d1974f...@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 try_page_mte_tagging arch/arm64/include/asm/mte.h:93 [inline]
> WARNING: CPU: 1 PID: 25189 at arch/arm64/mm/copypage.c:55 copy_highpage+0x150/0x334 arch/arm64/mm/copypage.c:55

This warning means that the destination page is already tagged
(PG_mte_tagged set) when it got to copy_page(). In general it is fine
as we copy into and override all the tags but my assumption until now
has been that such new pages are always untagged.
I don't think we ever stressed MTE with mbind before. I have a suspicion
this problem has been around for some time.

My reading of do_mbind() is that it ends up allocating pages for
migrating into via alloc_migration_target_by_mpol() ->
folio_alloc_mpol(). Pages returned should be untagged and uninitialised
unless the PG_* flags have not been cleared on a prior free. Or
migrate_pages_batch() somehow reuses some pages instead of reallocating.

--
Catalin

David Hildenbrand

unread,
Oct 6, 2025, 9:26:05 AMĀ (22 hours ago)Ā Oct 6
to Catalin Marinas, syzbot, linux-ar...@lists.infradead.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, wi...@kernel.org, Kefeng Wang
As long as the folio is not getting reused elsewhere, yes.

I haven't fully understood yet if there could be cases where we use the
folio for another source. But I think it's not trivially possible,
because I think we allocate dst folios based on source-folio properties
(order, node, zone, etc).

>
>> The most reliable way would be to un-tag in case folio_mc_copy succeeded but
>> __folio_migrate_mapping() failed.
>
> Clearing an MTE specific flag in the core code doesn't look great. Also
> going for some generic mask like PAGE_FLAGS_CHECK_AT_PREP may have
> side-effects as we don't know where the page is coming from (we have
> those get_new_folio()/put_new_folio() arguments passed on by higher up
> callers).

As an alternative, I would probably have done something like providing a
simple folio_mc_copy_abort().

>
> I'm tempted to just drop the warning in the arm64 copy_highpage(),
> replace it with a comment about migration retrying on a potentially
> tagged page. It will have to override the tags each time (as it
> currently does but also warns).

Works for me. Maybe we could warn if the tag would change, because I
think after we unmapped the folio during migration, the tag can no
longer change.
Reply all
Reply to author
Forward
0 new messages