[syzbot] [mm?] WARNING in do_wp_page

4 views
Skip to first unread message

syzbot

unread,
Apr 12, 2025, 2:46:24 PM4/12/25
to ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 0af2f6be1b42 Linux 6.15-rc1
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1766323f980000
kernel config: https://syzkaller.appspot.com/x/.config?x=f175b153b655dbb3
dashboard link: https://syzkaller.appspot.com/bug?extid=5e8feb543ca8e12e0ede
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/f1d71d1bf77d/disk-0af2f6be.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/7f1638f065da/vmlinux-0af2f6be.xz
kernel image: https://storage.googleapis.com/syzbot-assets/9b3e49834705/bzImage-0af2f6be.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+5e8feb...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 7165 at mm/memory.c:3738 __wp_can_reuse_large_anon_folio mm/memory.c:3738 [inline]
WARNING: CPU: 0 PID: 7165 at mm/memory.c:3738 wp_can_reuse_anon_folio mm/memory.c:3788 [inline]
WARNING: CPU: 0 PID: 7165 at mm/memory.c:3738 do_wp_page+0x4c62/0x59f0 mm/memory.c:3918
Modules linked in:
CPU: 0 UID: 0 PID: 7165 Comm: syz.3.280 Not tainted 6.15.0-rc1-syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
RIP: 0010:__wp_can_reuse_large_anon_folio mm/memory.c:3738 [inline]
RIP: 0010:wp_can_reuse_anon_folio mm/memory.c:3788 [inline]
RIP: 0010:do_wp_page+0x4c62/0x59f0 mm/memory.c:3918
Code: 48 89 ef e8 50 c3 ea ff e9 62 b8 ff ff e8 c6 e0 b4 ff 48 c7 c6 20 43 9b 8b 4c 89 e7 e8 f7 a0 fc ff 90 0f 0b e8 af e0 b4 ff 90 <0f> 0b 90 e9 df ed ff ff e8 a1 e0 b4 ff 48 c7 c6 60 46 9b 8b 48 89
RSP: 0018:ffffc900039f77e0 EFLAGS: 00010287
RAX: 0000000000041ec0 RBX: ffffc900039f7a00 RCX: ffffc9000d0c6000
RDX: 0000000000080000 RSI: ffffffff82065c61 RDI: 0000000000000005
RBP: ffffea0001320000 R08: 0000000000000005 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888012935dc0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
FS: 00007f57215806c0(0000) GS:ffff8881249b9000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000020000001e000 CR3: 000000006cd88000 CR4: 0000000000350ef0
Call Trace:
<TASK>
handle_pte_fault mm/memory.c:6013 [inline]
__handle_mm_fault+0x1ada/0x2a40 mm/memory.c:6140
handle_mm_fault+0x3fe/0xad0 mm/memory.c:6309
faultin_page mm/gup.c:1193 [inline]
__get_user_pages+0x771/0x36f0 mm/gup.c:1491
populate_vma_page_range+0x278/0x3a0 mm/gup.c:1929
__mm_populate+0x1d8/0x380 mm/gup.c:2032
do_mlock+0x448/0x810 mm/mlock.c:655
__do_sys_mlock mm/mlock.c:663 [inline]
__se_sys_mlock mm/mlock.c:661 [inline]
__x64_sys_mlock+0x59/0x80 mm/mlock.c:661
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x260 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f572078d169
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5721580038 EFLAGS: 00000246 ORIG_RAX: 0000000000000095
RAX: ffffffffffffffda RBX: 00007f57209a5fa0 RCX: 00007f572078d169
RDX: 0000000000000000 RSI: 0000000000800000 RDI: 0000200000000000
RBP: 00007f572080e2a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f57209a5fa0 R15: 00007fffd504a2f8
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

David Hildenbrand

unread,
Apr 13, 2025, 4:20:24 PM4/13/25
to syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
On 12.04.25 20:46, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:

Related to my recent changes

>
> HEAD commit: 0af2f6be1b42 Linux 6.15-rc1
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1766323f980000
> kernel config: https://syzkaller.appspot.com/x/.config?x=f175b153b655dbb3

CONFIG_ARCH_WANTS_THP_SWAP=y
CONFIG_MM_ID=y
CONFIG_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS is not set
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
# CONFIG_TRANSPARENT_HUGEPAGE_NEVER is not set
CONFIG_THP_SWAP=y
CONFIG_READ_ONLY_THP_FOR_FS=y
# CONFIG_NO_PAGE_MAPCOUNT is not set
CONFIG_PAGE_MAPCOUNT=y

> dashboard link: https://syzkaller.appspot.com/bug?extid=5e8feb543ca8e12e0ede
> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/f1d71d1bf77d/disk-0af2f6be.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/7f1638f065da/vmlinux-0af2f6be.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/9b3e49834705/bzImage-0af2f6be.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+5e8feb...@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 7165 at mm/memory.c:3738 __wp_can_reuse_large_anon_folio mm/memory.c:3738 [inline]

VM_WARN_ON_ONCE(folio_entire_mapcount(folio));

Which is rather unexpected. I know we had a scenario (remapping a THP?)
where we would have a PMD mapping and a PTE mapping of an exclusive anon
folio for a very short time. But, IIRC locking should make sure that
that cannot be observed by some other page table walker.

Unfortunately o reproducer. I'll do some digging ...
During do_mlock() we should be holding the mmap lock in write mode I
assume once we reach do_wp_page.

--
Cheers,

David / dhildenb

David Hildenbrand

unread,
Apr 13, 2025, 4:45:16 PM4/13/25
to syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Ah, it likely is a (harless) race, when process A and process B
cow-share a PMD THP, and process A write-faults on a PTE mapping of the
THP while process B concurrently unmaps the PMD mapping of the THP.

In __folio_remove_rmap(), for RMAP_LEVEL_PMD in case of
CONFIG_PAGE_MAPCOUNT=y, we'll do

folio_dec_large_mapcount(folio, vma);
last = atomic_add_negative(-1, &folio->_entire_mapcount);

So after decrementing the large mapcount, the folio will be indicated as
"exclusive" to process A.

Process B, still has to decrement the entire mapcount, but process A
might already run into the entire_mapcount sanity check.


In do_wp_page(), we'd later fail the "folio_large_mapcount(folio) !=
folio_ref_count(folio)" test until process B is completely done with
unmapping the folio.

Maybe we should just move these sanity checks after the refcount check,
or reverse the mapcount decrement order. I'll think about that.
Reply all
Reply to author
Forward
0 new messages