[syzbot] [mm?] WARNING in __folio_rmap_sanity_checks

13 views
Skip to first unread message

syzbot

unread,
Jan 3, 2024, 6:48:22 AMJan 3
to ak...@linux-foundation.org, da...@redhat.com, fengw...@intel.com, linux-...@vger.kernel.org, linu...@kvack.org, ryan.r...@arm.com, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: ab0b3e6ef50d Add linux-next specific files for 20240102
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=17be3e09e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=a14a6350374945f9
dashboard link: https://syzkaller.appspot.com/bug?extid=50ef73537bbc393a25bb
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=14e2256ee80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17b57db5e80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/4e6376fe5764/disk-ab0b3e6e.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/7cb9ecbaf001/vmlinux-ab0b3e6e.xz
kernel image: https://storage.googleapis.com/syzbot-assets/2c1a9a6d424f/bzImage-ab0b3e6e.xz

The issue was bisected to:

commit 68f0320824fa59c5429cbc811e6c46e7a30ea32c
Author: David Hildenbrand <da...@redhat.com>
Date: Wed Dec 20 22:44:31 2023 +0000

mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10b9e1b1e80000
final oops: https://syzkaller.appspot.com/x/report.txt?x=12b9e1b1e80000
console output: https://syzkaller.appspot.com/x/log.txt?x=14b9e1b1e80000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+50ef73...@syzkaller.appspotmail.com
Fixes: 68f0320824fa ("mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()")

kasan_quarantine_reduce+0x18e/0x1d0 mm/kasan/quarantine.c:283
__kasan_slab_alloc+0x65/0x90 mm/kasan/common.c:324
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3813 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc+0x136/0x320 mm/slub.c:3867
vm_area_alloc+0x1f/0x220 kernel/fork.c:465
mmap_region+0x3ae/0x2a90 mm/mmap.c:2804
do_mmap+0x890/0xef0 mm/mmap.c:1379
vm_mmap_pgoff+0x1a7/0x3c0 mm/util.c:573
ksys_mmap_pgoff+0x421/0x5a0 mm/mmap.c:1425
__do_sys_mmap arch/x86/kernel/sys_x86_64.c:93 [inline]
__se_sys_mmap arch/x86/kernel/sys_x86_64.c:86 [inline]
__x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:86
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd0/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x62/0x6a
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5059 at include/linux/rmap.h:202 __folio_rmap_sanity_checks+0x4d5/0x630 include/linux/rmap.h:202
Modules linked in:
CPU: 1 PID: 5059 Comm: syz-executor115 Not tainted 6.7.0-rc8-next-20240102-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:__folio_rmap_sanity_checks+0x4d5/0x630 include/linux/rmap.h:202
Code: 41 83 e4 01 44 89 e6 e8 79 bc b7 ff 45 84 e4 0f 85 08 fc ff ff e8 3b c1 b7 ff 48 c7 c6 e0 b5 d9 8a 48 89 df e8 5c 12 f7 ff 90 <0f> 0b 90 e9 eb fb ff ff e8 1e c1 b7 ff be 01 00 00 00 48 89 df e8
RSP: 0018:ffffc900038df978 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffea00008cde00 RCX: ffffffff81687419
RDX: ffff88807becbb80 RSI: ffffffff81d06104 RDI: 0000000000000000
RBP: ffffea00008cde00 R08: 0000000000000000 R09: fffffbfff1e75f6a
R10: ffffffff8f3afb57 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000000 R15: dffffc0000000000
FS: 0000555556508380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200000c0 CR3: 0000000079000000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__folio_add_rmap mm/rmap.c:1167 [inline]
__folio_add_file_rmap mm/rmap.c:1452 [inline]
folio_add_file_rmap_ptes+0x8e/0x2c0 mm/rmap.c:1478
insert_page_into_pte_locked.isra.0+0x34d/0x960 mm/memory.c:1874
insert_page mm/memory.c:1900 [inline]
vm_insert_page+0x62c/0x8c0 mm/memory.c:2053
packet_mmap+0x314/0x570 net/packet/af_packet.c:4594
call_mmap include/linux/fs.h:2090 [inline]
mmap_region+0x745/0x2a90 mm/mmap.c:2819
do_mmap+0x890/0xef0 mm/mmap.c:1379
vm_mmap_pgoff+0x1a7/0x3c0 mm/util.c:573
ksys_mmap_pgoff+0x421/0x5a0 mm/mmap.c:1425
__do_sys_mmap arch/x86/kernel/sys_x86_64.c:93 [inline]
__se_sys_mmap arch/x86/kernel/sys_x86_64.c:86 [inline]
__x64_sys_mmap+0x125/0x190 arch/x86/kernel/sys_x86_64.c:86
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xd0/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x62/0x6a
RIP: 0033:0x7ff89df16329
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcbc3eb618 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
RAX: ffffffffffffffda RBX: 00007ffcbc3eb7f8 RCX: 00007ff89df16329
RDX: 0000000000000000 RSI: 0000000001000000 RDI: 0000000020568000
RBP: 00007ff89df89610 R08: 0000000000000003 R09: 0000000000000000
R10: 0000000000000011 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffcbc3eb7e8 R14: 0000000000000001 R15: 0000000000000001
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

David Hildenbrand

unread,
Jan 3, 2024, 7:13:40 AMJan 3
to syzbot, ak...@linux-foundation.org, fengw...@intel.com, linux-...@vger.kernel.org, linu...@kvack.org, ryan.r...@arm.com, syzkall...@googlegroups.com, Matthew Wilcox
If I am not wrong, that triggers:

VM_WARN_ON_FOLIO(folio_test_large(folio) &&
!folio_test_large_rmappable(folio), folio);

So we are trying to rmap a large folio that did not go through
folio_prep_large_rmappable().

net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed
in the "struct packet_ring_buffer". No idea where that comes from, but I
suspect it's simply some compound allocation.

Likely, rmap handling is completely bogus for these folios?

--
Cheers,

David / dhildenb

Yin, Fengwei

unread,
Jan 3, 2024, 9:16:28 AMJan 3
to David Hildenbrand, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, ryan.r...@arm.com, syzkall...@googlegroups.com, Matthew Wilcox
Looks like:
alloc_pg_vec
alloc_one_pg_vec_page
gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
__GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;

buffer = (char *) __get_free_pages(gfp_flags, order);
So you are right here... :).


Regards
Yin, Fengwei

David Hildenbrand

unread,
Jan 4, 2024, 4:36:50 PMJan 4
to Yin, Fengwei, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, ryan.r...@arm.com, syzkall...@googlegroups.com, Matthew Wilcox
Hm, but I wonder if this something that's supposed to work or is this
one of the cases where we should actually use a VM_PFN mapping?

It's not a pagecache(file/shmem) page after all.

We could relax that check and document why we expect something that is
not marked rmappable. But it fells wrong. I suspect this should be a
VM_PFNMAP instead (like recent udmabuf changes).

Yin Fengwei

unread,
Jan 4, 2024, 9:20:44 PMJan 4
to David Hildenbrand, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, ryan.r...@arm.com, syzkall...@googlegroups.com, Matthew Wilcox
VM_PFNMAP looks correct.

I do have another question: why do we just check the large folio
rmappable? Does that mean order0 folio is always rmappable?

I ask this because vm_insert_pages() is called in net/ipv4/tcp.c
and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP.

There is not warning because we didn't check order0 folio rmappable.


Regards
Yin, Fengwei

Yin Fengwei

unread,
Jan 5, 2024, 3:42:25 AMJan 5
to Ryan Roberts, David Hildenbrand, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, Matthew Wilcox


On 2024/1/5 16:14, Ryan Roberts wrote:
> Would someone mind explaining the rules to me for this? As far as I can see,
> folio_prep_large_rmappable() just inits the _deferred_list and sets a flag so we
> remember to deinit the list on destruction. Why can't we just init that list for
> all folios order-2 or greater? Then everything is rmappable?
>
>>>>>
>>>>> net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed
>>>>> in the "struct packet_ring_buffer". No idea where that comes from, but I
>>>>> suspect it's simply some compound allocation.
>>>> Looks like:
>>>>    alloc_pg_vec
>>>>      alloc_one_pg_vec_page
>>>>           gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
>>>>                             __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
>>>>
>>>>           buffer = (char *) __get_free_pages(gfp_flags, order);
>>>> So you are right here... :).
>>>
>>> Hm, but I wonder if this something that's supposed to work or is this one of
>>> the cases where we should actually use a VM_PFN mapping?
>>>
>>> It's not a pagecache(file/shmem) page after all.
>>>
>>> We could relax that check and document why we expect something that is not
>>> marked rmappable. But it fells wrong. I suspect this should be a VM_PFNMAP
>>> instead (like recent udmabuf changes).
>>
>> VM_PFNMAP looks correct.
>
> And why is making the folio rmappable and mapping it the normal way not the
> right solution here? Because the folio could be order-1? Or something more profound?
My understanding is order 1 could be one reason. Another thing I can
tell is the page here is not anonymous page or file-backed pagecache.
So it can't be large rmappable as David pointed out.

>
>>
>> I do have another question: why do we just check the large folio
>> rmappable? Does that mean order0 folio is always rmappable?
>>
>> I ask this because vm_insert_pages() is called in net/ipv4/tcp.c
>> and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP.
>>
>> There is not warning because we didn't check order0 folio rmappable.
Please ignore my question above. I messed the rmappable folio and
add_rmap().


Regards
Yin, Fengwei

>>
>>
>> Regards
>> Yin, Fengwei
>>
>

David Hildenbrand

unread,
Jan 5, 2024, 3:56:40 AMJan 5
to Ryan Roberts, Yin Fengwei, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, Matthew Wilcox
>>>>> If I am not wrong, that triggers:
>>>>>
>>>>> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>>>>>            !folio_test_large_rmappable(folio), folio);
>>>>>
>>>>> So we are trying to rmap a large folio that did not go through
>>>>> folio_prep_large_rmappable().
>
> Would someone mind explaining the rules to me for this? As far as I can see,
> folio_prep_large_rmappable() just inits the _deferred_list and sets a flag so we
> remember to deinit the list on destruction. Why can't we just init that list for
> all folios order-2 or greater? Then everything is rmappable?

I think we much rather want to look into moving all mapcount-related
stuff into folio_prep_large_rmappable(). It doesn't make any sense to
initialize that for any compound pages, especially the ones that will
never get mapped to user space.

>
>>>>>
>>>>> net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed
>>>>> in the "struct packet_ring_buffer". No idea where that comes from, but I
>>>>> suspect it's simply some compound allocation.
>>>> Looks like:
>>>>    alloc_pg_vec
>>>>      alloc_one_pg_vec_page
>>>>           gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
>>>>                             __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
>>>>
>>>>           buffer = (char *) __get_free_pages(gfp_flags, order);
>>>> So you are right here... :).
>>>
>>> Hm, but I wonder if this something that's supposed to work or is this one of
>>> the cases where we should actually use a VM_PFN mapping?
>>>
>>> It's not a pagecache(file/shmem) page after all.
>>>
>>> We could relax that check and document why we expect something that is not
>>> marked rmappable. But it fells wrong. I suspect this should be a VM_PFNMAP
>>> instead (like recent udmabuf changes).
>>
>> VM_PFNMAP looks correct.
>
> And why is making the folio rmappable and mapping it the normal way not the
> right solution here? Because the folio could be order-1? Or something more profound?
>

Think about it: we are adding/removing a page from rmap handling that
can *never* be looked up using the rmap because there is no rmap for
these pages, and folio->index is just completely unexpressive.
VM_MIXEDMAP doesn't have any linearity constraints.

Logically, it doesn't make any sense to involve rmap code although it
currently might work. validate_page_before_insert() blocks off most
pages where the order-0 mapcount would be used for other purposes and
everything would blow up.

Looking at vm_insert_page(), this interface is only for pages the caller
allocated. Maybe we should just not do any rmap accounting when
mapping/unmapping these pages: not involve any rmap code, including
mapcounts?

vm_normal_page() works on these mappings, so we'd also have to skip rmap
code when unmapping these pages etc. Maybe that's the whole reason we
have the rmap handling here: to not special-case the unmap path.

Alternatively, we can:

(1) Require the caller to make sure large folios are rmappable. We
already require allocations to be compound. Should be easy to add.
(2) Allow non-rmappable folios in rmap code just for mapcount tracking.
Confusing but possible.

>>
>> I do have another question: why do we just check the large folio
>> rmappable? Does that mean order0 folio is always rmappable?
>>

We didn't really have a check for that I believe. We simply reject all
pages in vm_insert_page() that are problematic because the pagecount is
overloaded.

>> I ask this because vm_insert_pages() is called in net/ipv4/tcp.c
>> and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP.

Right, similar problem.

Ryan Roberts

unread,
Jan 5, 2024, 5:58:58 AMJan 5
to Yin Fengwei, David Hildenbrand, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, Matthew Wilcox
On 05/01/2024 02:20, Yin Fengwei wrote:
>
>
Would someone mind explaining the rules to me for this? As far as I can see,
folio_prep_large_rmappable() just inits the _deferred_list and sets a flag so we
remember to deinit the list on destruction. Why can't we just init that list for
all folios order-2 or greater? Then everything is rmappable?

>>>>
>>>> net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed
>>>> in the "struct packet_ring_buffer". No idea where that comes from, but I
>>>> suspect it's simply some compound allocation.
>>> Looks like:
>>>    alloc_pg_vec
>>>      alloc_one_pg_vec_page
>>>           gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
>>>                             __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
>>>
>>>           buffer = (char *) __get_free_pages(gfp_flags, order);
>>> So you are right here... :).
>>
>> Hm, but I wonder if this something that's supposed to work or is this one of
>> the cases where we should actually use a VM_PFN mapping?
>>
>> It's not a pagecache(file/shmem) page after all.
>>
>> We could relax that check and document why we expect something that is not
>> marked rmappable. But it fells wrong. I suspect this should be a VM_PFNMAP
>> instead (like recent udmabuf changes).
>
> VM_PFNMAP looks correct.

And why is making the folio rmappable and mapping it the normal way not the
right solution here? Because the folio could be order-1? Or something more profound?

>

Ryan Roberts

unread,
Jan 5, 2024, 5:59:03 AMJan 5
to David Hildenbrand, Yin Fengwei, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, Matthew Wilcox
I guess I was assuming treating it the same way as anon folios. But I guess that
would be VeryBad (TM) because these aren't anon pages and we don't want to swap,
etc? OK got it.

>
> Logically, it doesn't make any sense to involve rmap code although it currently
> might work. validate_page_before_insert() blocks off most pages where the
> order-0 mapcount would be used for other purposes and everything would blow up.
>
> Looking at vm_insert_page(), this interface is only for pages the caller
> allocated. Maybe we should just not do any rmap accounting when
> mapping/unmapping these pages: not involve any rmap code, including mapcounts?
>
> vm_normal_page() works on these mappings, so we'd also have to skip rmap code
> when unmapping these pages etc. Maybe that's the whole reason we have the rmap
> handling here: to not special-case the unmap path.

Right. I guess it depends what vm_insert_page() is spec'ed to expect; is the bug
in the implementation or is the caller providing the wrong type of folio? I
guess there will be many callers providing non-rmappable folios (inc out of tree?).

>
> Alternatively, we can:
>
> (1) Require the caller to make sure large folios are rmappable. We
>     already require allocations to be compound. Should be easy to add.

I'm not sure this is practical given vm_insert_page() is exported?

Yin, Fengwei

unread,
Jan 5, 2024, 9:49:11 AMJan 5
to David Hildenbrand, Ryan Roberts, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, Matthew Wilcox
This is my understanding.

>
> vm_normal_page() works on these mappings, so we'd also have to skip rmap
> code when unmapping these pages etc. Maybe that's the whole reason we
> have the rmap handling here: to not special-case the unmap path.

vm_insert_page() will set VM_MIXEDMAP and vm_normal_page() will skip
the page if CONFIG_ARCH_HAS_PTE_SPECIAL is enabled (it's enabled for
x86). So the unmap path will skip these kind of folios?


Regards
Yin, Fengwei

David Hildenbrand

unread,
Jan 5, 2024, 10:45:19 AMJan 5
to Yin, Fengwei, Ryan Roberts, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, Matthew Wilcox
>> vm_normal_page() works on these mappings, so we'd also have to skip rmap
>> code when unmapping these pages etc. Maybe that's the whole reason we
>> have the rmap handling here: to not special-case the unmap path.
>
> vm_insert_page() will set VM_MIXEDMAP and vm_normal_page() will skip
> the page if CONFIG_ARCH_HAS_PTE_SPECIAL is enabled (it's enabled for
> x86). So the unmap path will skip these kind of folios?

I think we run into the
if (likely(!pte_special(pte)))
goto check_pfn;

first and return these folios. That also matches the comment of
vm_normal_page: "VM_MIXEDMAP mappings can likewise contain memory with
or without ... _all_ pages with a struct page (that is, those where
pfn_valid is true) are refcounted and considered normal pages by the VM."

David Hildenbrand

unread,
Jan 5, 2024, 11:05:29 AMJan 5
to Ryan Roberts, Yin Fengwei, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, Matthew Wilcox
>>
>> Think about it: we are adding/removing a page from rmap handling that can
>> *never* be looked up using the rmap because there is no rmap for these pages,
>> and folio->index is just completely unexpressive. VM_MIXEDMAP doesn't have any
>> linearity constraints.
>
> I guess I was assuming treating it the same way as anon folios. But I guess that
> would be VeryBad (TM) because these aren't anon pages and we don't want to swap,
> etc? OK got it.

Yes, there can't and never will be an rmap walk. No rmap chains are involved.
There is no rmap concept for these folios.

That's the ugly part about all this :)

>
>>
>> Logically, it doesn't make any sense to involve rmap code although it currently
>> might work. validate_page_before_insert() blocks off most pages where the
>> order-0 mapcount would be used for other purposes and everything would blow up.
>>
>> Looking at vm_insert_page(), this interface is only for pages the caller
>> allocated. Maybe we should just not do any rmap accounting when
>> mapping/unmapping these pages: not involve any rmap code, including mapcounts?
>>
>> vm_normal_page() works on these mappings, so we'd also have to skip rmap code
>> when unmapping these pages etc. Maybe that's the whole reason we have the rmap
>> handling here: to not special-case the unmap path.
>
> Right. I guess it depends what vm_insert_page() is spec'ed to expect; is the bug
> in the implementation or is the caller providing the wrong type of folio? I
> guess there will be many callers providing non-rmappable folios (inc out of tree?).

I would assume that *all* are providing non-pagecache pages. And if they would
be pagecache pages, likely it would be a BUG because the rmap does not work with
VM_MIXEDMAP and we'd be breaking folio unmapping etc. (we had such a BUG recently
with udmabuf that rightfully switched to VM_PFNMAP)

As vm_insert_page() documents:

"This allows drivers to insert individual pages they've allocated into a user vma."

>
>>
>> Alternatively, we can:
>>
>> (1) Require the caller to make sure large folios are rmappable. We
>>     already require allocations to be compound. Should be easy to add.
>
> I'm not sure this is practical given vm_insert_page() is exported?

We could fixup all in-tree users. out-of-tree have to fixup their stuff themselves.

>
>> (2) Allow non-rmappable folios in rmap code just for mapcount tracking.
>>     Confusing but possible.
>>
>>>>
>>>> I do have another question: why do we just check the large folio
>>>> rmappable? Does that mean order0 folio is always rmappable?
>>>>
>>
>> We didn't really have a check for that I believe. We simply reject all pages in
>> vm_insert_page() that are problematic because the pagecount is overloaded.
>>
>>>> I ask this because vm_insert_pages() is called in net/ipv4/tcp.c
>>>> and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP.

Just to comment on that: I think they don't want VM_PFNMAP for some reason.
For example, because GUP has to work on these pages.


To summarize my thoughts: Using "rmap" code for these folios doesn't make
any sense. mapping/unmapping these folios shouldn't call into rmap code.

If we want to adjust the mapcount/counters, we should mimic what rmap
map/unmap does separately, special-cases for VM_MIXEDMAP mappings.

For the time being, it's probably easiest to do the following:

From 9ff752e9b82991b55813dbf1088cf5b573577bdd Mon Sep 17 00:00:00 2001
From: David Hildenbrand <da...@redhat.com>
Date: Fri, 5 Jan 2024 16:57:29 +0100
Subject: [PATCH] mm/rmap: silence VM_WARN_ON_FOLIO() in
__folio_rmap_sanity_checks()

Unfortunately, vm_insert_page() and friends and up passing
driver-allocated folios into folio_add_file_rmap_pte() using
insert_page_into_pte_locked().

While these driver-allocated folios can be compound pages (large
folios), they are not proper "rmappable" folios.

In these VM_MIXEDMAP VMAs, there isn't really the concept of a reverse
mapping, so long-term, we should clean that up and not call into rmap
code.

For the time being, document how we can end up in rmap code with large
folios that are not marked rmappable.

Reported-by: syzbot+50ef73...@syzkaller.appspotmail.com
Fixes: 68f0320824fa ("mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()")
Signed-off-by: David Hildenbrand <da...@redhat.com>
---
include/linux/rmap.h | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index fd6fe16fa3583..b7944a833668a 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -199,8 +199,15 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio,
{
/* hugetlb folios are handled separately. */
VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
- VM_WARN_ON_FOLIO(folio_test_large(folio) &&
- !folio_test_large_rmappable(folio), folio);
+
+ /*
+ * TODO: we get driver-allocated folios that have nothing to do with
+ * the rmap using vm_insert_page(); therefore, we cannot assume that
+ * folio_test_large_rmappable() holds for large folios. We should
+ * handle any desired mapcount+stats accounting for these folios in
+ * VM_MIXEDMAP VMAs separately, and then sanity-check here that
+ * we really only get rmappable folios.
+ */

VM_WARN_ON_ONCE(nr_pages <= 0);
VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
--
2.43.0

Andrew Morton

unread,
Jan 5, 2024, 12:04:36 PMJan 5
to Yin, Fengwei, David Hildenbrand, syzbot, linux-...@vger.kernel.org, linu...@kvack.org, ryan.r...@arm.com, syzkall...@googlegroups.com, Matthew Wilcox, net...@vger.kernel.org
(cc netdev)
Can we suggest what af_packet should be doing to avoid this?

David Hildenbrand

unread,
Jan 5, 2024, 12:16:24 PMJan 5
to Andrew Morton, Yin, Fengwei, syzbot, linux-...@vger.kernel.org, linu...@kvack.org, ryan.r...@arm.com, syzkall...@googlegroups.com, Matthew Wilcox, net...@vger.kernel.org
I concluded that af_packet is doing the right thing and we have to
figure out how we want to handle that internally differently at some point.

I just shared a patch to revert that check for now, let me know if you
want an "officially" sent patch :)

Yin, Fengwei

unread,
Jan 5, 2024, 11:36:46 PMJan 5
to David Hildenbrand, Ryan Roberts, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, Matthew Wilcox
Oh. Yes. This is the path. Thanks a lot for pointing it out to me.


Regards
Yin, Fengwei
Reply all
Reply to author
Forward
0 new messages