[syzbot] [fs?] WARNING in pagemap_scan_pmd_entry

9 views
Skip to first unread message

syzbot

unread,
Nov 15, 2023, 9:40:31 AM11/15/23
to linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: c42d9eeef8e5 Merge tag 'hardening-v6.7-rc2' of git://git.k..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=13626650e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=84217b7fc4acdc59
dashboard link: https://syzkaller.appspot.com/bug?extid=e94c5aaf7890901ebf9b
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15d73be0e80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13670da8e80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/a595d90eb9af/disk-c42d9eee.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/c1e726fedb94/vmlinux-c42d9eee.xz
kernel image: https://storage.googleapis.com/syzbot-assets/cb43ae262d09/bzImage-c42d9eee.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+e94c5a...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 5071 at arch/x86/include/asm/pgtable.h:403 pte_uffd_wp arch/x86/include/asm/pgtable.h:403 [inline]
WARNING: CPU: 1 PID: 5071 at arch/x86/include/asm/pgtable.h:403 pagemap_scan_pmd_entry+0x1d27/0x23f0 fs/proc/task_mmu.c:2146
Modules linked in:
CPU: 1 PID: 5071 Comm: syz-executor182 Not tainted 6.7.0-rc1-syzkaller-00019-gc42d9eeef8e5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:pte_uffd_wp arch/x86/include/asm/pgtable.h:403 [inline]
RIP: 0010:pagemap_scan_pmd_entry+0x1d27/0x23f0 fs/proc/task_mmu.c:2146
Code: ff ff e8 5c 23 76 ff 48 89 e8 31 ff 83 e0 02 48 89 c6 48 89 04 24 e8 d8 1e 76 ff 48 8b 04 24 48 85 c0 74 25 e8 3a 23 76 ff 90 <0f> 0b 90 e9 71 ff ff ff 4c 89 74 24 68 4c 8b 74 24 10 c7 44 24 28
RSP: 0018:ffffc9000392f870 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000020001000 RCX: ffffffff82116da8
RDX: ffff88801aae8000 RSI: ffffffff82116db6 RDI: 0000000000000007
RBP: 0000000012c7ac67 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000002 R11: 0000000000000002 R12: dffffc0000000000
R13: 0000000000000400 R14: 0000000000000000 R15: ffff8880745f4000
FS: 00005555557a8380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000d60 CR3: 0000000074627000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
walk_pmd_range mm/pagewalk.c:143 [inline]
walk_pud_range mm/pagewalk.c:221 [inline]
walk_p4d_range mm/pagewalk.c:256 [inline]
walk_pgd_range+0xa48/0x1870 mm/pagewalk.c:293
__walk_page_range+0x630/0x770 mm/pagewalk.c:395
walk_page_range+0x626/0xa80 mm/pagewalk.c:521
do_pagemap_scan+0x40d/0xcd0 fs/proc/task_mmu.c:2437
do_pagemap_cmd+0x5e/0x80 fs/proc/task_mmu.c:2478
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl fs/ioctl.c:857 [inline]
__x64_sys_ioctl+0x18f/0x210 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f9c3ea93669
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe1d95e918 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe1d95e920 RCX: 00007f9c3ea93669
RDX: 0000000020000d40 RSI: 00000000c0606610 RDI: 0000000000000003
RBP: 00007f9c3eb06610 R08: 65732f636f72702f R09: 65732f636f72702f
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffe1d95eb58 R14: 0000000000000001 R15: 0000000000000001
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Andrei Vagin

unread,
Nov 15, 2023, 10:21:23 PM11/15/23
to syzbot, Peter Xu, Muhammad Usama Anjum, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Cc: Peter and Muhammad

Peter Xu

unread,
Nov 15, 2023, 10:21:32 PM11/15/23
to Andrei Vagin, syzbot, Muhammad Usama Anjum, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hi, Andrei, Muhammad,

I had a look (as it triggered the guard I added before..), and I think I
know what happened. So far I think it's a question to the new ioctl()
interface, which I'd like to double check with you all. See below.

On Wed, Nov 15, 2023 at 01:07:18PM -0800, Andrei Vagin wrote:
> Cc: Peter and Muhammad
>
> On Wed, Nov 15, 2023 at 6:41 AM syzbot
> <syzbot+e94c5a...@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: c42d9eeef8e5 Merge tag 'hardening-v6.7-rc2' of git://git.k..
> > git tree: upstream
> > console+strace: https://syzkaller.appspot.com/x/log.txt?x=13626650e80000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=84217b7fc4acdc59
> > dashboard link: https://syzkaller.appspot.com/bug?extid=e94c5aaf7890901ebf9b
> > compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15d73be0e80000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13670da8e80000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/a595d90eb9af/disk-c42d9eee.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/c1e726fedb94/vmlinux-c42d9eee.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/cb43ae262d09/bzImage-c42d9eee.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+e94c5a...@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 5071 at arch/x86/include/asm/pgtable.h:403 pte_uffd_wp arch/x86/include/asm/pgtable.h:403 [inline]

This is the guard I added to detect writable bit set even if uffd-wp bit is
not yet cleared. It means something obviously wrong happened.

Here afaict the wrong thing is ioctl(PAGEMAP_SCAN) allows applying uffd-wp
bit to VMA that is not even registered with userfault. Then what happened
is when the page is written, do_wp_page() will try to reuse the anonymous
page with the uffd-wp bit set, set W bit on top of it.

Below change works for me:

===8<===
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ef2eb12906da..8a2500fa4580 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1987,6 +1987,12 @@ static int pagemap_scan_test_walk(unsigned long start, unsigned long end,
vma_category |= PAGE_IS_WPALLOWED;
else if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
return -EPERM;
+ else
+ /*
+ * Neither has the VMA enabled WP tracking, nor does the
+ * user want to explicit fail the walk. Skip the vma.
+ */
+ return 1;

if (vma->vm_flags & VM_PFNMAP)
return 1;
===8<===

This is based on my reading of the pagemap scan flags:

- Write-protect the pages. The ``PM_SCAN_WP_MATCHING`` is used to write-protect
the pages of interest. The ``PM_SCAN_CHECK_WPASYNC`` aborts the operation if
non-Async Write Protected pages are found. The ``PM_SCAN_WP_MATCHING`` can be
used with or without ``PM_SCAN_CHECK_WPASYNC``.

If PM_SCAN_CHECK_WPASYNC is used to enforce the check, we need to skip the
vma that is not registered properly. Does it look reasonable to you?

Thanks,

--
Peter Xu

Andrei Vagin

unread,
Nov 16, 2023, 10:38:15 AM11/16/23
to Peter Xu, syzbot, Muhammad Usama Anjum, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Thank you for looking at this.

>
> Below change works for me:
>
> ===8<===
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index ef2eb12906da..8a2500fa4580 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1987,6 +1987,12 @@ static int pagemap_scan_test_walk(unsigned long start, unsigned long end,
> vma_category |= PAGE_IS_WPALLOWED;
> else if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
> return -EPERM;
> + else
> + /*
> + * Neither has the VMA enabled WP tracking, nor does the
> + * user want to explicit fail the walk. Skip the vma.
> + */
> + return 1;

In this case, I think we need to check the PM_SCAN_WP_MATCHING flag
and skip these vma-s only if it is set.

If PM_SCAN_WP_MATCHING isn't set, this ioctl returns page flags and
can be used without the intention of tracking memory changes.

>
> if (vma->vm_flags & VM_PFNMAP)
> return 1;
> ===8<===
>
> This is based on my reading of the pagemap scan flags:
>
> - Write-protect the pages. The ``PM_SCAN_WP_MATCHING`` is used to write-protect
> the pages of interest. The ``PM_SCAN_CHECK_WPASYNC`` aborts the operation if
> non-Async Write Protected pages are found. The ``PM_SCAN_WP_MATCHING`` can be
> used with or without ``PM_SCAN_CHECK_WPASYNC``.
>
> If PM_SCAN_CHECK_WPASYNC is used to enforce the check, we need to skip the
> vma that is not registered properly. Does it look reasonable to you?

I think the idea here could be to report page flags but doesn't
write-protect such pages.

Thanks,
Andrei

Peter Xu

unread,
Nov 16, 2023, 11:49:20 AM11/16/23
to Andrei Vagin, syzbot, Muhammad Usama Anjum, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Ah, I think I understand slightly better now. Below is my 2nd try..

Meanwhile, I think this won't work:

/* 9. Memory mapped file */
fd = open(__FILE__, O_RDONLY);
if (fd < 0)
ksft_exit_fail_msg("%s Memory mapped file\n", __func__);

We can't assume __FILE__ is there.. Attached one more patch for that.
I'll repost formally if that looks good to you.

===8<===

From 47d54f3bbb709c54d6bed95fbf2045ea3a541a4b Mon Sep 17 00:00:00 2001
From: Peter Xu <pet...@redhat.com>
Date: Thu, 16 Nov 2023 11:05:12 -0500
Subject: [PATCH] mm/pagemap: Fix ioctl(PAGEMAP_SCAN) on vma check

The new ioctl(PAGEMAP_SCAN) relies on vma wr-protect capability provided by
userfault, however in the vma test it didn't explicitly require the vma to
have wr-protect function enabled, even if PM_SCAN_WP_MATCHING flag is set.

It means the pagemap code can now apply uffd-wp bit to a page in the vma
even if not registered to userfaultfd at all.

Then in whatever way as long as the pte got written and page fault
resolved, we'll apply the write bit even if uffd-wp bit is set. We'll see
a pte that has both UFFD_WP and WRITE bit set. Anything later that looks
up the pte for uffd-wp bit will trigger the warning:

WARNING: CPU: 1 PID: 5071 at arch/x86/include/asm/pgtable.h:403 pte_uffd_wp arch/x86/include/asm/pgtable.h:403 [inline]

Fix it by doing proper check over the vma attributes when
PM_SCAN_WP_MATCHING is specified.

Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optionally clear info about PTEs")
Reported-by: syzbot+e94c5a...@syzkaller.appspotmail.com
Signed-off-by: Peter Xu <pet...@redhat.com>
---
fs/proc/task_mmu.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 51e0ec658457..e91085d79926 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1994,15 +1994,31 @@ static int pagemap_scan_test_walk(unsigned long start, unsigned long end,
struct pagemap_scan_private *p = walk->private;
struct vm_area_struct *vma = walk->vma;
unsigned long vma_category = 0;
+ bool wp_allowed = userfaultfd_wp_async(vma) &&
+ userfaultfd_wp_use_markers(vma);

- if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
- vma_category |= PAGE_IS_WPALLOWED;
- else if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
- return -EPERM;
+ if (!wp_allowed) {
+ /* User requested explicit failure over wp-async capability */
+ if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
+ return -EPERM;
+ /*
+ * User requires wr-protect, and allows silently skipping
+ * unsupported vmas.
+ */
+ if (p->arg.flags & PM_SCAN_WP_MATCHING)
+ return 1;
+ /*
+ * Then the request doesn't involve wr-protects at all,
+ * fall through to the rest checks, and allow vma walk.
+ */
+ }

if (vma->vm_flags & VM_PFNMAP)
return 1;

+ if (wp_allowed)
+ vma_category |= PAGE_IS_WPALLOWED;
+
if (vma->vm_flags & VM_SOFTDIRTY)
vma_category |= PAGE_IS_SOFT_DIRTY;

--
2.41.0

===8<===

From f2be2816c30fd1016d597a219e5b42c4ae847796 Mon Sep 17 00:00:00 2001
From: Peter Xu <pet...@redhat.com>
Date: Thu, 16 Nov 2023 11:45:47 -0500
Subject: [PATCH 2/2] mm/selftests: Fix pagemap_ioctl memory map test

__FILE__ is not guaranteed to exist in current dir. Replace that with
argv[0] for memory map test.

Fixes: 46fd75d4a3c9 ("selftests: mm: add pagemap ioctl tests")
Signed-off-by: Peter Xu <pet...@redhat.com>
---
tools/testing/selftests/mm/pagemap_ioctl.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/mm/pagemap_ioctl.c b/tools/testing/selftests/mm/pagemap_ioctl.c
index befab43719ba..d59517ed3d48 100644
--- a/tools/testing/selftests/mm/pagemap_ioctl.c
+++ b/tools/testing/selftests/mm/pagemap_ioctl.c
@@ -36,6 +36,7 @@ int pagemap_fd;
int uffd;
int page_size;
int hpage_size;
+const char *progname;

#define LEN(region) ((region.end - region.start)/page_size)

@@ -1149,11 +1150,11 @@ int sanity_tests(void)
munmap(mem, mem_size);

/* 9. Memory mapped file */
- fd = open(__FILE__, O_RDONLY);
+ fd = open(progname, O_RDONLY);
if (fd < 0)
ksft_exit_fail_msg("%s Memory mapped file\n", __func__);

- ret = stat(__FILE__, &sbuf);
+ ret = stat(progname, &sbuf);
if (ret < 0)
ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno));

@@ -1472,12 +1473,14 @@ static void transact_test(int page_size)
extra_thread_faults);
}

-int main(void)
+int main(int argc, char *argv[])
{
int mem_size, shmid, buf_size, fd, i, ret;
char *mem, *map, *fmem;
struct stat sbuf;

+ progname = argv[0];
+
ksft_print_header();

if (init_uffd())
--
2.41.0


--
Peter Xu

Reply all
Reply to author
Forward
0 new messages