[syzbot] [mm?] kernel BUG in vma_replace_policy

13 views
Skip to first unread message

syzbot

unread,
Sep 5, 2023, 9:03:51 PM9/5/23
to ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: a47fc304d2b6 Add linux-next specific files for 20230831
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=16502ddba80000
kernel config: https://syzkaller.appspot.com/x/.config?x=6ecd2a74f20953b9
dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=120e7d70680000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1523f9c0680000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/b2e8f4217527/disk-a47fc304.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/ed6cdcc09339/vmlinux-a47fc304.xz
kernel image: https://storage.googleapis.com/syzbot-assets/bd9b2475bf5a/bzImage-a47fc304.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+b59185...@syzkaller.appspotmail.com

vma ffff888077381a00 start 0000000020c2a000 end 0000000021000000 mm ffff8880258a8980
prot 25 anon_vma 0000000000000000 vm_ops 0000000000000000
pgoff 20c2a file 0000000000000000 private_data 0000000000000000
flags: 0x8100077(read|write|exec|mayread|maywrite|mayexec|account|softdirty)
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:733!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 5040 Comm: syz-executor418 Not tainted 6.5.0-next-20230831-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:vma_assert_write_locked include/linux/mm.h:733 [inline]
RIP: 0010:vma_assert_write_locked include/linux/mm.h:729 [inline]
RIP: 0010:vma_replace_policy+0x406/0x4e0 mm/mempolicy.c:783
Code: ff 48 89 ef e8 db 78 ff ff e9 83 fe ff ff e8 d1 7c ad ff 4c 89 e7 e8 a9 86 eb ff 0f 0b e8 c2 7c ad ff 48 89 df e8 fa 83 eb ff <0f> 0b e8 b3 7c ad ff 41 89 ec e9 58 fe ff ff 48 c7 c7 d0 55 ce 8e
RSP: 0018:ffffc9000395fc58 EFLAGS: 00010282
RAX: 000000000000011b RBX: ffff888077381a00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff816b9a92 RDI: 0000000000000005
RBP: ffff888014a7e030 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000015
R13: 0000000000000016 R14: 0000000000000001 R15: 0000000021000000
FS: 0000555556684380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd277b0020 CR3: 00000000773e1000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
mbind_range+0x37c/0x530 mm/mempolicy.c:855
do_mbind+0x583/0xa00 mm/mempolicy.c:1345
kernel_mbind+0x1d4/0x1f0 mm/mempolicy.c:1502
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fb959069369
Code: 48 83 c4 28 c3 e8 37 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd276bed98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ed
RAX: ffffffffffffffda RBX: 00007ffd276bef78 RCX: 00007fb959069369
RDX: 0000000000000004 RSI: 0000000000c00000 RDI: 0000000020400000
RBP: 00007fb9590dc610 R08: 0000000000000000 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffd276bef68 R14: 0000000000000001 R15: 0000000000000001
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:vma_assert_write_locked include/linux/mm.h:733 [inline]
RIP: 0010:vma_assert_write_locked include/linux/mm.h:729 [inline]
RIP: 0010:vma_replace_policy+0x406/0x4e0 mm/mempolicy.c:783
Code: ff 48 89 ef e8 db 78 ff ff e9 83 fe ff ff e8 d1 7c ad ff 4c 89 e7 e8 a9 86 eb ff 0f 0b e8 c2 7c ad ff 48 89 df e8 fa 83 eb ff <0f> 0b e8 b3 7c ad ff 41 89 ec e9 58 fe ff ff 48 c7 c7 d0 55 ce 8e
RSP: 0018:ffffc9000395fc58 EFLAGS: 00010282
RAX: 000000000000011b RBX: ffff888077381a00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff816b9a92 RDI: 0000000000000005
RBP: ffff888014a7e030 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000015
R13: 0000000000000016 R14: 0000000000000001 R15: 0000000021000000
FS: 0000555556684380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd277b0020 CR3: 00000000773e1000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Edward AD

unread,
Sep 5, 2023, 10:47:48 PM9/5/23
to syzbot+b59185...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
please test un write vma

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git a47fc304d2b6

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..90ad5fe60824 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1342,6 +1342,7 @@ static long do_mbind(unsigned long start, unsigned long len,
vma_iter_init(&vmi, mm, start);
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
+ vma_start_write(vma);
err = mbind_range(&vmi, vma, &prev, start, end, new);
if (err)
break;

Edward AD

unread,
Sep 5, 2023, 10:58:35 PM9/5/23
to syzbot+b59185...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
please test un write vma

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git a47fc304d2b6

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..19cabf0f91d0 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1342,6 +1342,8 @@ static long do_mbind(unsigned long start, unsigned long len,
vma_iter_init(&vmi, mm, start);
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
+ if (!vma || start < vma->start)
+ continue;

syzbot

unread,
Sep 5, 2023, 11:34:29 PM9/5/23
to ead...@sina.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+b59185...@syzkaller.appspotmail.com

Tested on:

commit: a47fc304 Add linux-next specific files for 20230831
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=127eeb34680000
kernel config: https://syzkaller.appspot.com/x/.config?x=6ecd2a74f20953b9
dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1714c9cc680000

Note: testing is done by a robot and is best-effort only.

syzbot

unread,
Sep 5, 2023, 11:40:31 PM9/5/23
to ead...@sina.com, syzkall...@googlegroups.com
Hello,

syzbot tried to test the proposed patch but the build/boot failed:

mm/mempolicy.c:1345:42: error: 'struct vm_area_struct' has no member named 'start'; did you mean 'vm_start'?


Tested on:

commit: a47fc304 Add linux-next specific files for 20230831
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
kernel config: https://syzkaller.appspot.com/x/.config?x=6ecd2a74f20953b9
dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1371fc00680000

Edward AD

unread,
Sep 6, 2023, 12:16:19 AM9/6/23
to syzbot+b59185...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
please test un write vma

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git a47fc304d2b6

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..19cabf0f91d0 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1342,6 +1342,8 @@ static long do_mbind(unsigned long start, unsigned long len,
vma_iter_init(&vmi, mm, start);
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
+ if (!vma || start < vma->vm_start)

syzbot

unread,
Sep 6, 2023, 12:44:31 AM9/6/23
to ead...@sina.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+b59185...@syzkaller.appspotmail.com

Tested on:

commit: a47fc304 Add linux-next specific files for 20230831
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1394ecd0680000
kernel config: https://syzkaller.appspot.com/x/.config?x=6ecd2a74f20953b9
dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=11fcdca4680000

Edward AD

unread,
Sep 6, 2023, 2:19:16 AM9/6/23
to syzbot+b59185...@syzkaller.appspotmail.com, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Only dealwith queue_pages_range locked vmas.

Signed-off-by: Edward AD <ead...@sina.com>
---
mm/mempolicy.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..13050b968479 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1342,6 +1342,8 @@ static long do_mbind(unsigned long start, unsigned long len,
vma_iter_init(&vmi, mm, start);
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
+ if (!vma || start < vma->vm_start)
+ continue;
err = mbind_range(&vmi, vma, &prev, start, end, new);
if (err)
break;
--
2.25.1

Matthew Wilcox

unread,
Sep 6, 2023, 8:07:00 AM9/6/23
to Edward AD, syzbot+b59185...@syzkaller.appspotmail.com, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
On Wed, Sep 06, 2023 at 02:19:02PM +0800, Edward AD wrote:
> Only dealwith queue_pages_range locked vmas.

What?

> Signed-off-by: Edward AD <ead...@sina.com>
> ---
> mm/mempolicy.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 42b5567e3773..13050b968479 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1342,6 +1342,8 @@ static long do_mbind(unsigned long start, unsigned long len,
> vma_iter_init(&vmi, mm, start);
> prev = vma_prev(&vmi);
> for_each_vma_range(vmi, vma, end) {
> + if (!vma || start < vma->vm_start)
> + continue;

#define for_each_vma_range(__vmi, __vma, __end) \
while (((__vma) = vma_find(&(__vmi), (__end))) != NULL)

How can this produce a vma that is either NULL or has a vm_start after start?

syzbot

unread,
Sep 8, 2023, 2:04:32 PM9/8/23
to 42.h...@gmail.com, Liam.H...@oracle.com, agor...@linux.ibm.com, ak...@linux-foundation.org, alex...@rivosinc.com, a...@eecs.berkeley.edu, bornt...@linux.ibm.com, cgr...@vger.kernel.org, christop...@csgroup.eu, da...@lists.linux.dev, da...@redhat.com, ead...@sina.com, fra...@linux.ibm.com, gerald....@linux.ibm.com, g...@linux.ibm.com, han...@cmpxchg.org, h...@linux.ibm.com, imbr...@linux.ibm.com, jeehe...@starfivetech.com, jgl...@redhat.com, k...@vger.kernel.org, leyfo...@starfivetech.com, linm...@huawei.com, linux-...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linux...@vger.kernel.org, linuxp...@lists.ozlabs.org, maso...@starfivetech.com, mho...@kernel.org, m...@ellerman.id.au, muchu...@linux.dev, naoya.h...@nec.com, npi...@gmail.com, pal...@dabbelt.com, paul.w...@sifive.com, roman.g...@linux.dev, sebastia...@collabora.com, shak...@google.com, s...@kernel.org, sur...@google.com, sv...@linux.ibm.com, syzkall...@googlegroups.com, wi...@infradead.org
syzbot has bisected this issue to:

commit 49b0638502da097c15d46cd4e871dbaa022caf7c
Author: Suren Baghdasaryan <sur...@google.com>
Date: Fri Aug 4 15:27:19 2023 +0000

mm: enable page walking API to lock vmas during the walk

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11fd2348680000
start commit: 7733171926cc Merge tag 'mailbox-v6.6' of git://git.linaro...
git tree: upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=13fd2348680000
console output: https://syzkaller.appspot.com/x/log.txt?x=15fd2348680000
kernel config: https://syzkaller.appspot.com/x/.config?x=b273cdfbc13e9a4b
dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15d4ecd0680000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1055c284680000

Reported-by: syzbot+b59185...@syzkaller.appspotmail.com
Fixes: 49b0638502da ("mm: enable page walking API to lock vmas during the walk")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

Hillf Danton

unread,
Sep 8, 2023, 11:42:23 PM9/8/23
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On Tue, 05 Sep 2023 18:03:49 -0700
> HEAD commit: a47fc304d2b6 Add linux-next specific files for 20230831
> git tree: linux-next
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1523f9c0680000

Lock vma before updating mem policy.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

--- x/mm/mempolicy.c
+++ y/mm/mempolicy.c
@@ -1342,6 +1342,7 @@ static long do_mbind(unsigned long start
vma_iter_init(&vmi, mm, start);
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
+ vma_start_write(vma);

syzbot

unread,
Sep 9, 2023, 12:43:34 AM9/9/23
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+b59185...@syzkaller.appspotmail.com

Tested on:

commit: 32bf43e4 Merge tag 'thermal-6.6-rc1-3' of git://git.ke..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=15137480680000
kernel config: https://syzkaller.appspot.com/x/.config?x=e82a7781f9208c0d
dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=12eaf0e8680000

Matthew Wilcox

unread,
Sep 12, 2023, 1:30:52 AM9/12/23
to syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
On Tue, Sep 05, 2023 at 06:03:49PM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: a47fc304d2b6 Add linux-next specific files for 20230831
> git tree: linux-next
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=16502ddba80000
> kernel config: https://syzkaller.appspot.com/x/.config?x=6ecd2a74f20953b9
> dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
> compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=120e7d70680000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1523f9c0680000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/b2e8f4217527/disk-a47fc304.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/ed6cdcc09339/vmlinux-a47fc304.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/bd9b2475bf5a/bzImage-a47fc304.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+b59185...@syzkaller.appspotmail.com

#syz test

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..90ad5fe60824 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1342,6 +1342,7 @@ static long do_mbind(unsigned long start, unsigned long len,

syzbot

unread,
Sep 12, 2023, 2:09:29 AM9/12/23
to ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, wi...@infradead.org
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+b59185...@syzkaller.appspotmail.com

Tested on:

commit: 0bb80ecc Linux 6.6-rc1
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1574bfc8680000
kernel config: https://syzkaller.appspot.com/x/.config?x=ba194e5cfd385dbf
dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=11ad2dfc680000

kernel test robot

unread,
Sep 12, 2023, 4:17:10 AM9/12/23
to Edward AD, oe-...@lists.linux.dev, l...@intel.com, linu...@kvack.org, l...@lists.linux.it, syzbot+b59185...@syzkaller.appspotmail.com, ak...@linux-foundation.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, olive...@intel.com


Hello,

kernel test robot noticed "ltp.vma02.fail" on:

commit: 8dfdaf99160833af1a0a4ffef60c94627c77025b ("[PATCH] mm: as the same logic with queue_pages_range")
url: https://github.com/intel-lab-lkp/linux/commits/Edward-AD/mm-as-the-same-logic-with-queue_pages_range/20230906-142110
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20230906061902....@sina.com/
patch subject: [PATCH] mm: as the same logic with queue_pages_range

in testcase: ltp
version: ltp-x86_64-14c1f76-1_20230715
with following parameters:

disk: 1HDD
test: mm-00/vma02



compiler: gcc-12
test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz (Kaby Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <olive...@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202309121302.886...@intel.com



Running tests.......
<<<test_start>>>
tag=vma02 stime=1694438526
cmdline="vma02"
contacts=""
analysis=exit
<<<test_output>>>
vma02 0 TINFO : pid = 3639 addr = 0x7f5202dd6000
vma02 0 TINFO : start = 0x7f5202dd6000, end = 0x7f5202dd7000
vma02 0 TINFO : start = 0x7f5202dd7000, end = 0x7f5202dd8000
vma02 0 TINFO : start = 0x7f5202dd8000, end = 0x7f5202dd9000
vma02 1 TFAIL : vma02.c:144: >1 unmerged VMAs.
incrementing stop
<<<execution_status>>>
initiation_status="ok"
duration=0 termination_type=exited termination_id=1 corefile=no
cutime=0 cstime=7
<<<test_end>>>
INFO: ltp-pan reported some tests FAIL
LTP Version: 20230516-75-g2e582e743

###############################################################

Done executing testcases.
LTP Version: 20230516-75-g2e582e743
###############################################################




The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230912/202309121302.886...@intel.com



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Matthew Wilcox

unread,
Sep 12, 2023, 10:55:43 AM9/12/23
to syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com, Suren Baghdasaryan
Suren, can you take a look at this? The VMA should be locked by the
call to queue_pages_range(), but by the time we get to here, the VMA
isn't locked. I don't see anywhere that we cycle the mmap_lock (which
would unlock the VMA), but I could have missed something. The two
VMA walks should walk over the same set of VMAs. Certainly the VMA
being dumped should have been locked by the pagewalk:

vma ffff888077381a00 start 0000000020c2a000 end 0000000021000000 mm ffff8880258a8980
prot 25 anon_vma 0000000000000000 vm_ops 0000000000000000
pgoff 20c2a file 0000000000000000 private_data 0000000000000000
flags: 0x8100077(read|write|exec|mayread|maywrite|mayexec|account|softdirty)

syscall(__NR_mbind, /*addr=*/0x20400000ul, /*len=*/0xc00000ul, /*mode=*/4ul,
/*nodemask=*/0ul, /*maxnode=*/0ul, /*flags=*/3ul);

20400000 + c00000 should overlap 20c2a000-21000000

Suren Baghdasaryan

unread,
Sep 12, 2023, 11:03:22 AM9/12/23
to Matthew Wilcox, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Sure, I'll look into this today. Somehow this report slipped by me
unnoticed. Thanks!

Suren Baghdasaryan

unread,
Sep 12, 2023, 12:01:09 PM9/12/23
to Matthew Wilcox, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Yeah, this looks strange. queue_pages_range() should have locked all
the vmas and the tree can't change since we are holding mmap_lock for
write. I'll try to reproduce later today and see what's going on.

Cyril Hrubis

unread,
Sep 13, 2023, 7:00:20 AM9/13/23
to kernel test robot, Edward AD, syzbot+b59185...@syzkaller.appspotmail.com, l...@intel.com, syzkall...@googlegroups.com, linux-...@vger.kernel.org, linu...@kvack.org, oe-...@lists.linux.dev, ak...@linux-foundation.org, l...@lists.linux.it
Hi!
> Running tests.......
> <<<test_start>>>
> tag=vma02 stime=1694438526
> cmdline="vma02"
> contacts=""
> analysis=exit
> <<<test_output>>>
> vma02 0 TINFO : pid = 3639 addr = 0x7f5202dd6000
> vma02 0 TINFO : start = 0x7f5202dd6000, end = 0x7f5202dd7000
> vma02 0 TINFO : start = 0x7f5202dd7000, end = 0x7f5202dd8000
> vma02 0 TINFO : start = 0x7f5202dd8000, end = 0x7f5202dd9000
> vma02 1 TFAIL : vma02.c:144: >1 unmerged VMAs.
> incrementing stop

So the test is reproducer for:

commit 9d8cebd4bcd7c3878462fdfda34bbcdeb4df7ef4
Author: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
Date: Fri Mar 5 13:41:57 2010 -0800

mm: fix mbind vma merge problem

Strangely, current mbind() doesn't merge vma with neighbor vma although it's possible.
Unfortunately, many vma can reduce performance...


And what the test does is:

- map 3 continous pages
- popluate
- mbind() first page
- mbind() all three pages
- checks that there is only single VMA for the pages after second
mbind()

--
Cyril Hrubis
chr...@suse.cz

Suren Baghdasaryan

unread,
Sep 13, 2023, 12:05:31 PM9/13/23
to Matthew Wilcox, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
So far I was unable to reproduce the issue. I tried with Linus' ToT
using the attached config. linux-next ToT does not boot with this
config but defconfig boots and fails to reproduce the issue. I'll try
to figure out why current linux-next does not like this config.

Suren Baghdasaryan

unread,
Sep 13, 2023, 12:47:04 PM9/13/23
to Matthew Wilcox, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Ok, I found a way to reproduce this using the config and kernel
baseline reported on 2023/09/06 06:24 at
https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023. I
suspect mmap_lock is being dropped by a racing thread, similar to this
issue we fixed before here:
https://lore.kernel.org/all/CAJuCfpH8ucOkCFYrVZafUAppi5+mVhy=uD+BK6-oYX=ysQ...@mail.gmail.com/
Anyway, I'm on it and will report once I figure out the issue.

Edward AD

unread,
Sep 13, 2023, 10:06:15 PM9/13/23
to syzbot+b59185...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
please test un write vma

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git a47fc304d2b6

--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -490,6 +490,7 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,

vma = find_vma(walk.mm, start);
do {
+ process_vma_walk_lock(vma, ops->walk_lock);
if (!vma) { /* after the last vma */
walk.vma = NULL;
next = end;
@@ -501,7 +502,6 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
if (ops->pte_hole)
err = ops->pte_hole(start, next, -1, &walk);
} else { /* inside vma */
- process_vma_walk_lock(vma, ops->walk_lock);
walk.vma = vma;
next = min(end, vma->vm_end);
vma = find_vma(mm, vma->vm_end);

syzbot

unread,
Sep 13, 2023, 11:44:31 PM9/13/23
to ead...@sina.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in vma_replace_policy

vma ffff888026d16200 start 0000000020c2a000 end 0000000021000000 mm ffff888028ff7200
prot 25 anon_vma 0000000000000000 vm_ops 0000000000000000
pgoff 20c2a file 0000000000000000 private_data 0000000000000000
flags: 0x8100077(read|write|exec|mayread|maywrite|mayexec|account|softdirty)
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:733!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 5458 Comm: syz-executor.0 Not tainted 6.5.0-next-20230831-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
RIP: 0010:vma_assert_write_locked include/linux/mm.h:733 [inline]
RIP: 0010:vma_assert_write_locked include/linux/mm.h:729 [inline]
RIP: 0010:vma_replace_policy+0x406/0x4e0 mm/mempolicy.c:783
Code: ff 48 89 ef e8 db 78 ff ff e9 83 fe ff ff e8 21 7d ad ff 4c 89 e7 e8 f9 86 eb ff 0f 0b e8 12 7d ad ff 48 89 df e8 4a 84 eb ff <0f> 0b e8 03 7d ad ff 41 89 ec e9 58 fe ff ff 48 c7 c7 90 5f ce 8e
RSP: 0018:ffffc900052b7c58 EFLAGS: 00010282
RAX: 000000000000011b RBX: ffff888026d16200 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff816b9a92 RDI: 0000000000000005
RBP: ffff888014a7e030 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000005
R13: 0000000000000006 R14: 0000000000000001 R15: 0000000021000000
FS: 00007f32c8ef26c0(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c543fca131 CR3: 000000001cb69000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
mbind_range+0x37c/0x530 mm/mempolicy.c:855
do_mbind+0x583/0xa00 mm/mempolicy.c:1345
kernel_mbind+0x1d4/0x1f0 mm/mempolicy.c:1502
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f32c827cae9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f32c8ef20c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ed
RAX: ffffffffffffffda RBX: 00007f32c839bf80 RCX: 00007f32c827cae9
RDX: 0000000000000004 RSI: 0000000000c00000 RDI: 0000000020400000
RBP: 00007f32c82c847a R08: 0000000000000000 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f32c839bf80 R15: 00007fffd3974c58
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:vma_assert_write_locked include/linux/mm.h:733 [inline]
RIP: 0010:vma_assert_write_locked include/linux/mm.h:729 [inline]
RIP: 0010:vma_replace_policy+0x406/0x4e0 mm/mempolicy.c:783
Code: ff 48 89 ef e8 db 78 ff ff e9 83 fe ff ff e8 21 7d ad ff 4c 89 e7 e8 f9 86 eb ff 0f 0b e8 12 7d ad ff 48 89 df e8 4a 84 eb ff <0f> 0b e8 03 7d ad ff 41 89 ec e9 58 fe ff ff 48 c7 c7 90 5f ce 8e
RSP: 0018:ffffc900052b7c58 EFLAGS: 00010282
RAX: 000000000000011b RBX: ffff888026d16200 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff816b9a92 RDI: 0000000000000005
RBP: ffff888014a7e030 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000005
R13: 0000000000000006 R14: 0000000000000001 R15: 0000000021000000
FS: 00007f32c8ef26c0(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c543fca131 CR3: 000000001cb69000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit: a47fc304 Add linux-next specific files for 20230831
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=17517730680000
kernel config: https://syzkaller.appspot.com/x/.config?x=8294f0401243efe1
dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=13043e44680000

Edward AD

unread,
Sep 14, 2023, 12:56:59 AM9/14/23
to syzbot+b59185...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
please test un write vma

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git a47fc304d2b6

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index b7d7e4fcfad7..b31d59a27f57 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -479,6 +479,7 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
.mm = mm,
.private = private,
};
+ struct vma_iterator vmi;

if (start >= end)
return -EINVAL;
@@ -488,8 +489,9 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,

process_mm_walk_lock(walk.mm, ops->walk_lock);

- vma = find_vma(walk.mm, start);
- do {
+ vma_iter_init(&vmi, walk.mm, start);
+ for_each_vma_range(vmi, vma, end) {
+ process_vma_walk_lock(vma, ops->walk_lock);
if (!vma) { /* after the last vma */
walk.vma = NULL;
next = end;
@@ -501,10 +503,8 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
if (ops->pte_hole)
err = ops->pte_hole(start, next, -1, &walk);
} else { /* inside vma */
- process_vma_walk_lock(vma, ops->walk_lock);
walk.vma = vma;
next = min(end, vma->vm_end);
- vma = find_vma(mm, vma->vm_end);

err = walk_page_test(start, next, &walk);
if (err > 0) {
@@ -522,7 +522,7 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
}
if (err)
break;
- } while (start = next, start < end);
+ };
return err;
}

syzbot

unread,
Sep 14, 2023, 1:34:27 AM9/14/23
to ead...@sina.com, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+b59185...@syzkaller.appspotmail.com

Tested on:

commit: a47fc304 Add linux-next specific files for 20230831
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=150cbf30680000
kernel config: https://syzkaller.appspot.com/x/.config?x=8294f0401243efe1
dashboard link: https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=10de73bfa80000

Edward AD

unread,
Sep 14, 2023, 1:53:20 AM9/14/23
to olive...@intel.com, ak...@linux-foundation.org, ead...@sina.com, linux-...@vger.kernel.org, linu...@kvack.org, l...@intel.com, l...@lists.linux.it, oe-...@lists.linux.dev, syzbot+b59185...@syzkaller.appspotmail.com, syzkall...@googlegroups.com
Using for_each_vma_range to find vmas, and not only inside locking, but also
outside locking to ensure that vma_start_write are performed.

Reported-and-tested-by: syzbot+b59185...@syzkaller.appspotmail.com
Signed-off-by: Edward AD <ead...@sina.com>
---
mm/pagewalk.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
--
2.25.1

Suren Baghdasaryan

unread,
Sep 14, 2023, 2:21:10 PM9/14/23
to Matthew Wilcox, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
I think I found the problem and the explanation is much simpler. While
walking the page range, queue_folios_pte_range() encounters an
unmovable page and queue_folios_pte_range() returns 1. That causes a
break from the loop inside walk_page_range() and no more VMAs get
locked. After that the loop calling mbind_range() walks over all VMAs,
even the ones which were skipped by queue_folios_pte_range() and that
causes this BUG assertion.

Thinking what's the right way to handle this situation (what's the
expected behavior here)...
I think the safest way would be to modify walk_page_range() and make
it continue calling process_vma_walk_lock() for all VMAs in the range
even when __walk_page_range() returns a positive err. Any objection or
alternative suggestions?

Matthew Wilcox

unread,
Sep 14, 2023, 3:09:12 PM9/14/23
to Suren Baghdasaryan, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
On Thu, Sep 14, 2023 at 06:20:56PM +0000, Suren Baghdasaryan wrote:
> I think I found the problem and the explanation is much simpler. While
> walking the page range, queue_folios_pte_range() encounters an
> unmovable page and queue_folios_pte_range() returns 1. That causes a
> break from the loop inside walk_page_range() and no more VMAs get
> locked. After that the loop calling mbind_range() walks over all VMAs,
> even the ones which were skipped by queue_folios_pte_range() and that
> causes this BUG assertion.
>
> Thinking what's the right way to handle this situation (what's the
> expected behavior here)...
> I think the safest way would be to modify walk_page_range() and make
> it continue calling process_vma_walk_lock() for all VMAs in the range
> even when __walk_page_range() returns a positive err. Any objection or
> alternative suggestions?

So we only return 1 here if MPOL_MF_MOVE* & MPOL_MF_STRICT were
specified. That means we're going to return an error, no matter what,
and there's no point in calling mbind_range(). Right?

+++ b/mm/mempolicy.c
@@ -1334,6 +1334,8 @@ static long do_mbind(unsigned long start, unsigned long len,
ret = queue_pages_range(mm, start, end, nmask,
flags | MPOL_MF_INVERT, &pagelist, true);

+ if (ret == 1)
+ ret = -EIO;
if (ret < 0) {
err = ret;
goto up_out;

(I don't really understand this code, so it can't be this simple, can
it? Why don't we just return -EIO from queue_folios_pte_range() if
this is the right answer?)

Suren Baghdasaryan

unread,
Sep 14, 2023, 4:00:59 PM9/14/23
to Matthew Wilcox, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Yeah, I'm trying to understand the expected behavior of this function
to make sure we are not missing anything. I tried a simple fix that I
suggested in my previous email and it works but I want to understand a
bit more about this function's logic before posting the fix.

Suren Baghdasaryan

unread,
Sep 14, 2023, 4:54:14 PM9/14/23
to Matthew Wilcox, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
So, current functionality is that after queue_pages_range() encounters
an unmovable page, terminates the loop and returns 1, mbind_range()
will still be called for the whole range
(https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1345),
all pages in the pagelist will be migrated
(https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1355)
and only after that the -EIO code will be returned
(https://elixir.bootlin.com/linux/latest/source/mm/mempolicy.c#L1362).
So, if we follow Matthew's suggestion we will be altering the current
behavior which I assume is not what we want to do.
The simple fix I was thinking about that would not alter this behavior
is smth like this:

diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index b7d7e4fcfad7..c37a7e8be4cb 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -493,11 +493,17 @@ int walk_page_range(struct mm_struct *mm,
unsigned long start,
if (!vma) { /* after the last vma */
walk.vma = NULL;
next = end;
+ if (err)
+ continue;
+
if (ops->pte_hole)
err = ops->pte_hole(start, next, -1, &walk);
} else if (start < vma->vm_start) { /* outside vma */
walk.vma = NULL;
next = min(end, vma->vm_start);
+ if (err)
+ continue;
+
if (ops->pte_hole)
err = ops->pte_hole(start, next, -1, &walk);
} else { /* inside vma */
@@ -505,6 +511,8 @@ int walk_page_range(struct mm_struct *mm, unsigned
long start,
walk.vma = vma;
next = min(end, vma->vm_end);
vma = find_vma(mm, vma->vm_end);
+ if (err)
+ continue;

err = walk_page_test(start, next, &walk);
if (err > 0) {
@@ -520,8 +528,6 @@ int walk_page_range(struct mm_struct *mm, unsigned
long start,
break;
err = __walk_page_range(start, next, &walk);
}
- if (err)
- break;
} while (start = next, start < end);
return err;
}

WDYT?

Matthew Wilcox

unread,
Sep 14, 2023, 5:24:09 PM9/14/23
to Suren Baghdasaryan, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Right, I'm intentionally changing the behaviour. My thinking is
that mbind(MPOL_MF_MOVE | MPOL_MF_STRICT) is going to fail. Should
such a failure actually move the movable pages before reporting that
it failed? I don't know.

> The simple fix I was thinking about that would not alter this behavior
> is smth like this:

I don't like it, but can we run it past syzbot to be sure it solves the
issue and we're not chasing a ghost here?

Suren Baghdasaryan

unread,
Sep 14, 2023, 6:21:25 PM9/14/23
to Matthew Wilcox, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Yes, I just finished running the reproducer on both upstream and
linux-next builds listed in
https://syzkaller.appspot.com/bug?extid=b591856e0f0139f83023 and the
problem does not happen anymore.
I'm fine with your suggestion too, just wanted to point out it would
introduce change in the behavior. Let me know how you want to proceed.

Hugh Dickins

unread,
Sep 15, 2023, 12:26:31 AM9/15/23
to Suren Baghdasaryan, Matthew Wilcox, Yang Shi, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Well done, identifying the mysterious cause of this problem:
I'm glad to hear that you've now verified that hypothesis.

You're right, it would be a regression to follow Matthew's suggestion.

Traditionally, modulo bugs and inconsistencies, the queue_pages_range()
phase of do_mbind() has done the best it can, gathering all the pages it
can that need migration, even if some were missed; and proceeds to do the
mbind_range() phase if there was nothing "seriously" wrong (a gap causing
-EFAULT). Then at the end, if MPOL_MF_STRICT was set, and not all the
pages could be migrated (or MOVE was not specified and not all pages
were well placed), it returns -EIO rather than 0 to inform the caller
that not all could be done.

There have been numerous tweaks, but I think most importantly
5.3's d883544515aa ("mm: mempolicy: make the behavior consistent when
MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") added those "return 1"s
which stop the pagewalk early. In my opinion, not an improvement - makes
it harder to get mbind() to do the best job it can (or is it justified as
what you're asking for if you say STRICT?).

But whatever, it would be a further regression for mbind() not to have
done the mbind_range(), even though it goes on to return -EIO.

I had a bad first reaction to your walk_page_range() patch (was expecting
to see vma_start_write()s in mbind_range()), but perhaps your patch is
exactly what process_mm_walk_lock() does now demand.

[Why is Hugh responding on this? Because I have some long-standing
mm/mempolicy.c patches to submit next week, but in reviewing what I
could or could not afford to get into at this time, had decided I'd
better stay out of queue_pages_range() for now - beyond the trivial
preferring an MPOL_MF_WRLOCK flag to your bool lock_vma.]

Hugh

Suren Baghdasaryan

unread,
Sep 15, 2023, 12:09:52 PM9/15/23
to Hugh Dickins, Matthew Wilcox, Yang Shi, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Thanks for the feedback, Hugh!
Yeah, this positive err handling is kinda weird. If this behavior (do
as much as possible even if we fail eventually) is specific to mbind()
then we could keep walk_page_range() as is and lock the VMAs inside
the loop that calls mbind_range() with a condition that ret is
positive. That would be the simplest solution IMHO. But if we expect
walk_page_range() to always apply requested page_walk_lock policy to
all VMAs even if some mm_walk_ops returns a positive error somewhere
in the middle of the walk then my fix would work for that. So, to me
the important question is how we want walk_page_range() to behave in
these conditions. I think we should answer that first and document
that. Then the fix will be easy.


>
> Hugh

Suren Baghdasaryan

unread,
Sep 15, 2023, 2:05:29 PM9/15/23
to Hugh Dickins, Matthew Wilcox, Yang Shi, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
I looked at all the cases where we perform page walk while locking
VMAs and mbind() seems to be the only one that would require
walk_page_range() to lock all VMAs even for a failed walk. So, I
suggest this fix instead and I can also document that if
walk_page_range() fails it might not apply page_walk_lock policy to
the VMAs.

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..cbc584e9b6ca 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start,
unsigned long len,
vma_iter_init(&vmi, mm, start);
prev = vma_prev(&vmi);
for_each_vma_range(vmi, vma, end) {
+ /* If queue_pages_range failed then not all VMAs
might be locked */
+ if (ret)
+ vma_start_write(vma);
err = mbind_range(&vmi, vma, &prev, start, end, new);
if (err)
break;

If this looks good I'll post the patch. Matthew, Hugh, anyone else?

>
>
> >
> > Hugh

Matthew Wilcox

unread,
Sep 15, 2023, 2:26:23 PM9/15/23
to Hugh Dickins, Suren Baghdasaryan, Yang Shi, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
I suspect you agree that it's inconsistent to stop early. Userspace
doesn't know at which point we found an unmovable page, so it can't behave
rationally. Perhaps we should remove the 'early stop' and attempt to
migrate every page in the range, whether it's before or after the first
unmovable page?

Yang Shi

unread,
Sep 15, 2023, 9:35:18 PM9/15/23
to Hugh Dickins, Suren Baghdasaryan, Matthew Wilcox, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
On Thu, Sep 14, 2023 at 9:26 PM Hugh Dickins <hu...@google.com> wrote:
>
Hi Suren and Hugh,

Thanks for figuring this out. The mbind behavior is a little bit messy
and hard to follow. I tried my best to recall all the changes.

IIUC, mbind did break the vma iteration early in the first place, then
commit 6f4576e3687b ("mempolicy: apply page table walker on
queue_pages_range()") changed the behavior (didn't break vma iteration
early for some cases anymore), but it messed up the return value and
caused some test cases failure, also violated the manual. The return
value issue was fixed by commit a7f40cfe3b7a ("mm: mempolicy: make
mbind() return -EIO when MPOL_MF_STRICT is specified"), this commit
also restored the oldest behavior (break loop early). But it also
breaks the loop early when MPOL_MF_MOVE|MOVEALL is set, kernel should
actually continue the loop to try to migrate all existing pages per
the manual. It sounds like a regression. I will take a look at it.

So the logic should conceptually look like:

if (MPOL_MF_MOVE|MOVEALL)
continue;
if (MPOL_MF_STRICT)
break;

So it is still possible that some VMAs are not locked if only
MPOL_MF_STRICT is set.

Hugh Dickins

unread,
Sep 15, 2023, 10:44:12 PM9/15/23
to Suren Baghdasaryan, Hugh Dickins, Matthew Wilcox, Yang Shi, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Yes, I can well believe that.

> So, I suggest this fix instead and I can also document that if
> walk_page_range() fails it might not apply page_walk_lock policy to
> the VMAs.
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 42b5567e3773..cbc584e9b6ca 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1342,6 +1342,9 @@ static long do_mbind(unsigned long start,
> unsigned long len,
> vma_iter_init(&vmi, mm, start);
> prev = vma_prev(&vmi);
> for_each_vma_range(vmi, vma, end) {
> + /* If queue_pages_range failed then not all VMAs
> might be locked */
> + if (ret)
> + vma_start_write(vma);
> err = mbind_range(&vmi, vma, &prev, start, end, new);
> if (err)
> break;
>
> If this looks good I'll post the patch. Matthew, Hugh, anyone else?

Yes, I do prefer this, to adding those pos ret mods into the generic
pagewalk. The "if (ret)" above being just a minor optimization, that
I would probably not have bothered with (does it even save any atomics?)
- but I guess it helps as documentation.

I think it's quite likely that mbind() will be changed sooner or later
not to need this; but it's much the best to fix this vma locking issue
urgently as above, without depending on any mbind() behavioral discussions.

Thanks,
Hugh

Hugh Dickins

unread,
Sep 15, 2023, 10:54:20 PM9/15/23
to Matthew Wilcox, Hugh Dickins, Suren Baghdasaryan, Yang Shi, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
Yes, that's what I was arguing for, and how it was done in olden days.
Though (after Yang Shi's following comments, and looking back at my
last attempted patch here) I may disagree with myself about the right
behavior in the MPOL_MF_STRICT case.

Hugh

Hugh Dickins

unread,
Sep 15, 2023, 11:57:26 PM9/15/23
to Yang Shi, Hugh Dickins, Suren Baghdasaryan, Matthew Wilcox, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
On Fri, 15 Sep 2023, Yang Shi wrote:
>
> Hi Suren and Hugh,
>
> Thanks for figuring this out. The mbind behavior is a little bit messy
> and hard to follow. I tried my best to recall all the changes.

Messy and confusing yes; and for every particular behavior, I suspect
that by now there exists some release which has done it that way.

>
> IIUC, mbind did break the vma iteration early in the first place, then
> commit 6f4576e3687b ("mempolicy: apply page table walker on
> queue_pages_range()") changed the behavior (didn't break vma iteration
> early for some cases anymore), but it messed up the return value and
> caused some test cases failure, also violated the manual. The return
> value issue was fixed by commit a7f40cfe3b7a ("mm: mempolicy: make
> mbind() return -EIO when MPOL_MF_STRICT is specified"), this commit
> also restored the oldest behavior (break loop early). But it also
> breaks the loop early when MPOL_MF_MOVE|MOVEALL is set, kernel should
> actually continue the loop to try to migrate all existing pages per
> the manual.

Oh, I missed that aspect in my description: yes, I think that's the
worst of it: MPOL_MF_STRICT alone could break out early because it had
nothing more to learn by going further, but it was simply a mistake for
the MOVEs to break out early (and arguable what MOVE|STRICT should do).

I thought you and I were going to have a debate about this, but we
appear to be in agreement. And I'm not sure whether I agree with
myself about whether do_mbind() should apply the mbind_range()s
when STRICT queue_pages_range() found an unmovable - there are
consistency and regression arguments both ways.

(I've been repeatedly puzzled by your comment in queue_folios_pte_range()
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
/* MPOL_MF_STRICT must be specified if we get here */
if (!vma_migratable(vma)) {
Does that commment about MPOL_MF_STRICT actually belong inside the
!vma_migratable(vma) block? Sometimes I think so, but sometimes I
remember that the interaction of those flags, and the skipping arranged
by queue_pages_test_walk(), is subtler than I imagine.)

> It sounds like a regression. I will take a look at it.

Thanks! Please do, I don't have the time for it.

>
> So the logic should conceptually look like:
>
> if (MPOL_MF_MOVE|MOVEALL)
> continue;
> if (MPOL_MF_STRICT)
> break;
>
> So it is still possible that some VMAs are not locked if only
> MPOL_MF_STRICT is set.

Conditionally, I'll agree; but it's too easy for me to agree in the
course of trying to get an email out, but on later reflection come
to disagree. STRICT|MOVE behavior arguable.

I think the best I can do is send you (privately) my approx-v5.2 patch
for this (which I never got time to put into even a Google-internal
kernel, though an earlier version was there). In part because I did
more research back then, and its commit message cites several even
older commits than you cite above, which might help to shed more light
on the history (or might just be wrong). And in part because it may
give you some more ideas of what needs doing: notably qp->nr_failed,
because "man 2 migrate_pages" says "On success migrate_pages() returns
the number of pages that could not be moved", but we seem to have
lost sight of that (from which one may conclude that it's not very
important, but I did find it useful when testing); but of course
the usual doubts about the right way to count a page when compound.

I'll check how easily that patch applies to a known base such as
v5.2, maybe trim it to fit better, then send it off to you.

Hugh

Suren Baghdasaryan

unread,
Sep 18, 2023, 5:20:36 PM9/18/23
to Hugh Dickins, Matthew Wilcox, Yang Shi, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
I posted this patch at
https://lore.kernel.org/all/20230918211608....@google.com/
to fix the immediate problem.
Thanks!

>
> Thanks,
> Hugh

Yang Shi

unread,
Sep 18, 2023, 6:34:21 PM9/18/23
to Hugh Dickins, Suren Baghdasaryan, Matthew Wilcox, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
They will not be added into the migration list in the first place. Why
waste time to try to migrate the unmovable?

>
> (I've been repeatedly puzzled by your comment in queue_folios_pte_range()
> if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
> /* MPOL_MF_STRICT must be specified if we get here */
> if (!vma_migratable(vma)) {
> Does that commment about MPOL_MF_STRICT actually belong inside the
> !vma_migratable(vma) block? Sometimes I think so, but sometimes I
> remember that the interaction of those flags, and the skipping arranged
> by queue_pages_test_walk(), is subtler than I imagine.)

It is because the below code snippet from queue_pages_test_walk():

if (!vma_migratable(vma) &&
!(flags & MPOL_MF_STRICT))
return 1;

When queue_pages_test_walk() returns 1, queue_folios_pte_range() will
be skipped. So if queue_folios_pte_range() sees unmigratable vma, it
means MPOL_MF_STRICT must be set.

>
> > It sounds like a regression. I will take a look at it.
>
> Thanks! Please do, I don't have the time for it.
>
> >
> > So the logic should conceptually look like:
> >
> > if (MPOL_MF_MOVE|MOVEALL)
> > continue;
> > if (MPOL_MF_STRICT)
> > break;
> >
> > So it is still possible that some VMAs are not locked if only
> > MPOL_MF_STRICT is set.
>
> Conditionally, I'll agree; but it's too easy for me to agree in the
> course of trying to get an email out, but on later reflection come
> to disagree. STRICT|MOVE behavior arguable.

I thought the code should conceptually do:

if (MPOL_MF_MOVE|MOVEALL)
scan all vmas
try to migrate the existing pages
return success
else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
scan all vmas
try to migrate the existing pages
return -EIO if unmovable or migration failed
else /* MPOL_MF_STRICT alone */
break early if meets unmovable and don't call mbind_range() at all

So the vma scan will just be skipped when MPOL_MF_STRICT alone is
specified and mbind_range() won't be called in this case. So Suren's
fix may not be needed.

>
> I think the best I can do is send you (privately) my approx-v5.2 patch
> for this (which I never got time to put into even a Google-internal
> kernel, though an earlier version was there). In part because I did
> more research back then, and its commit message cites several even
> older commits than you cite above, which might help to shed more light
> on the history (or might just be wrong). And in part because it may
> give you some more ideas of what needs doing: notably qp->nr_failed,
> because "man 2 migrate_pages" says "On success migrate_pages() returns
> the number of pages that could not be moved", but we seem to have
> lost sight of that (from which one may conclude that it's not very
> important, but I did find it useful when testing); but of course
> the usual doubts about the right way to count a page when compound.
>
> I'll check how easily that patch applies to a known base such as
> v5.2, maybe trim it to fit better, then send it off to you.

I'm thinking about the below fix (build test against the latest
mm-unstable only):

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 42b5567e3773..c9b768a042a8 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -426,6 +426,7 @@ struct queue_pages {
unsigned long start;
unsigned long end;
struct vm_area_struct *first;
+ bool has_unmovable;
};

/*
@@ -446,9 +447,8 @@ static inline bool queue_folio_required(struct folio *folio,
/*
* queue_folios_pmd() has three possible return values:
* 0 - folios are placed on the right node or queued successfully, or
- * special page is met, i.e. huge zero page.
- * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
- * specified.
+ * special page is met, i.e. zero page, or unmovable page is found
+ * but continue walking (indicated by queue_pages.has_unmovable).
* -EIO - is migration entry or only MPOL_MF_STRICT was specified and an
* existing folio was already on a node that does not follow the
* policy.
@@ -479,7 +479,7 @@ static int queue_folios_pmd(pmd_t *pmd, spinlock_t
*ptl, unsigned long addr,
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(walk->vma) ||
migrate_folio_add(folio, qp->pagelist, flags)) {
- ret = 1;
+ qp->has_unmovable |= 1;
goto unlock;
}
} else
@@ -495,9 +495,8 @@ static int queue_folios_pmd(pmd_t *pmd, spinlock_t
*ptl, unsigned long addr,
*
* queue_folios_pte_range() has three possible return values:
* 0 - folios are placed on the right node or queued successfully, or
- * special page is met, i.e. zero page.
- * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
- * specified.
+ * special page is met, i.e. zero page, or unmovable page is found
+ * but continue walking (indicated by queue_pages.has_unmovable).
* -EIO - only MPOL_MF_STRICT was specified and an existing folio was already
* on a node that does not follow the policy.
*/
@@ -538,10 +537,13 @@ static int queue_folios_pte_range(pmd_t *pmd,
unsigned long addr,
if (!queue_folio_required(folio, qp))
continue;
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- /* MPOL_MF_STRICT must be specified if we get here */
+ /*
+ * MPOL_MF_STRICT must be specified if we get here.
+ * Continue walking vmas due to MPOL_MF_MOVE* flags.
+ */
if (!vma_migratable(vma)) {
- has_unmovable = true;
- break;
+ qp->has_unmovable |= 1;
+ continue;
}

/*
@@ -550,16 +552,13 @@ static int queue_folios_pte_range(pmd_t *pmd,
unsigned long addr,
* need migrate other LRU pages.
*/
if (migrate_folio_add(folio, qp->pagelist, flags))
- has_unmovable = true;
+ has_unmovable |= 1;
} else
break;
}
pte_unmap_unlock(mapped_pte, ptl);
cond_resched();

- if (has_unmovable)
- return 1;
-
return addr != end ? -EIO : 0;
}

@@ -599,7 +598,7 @@ static int queue_folios_hugetlb(pte_t *pte,
unsigned long hmask,
* Detecting misplaced folio but allow migrating folios which
* have been queued.
*/
- ret = 1;
+ qp->has_unmovable |= 1;
goto unlock;
}

@@ -620,7 +619,7 @@ static int queue_folios_hugetlb(pte_t *pte,
unsigned long hmask,
* Failed to isolate folio but allow migrating pages
* which have been queued.
*/
- ret = 1;
+ qp->has_unmovable |= 1;
}
unlock:
spin_unlock(ptl);
@@ -756,12 +755,15 @@ queue_pages_range(struct mm_struct *mm, unsigned
long start, unsigned long end,
.start = start,
.end = end,
.first = NULL,
+ .has_unmovable = false,
};
const struct mm_walk_ops *ops = lock_vma ?
&queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops;

err = walk_page_range(mm, start, end, ops, &qp);

+ if (qp.has_unmovable)
+ err = 1;
if (!qp.first)
/* whole range in hole */
err = -EFAULT;
@@ -1358,7 +1360,7 @@ static long do_mbind(unsigned long start,
unsigned long len,
putback_movable_pages(&pagelist);
}

- if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
+ if (((ret > 0) || nr_failed) && (flags & MPOL_MF_STRICT))
err = -EIO;
} else {
up_out:

>
> Hugh

Hugh Dickins

unread,
Sep 18, 2023, 8:34:16 PM9/18/23
to Yang Shi, Hugh Dickins, Suren Baghdasaryan, Matthew Wilcox, Michal Hocko, Vlastimil Babka, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
I don't understand you there. I was not proposing to try to migrate
the unmovable.

My doubts were really all about how to make sense of mbind() sometimes
failing with EFAULT, in which case it has not applied the mbind_range()s,
versus sometimes failing with EIO, in which case it may or may not have
applied the mbind_range()s.

And I've come to the conclusion (partially driven by precedent) that it
makes best sense to imagine the collection of folios on pagelist as a
part of MOVE's migration stage, and just an implementation detail that
it happens to be done before the mbind_range()s. So when there's a
MOVE involved, STRICT's EIO says that the mbind_ranges() were applied
(but migrations were incomplete); but when no MOVE involved, EIO says
that the mbind_range()s were not applied (because it's being STRICT).

I don't think there's any disagreement between us on this: it was just
hard for me to reach an understanding of behavior which I could defend.

>
> >
> > (I've been repeatedly puzzled by your comment in queue_folios_pte_range()
> > if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
> > /* MPOL_MF_STRICT must be specified if we get here */
> > if (!vma_migratable(vma)) {
> > Does that commment about MPOL_MF_STRICT actually belong inside the
> > !vma_migratable(vma) block? Sometimes I think so, but sometimes I
> > remember that the interaction of those flags, and the skipping arranged
> > by queue_pages_test_walk(), is subtler than I imagine.)
>
> It is because the below code snippet from queue_pages_test_walk():
>
> if (!vma_migratable(vma) &&
> !(flags & MPOL_MF_STRICT))
> return 1;
>
> When queue_pages_test_walk() returns 1, queue_folios_pte_range() will
> be skipped. So if queue_folios_pte_range() sees unmigratable vma, it
> means MPOL_MF_STRICT must be set.

Thanks, yes, I eventually came to see that, once I got back into the code
(I had been right to remember "subtler than I imagine" above). Though I
don't think there's any good reason for the queueing code to have to
depend on such subtleties.

>
> >
> > > It sounds like a regression. I will take a look at it.

At one point I was thinking it a regression in all the MOVE cases;
but it's only in the STRICT MOVE case, so maybe not so significant.

> >
> > Thanks! Please do, I don't have the time for it.

I came back in private mail to say that I'd not managed a buildable
v5.2 version of my qp->nr_failed patch, so reluctantly put in the time
to think through it all again, and do a v6.6-rc1 version to add into my
mm/mempolicy series.

I have that now, I'll send you the preview privately in a moment; but
leave posting it publicly until I've finished the commit messages for
all the series.

> >
> > >
> > > So the logic should conceptually look like:
> > >
> > > if (MPOL_MF_MOVE|MOVEALL)
> > > continue;
> > > if (MPOL_MF_STRICT)
> > > break;
> > >
> > > So it is still possible that some VMAs are not locked if only
> > > MPOL_MF_STRICT is set.
> >
> > Conditionally, I'll agree; but it's too easy for me to agree in the
> > course of trying to get an email out, but on later reflection come
> > to disagree. STRICT|MOVE behavior arguable.
>
> I thought the code should conceptually do:
>
> if (MPOL_MF_MOVE|MOVEALL)
> scan all vmas
> try to migrate the existing pages
> return success
> else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
> scan all vmas
> try to migrate the existing pages
> return -EIO if unmovable or migration failed
> else /* MPOL_MF_STRICT alone */
> break early if meets unmovable and don't call mbind_range() at all
else /* none of those flags */
check the ranges in test_walk, EFAULT without mbind_range() if discontig.

Yes: to quote my own patch:
static bool strictly_unmovable(unsigned long flags)
{
/*
* STRICT without MOVE flags lets do_mbind() fail immediately with -EIO
* if any misplaced page is found.
*/
return (flags & (MPOL_MF_STRICT | MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) ==
MPOL_MF_STRICT;
}

>
> So the vma scan will just be skipped when MPOL_MF_STRICT alone is
> specified and mbind_range() won't be called in this case. So Suren's
> fix may not be needed.

Yes, Suren's fix can be reverted when your patch or mine goes in;
but Suren's is important for fixing the VMA locking issue meanwhile.

>
> >
> > I think the best I can do is send you (privately) my approx-v5.2 patch
> > for this (which I never got time to put into even a Google-internal
> > kernel, though an earlier version was there). In part because I did
> > more research back then, and its commit message cites several even
> > older commits than you cite above, which might help to shed more light
> > on the history (or might just be wrong). And in part because it may
> > give you some more ideas of what needs doing: notably qp->nr_failed,
> > because "man 2 migrate_pages" says "On success migrate_pages() returns
> > the number of pages that could not be moved", but we seem to have
> > lost sight of that (from which one may conclude that it's not very
> > important, but I did find it useful when testing); but of course
> > the usual doubts about the right way to count a page when compound.
> >
> > I'll check how easily that patch applies to a known base such as
> > v5.2, maybe trim it to fit better, then send it off to you.
>
> I'm thinking about the below fix (build test against the latest
> mm-unstable only):

Yes, that looks about right (more "|="ing than necessary, for something
that's only going to be set to 1, er, I think would better be "true").

And it's much smaller (rightly so if it's aimed at v6.6) than my patch
which is aimed at v6.7: mine doing quite a bit of cleanup, along with
the qp->nr_failed instead of your qp->has_unmovable, in order that
migrate_pages(2) can return the promised number of pages that could
not be moved.

Hugh
Reply all
Reply to author
Forward
0 new messages