[syzbot] WARNING: locking bug in hugetlb_no_page

10 views
Skip to first unread message

syzbot

unread,
Nov 12, 2022, 9:03:49 AM11/12/22
to ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, ll...@lists.linux.dev, mike.k...@oracle.com, nat...@kernel.org, ndesau...@google.com, songm...@bytedance.com, syzkall...@googlegroups.com, tr...@redhat.com
Hello,

syzbot found the following issue on:

HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13315856880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=173614d1880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/82aa7741098d/disk-1621b6ea.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/f6be08c4e4c2/vmlinux-1621b6ea.xz
kernel image: https://storage.googleapis.com/syzbot-assets/296b6946258a/Image-1621b6ea.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d07c65...@syzkaller.appspotmail.com

------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(!test_bit(class_idx, lock_classes_in_use))
WARNING: CPU: 1 PID: 3290 at kernel/locking/lockdep.c:5025 __lock_acquire+0x2758/0x3084
Modules linked in:
CPU: 1 PID: 3290 Comm: syz-executor317 Not tainted 6.1.0-rc4-syzkaller-31872-g1621b6eaebf7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __lock_acquire+0x2758/0x3084
lr : __lock_acquire+0x2754/0x3084 kernel/locking/lockdep.c:5025
sp : ffff800012e3b3e0
x29: ffff800012e3b4c0 x28: 0000000000000001 x27: ffff0000cb891a68
x26: ffff0000cb892450 x25: ffff0000cb892470 x24: ffff0000cb892470
x23: 00000000000000c0 x22: 0000000000000001 x21: 0000000000000000
x20: ffff0000cb891a40 x19: aaaaaa0000fb22ca x18: 0000000000000358
x17: ffff80000c04d83c x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000012 x12: ffff80000d86ff30
x11: ff808000081c06c8 x10: 0000000000000000 x9 : ddc86c2f228f9600
x8 : ddc86c2f228f9600 x7 : 4e5241575f534b43 x6 : ffff80000c01775c
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000100000000 x0 : 0000000000000000
Call trace:
__lock_acquire+0x2758/0x3084
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x284/0xe1c mm/hugetlb.c:5771
hugetlb_fault+0x3a0/0xdfc mm/hugetlb.c:5874
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x24/0x1f4 arch/arm64/lib/copy_from_user.S:77
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
irq event stamp: 941
hardirqs last enabled at (941): [<ffff80000c01c86c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last enabled at (941): [<ffff80000c01c86c>] _raw_spin_unlock_irq+0x3c/0x70 kernel/locking/spinlock.c:202
hardirqs last disabled at (940): [<ffff80000c01c66c>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (940): [<ffff80000c01c66c>] _raw_spin_lock_irq+0x34/0x9c kernel/locking/spinlock.c:170
softirqs last enabled at (744): [<ffff80000801c38c>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
softirqs last disabled at (742): [<ffff80000801c358>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19
---[ end trace 0000000000000000 ]---


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

Hillf Danton

unread,
Nov 12, 2022, 10:16:34 PM11/12/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 12 Nov 2022 06:03:46 -0800
> syzbot found the following issue on:
>
> HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
> dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
Add debug info.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 1621b6eaebf7

--- x/mm/hugetlb.c
+++ h/mm/hugetlb.c
@@ -5575,7 +5575,7 @@ static vm_fault_t hugetlb_no_page(struct
struct vm_area_struct *vma,
struct address_space *mapping, pgoff_t idx,
unsigned long address, pte_t *ptep,
- pte_t old_pte, unsigned int flags)
+ pte_t old_pte, unsigned int flags, struct mutex *flt_mutex)
{
struct hstate *h = hstate_vma(vma);
vm_fault_t ret = VM_FAULT_SIGBUS;
@@ -5768,6 +5768,7 @@ static vm_fault_t hugetlb_no_page(struct
unlock_page(page);
out:
hugetlb_vma_unlock_read(vma);
+ BUG_ON(flt_mutex != &hugetlb_fault_mutex_table[hash]);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
return ret;

@@ -5820,6 +5821,7 @@ vm_fault_t hugetlb_fault(struct mm_struc
struct address_space *mapping;
int need_wait_lock = 0;
unsigned long haddr = address & huge_page_mask(h);
+ struct mutex *flt_mutex;

ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
if (ptep) {
@@ -5846,6 +5848,7 @@ vm_fault_t hugetlb_fault(struct mm_struc
idx = vma_hugecache_offset(h, vma, haddr);
hash = hugetlb_fault_mutex_hash(mapping, idx);
mutex_lock(&hugetlb_fault_mutex_table[hash]);
+ flt_mutex = &hugetlb_fault_mutex_table[hash];

/*
* Acquire vma lock before calling huge_pte_alloc and hold
@@ -5872,7 +5875,7 @@ vm_fault_t hugetlb_fault(struct mm_struc
* mutex internally, which make us return immediately.
*/
return hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
- entry, flags);
+ entry, flags, flt_mutex);

ret = 0;

--

syzbot

unread,
Nov 12, 2022, 11:41:32 PM11/12/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: unable to handle kernel paging request in hugetlb_no_page

Unable to handle kernel paging request at virtual address 1fff800003441a18
Mem abort info:
ESR = 0x0000000096000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
[1fff800003441a18] address between user and kernel address ranges
Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 4269 Comm: syz-executor.2 Not tainted 6.1.0-rc4-syzkaller-00039-g1621b6eaebf7-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : generic_test_bit include/asm-generic/bitops/generic-non-atomic.h:128 [inline]
pc : __lock_acquire+0x654/0x3084 kernel/locking/lockdep.c:5025
lr : mark_usage kernel/locking/lockdep.c:4555 [inline]
lr : __lock_acquire+0x630/0x3084 kernel/locking/lockdep.c:5009
sp : ffff8000131033d0
x29: ffff8000131034b0 x28: 0000000000000001 x27: ffff0000d2c89a68
x26: ffff0000d2c8a450 x25: ffff0000d2c8a470 x24: ffff0000d2c8a470
x23: 00000000000000c0 x22: 0000000000000001 x21: 0000000000000000
x20: ffff0000d2c89a40 x19: 555554aaabb2c422 x18: 00000000000000c0
x17: ffff80000dcdc198 x16: ffff80000db1a158 x15: ffff0000d2c89a40
x14: 0000000000000018 x13: ffff80000819fba0 x12: 00000000c73c5909
x11: ff808000095f17a4 x10: ffff80000dcdc198 x9 : 1ffffffff5765880
x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff80000801154c
x5 : 4c1501080080ffff x4 : ffff80000801154c x3 : 4c1501080080ffff
x2 : fffffffffffffff8 x1 : ffff80000cc75907 x0 : 0000000000000001
Call trace:
generic_test_bit include/asm-generic/bitops/generic-non-atomic.h:128 [inline]
__lock_acquire+0x654/0x3084 kernel/locking/lockdep.c:5025
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x298/0xe38 mm/hugetlb.c:5772
hugetlb_fault+0x3d0/0xe30 mm/hugetlb.c:5877
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x1bc/0x1f4 arch/arm64/lib/copy_from_user.S:214
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
Code: 350000e8 93407e69 d343fd29 927de529 (f8696949)
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
0: 350000e8 cbnz w8, 0x1c
4: 93407e69 sxtw x9, w19
8: d343fd29 lsr x9, x9, #3
c: 927de529 and x9, x9, #0x1ffffffffffffff8
* 10: f8696949 ldr x9, [x10, x9] <-- trapping instruction


Tested on:

commit: 1621b6ea Merge branch 'for-next/fixes' into for-kernelci
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=12018ac1880000
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=12eb8b71880000

Hillf Danton

unread,
Nov 13, 2022, 12:58:57 AM11/13/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 12 Nov 2022 06:03:46 -0800
> syzbot found the following issue on:
>
> HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
> dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
See if changing lock class key could make a difference.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 1621b6eaebf7

--- x/mm/hugetlb.c
+++ h/mm/hugetlb.c
@@ -4043,6 +4043,7 @@ static void __init hugetlb_sysfs_init(vo

static int __init hugetlb_init(void)
{
+ struct lock_class_key *key;
int i;

BUILD_BUG_ON(sizeof_field(struct page, private) * BITS_PER_BYTE <
@@ -4107,8 +4108,15 @@ static int __init hugetlb_init(void)
GFP_KERNEL);
BUG_ON(!hugetlb_fault_mutex_table);

- for (i = 0; i < num_fault_mutexes; i++)
+ key = kmalloc_array(num_fault_mutexes, sizeof(struct lock_class_key),
+ GFP_KERNEL);
+ BUG_ON(!key);
+
+ for (i = 0; i < num_fault_mutexes; i++) {
mutex_init(&hugetlb_fault_mutex_table[i]);
+ lockdep_register_key(&key[i]);
+ lockdep_set_class(&hugetlb_fault_mutex_table[i], &key[i]);
+ }
return 0;
}
subsys_initcall(hugetlb_init);
--

syzbot

unread,
Nov 13, 2022, 5:09:21 AM11/13/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING: locking bug in hugetlb_no_page

------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 1 PID: 3786 at kernel/locking/lockdep.c:231 check_wait_context kernel/locking/lockdep.c:4729 [inline]
WARNING: CPU: 1 PID: 3786 at kernel/locking/lockdep.c:231 __lock_acquire+0x2b0/0x3084 kernel/locking/lockdep.c:5005
Modules linked in:
CPU: 1 PID: 3786 Comm: syz-executor.1 Not tainted 6.1.0-rc4-syzkaller-00039-g1621b6eaebf7-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : check_wait_context kernel/locking/lockdep.c:4729 [inline]
pc : __lock_acquire+0x2b0/0x3084 kernel/locking/lockdep.c:5005
lr : hlock_class kernel/locking/lockdep.c:231 [inline]
lr : check_wait_context kernel/locking/lockdep.c:4729 [inline]
lr : __lock_acquire+0x298/0x3084 kernel/locking/lockdep.c:5005
sp : ffff80001301b3e0
x29: ffff80001301b4c0 x28: 0000000000000001 x27: ffff0000cfbbb4a8
x26: ffff0000d342ea78 x25: ffff0000cfbbbeb0 x24: 0000000000000000
x23: 0000000000000000
x22: 0000000000000001 x21: 0000000000000000
x20: 0000000000000001 x19: aaaaaa0001076c5e
x18: 00000000000000c0
x17: ffff80000dcdc198 x16: ffff80000db1a158 x15: ffff0000cfbbb480
x14: 0000000000000000 x13: 0000000000000012 x12: ffff80000d86ff30
x11: ff808000081c06c8
x10: ffff80000dcdc198
x9 : 3a226953cce2cb00

x8 : 0000000000000000 x7 : 4e5241575f534b43
x6 : ffff80000c01775c
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000

x2 : 0000000000000000
x1 : 0000000100000000
x0 : 0000000000000016
Call trace:
check_wait_context kernel/locking/lockdep.c:4729 [inline]
__lock_acquire+0x2b0/0x3084 kernel/locking/lockdep.c:5005
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x284/0xe1c mm/hugetlb.c:5779
hugetlb_fault+0x3a0/0xdfc mm/hugetlb.c:5882
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x24/0x1f4 arch/arm64/lib/copy_from_user.S:77
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
irq event stamp: 41
hardirqs last enabled at (41): [<ffff80000c01c86c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last enabled at (41): [<ffff80000c01c86c>] _raw_spin_unlock_irq+0x3c/0x70 kernel/locking/spinlock.c:202
hardirqs last disabled at (40): [<ffff80000c01c66c>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (40): [<ffff80000c01c66c>] _raw_spin_lock_irq+0x34/0x9c kernel/locking/spinlock.c:170
softirqs last enabled at (8): [<ffff80000801c38c>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
softirqs last disabled at (6): [<ffff80000801c358>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19
---[ end trace 0000000000000000 ]---
BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:597
in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 3786, name: syz-executor.1
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 41
hardirqs last enabled at (41): [<ffff80000c01c86c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last enabled at (41): [<ffff80000c01c86c>] _raw_spin_unlock_irq+0x3c/0x70 kernel/locking/spinlock.c:202
hardirqs last disabled at (40): [<ffff80000c01c66c>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (40): [<ffff80000c01c66c>] _raw_spin_lock_irq+0x34/0x9c kernel/locking/spinlock.c:170
softirqs last enabled at (8): [<ffff80000801c38c>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
softirqs last disabled at (6): [<ffff80000801c358>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19
CPU: 1 PID: 3786 Comm: syz-executor.1 Tainted: G W 6.1.0-rc4-syzkaller-00039-g1621b6eaebf7-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
Call trace:
dump_backtrace+0x1c4/0x1f0 arch/arm64/kernel/stacktrace.c:156
show_stack+0x2c/0x54 arch/arm64/kernel/stacktrace.c:163
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x104/0x16c lib/dump_stack.c:106
dump_stack+0x1c/0x58 lib/dump_stack.c:113
__might_resched+0x208/0x218 kernel/sched/core.c:9890
__might_sleep+0x48/0x78 kernel/sched/core.c:9819
do_page_fault+0x214/0x79c arch/arm64/mm/fault.c:597
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
hlock_class kernel/locking/lockdep.c:222 [inline]
check_wait_context kernel/locking/lockdep.c:4730 [inline]
__lock_acquire+0x2d0/0x3084 kernel/locking/lockdep.c:5005
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x284/0xe1c mm/hugetlb.c:5779
hugetlb_fault+0x3a0/0xdfc mm/hugetlb.c:5882
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x24/0x1f4 arch/arm64/lib/copy_from_user.S:77
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b8
Mem abort info:
ESR = 0x0000000096000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000011345f000
[00000000000000b8] pgd=080000011347a003, p4d=080000011347a003, pud=080000011347e003, pmd=0000000000000000
Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 3786 Comm: syz-executor.1 Tainted: G W 6.1.0-rc4-syzkaller-00039-g1621b6eaebf7-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : check_wait_context kernel/locking/lockdep.c:4729 [inline]
pc : __lock_acquire+0x2d0/0x3084 kernel/locking/lockdep.c:5005
lr : hlock_class kernel/locking/lockdep.c:231 [inline]
lr : check_wait_context kernel/locking/lockdep.c:4729 [inline]
lr : __lock_acquire+0x298/0x3084 kernel/locking/lockdep.c:5005
sp : ffff80001301b3e0
x29: ffff80001301b4c0 x28: 0000000000000001 x27: ffff0000cfbbb4a8
x26: ffff0000d342ea78 x25: ffff0000cfbbbeb0 x24: 0000000000000000
x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000
x20: 0000000000000001 x19: aaaaaa0001076c5e x18: 00000000000000c0
x17: ffff80000dcdc198 x16: ffff80000db1a158 x15: ffff0000cfbbb480
x14: 0000000000000000 x13: 0000000000000012 x12: ffff80000d86ff30
x11: ff808000081c06c8 x10: ffff80000dcdc198 x9 : 0000000000050c5e
x8 : 0000000000000000 x7 : 4e5241575f534b43 x6 : ffff80000c01775c
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000100000000 x0 : 0000000000000016
Call trace:
hlock_class kernel/locking/lockdep.c:222 [inline]
check_wait_context kernel/locking/lockdep.c:4730 [inline]
__lock_acquire+0x2d0/0x3084 kernel/locking/lockdep.c:5005
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x284/0xe1c mm/hugetlb.c:5779
hugetlb_fault+0x3a0/0xdfc mm/hugetlb.c:5882
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x24/0x1f4 arch/arm64/lib/copy_from_user.S:77
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
Code: d002da0a 91056210 9106614a b9400329 (3942e114)
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
0: d002da0a adrp x10, 0x5b42000
4: 91056210 add x16, x16, #0x158
8: 9106614a add x10, x10, #0x198
c: b9400329 ldr w9, [x25]
* 10: 3942e114 ldrb w20, [x8, #184] <-- trapping instruction


Tested on:

commit: 1621b6ea Merge branch 'for-next/fixes' into for-kernelci
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=15b39c85880000
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=14fbe185880000

Hillf Danton

unread,
Nov 13, 2022, 6:53:03 AM11/13/22
to syzbot, linux-...@vger.kernel.org, syzkall...@googlegroups.com
On 12 Nov 2022 06:03:46 -0800
> syzbot found the following issue on:
>
> HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
> dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
See if it could make a difference to turn vma lock to noop.

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 1621b6eaebf7

--- x/mm/hugetlb.c
+++ h/mm/hugetlb.c
@@ -6810,6 +6810,7 @@ static bool __vma_shareable_flags_pmd(st

void hugetlb_vma_lock_read(struct vm_area_struct *vma)
{
+ return;
if (__vma_shareable_flags_pmd(vma)) {
struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;

@@ -6819,6 +6820,7 @@ void hugetlb_vma_lock_read(struct vm_are

void hugetlb_vma_unlock_read(struct vm_area_struct *vma)
{
+ return;
if (__vma_shareable_flags_pmd(vma)) {
struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;

--

Dmitry Vyukov

unread,
Nov 13, 2022, 10:36:30 AM11/13/22
to syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, ll...@lists.linux.dev, mike.k...@oracle.com, nat...@kernel.org, ndesau...@google.com, songm...@bytedance.com, syzkall...@googlegroups.com, tr...@redhat.com, Hillf Danton
On Sat, 12 Nov 2022 at 15:03, syzbot
<syzbot+d07c65...@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
> dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> userspace arch: arm64
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13315856880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=173614d1880000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/82aa7741098d/disk-1621b6ea.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/f6be08c4e4c2/vmlinux-1621b6ea.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/296b6946258a/Image-1621b6ea.gz.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d07c65...@syzkaller.appspotmail.com

This may have the same root cause as:

possible deadlock in hugetlb_fault
https://lore.kernel.org/all/CACT4Y+ZWNV6ApzEv0UrsF2T8...@mail.gmail.com/

and there is a potential explanation as to what may be the problem.

Mike Kravetz

unread,
Nov 13, 2022, 1:50:49 PM11/13/22
to Dmitry Vyukov, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, songm...@bytedance.com, syzkall...@googlegroups.com, tr...@redhat.com, Hillf Danton
On 11/13/22 16:36, Dmitry Vyukov wrote:
> On Sat, 12 Nov 2022 at 15:03, syzbot
> <syzbot+d07c65...@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
> > git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
> > compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> > userspace arch: arm64
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13315856880000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=173614d1880000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/82aa7741098d/disk-1621b6ea.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/f6be08c4e4c2/vmlinux-1621b6ea.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/296b6946258a/Image-1621b6ea.gz.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+d07c65...@syzkaller.appspotmail.com
>
> This may have the same root cause as:
>
> possible deadlock in hugetlb_fault
> https://lore.kernel.org/all/CACT4Y+ZWNV6ApzEv0UrsF2T8...@mail.gmail.com/
>
> and there is a potential explanation as to what may be the problem.

Thanks Dmitry!

An issue with this new hugetlb locking was previously reported and I have been
working on a solution. When I look at the reproducer, I see that it is calling
madvise(MADV_DONTNEED). This triggers the other issue and could certainly
cause the issue reported here.

Proposed patches are here and in next-20221111:
https://lore.kernel.org/linux-mm/20221111232628.290...@oracle.com/

I am currently trying to run the reproducer, but it is not reproducing quickly.
Since this is a timing issue that as expected. Interesting that this
report is run on arm64 and I am trying to reproduce on x86. Although, the
issue is not architecture specific in any way.

I'll keep looking, but am fairly confident this is the root cause.
--
Mike Kravetz

syzbot

unread,
Nov 13, 2022, 3:30:28 PM11/13/22
to hda...@sina.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P3507 } 2668 jiffies s: 2069 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: 1621b6ea Merge branch 'for-next/fixes' into for-kernelci
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=108ca515880000
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=174f46d1880000

Hillf Danton

unread,
Nov 13, 2022, 8:26:57 PM11/13/22
to Mike Kravetz, Dmitry Vyukov, syzbot, linux-...@vger.kernel.org, linu...@kvack.org, syzkall...@googlegroups.com
On 13 Nov 2022 10:50:37 -0800 Mike Kravetz <mike.k...@oracle.com>
Thanks for your fix.
>
> I am currently trying to run the reproducer, but it is not reproducing quickly.
> Since this is a timing issue that as expected. Interesting that this
> report is run on arm64 and I am trying to reproduce on x86. Although, the
> issue is not architecture specific in any way.

Syzbot is good at testing patches and take a look at [1,2] for submitting
patch to the bot. Have fun.

[1] https://lore.kernel.org/lkml/YtlbkmVG...@rowland.harvard.edu/
[2] https://lore.kernel.org/lkml/fa23ffc2-755e-7e04...@kernel.dk/

BTW I prefer Alan's way with patch directly attached in response to the report.

Hillf

Mike Kravetz

unread,
Nov 13, 2022, 9:24:31 PM11/13/22
to Dmitry Vyukov, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, songm...@bytedance.com, syzkall...@googlegroups.com, tr...@redhat.com, Hillf Danton
After tweaking my config, I was able to reliably reproduce.

> I'll keep looking, but am fairly confident this is the root cause.

I was also able to verify the series above addresses the issue.

--
Mike Kravetz

Dmitry Vyukov

unread,
Nov 14, 2022, 4:59:23 AM11/14/22
to Mike Kravetz, syzbot, ak...@linux-foundation.org, linux-...@vger.kernel.org, linu...@kvack.org, ll...@lists.linux.dev, nat...@kernel.org, ndesau...@google.com, songm...@bytedance.com, syzkall...@googlegroups.com, tr...@redhat.com, Hillf Danton
Let's tell syzbot about the fix so that it reports similar issues in future:

#syz fix:
hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
Reply all
Reply to author
Forward
0 new messages