possible deadlock in __wake_up_common_lock

101 views
Skip to first unread message

syzbot

unread,
Jan 2, 2019, 3:51:04 AM1/2/19
to aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, vba...@suse.cz, xieyi...@huawei.com, zhong...@huawei.com
Hello,

syzbot found the following crash on:

HEAD commit: f346b0becb1b Merge branch 'akpm' (patches from Andrew)
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1510cefd400000
kernel config: https://syzkaller.appspot.com/x/.config?x=c255c77ba370fe7c
dashboard link: https://syzkaller.appspot.com/bug?extid=93d94a001cfbce9e60e1
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
userspace arch: i386

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+93d94a...@syzkaller.appspotmail.com


======================================================
WARNING: possible circular locking dependency detected
4.20.0+ #297 Not tainted
------------------------------------------------------
syz-executor0/8529 is trying to acquire lock:
000000005e7fb829 (&pgdat->kswapd_wait){....}, at:
__wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120

but task is already holding lock:
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock
include/linux/spinlock.h:329 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk
mm/page_alloc.c:2548 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist
mm/page_alloc.c:3021 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist
mm/page_alloc.c:3050 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue
mm/page_alloc.c:3072 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at:
get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #4 (&(&zone->lock)->rlock){-.-.}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
rmqueue mm/page_alloc.c:3082 [inline]
get_page_from_freelist+0x9eb/0x52a0 mm/page_alloc.c:3491
__alloc_pages_nodemask+0x4f3/0xde0 mm/page_alloc.c:4529
__alloc_pages include/linux/gfp.h:473 [inline]
alloc_page_interleave+0x25/0x1c0 mm/mempolicy.c:1988
alloc_pages_current+0x1bf/0x210 mm/mempolicy.c:2104
alloc_pages include/linux/gfp.h:509 [inline]
depot_save_stack+0x3f1/0x470 lib/stackdepot.c:260
save_stack+0xa9/0xd0 mm/kasan/common.c:79
set_track mm/kasan/common.c:85 [inline]
kasan_kmalloc+0xcb/0xd0 mm/kasan/common.c:482
kasan_slab_alloc+0x12/0x20 mm/kasan/common.c:397
kmem_cache_alloc+0x130/0x730 mm/slab.c:3541
kmem_cache_zalloc include/linux/slab.h:731 [inline]
fill_pool lib/debugobjects.c:134 [inline]
__debug_object_init+0xbb8/0x1290 lib/debugobjects.c:379
debug_object_init lib/debugobjects.c:431 [inline]
debug_object_activate+0x323/0x600 lib/debugobjects.c:512
debug_timer_activate kernel/time/timer.c:708 [inline]
debug_activate kernel/time/timer.c:763 [inline]
__mod_timer kernel/time/timer.c:1040 [inline]
mod_timer kernel/time/timer.c:1101 [inline]
add_timer+0x50e/0x1490 kernel/time/timer.c:1137
__queue_delayed_work+0x249/0x380 kernel/workqueue.c:1533
queue_delayed_work_on+0x1a2/0x1f0 kernel/workqueue.c:1558
queue_delayed_work include/linux/workqueue.h:527 [inline]
schedule_delayed_work include/linux/workqueue.h:628 [inline]
start_dirtytime_writeback+0x4e/0x53 fs/fs-writeback.c:2043
do_one_initcall+0x145/0x957 init/main.c:889
do_initcall_level init/main.c:957 [inline]
do_initcalls init/main.c:965 [inline]
do_basic_setup init/main.c:983 [inline]
kernel_init_freeable+0x4c1/0x5af init/main.c:1136
kernel_init+0x11/0x1ae init/main.c:1056
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

-> #3 (&base->lock){-.-.}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
lock_timer_base+0xbb/0x2b0 kernel/time/timer.c:937
__mod_timer kernel/time/timer.c:1009 [inline]
mod_timer kernel/time/timer.c:1101 [inline]
add_timer+0x895/0x1490 kernel/time/timer.c:1137
__queue_delayed_work+0x249/0x380 kernel/workqueue.c:1533
queue_delayed_work_on+0x1a2/0x1f0 kernel/workqueue.c:1558
queue_delayed_work include/linux/workqueue.h:527 [inline]
schedule_delayed_work include/linux/workqueue.h:628 [inline]
psi_group_change kernel/sched/psi.c:485 [inline]
psi_task_change+0x3f1/0x5f0 kernel/sched/psi.c:534
psi_enqueue kernel/sched/stats.h:82 [inline]
enqueue_task kernel/sched/core.c:727 [inline]
activate_task+0x21a/0x430 kernel/sched/core.c:751
wake_up_new_task+0x527/0xd20 kernel/sched/core.c:2423
_do_fork+0x33b/0x11d0 kernel/fork.c:2247
kernel_thread+0x34/0x40 kernel/fork.c:2281
rest_init+0x28/0x372 init/main.c:409
arch_call_rest_init+0xe/0x1b
start_kernel+0x873/0x8ae init/main.c:741
x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:470
x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:451
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

-> #2 (&rq->lock){-.-.}:
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x2d/0x40 kernel/locking/spinlock.c:144
rq_lock kernel/sched/sched.h:1149 [inline]
task_fork_fair+0xb0/0x6d0 kernel/sched/fair.c:10083
sched_fork+0x443/0xba0 kernel/sched/core.c:2359
copy_process+0x25b9/0x8790 kernel/fork.c:1893
_do_fork+0x1cb/0x11d0 kernel/fork.c:2222
kernel_thread+0x34/0x40 kernel/fork.c:2281
rest_init+0x28/0x372 init/main.c:409
arch_call_rest_init+0xe/0x1b
start_kernel+0x873/0x8ae init/main.c:741
x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:470
x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:451
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

-> #1 (&p->pi_lock){-.-.}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
try_to_wake_up+0xdc/0x1460 kernel/sched/core.c:1965
default_wake_function+0x30/0x50 kernel/sched/core.c:3710
autoremove_wake_function+0x80/0x370 kernel/sched/wait.c:375
__wake_up_common+0x1d7/0x7d0 kernel/sched/wait.c:92
__wake_up_common_lock+0x1c2/0x330 kernel/sched/wait.c:121
__wake_up+0xe/0x10 kernel/sched/wait.c:145
wakeup_kswapd+0x5f0/0x930 mm/vmscan.c:3982
wake_all_kswapds+0x150/0x300 mm/page_alloc.c:3975
__alloc_pages_slowpath+0x1ff1/0x2db0 mm/page_alloc.c:4246
__alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
alloc_pages include/linux/gfp.h:509 [inline]
__get_free_pages+0xc/0x40 mm/page_alloc.c:4573
pte_alloc_one_kernel+0x15/0x20 arch/x86/mm/pgtable.c:28
__pte_alloc_kernel+0x23/0x220 mm/memory.c:439
vmap_pte_range mm/vmalloc.c:144 [inline]
vmap_pmd_range mm/vmalloc.c:171 [inline]
vmap_pud_range mm/vmalloc.c:188 [inline]
vmap_p4d_range mm/vmalloc.c:205 [inline]
vmap_page_range_noflush+0x878/0xa80 mm/vmalloc.c:230
vmap_page_range mm/vmalloc.c:243 [inline]
vm_map_ram+0x46c/0xf60 mm/vmalloc.c:1181
ion_heap_clear_pages+0x2a/0x70
drivers/staging/android/ion/ion_heap.c:100
ion_heap_sglist_zero+0x24f/0x2d0
drivers/staging/android/ion/ion_heap.c:121
ion_heap_buffer_zero+0xf8/0x150
drivers/staging/android/ion/ion_heap.c:143
ion_system_heap_free+0x227/0x290
drivers/staging/android/ion/ion_system_heap.c:163
ion_buffer_destroy+0x15c/0x1c0 drivers/staging/android/ion/ion.c:119
_ion_heap_freelist_drain+0x43e/0x6a0
drivers/staging/android/ion/ion_heap.c:199
ion_heap_freelist_drain+0x1f/0x30
drivers/staging/android/ion/ion_heap.c:209
ion_buffer_create drivers/staging/android/ion/ion.c:86 [inline]
ion_alloc+0x487/0xa60 drivers/staging/android/ion/ion.c:409
ion_ioctl+0x216/0x41e drivers/staging/android/ion/ion-ioctl.c:76
__do_compat_sys_ioctl fs/compat_ioctl.c:1052 [inline]
__se_compat_sys_ioctl fs/compat_ioctl.c:998 [inline]
__ia32_compat_sys_ioctl+0x20e/0x630 fs/compat_ioctl.c:998
do_syscall_32_irqs_on arch/x86/entry/common.c:326 [inline]
do_fast_syscall_32+0x34d/0xfb2 arch/x86/entry/common.c:397
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139

-> #0 (&pgdat->kswapd_wait){....}:
lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3841
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
__wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120
__wake_up+0xe/0x10 kernel/sched/wait.c:145
wakeup_kswapd+0x5f0/0x930 mm/vmscan.c:3982
steal_suitable_fallback+0x538/0x830 mm/page_alloc.c:2217
__rmqueue_fallback mm/page_alloc.c:2502 [inline]
__rmqueue mm/page_alloc.c:2528 [inline]
rmqueue_bulk mm/page_alloc.c:2550 [inline]
__rmqueue_pcplist mm/page_alloc.c:3021 [inline]
rmqueue_pcplist mm/page_alloc.c:3050 [inline]
rmqueue mm/page_alloc.c:3072 [inline]
get_page_from_freelist+0x318c/0x52a0 mm/page_alloc.c:3491
__alloc_pages_nodemask+0x4f3/0xde0 mm/page_alloc.c:4529
alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
alloc_pages include/linux/gfp.h:509 [inline]
__get_free_pages+0xc/0x40 mm/page_alloc.c:4573
tlb_next_batch mm/mmu_gather.c:29 [inline]
__tlb_remove_page_size+0x2e5/0x500 mm/mmu_gather.c:133
__tlb_remove_page include/asm-generic/tlb.h:187 [inline]
zap_pte_range mm/memory.c:1093 [inline]
zap_pmd_range mm/memory.c:1192 [inline]
zap_pud_range mm/memory.c:1221 [inline]
zap_p4d_range mm/memory.c:1242 [inline]
unmap_page_range+0xf88/0x25b0 mm/memory.c:1263
unmap_single_vma+0x19b/0x310 mm/memory.c:1308
unmap_vmas+0x221/0x390 mm/memory.c:1339
exit_mmap+0x2be/0x590 mm/mmap.c:3140
__mmput kernel/fork.c:1051 [inline]
mmput+0x247/0x610 kernel/fork.c:1072
exit_mm kernel/exit.c:545 [inline]
do_exit+0xdeb/0x2620 kernel/exit.c:854
do_group_exit+0x177/0x440 kernel/exit.c:970
get_signal+0x8b0/0x1980 kernel/signal.c:2517
do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
do_syscall_32_irqs_on arch/x86/entry/common.c:341 [inline]
do_fast_syscall_32+0xcd5/0xfb2 arch/x86/entry/common.c:397
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139

other info that might help us debug this:

Chain exists of:
&pgdat->kswapd_wait --> &base->lock --> &(&zone->lock)->rlock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&(&zone->lock)->rlock);
lock(&base->lock);
lock(&(&zone->lock)->rlock);
lock(&pgdat->kswapd_wait);

*** DEADLOCK ***

2 locks held by syz-executor0/8529:
#0: 000000001be7b4ca (&(ptlock_ptr(page))->rlock#2){+.+.}, at: spin_lock
include/linux/spinlock.h:329 [inline]
#0: 000000001be7b4ca (&(ptlock_ptr(page))->rlock#2){+.+.}, at:
zap_pte_range mm/memory.c:1051 [inline]
#0: 000000001be7b4ca (&(ptlock_ptr(page))->rlock#2){+.+.}, at:
zap_pmd_range mm/memory.c:1192 [inline]
#0: 000000001be7b4ca (&(ptlock_ptr(page))->rlock#2){+.+.}, at:
zap_pud_range mm/memory.c:1221 [inline]
#0: 000000001be7b4ca (&(ptlock_ptr(page))->rlock#2){+.+.}, at:
zap_p4d_range mm/memory.c:1242 [inline]
#0: 000000001be7b4ca (&(ptlock_ptr(page))->rlock#2){+.+.}, at:
unmap_page_range+0x98e/0x25b0 mm/memory.c:1263
#1: 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock
include/linux/spinlock.h:329 [inline]
#1: 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk
mm/page_alloc.c:2548 [inline]
#1: 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist
mm/page_alloc.c:3021 [inline]
#1: 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist
mm/page_alloc.c:3050 [inline]
#1: 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue
mm/page_alloc.c:3072 [inline]
#1: 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at:
get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491

stack backtrace:
CPU: 0 PID: 8529 Comm: syz-executor0 Not tainted 4.20.0+ #297
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
print_circular_bug.isra.34.cold.56+0x1bd/0x27d
kernel/locking/lockdep.c:1224
check_prev_add kernel/locking/lockdep.c:1866 [inline]
check_prevs_add kernel/locking/lockdep.c:1979 [inline]
validate_chain kernel/locking/lockdep.c:2350 [inline]
__lock_acquire+0x3360/0x4c20 kernel/locking/lockdep.c:3338
lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3841
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
__wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120
__wake_up+0xe/0x10 kernel/sched/wait.c:145
wakeup_kswapd+0x5f0/0x930 mm/vmscan.c:3982
steal_suitable_fallback+0x538/0x830 mm/page_alloc.c:2217
__rmqueue_fallback mm/page_alloc.c:2502 [inline]
__rmqueue mm/page_alloc.c:2528 [inline]
rmqueue_bulk mm/page_alloc.c:2550 [inline]
__rmqueue_pcplist mm/page_alloc.c:3021 [inline]
rmqueue_pcplist mm/page_alloc.c:3050 [inline]
rmqueue mm/page_alloc.c:3072 [inline]
get_page_from_freelist+0x318c/0x52a0 mm/page_alloc.c:3491
__alloc_pages_nodemask+0x4f3/0xde0 mm/page_alloc.c:4529
alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
alloc_pages include/linux/gfp.h:509 [inline]
__get_free_pages+0xc/0x40 mm/page_alloc.c:4573
tlb_next_batch mm/mmu_gather.c:29 [inline]
__tlb_remove_page_size+0x2e5/0x500 mm/mmu_gather.c:133
__tlb_remove_page include/asm-generic/tlb.h:187 [inline]
zap_pte_range mm/memory.c:1093 [inline]
zap_pmd_range mm/memory.c:1192 [inline]
zap_pud_range mm/memory.c:1221 [inline]
zap_p4d_range mm/memory.c:1242 [inline]
unmap_page_range+0xf88/0x25b0 mm/memory.c:1263
unmap_single_vma+0x19b/0x310 mm/memory.c:1308
unmap_vmas+0x221/0x390 mm/memory.c:1339
exit_mmap+0x2be/0x590 mm/mmap.c:3140
__mmput kernel/fork.c:1051 [inline]
mmput+0x247/0x610 kernel/fork.c:1072
exit_mm kernel/exit.c:545 [inline]
do_exit+0xdeb/0x2620 kernel/exit.c:854
do_group_exit+0x177/0x440 kernel/exit.c:970
get_signal+0x8b0/0x1980 kernel/signal.c:2517
do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
exit_to_usermode_loop+0x2e5/0x380 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
do_syscall_32_irqs_on arch/x86/entry/common.c:341 [inline]
do_fast_syscall_32+0xcd5/0xfb2 arch/x86/entry/common.c:397
entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fe3849
Code: Bad RIP value.
RSP: 002b:00000000f5f9d0cc EFLAGS: 00000296 ORIG_RAX: 0000000000000036
RAX: 0000000000000000 RBX: 0000000000000005 RCX: 00000000c0184900
RDX: 0000000020000080 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
syz-executor0 (8529) used greatest stack depth: 10424 bytes left
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop2' (00000000c253515f): kobject_uevent_env
kobject: 'loop2' (00000000c253515f): fill_kobj_path: path
= '/devices/virtual/block/loop2'
kobject: 'loop4' (00000000ebe25695): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): fill_kobj_path: path
= '/devices/virtual/block/loop4'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop2' (00000000c253515f): kobject_uevent_env
kobject: 'loop2' (00000000c253515f): fill_kobj_path: path
= '/devices/virtual/block/loop2'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop4' (00000000ebe25695): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): fill_kobj_path: path
= '/devices/virtual/block/loop4'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): fill_kobj_path: path
= '/devices/virtual/block/loop4'
audit: type=1326 audit(1546069676.863:33): auid=4294967295 uid=0 gid=0
ses=4294967295 subj==unconfined pid=8664 comm="syz-executor1"
exe="/root/syz-executor1" sig=31 arch=40000003 syscall=265 compat=1
ip=0xf7f82849 code=0x0
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop4' (00000000ebe25695): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): fill_kobj_path: path
= '/devices/virtual/block/loop4'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop4' (00000000ebe25695): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): fill_kobj_path: path
= '/devices/virtual/block/loop4'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop2' (00000000c253515f): kobject_uevent_env
kobject: 'loop2' (00000000c253515f): fill_kobj_path: path
= '/devices/virtual/block/loop2'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop2' (00000000c253515f): kobject_uevent_env
kobject: 'loop2' (00000000c253515f): fill_kobj_path: path
= '/devices/virtual/block/loop2'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop2' (00000000c253515f): kobject_uevent_env
kobject: 'loop2' (00000000c253515f): fill_kobj_path: path
= '/devices/virtual/block/loop2'
kobject: 'loop4' (00000000ebe25695): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): fill_kobj_path: path
= '/devices/virtual/block/loop4'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop4' (00000000ebe25695): fill_kobj_path: path
= '/devices/virtual/block/loop4'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'loop2' (00000000c253515f): kobject_uevent_env
kobject: 'loop2' (00000000c253515f): fill_kobj_path: path
= '/devices/virtual/block/loop2'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop4' (00000000ebe25695): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): fill_kobj_path: path
= '/devices/virtual/block/loop4'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop5' (00000000c7588ca8): kobject_uevent_env
kobject: 'loop5' (00000000c7588ca8): fill_kobj_path: path
= '/devices/virtual/block/loop5'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop2' (00000000c253515f): kobject_uevent_env
kobject: 'loop2' (00000000c253515f): fill_kobj_path: path
= '/devices/virtual/block/loop2'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop4' (00000000ebe25695): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop4' (00000000ebe25695): fill_kobj_path: path
= '/devices/virtual/block/loop4'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'loop0' (000000002925f66c): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop0' (000000002925f66c): fill_kobj_path: path
= '/devices/virtual/block/loop0'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): kobject_uevent_env
kobject: 'loop3' (0000000061a5b8df): fill_kobj_path: path
= '/devices/virtual/block/loop3'
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'kvm' (00000000eddbbf94): kobject_uevent_env
kobject: 'kvm' (00000000eddbbf94): fill_kobj_path: path
= '/devices/virtual/misc/kvm'
kobject: 'loop1' (0000000003dfbc9f): kobject_uevent_env
kobject: 'loop1' (0000000003dfbc9f): fill_kobj_path: path
= '/devices/virtual/block/loop1'
kobject: 'loop2' (00000000c253515f): kobject_uevent_env
kobject: 'loop2' (00000000c253515f): fill_kobj_path: path
= '/devices/virtual/block/loop2'
WARNING: CPU: 0 PID: 8908 at net/bridge/netfilter/ebtables.c:2086
ebt_size_mwt net/bridge/netfilter/ebtables.c:2086 [inline]
WARNING: CPU: 0 PID: 8908 at net/bridge/netfilter/ebtables.c:2086
size_entry_mwt net/bridge/netfilter/ebtables.c:2167 [inline]
WARNING: CPU: 0 PID: 8908 at net/bridge/netfilter/ebtables.c:2086
compat_copy_entries+0x1088/0x1500 net/bridge/netfilter/ebtables.c:2206


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.

Vlastimil Babka

unread,
Jan 2, 2019, 7:51:04 AM1/2/19
to syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Mel Gorman, Peter Zijlstra, Ingo Molnar
On 1/2/19 9:51 AM, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: f346b0becb1b Merge branch 'akpm' (patches from Andrew)
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1510cefd400000
> kernel config: https://syzkaller.appspot.com/x/.config?x=c255c77ba370fe7c
> dashboard link: https://syzkaller.appspot.com/bug?extid=93d94a001cfbce9e60e1
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> userspace arch: i386
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+93d94a...@syzkaller.appspotmail.com
>
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.20.0+ #297 Not tainted
> ------------------------------------------------------
> syz-executor0/8529 is trying to acquire lock:
> 000000005e7fb829 (&pgdat->kswapd_wait){....}, at:
> __wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120

From the backtrace at the end of report I see it's coming from

> wakeup_kswapd+0x5f0/0x930 mm/vmscan.c:3982
> steal_suitable_fallback+0x538/0x830 mm/page_alloc.c:2217

This wakeup_kswapd is new due to Mel's 1c30844d2dfe ("mm: reclaim small
amounts of memory when an external fragmentation event occurs") so CC Mel.

> but task is already holding lock:
> 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock
> include/linux/spinlock.h:329 [inline]
> 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk
> mm/page_alloc.c:2548 [inline]
> 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist
> mm/page_alloc.c:3021 [inline]
> 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist
> mm/page_alloc.c:3050 [inline]
> 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue
> mm/page_alloc.c:3072 [inline]
> 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at:
> get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491
>
> which lock already depends on the new lock.

However, I don't understand why lockdep thinks it's a problem. IIRC it
doesn't like that we are locking pgdat->kswapd_wait.lock while holding
zone->lock. That means it has learned that the opposite order also
exists, e.g. somebody would take zone->lock while manipulating the wait
queue? I don't see where but I admit I'm not good at reading lockdep
splats, so CCing Peterz and Ingo as well. Keeping rest of mail for
reference.

Mel Gorman

unread,
Jan 2, 2019, 1:06:14 PM1/2/19
to Vlastimil Babka, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Peter Zijlstra, Ingo Molnar
New year new bugs :(

> > but task is already holding lock:
> > 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock
> > include/linux/spinlock.h:329 [inline]
> > 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk
> > mm/page_alloc.c:2548 [inline]
> > 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist
> > mm/page_alloc.c:3021 [inline]
> > 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist
> > mm/page_alloc.c:3050 [inline]
> > 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue
> > mm/page_alloc.c:3072 [inline]
> > 000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at:
> > get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491
> >
> > which lock already depends on the new lock.
>
> However, I don't understand why lockdep thinks it's a problem. IIRC it
> doesn't like that we are locking pgdat->kswapd_wait.lock while holding
> zone->lock. That means it has learned that the opposite order also
> exists, e.g. somebody would take zone->lock while manipulating the wait
> queue? I don't see where but I admit I'm not good at reading lockdep
> splats, so CCing Peterz and Ingo as well. Keeping rest of mail for
> reference.
>

I'm not sure I'm reading the output correctly because I'm having trouble
seeing the exact pattern that allows lockdep to conclude the lock ordering
is problematic.

I think it's hungup on the fact that mod_timer can allocate debug
objects for KASAN and somehow concludes that the waking of kswapd is
problematic because potentially a lock ordering exists that would trip.
I don't see how it's actually possible though due to either a lack of
imagination or maybe lockdep is being cautious as something could change
in the future that allows the lockup.

There are a few options I guess in order of preference.

1. Drop zone->lock for the call. It's not necessarily to keep track of
the IRQ flags as callers into that path already do things like treat
IRQ disabling and the spin lock separately.

2. Use another alloc_flag in steal_suitable_fallback that is set when a
wakeup is required but do the actual wakeup in rmqueue() after the
zone locks are dropped and the allocation request is completed

3. Always wakeup kswapd if watermarks are boosted. I like this the least
because it means doing wakeups that are unrelated to fragmentation
that occurred in the current context.

Any particular preference?

While I recognise there is no test case available, how often does this
trigger in syzbot as it would be nice to have some confirmation any
patch is really fixing the problem.

--
Mel Gorman
SUSE Labs

Qian Cai

unread,
Jan 2, 2019, 1:19:36 PM1/2/19
to Mel Gorman, Vlastimil Babka, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Peter Zijlstra, Ingo Molnar
On 1/2/19 1:06 PM, Mel Gorman wrote:

> While I recognise there is no test case available, how often does this
> trigger in syzbot as it would be nice to have some confirmation any
> patch is really fixing the problem.

I think I did manage to trigger this every time running a mmap() workload
causing swapping and a low-memory situation [1].

[1]
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/oom/oom01.c

[ 507.192079] ======================================================
[ 507.198294] WARNING: possible circular locking dependency detected
[ 507.204510] 4.20.0+ #27 Not tainted
[ 507.208018] ------------------------------------------------------
[ 507.214233] oom01/7666 is trying to acquire lock:
[ 507.218965] 00000000bc163d02 (&p->pi_lock){-.-.}, at: try_to_wake_up+0x10a/0xe80
[ 507.226415]
[ 507.226415] but task is already holding lock:
[ 507.232280] 0000000064eb4795 (&pgdat->kswapd_wait){....}, at:
__wake_up_common_lock+0x112/0x1c0
[ 507.241036]
[ 507.241036] which lock already depends on the new lock.
[ 507.241036]
[ 507.249260]
[ 507.249260] the existing dependency chain (in reverse order) is:
[ 507.256787]
[ 507.256787] -> #3 (&pgdat->kswapd_wait){....}:
[ 507.262748] lock_acquire+0x1b3/0x3c0
[ 507.266960] _raw_spin_lock_irqsave+0x35/0x50
[ 507.271867] __wake_up_common_lock+0x112/0x1c0
[ 507.276863] wakeup_kswapd+0x3d0/0x560
[ 507.281159] steal_suitable_fallback+0x40b/0x4e0
[ 507.286330] rmqueue_bulk.constprop.26+0xa36/0x1090
[ 507.291760] get_page_from_freelist+0xb79/0x28f0
[ 507.296930] __alloc_pages_nodemask+0x453/0x21f0
[ 507.302099] alloc_pages_vma+0x87/0x280
[ 507.306482] do_anonymous_page+0x443/0xb80
[ 507.311128] __handle_mm_fault+0xbb8/0xc80
[ 507.315773] handle_mm_fault+0x3ae/0x68b
[ 507.320243] __do_page_fault+0x329/0x6d0
[ 507.324712] do_page_fault+0x119/0x53c
[ 507.329008] page_fault+0x1b/0x20
[ 507.332863]
[ 507.332863] -> #2 (&(&zone->lock)->rlock){-.-.}:
[ 507.338997] lock_acquire+0x1b3/0x3c0
[ 507.343205] _raw_spin_lock_irqsave+0x35/0x50
[ 507.348111] get_page_from_freelist+0x108f/0x28f0
[ 507.353368] __alloc_pages_nodemask+0x453/0x21f0
[ 507.358538] alloc_page_interleave+0x6a/0x1b0
[ 507.363446] allocate_slab+0x319/0xa20
[ 507.367742] new_slab+0x41/0x60
[ 507.371427] ___slab_alloc+0x509/0x8a0
[ 507.375721] __slab_alloc+0x3a/0x70
[ 507.379754] kmem_cache_alloc+0x29c/0x310
[ 507.384312] __debug_object_init+0x984/0x9b0
[ 507.389130] hrtimer_init+0x9b/0x310
[ 507.393250] init_dl_task_timer+0x1c/0x40
[ 507.397808] __sched_fork+0x187/0x290
[ 507.402015] init_idle+0xa1/0x3a0
[ 507.405875] fork_idle+0x122/0x150
[ 507.409823] idle_threads_init+0xea/0x17a
[ 507.414379] smp_init+0x16/0xf2
[ 507.418064] kernel_init_freeable+0x31f/0x7ae
[ 507.422971] kernel_init+0xc/0x127
[ 507.426916] ret_from_fork+0x3a/0x50
[ 507.431034]
[ 507.431034] -> #1 (&rq->lock){-.-.}:
[ 507.436119] lock_acquire+0x1b3/0x3c0
[ 507.440326] _raw_spin_lock+0x2c/0x40
[ 507.444535] task_fork_fair+0x93/0x310
[ 507.448830] sched_fork+0x194/0x380
[ 507.452863] copy_process+0x1446/0x41f0
[ 507.457247] _do_fork+0x16a/0xac0
[ 507.461107] kernel_thread+0x25/0x30
[ 507.465226] rest_init+0x28/0x319
[ 507.469085] start_kernel+0x634/0x674
[ 507.473296] secondary_startup_64+0xb6/0xc0
[ 507.478026]
[ 507.478026] -> #0 (&p->pi_lock){-.-.}:
[ 507.483286] __lock_acquire+0x46d/0x860
[ 507.487670] lock_acquire+0x1b3/0x3c0
[ 507.491879] _raw_spin_lock_irqsave+0x35/0x50
[ 507.496787] try_to_wake_up+0x10a/0xe80
[ 507.501170] autoremove_wake_function+0x7e/0x1a0
[ 507.506338] __wake_up_common+0x12d/0x380
[ 507.510895] __wake_up_common_lock+0x149/0x1c0
[ 507.515889] wakeup_kswapd+0x3d0/0x560
[ 507.520184] steal_suitable_fallback+0x40b/0x4e0
[ 507.525354] rmqueue_bulk.constprop.26+0xa36/0x1090
[ 507.530786] get_page_from_freelist+0xb79/0x28f0
[ 507.535955] __alloc_pages_nodemask+0x453/0x21f0
[ 507.541124] alloc_pages_vma+0x87/0x280
[ 507.545506] do_anonymous_page+0x443/0xb80
[ 507.550152] __handle_mm_fault+0xbb8/0xc80
[ 507.554797] handle_mm_fault+0x3ae/0x68b
[ 507.559267] __do_page_fault+0x329/0x6d0
[ 507.563738] do_page_fault+0x119/0x53c
[ 507.568034] page_fault+0x1b/0x20
[ 507.571890]
[ 507.571890] other info that might help us debug this:
[ 507.571890]
[ 507.579938] Chain exists of:
[ 507.579938] &p->pi_lock --> &(&zone->lock)->rlock --> &pgdat->kswapd_wait
[ 507.579938]
[ 507.591311] Possible unsafe locking scenario:
[ 507.591311]
[ 507.597265] CPU0 CPU1
[ 507.601821] ---- ----
[ 507.606375] lock(&pgdat->kswapd_wait);
[ 507.610321] lock(&(&zone->lock)->rlock);
[ 507.616973] lock(&pgdat->kswapd_wait);
[ 507.623452] lock(&p->pi_lock);
[ 507.626698]
[ 507.626698] *** DEADLOCK ***
[ 507.626698]
[ 507.632652] 3 locks held by oom01/7666:
[ 507.636509] #0: 000000000ed9e0f8 (&mm->mmap_sem){++++}, at:
__do_page_fault+0x236/0x6d0
[ 507.644653] #1: 00000000592a7e32 (&(&zone->lock)->rlock){-.-.}, at:
rmqueue_bulk.constprop.26+0x16f/0x1090
[ 507.654453] #2: 0000000064eb4795 (&pgdat->kswapd_wait){....}, at:
__wake_up_common_lock+0x112/0x1c0
[ 507.663644]
[ 507.663644] stack backtrace:
[ 507.668027] CPU: 75 PID: 7666 Comm: oom01 Kdump: loaded Not tainted 4.20.0+ #27
[ 507.675378] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10,
BIOS U30 06/20/2018
[ 507.683953] Call Trace:
[ 507.686416] dump_stack+0xd1/0x160
[ 507.689840] ? dump_stack_print_info.cold.0+0x1b/0x1b
[ 507.694923] ? print_stack_trace+0x8f/0xa0
[ 507.699044] print_circular_bug.isra.10.cold.34+0x20f/0x297
[ 507.704651] ? print_circular_bug_header+0x50/0x50
[ 507.709473] check_prev_add.constprop.19+0x7ad/0xad0
[ 507.714468] ? check_usage+0x3e0/0x3e0
[ 507.718241] ? graph_lock+0xef/0x190
[ 507.721838] ? usage_match+0x27/0x40
[ 507.725435] validate_chain.isra.14+0xbd5/0x16c0
[ 507.730082] ? check_prev_add.constprop.19+0xad0/0xad0
[ 507.735252] ? stack_access_ok+0x35/0x80
[ 507.739200] ? deref_stack_reg+0xa2/0xf0
[ 507.743148] ? __read_once_size_nocheck.constprop.4+0x10/0x10
[ 507.748929] ? debug_lockdep_rcu_enabled.part.0+0x16/0x30
[ 507.754362] ? ftrace_ops_trampoline+0x131/0le_mm_fault+0xbb8/0xc80
[ 508.142595] ? handle_mm_fault+0x3ae/0x68b
[ 508.146716] ? __do_page_fault+0x329/0x6d0
[ 508.150836] ? trace_hardirqs_off+0x9d/0x230
[ 508.155132] ? trace_hardirqs_on_caller+0x230/0x230
[ 508.160038] ? pageset_set_high_and_batch+0x180/0x180
[ 508.165122] get_page_from_freelist+0xb79/0x28f0
[ 508.169772] ? __isolate_free_page+0x430/0x430
[ 508.174242] ? print_irqtrace_events+0x110/0x110
[ 508.178885] ? __isolate_free_page+0x430/0x430
[ 508.183355] ? free_unref_page_list+0x3e6/0x570
[ 508.187914] ? mark_held_locks+0x8b/0xb0
[ 508.191861] ? free_unref_page_list+0x3e6/0x570
[ 508.196418] ? free_unref_page_list+0x3e6/0x570
[ 508.200976] ? lockdep_hardirqs_on+0x1a4/0x290
[ 508.205445] ? trace_hardirqs_on+0x9d/0x230
[ 508.209654] ? ftrace_destroy_function_files+0x50/0x50
[ 508.214823] ? validate_chain.isra.14+0x16c/0x16c0
[ 508.219642] ? check_chain_key+0x13b/0x200
[ 508.223766] ? page_mapping+0x2be/0x460
[ 508.227627] ? page_evictable+0x1de/0x320
[ 508.231660] ? __page_frag_cache_drain+0x180ad0
[ 508.619426] ? lock_downgrade+0x360/0x360
[ 508.623458] ? rwlock_bug.part.0+0x60/0x60
[ 508.627580] ? do_raw_spin_unlock+0x157/0x220
[ 508.631963] ? do_raw_spin_trylock+0x180/0x180
[ 508.636434] ? do_raw_spin_lock+0x137/0x1f0
[ 508.640641] ? mark_lock+0x11c/0xd80
[ 508.644238] alloc_pages_vma+0x87/0x280
[ 508.648097] do_anonymous_page+0x443/0xb80
[ 508.652219] ? mark_lock+0x11c/0xd80
[ 508.655815] ? mark_lock+0x11c/0xd80
[ 508.659412] ? finish_fault+0xf0/0xf0
[ 508.663096] ? print_irqtrace_events+0x110/0x110
[ 508.667741] ? check_flags.part.18+0x220/0x220
[ 508.672213] ? do_raw_spin_unlock+0x157/0x220
[ 508.676598] ? do_raw_spin_trylock+0x180/0x180
[ 508.681070] ? rwlock_bug.part.0+0x60/0x60
[ 508.685191] ? check_chain_key+0x13b/0x200
[ 508.689313] ? __lock_acquire+0x4c0/0x860
[ 508.693347] ? check_chain_key+0x13b/0x200
[ 508.697469] ? handle_mm_fault+0x315/0x68b
[ 508.701590] __handle_mm_fault+0xbb8/0xc80
[ 508.705711] ? handle_mm_fault+0x4c3/0x68b

Dmitry Vyukov

unread,
Jan 2, 2019, 1:29:56 PM1/2/19
to Mel Gorman, Vlastimil Babka, syzbot, Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, LKML, Linux-MM, li...@dominikbrodowski.net, Michal Hocko, David Rientjes, syzkaller-bugs, xieyi...@huawei.com, zhong jiang, Peter Zijlstra, Ingo Molnar
Old too :(
https://syzkaller.appspot.com/#upstream-open
This info is always available over the "dashboard link" in the report:
https://syzkaller.appspot.com/bug?extid=93d94a001cfbce9e60e1

In this case it's 1. I don't know why. Lock inversions are easier to
trigger in some sense as information accumulates globally. Maybe one
of these stacks is hard to trigger, or maybe all these stacks are
rarely triggered on one machine. While the info accumulates globally,
non of the machines are actually run for any prolonged time: they all
crash right away on hundreds of known bugs.

So good that Qian can reproduce this.

Tetsuo Handa

unread,
Jan 2, 2019, 8:28:54 PM1/2/19
to Qian Cai, Mel Gorman, Vlastimil Babka, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Peter Zijlstra, Ingo Molnar
On 2019/01/03 3:19, Qian Cai wrote:
> On 1/2/19 1:06 PM, Mel Gorman wrote:
>
>> While I recognise there is no test case available, how often does this
>> trigger in syzbot as it would be nice to have some confirmation any
>> patch is really fixing the problem.
>
> I think I did manage to trigger this every time running a mmap() workload
> causing swapping and a low-memory situation [1].
>
> [1]
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/oom/oom01.c

wakeup_kswapd() is called because tlb_next_batch() is doing GFP_NOWAIT
allocation. But since tlb_next_batch() can tolerate allocation failure,
does below change in tlb_next_batch() help?

#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM)

- batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0);
+ batch = (void *)__get_free_pages(__GFP_NOWARN, 0);

Qian Cai

unread,
Jan 2, 2019, 10:27:44 PM1/2/19
to Tetsuo Handa, Mel Gorman, Vlastimil Babka, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Peter Zijlstra, Ingo Molnar
No. In oom01 case, it is from,

do_anonymous_page
__alloc_zeroed_user_highpage
alloc_page_vma(GFP_HIGHUSER ...

GFP_HIGHUSER -> GFP_USER -> __GFP_RECLAIM -> ___GFP_KSWAPD_RECLAIM


Then, it has this new code in steal_suitable_fallback() via 1c30844d2df (mm:
reclaim small amounts of memory when an external fragmentation event occurs)

/*
* Boost watermarks to increase reclaim pressure to reduce
* the likelihood of future fallbacks. Wake kswapd now as
* the node may be balanced overall and kswapd will not
* wake naturally.
*/
boost_watermark(zone);
if (alloc_flags & ALLOC_KSWAPD)
wakeup_kswapd(zone, 0, 0, zone_idx(zone));

Mel Gorman

unread,
Jan 3, 2019, 11:37:53 AM1/3/19
to Dmitry Vyukov, Vlastimil Babka, syzbot, Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, LKML, Linux-MM, li...@dominikbrodowski.net, Michal Hocko, David Rientjes, syzkaller-bugs, xieyi...@huawei.com, zhong jiang, Peter Zijlstra, Ingo Molnar, Qian Cai
On Wed, Jan 02, 2019 at 07:29:43PM +0100, Dmitry Vyukov wrote:
> > > This wakeup_kswapd is new due to Mel's 1c30844d2dfe ("mm: reclaim small
> > > amounts of memory when an external fragmentation event occurs") so CC Mel.
> > >
> >
> > New year new bugs :(
>
> Old too :(
> https://syzkaller.appspot.com/#upstream-open
>

Well, that can ruin a day! Lets see can we knock one off the list.

> > While I recognise there is no test case available, how often does this
> > trigger in syzbot as it would be nice to have some confirmation any
> > patch is really fixing the problem.
>
> This info is always available over the "dashboard link" in the report:
> https://syzkaller.appspot.com/bug?extid=93d94a001cfbce9e60e1
>

Noted for future reference.

> In this case it's 1. I don't know why. Lock inversions are easier to
> trigger in some sense as information accumulates globally. Maybe one
> of these stacks is hard to trigger, or maybe all these stacks are
> rarely triggered on one machine. While the info accumulates globally,
> non of the machines are actually run for any prolonged time: they all
> crash right away on hundreds of known bugs.
>
> So good that Qian can reproduce this.

I think this might simply be hard to reproduce. I tried for hours on two
separate machines and failed. Nevertheless this should still fix it and
hopefully syzbot picks this up automaticlly when cc'd. If I hear
nothing, I'll send the patch unconditionally (and cc syzbot). Hopefully
Qian can give it a whirl too.

Thanks

--8<--
mm, page_alloc: Do not wake kswapd with zone lock held

syzbot reported the following and it was confirmed by Qian Cai that a
similar bug was visible from a different context.

======================================================
WARNING: possible circular locking dependency detected
4.20.0+ #297 Not tainted
------------------------------------------------------
syz-executor0/8529 is trying to acquire lock:
000000005e7fb829 (&pgdat->kswapd_wait){....}, at:
__wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120

but task is already holding lock:
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: spin_lock
include/linux/spinlock.h:329 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_bulk
mm/page_alloc.c:2548 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: __rmqueue_pcplist
mm/page_alloc.c:3021 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue_pcplist
mm/page_alloc.c:3050 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at: rmqueue
mm/page_alloc.c:3072 [inline]
000000009bb7bae0 (&(&zone->lock)->rlock){-.-.}, at:
get_page_from_freelist+0x1bae/0x52a0 mm/page_alloc.c:3491

It appears to be a false positive in that the only way the lock
ordering should be inverted is if kswapd is waking itself and the
wakeup allocates debugging objects which should already be allocated
if it's kswapd doing the waking. Nevertheless, the possibility exists
and so it's best to avoid the problem.

This patch flags a zone as needing a kswapd using the, surprisingly,
unused zone flag field. The flag is read without the lock held to
do the wakeup. It's possible that the flag setting context is not
the same as the flag clearing context or for small races to occur.
However, each race possibility is harmless and there is no visible
degredation in fragmentation treatment.

While zone->flag could have continued to be unused, there is potential
for moving some existing fields into the flags field instead. Particularly
read-mostly ones like zone->initialized and zone->contiguous.

Signed-off-by: Mel Gorman <mgo...@techsingularity.net>
---
include/linux/mmzone.h | 6 ++++++
mm/page_alloc.c | 8 +++++++-
2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cc4a507d7ca4..842f9189537b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -520,6 +520,12 @@ enum pgdat_flags {
PGDAT_RECLAIM_LOCKED, /* prevents concurrent reclaim */
};

+enum zone_flags {
+ ZONE_BOOSTED_WATERMARK, /* zone recently boosted watermarks.
+ * Cleared when kswapd is woken.
+ */
+};
+
static inline unsigned long zone_managed_pages(struct zone *zone)
{
return (unsigned long)atomic_long_read(&zone->managed_pages);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cde5dac6229a..d295c9bc01a8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2214,7 +2214,7 @@ static void steal_suitable_fallback(struct zone *zone, struct page *page,
*/
boost_watermark(zone);
if (alloc_flags & ALLOC_KSWAPD)
- wakeup_kswapd(zone, 0, 0, zone_idx(zone));
+ set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);

/* We are not allowed to try stealing from the whole block */
if (!whole_block)
@@ -3102,6 +3102,12 @@ struct page *rmqueue(struct zone *preferred_zone,
local_irq_restore(flags);

out:
+ /* Separate test+clear to avoid unnecessary atomics */
+ if (test_bit(ZONE_BOOSTED_WATERMARK, &zone->flags)) {
+ clear_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
+ wakeup_kswapd(zone, 0, 0, zone_idx(zone));
+ }
+
VM_BUG_ON_PAGE(page && bad_range(zone, page), page);
return page;

Qian Cai

unread,
Jan 3, 2019, 2:40:38 PM1/3/19
to Mel Gorman, Dmitry Vyukov, Vlastimil Babka, syzbot, Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, LKML, Linux-MM, li...@dominikbrodowski.net, Michal Hocko, David Rientjes, syzkaller-bugs, xieyi...@huawei.com, zhong jiang, Peter Zijlstra, Ingo Molnar
Tested-by: Qian Cai <c...@lca.pw>

Mel Gorman

unread,
Jan 3, 2019, 5:54:26 PM1/3/19
to Qian Cai, Dmitry Vyukov, Vlastimil Babka, syzbot, Andrea Arcangeli, Andrew Morton, Kirill A. Shutemov, LKML, Linux-MM, li...@dominikbrodowski.net, Michal Hocko, David Rientjes, syzkaller-bugs, xieyi...@huawei.com, zhong jiang, Peter Zijlstra, Ingo Molnar
On Thu, Jan 03, 2019 at 02:40:35PM -0500, Qian Cai wrote:
> > Signed-off-by: Mel Gorman <mgo...@techsingularity.net>
>
> Tested-by: Qian Cai <c...@lca.pw>

Thanks!

Peter Zijlstra

unread,
Jan 7, 2019, 4:52:30 AM1/7/19
to Vlastimil Babka, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Mel Gorman, Ingo Molnar, han...@cmpxchg.org
On Wed, Jan 02, 2019 at 01:51:01PM +0100, Vlastimil Babka wrote:
That thing is fairly new; I don't think we used to have this dependency
prior to PSI.

Johannes, can we move that mod_timer out from under rq->lock? At worst
we can use an irq_work to self-ipi.

Johannes Weiner

unread,
Jan 7, 2019, 3:46:32 PM1/7/19
to Peter Zijlstra, Vlastimil Babka, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Mel Gorman, Ingo Molnar
Hm, so the splat says this:

wakeups take the pi lock
pi lock holders take the rq lock
rq lock holders take the timer base lock (thanks psi)
timer base lock holders take the zone lock (thanks kasan)
problem: now a zone lock holder wakes up kswapd

right? And we can break the chain from the VM or from psi.

I cannot say one is clearly cleaner than the other, though. With kasan
allocating from inside the basic timer code, those locks leak out from
kernel/* and contaminate the VM locking anyway.

Do you think the rq->lock -> base->lock ordering is likely to cause
issues elsewhere?

Something like this below seems to pass the smoke test. If we want to
go ahead with that, I'd test it properly and send it with a sign-off.

diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
index 2cf422db5d18..42e287139c31 100644
--- a/include/linux/psi_types.h
+++ b/include/linux/psi_types.h
@@ -1,6 +1,7 @@
#ifndef _LINUX_PSI_TYPES_H
#define _LINUX_PSI_TYPES_H

+#include <linux/irq_work.h>
#include <linux/seqlock.h>
#include <linux/types.h>

@@ -77,6 +78,7 @@ struct psi_group {
u64 last_update;
u64 next_update;
struct delayed_work clock_work;
+ struct irq_work clock_reviver;

/* Total stall times and sampled pressure averages */
u64 total[NR_PSI_STATES - 1];
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index f39958321293..9654de009250 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -165,6 +165,7 @@ static struct psi_group psi_system = {
};

static void psi_update_work(struct work_struct *work);
+static void psi_revive_clock(struct irq_work *work);

static void group_init(struct psi_group *group)
{
@@ -177,6 +178,7 @@ static void group_init(struct psi_group *group)
group->last_update = now;
group->next_update = now + psi_period;
INIT_DELAYED_WORK(&group->clock_work, psi_update_work);
+ init_irq_work(&group->clock_reviver, psi_revive_clock);
mutex_init(&group->stat_lock);
}

@@ -399,6 +401,14 @@ static void psi_update_work(struct work_struct *work)
}
}

+static void psi_revive_clock(struct irq_work *work)
+{
+ struct psi_group *group;
+
+ group = container_of(work, struct psi_group, clock_reviver);
+ schedule_delayed_work(&group->clock_work, PSI_FREQ);
+}
+
static void record_times(struct psi_group_cpu *groupc, int cpu,
bool memstall_tick)
{
@@ -484,8 +494,14 @@ static void psi_group_change(struct psi_group *group, int cpu,

write_seqcount_end(&groupc->seq);

+ /*
+ * We cannot modify workqueues or timers with the rq lock held
+ * here. If the clock has stopped due to a lack of activity in
+ * the past and needs reviving, go through an IPI to wake it
+ * back up. In most cases, the work should already be pending.
+ */
if (!delayed_work_pending(&group->clock_work))
- schedule_delayed_work(&group->clock_work, PSI_FREQ);
+ irq_work_queue(&group->clock_reviver);
}

static struct psi_group *iterate_groups(struct task_struct *task, void **iter)

Peter Zijlstra

unread,
Jan 7, 2019, 4:29:31 PM1/7/19
to Johannes Weiner, Vlastimil Babka, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Mel Gorman, Ingo Molnar
On Mon, Jan 07, 2019 at 03:46:27PM -0500, Johannes Weiner wrote:
> Hm, so the splat says this:
>
> wakeups take the pi lock
> pi lock holders take the rq lock
> rq lock holders take the timer base lock (thanks psi)
> timer base lock holders take the zone lock (thanks kasan)
> problem: now a zone lock holder wakes up kswapd
>
> right? And we can break the chain from the VM or from psi.

Yep. And since PSI it the latest addition to that chain, I figured we
ought maybe not do that. But I've not looked at a computer in 2 weeks,
so what do I know ;-)

> I cannot say one is clearly cleaner than the other, though. With kasan
> allocating from inside the basic timer code, those locks leak out from
> kernel/* and contaminate the VM locking anyway.
>
> Do you think the rq->lock -> base->lock ordering is likely to cause
> issues elsewhere?

Not sure; we nest the hrtimer base lock under rq->lock (at the time I
fixed hrtimers to not hold it's base lock over the timer function
callback, just like regular timers already did) and that has worked
fine.

So maybe we should look at the kasan thing.. dunno.

Peter Zijlstra

unread,
Jan 7, 2019, 4:33:25 PM1/7/19
to Johannes Weiner, Vlastimil Babka, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Mel Gorman, Ingo Molnar
On Mon, Jan 07, 2019 at 10:29:21PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 07, 2019 at 03:46:27PM -0500, Johannes Weiner wrote:
> > Hm, so the splat says this:
> >
> > wakeups take the pi lock
> > pi lock holders take the rq lock
> > rq lock holders take the timer base lock (thanks psi)
> > timer base lock holders take the zone lock (thanks kasan)

That's not kasan, that's debugobjects, and that would be equally true
for the hrtimer usage we already have in the scheduler.

With that, I'm not entirely sure we're responsible for this splat.. I'll
try and have another look tomorrow.

Peter Zijlstra

unread,
Jan 8, 2019, 8:08:59 AM1/8/19
to Vlastimil Babka, syzbot, aarc...@redhat.com, ak...@linux-foundation.org, kirill....@linux.intel.com, linux-...@vger.kernel.org, linu...@kvack.org, li...@dominikbrodowski.net, mho...@suse.com, rien...@google.com, syzkall...@googlegroups.com, xieyi...@huawei.com, zhong...@huawei.com, Mel Gorman, Ingo Molnar, Thomas Gleixner, han...@cmpxchg.org
On Wed, Jan 02, 2019 at 01:51:01PM +0100, Vlastimil Babka wrote:

> > syz-executor0/8529 is trying to acquire lock:
> > 000000005e7fb829 (&pgdat->kswapd_wait){....}, at:
> > __wake_up_common_lock+0x19e/0x330 kernel/sched/wait.c:120
>
> From the backtrace at the end of report I see it's coming from
>
> > wakeup_kswapd+0x5f0/0x930 mm/vmscan.c:3982
> > steal_suitable_fallback+0x538/0x830 mm/page_alloc.c:2217
>
> This wakeup_kswapd is new due to Mel's 1c30844d2dfe ("mm: reclaim small
> amounts of memory when an external fragmentation event occurs") so CC Mel.

Right; and I see Mel already has a fix for that.
However I really, _really_ hate that dependency. We really should not
get memory allocations under rq->lock.

We seem to avoid this for the existing hrtimer usage, because of
hrtimer_init() doing: debug_init() -> debug_hrtimer_init() ->
debug_object_init().

But that isn't done for the (PSI) schedule_delayed_work() thing for some
raisin; even though: group_init() does INIT_DELAYED_WORK() ->
__INIT_DELAYED_WORK() -> __init_timer() -> init_timer_key() ->
debug_init() -> debug_timer_init() -> debug_object_init().

But _somehow_ that isn't doing it.

Now debug_object_activate() has this case:

if (descr->is_static_object && descr->is_static_object(addr)) {
debug_object_init()

which does an debug_object_init() for static allocations, which brings
us to:

static DEFINE_PER_CPU(struct psi_group_cpu, system_group_pcpu);
static struct psi_group psi_system = {

But that _should_ get initialized by psi_init(), which is called from
sched_init() which _should_ be waaay before do_basic_setup().

Something goes wobbly.. but I'm not seeing it.
Reply all
Reply to author
Forward
0 new messages