[syzbot] [kernel?] possible deadlock in __mod_timer (2)

12 views
Skip to first unread message

syzbot

unread,
May 10, 2023, 6:10:51ā€ÆPM5/10/23
to b...@vger.kernel.org, bra...@kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot found the following issue on:

HEAD commit: 1dc3731daf1f Merge tag 'for-6.4-rc1-tag' of git://git.kern..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=153dd834280000
kernel config: https://syzkaller.appspot.com/x/.config?x=8bc832f563d8bf38
dashboard link: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image (non-bootable): https://storage.googleapis.com/syzbot-assets/7bc7510fe41f/non_bootable_disk-1dc3731d.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/9ac41c523f85/vmlinux-1dc3731d.xz
kernel image: https://storage.googleapis.com/syzbot-assets/40b82936b92f/bzImage-1dc3731d.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+fe0c72...@syzkaller.appspotmail.com

======================================================
WARNING: possible circular locking dependency detected
6.4.0-rc1-syzkaller-00011-g1dc3731daf1f #0 Not tainted
------------------------------------------------------
kworker/u16:0/11 is trying to acquire lock:
ffff88807ffdaba0 (&pgdat->kswapd_wait){....}-{2:2}, at: __wake_up_common_lock+0xb8/0x140 kernel/sched/wait.c:137

but task is already holding lock:
ffff88802c7296d8 (&base->lock){-.-.}-{2:2}, at: __mod_timer+0x69c/0xe80 kernel/time/timer.c:1112

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&base->lock){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
lock_timer_base+0x5a/0x1f0 kernel/time/timer.c:999
__mod_timer+0x3f9/0xe80 kernel/time/timer.c:1080
add_timer+0x62/0x90 kernel/time/timer.c:1244
__queue_delayed_work+0x1a7/0x270 kernel/workqueue.c:1685
queue_delayed_work_on+0x109/0x120 kernel/workqueue.c:1710
psi_task_change+0x1bf/0x300 kernel/sched/psi.c:920
psi_enqueue kernel/sched/stats.h:139 [inline]
enqueue_task kernel/sched/core.c:2078 [inline]
activate_task kernel/sched/core.c:2112 [inline]
wake_up_new_task+0xc13/0x1000 kernel/sched/core.c:4833
kernel_clone+0x219/0x890 kernel/fork.c:2949
user_mode_thread+0xb1/0xf0 kernel/fork.c:2996
rest_init+0x27/0x2b0 init/main.c:700
arch_call_rest_init+0x13/0x30 init/main.c:834
start_kernel+0x3b6/0x490 init/main.c:1088
x86_64_start_reservations+0x18/0x30 arch/x86/kernel/head64.c:556
x86_64_start_kernel+0xb3/0xc0 arch/x86/kernel/head64.c:537
secondary_startup_64_no_verify+0xf4/0xfb

-> #2 (&rq->__lock){-.-.}-{2:2}:
_raw_spin_lock_nested+0x34/0x40 kernel/locking/spinlock.c:378
raw_spin_rq_lock_nested+0x2f/0x120 kernel/sched/core.c:558
raw_spin_rq_lock kernel/sched/sched.h:1366 [inline]
rq_lock kernel/sched/sched.h:1653 [inline]
task_fork_fair+0x74/0x4f0 kernel/sched/fair.c:12095
sched_cgroup_fork+0x3d1/0x540 kernel/sched/core.c:4777
copy_process+0x4b8a/0x7600 kernel/fork.c:2618
kernel_clone+0xeb/0x890 kernel/fork.c:2918
user_mode_thread+0xb1/0xf0 kernel/fork.c:2996
rest_init+0x27/0x2b0 init/main.c:700
arch_call_rest_init+0x13/0x30 init/main.c:834
start_kernel+0x3b6/0x490 init/main.c:1088
x86_64_start_reservations+0x18/0x30 arch/x86/kernel/head64.c:556
x86_64_start_kernel+0xb3/0xc0 arch/x86/kernel/head64.c:537
secondary_startup_64_no_verify+0xf4/0xfb

-> #1 (&p->pi_lock){-.-.}-{2:2}:
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
try_to_wake_up+0xab/0x1c40 kernel/sched/core.c:4191
autoremove_wake_function+0x16/0x150 kernel/sched/wait.c:419
__wake_up_common+0x147/0x650 kernel/sched/wait.c:107
__wake_up_common_lock+0xd4/0x140 kernel/sched/wait.c:138
wakeup_kswapd+0x3fe/0x5c0 mm/vmscan.c:7797
rmqueue mm/page_alloc.c:3057 [inline]
get_page_from_freelist+0x6c5/0x2c00 mm/page_alloc.c:3499
__alloc_pages+0x1cb/0x4a0 mm/page_alloc.c:4768
__folio_alloc+0x16/0x40 mm/page_alloc.c:4800
vma_alloc_folio+0x155/0x890 mm/mempolicy.c:2240
wp_page_copy mm/memory.c:3070 [inline]
do_wp_page+0x173d/0x33c0 mm/memory.c:3432
handle_pte_fault mm/memory.c:4964 [inline]
__handle_mm_fault+0x1635/0x41c0 mm/memory.c:5089
handle_mm_fault+0x2af/0x9f0 mm/memory.c:5243
do_user_addr_fault+0x2ca/0x1210 arch/x86/mm/fault.c:1349
handle_page_fault arch/x86/mm/fault.c:1534 [inline]
exc_page_fault+0x98/0x170 arch/x86/mm/fault.c:1590
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570

-> #0 (&pgdat->kswapd_wait){....}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3108 [inline]
check_prevs_add kernel/locking/lockdep.c:3227 [inline]
validate_chain kernel/locking/lockdep.c:3842 [inline]
__lock_acquire+0x2f21/0x5df0 kernel/locking/lockdep.c:5074
lock_acquire kernel/locking/lockdep.c:5691 [inline]
lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
__wake_up_common_lock+0xb8/0x140 kernel/sched/wait.c:137
wakeup_kswapd+0x3fe/0x5c0 mm/vmscan.c:7797
wake_all_kswapds+0x182/0x2d0 mm/page_alloc.c:4028
__alloc_pages_slowpath.constprop.0+0x1724/0x2170 mm/page_alloc.c:4296
__alloc_pages+0x408/0x4a0 mm/page_alloc.c:4781
alloc_pages+0x1aa/0x270 mm/mempolicy.c:2279
alloc_slab_page mm/slub.c:1851 [inline]
allocate_slab+0x25f/0x390 mm/slub.c:1998
new_slab mm/slub.c:2051 [inline]
___slab_alloc+0xa91/0x1400 mm/slub.c:3192
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3291
__slab_alloc_node mm/slub.c:3344 [inline]
slab_alloc_node mm/slub.c:3441 [inline]
slab_alloc mm/slub.c:3459 [inline]
__kmem_cache_alloc_lru mm/slub.c:3466 [inline]
kmem_cache_alloc+0x38e/0x3b0 mm/slub.c:3475
kmem_cache_zalloc include/linux/slab.h:670 [inline]
fill_pool+0x264/0x5c0 lib/debugobjects.c:168
debug_objects_fill_pool lib/debugobjects.c:597 [inline]
debug_object_activate+0xfd/0x400 lib/debugobjects.c:693
debug_timer_activate kernel/time/timer.c:782 [inline]
__mod_timer+0x80d/0xe80 kernel/time/timer.c:1119
add_timer+0x62/0x90 kernel/time/timer.c:1244
__queue_delayed_work+0x1a7/0x270 kernel/workqueue.c:1685
queue_delayed_work_on+0x109/0x120 kernel/workqueue.c:1710
process_one_work+0x99a/0x15e0 kernel/workqueue.c:2405
worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552
kthread+0x344/0x440 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308

other info that might help us debug this:

Chain exists of:
&pgdat->kswapd_wait --> &rq->__lock --> &base->lock

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&base->lock);
lock(&rq->__lock);
lock(&base->lock);
lock(&pgdat->kswapd_wait);

*** DEADLOCK ***

3 locks held by kworker/u16:0/11:
#0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1324 [inline]
#0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:643 [inline]
#0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:670 [inline]
#0: ffff8880444ff138 ((wq_completion)bat_events){+.+.}-{0:0}, at: process_one_work+0x883/0x15e0 kernel/workqueue.c:2376
#1: ffffc900003d7db0 ((work_completion)(&(&bat_priv->nc.work)->work)){+.+.}-{0:0}, at: process_one_work+0x8b7/0x15e0 kernel/workqueue.c:2380
#2: ffff88802c7296d8 (&base->lock){-.-.}-{2:2}, at: __mod_timer+0x69c/0xe80 kernel/time/timer.c:1112

stack backtrace:
CPU: 1 PID: 11 Comm: kworker/u16:0 Not tainted 6.4.0-rc1-syzkaller-00011-g1dc3731daf1f #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: bat_events batadv_nc_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2188
check_prev_add kernel/locking/lockdep.c:3108 [inline]
check_prevs_add kernel/locking/lockdep.c:3227 [inline]
validate_chain kernel/locking/lockdep.c:3842 [inline]
__lock_acquire+0x2f21/0x5df0 kernel/locking/lockdep.c:5074
lock_acquire kernel/locking/lockdep.c:5691 [inline]
lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
__wake_up_common_lock+0xb8/0x140 kernel/sched/wait.c:137
wakeup_kswapd+0x3fe/0x5c0 mm/vmscan.c:7797
wake_all_kswapds+0x182/0x2d0 mm/page_alloc.c:4028
__alloc_pages_slowpath.constprop.0+0x1724/0x2170 mm/page_alloc.c:4296
__alloc_pages+0x408/0x4a0 mm/page_alloc.c:4781
alloc_pages+0x1aa/0x270 mm/mempolicy.c:2279
alloc_slab_page mm/slub.c:1851 [inline]
allocate_slab+0x25f/0x390 mm/slub.c:1998
new_slab mm/slub.c:2051 [inline]
___slab_alloc+0xa91/0x1400 mm/slub.c:3192
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3291
__slab_alloc_node mm/slub.c:3344 [inline]
slab_alloc_node mm/slub.c:3441 [inline]
slab_alloc mm/slub.c:3459 [inline]
__kmem_cache_alloc_lru mm/slub.c:3466 [inline]
kmem_cache_alloc+0x38e/0x3b0 mm/slub.c:3475
kmem_cache_zalloc include/linux/slab.h:670 [inline]
fill_pool+0x264/0x5c0 lib/debugobjects.c:168
debug_objects_fill_pool lib/debugobjects.c:597 [inline]
debug_object_activate+0xfd/0x400 lib/debugobjects.c:693
debug_timer_activate kernel/time/timer.c:782 [inline]
__mod_timer+0x80d/0xe80 kernel/time/timer.c:1119
add_timer+0x62/0x90 kernel/time/timer.c:1244
__queue_delayed_work+0x1a7/0x270 kernel/workqueue.c:1685
queue_delayed_work_on+0x109/0x120 kernel/workqueue.c:1710
process_one_work+0x99a/0x15e0 kernel/workqueue.c:2405
worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552
kthread+0x344/0x440 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Tetsuo Handa

unread,
May 11, 2023, 9:47:41ā€ÆAM5/11/23
to syzbot, syzkall...@googlegroups.com, Ingo Molnar, Thomas Gleixner, Andrew Morton, linux-...@vger.kernel.org, linux-mm
syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
(__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
Since fill_pool() might be called with arbitrary locks held,
fill_pool() should not assume that holding pgdat->kswapd_wait is safe.

Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation.

Reported-by: syzbot <syzbot+fe0c72...@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects")
---
lib/debugobjects.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 003edc5ebd67..986adca357b4 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {

static void fill_pool(void)
{
- gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
+ gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;
struct debug_obj *obj;
unsigned long flags;

--
2.18.4

Andrew Morton

unread,
May 11, 2023, 11:45:01ā€ÆPM5/11/23
to Tetsuo Handa, syzbot, syzkall...@googlegroups.com, Ingo Molnar, Thomas Gleixner, linux-...@vger.kernel.org, linux-mm
On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> wrote:

> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
> Since fill_pool() might be called with arbitrary locks held,
> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.

hm. But many GFP_ATOMIC allocation attempts are made with locks held.
Why aren't all such callers buggy, by trying to wake kswapd with locks
held? What's special about this one?

> Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation
>
> Reported-by: syzbot <syzbot+fe0c72...@syzkaller.appspotmail.com>
> Closes: https://syzkaller.appspot.com/bug?extid=fe0c72f0ccbb93786380
> Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
> Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects")
> ---
> lib/debugobjects.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index 003edc5ebd67..986adca357b4 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
>
> static void fill_pool(void)
> {
> - gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
> + gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;

Does this weaken fill_pool()'s allocation attempt more than necessary?
We can still pass __GFP_HIGH?

Tetsuo Handa

unread,
May 12, 2023, 6:57:17ā€ÆAM5/12/23
to Andrew Morton, syzbot, syzkall...@googlegroups.com, Ingo Molnar, Thomas Gleixner, linux-...@vger.kernel.org, linux-mm
On 2023/05/12 12:44, Andrew Morton wrote:
> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> wrote:
>
>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>> Since fill_pool() might be called with arbitrary locks held,
>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
>
> hm. But many GFP_ATOMIC allocation attempts are made with locks held.
> Why aren't all such callers buggy, by trying to wake kswapd with locks
> held? What's special about this one?

Because debugobject cannot know what locks are held when fill_pool() does
GFP_ATOMIC allocation.

syzbot is reporting base->lock => pgdat->kswapd_wait dependency

add_timer() {
__mod_timer() {
base = lock_timer_base(timer, &flags);
new_base = get_target_base(base, timer->flags);
if (base != new_base) {
raw_spin_unlock(&base->lock);
base = new_base;
raw_spin_lock(&base->lock);
}
debug_timer_activate(timer) {
debug_object_activate(timer, &timer_debug_descr) {
debug_objects_fill_pool() {
fill_pool() {
kmem_cache_zalloc(GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN) {
// wakes kswapd
}
}
}
}
}
raw_spin_unlock_irqrestore(&base->lock, flags);
}
}

when pgdat->kswapd_wait => p->pi_lock dependency

__alloc_pages() {
get_page_from_freelist() {
rmqueue() {
wakeup_kswapd() {
wake_up_interruptible(&pgdat->kswapd_wait) {
__wake_up_common_lock() {
spin_lock_irqsave(&pgdat->kswapd_wait.lock, flags);
__wake_up_common() {
autoremove_wake_function() {
try_to_wake_up() {
raw_spin_lock_irqsave(&p->pi_lock, flags);
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
}
}
}
spin_unlock_irqrestore(&pgdat->kswapd_wait.lock, flags);
}
}
}
}
}
}

and p->pi_lock => rq->__lock => base->lock dependency

wake_up_new_task() {
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
rq = __task_rq_lock(p, &rf); // acquires rq->lock
activate_task(rq, p, ENQUEUE_NOCLOCK) {
enqueue_task() {
psi_enqueue() {
psi_task_change() {
queue_delayed_work_on() {
__queue_delayed_work() {
add_timer() {
__mod_timer() {
base = lock_timer_base(timer, &flags); // acquires base->lock
debug_timer_activate(timer); // possible base->lock => pgdat->kswapd_wait => p->pi_lock dependency
raw_spin_unlock_irqrestore(&base->lock, flags);
}
}
}
}
}
}
}
}
task_rq_unlock(rq, p, &rf);
}

exists.

All GFP_ATOMIC allocation users are supposed to be aware of what locks
are held, and are supposed to explicitly remove __GFP_KSWAPD_RECLAIM
if waking up kswapd can cause deadlock. But reality is that we can't be
careful enough to error-free. Who would imagine GFP_ATOMIC allocation
while base->lock is held can form circular locking dependency?

>
>> Also, __GFP_NORETRY is pointless for !__GFP_DIRECT_RECLAIM allocation

__GFP_NORETRY is not checked by !__GFP_DIRECT_RECLAIM allocation.
GFP_ATOMIC - __GFP_KSWAPD_RECLAIM is __GFP_HIGH.

>>
>> @@ -126,7 +126,7 @@ static const char *obj_states[ODEBUG_STATE_MAX] = {
>>
>> static void fill_pool(void)
>> {
>> - gfp_t gfp = GFP_ATOMIC | __GFP_NORETRY | __GFP_NOWARN;
>> + gfp_t gfp = __GFP_HIGH | __GFP_NOWARN;
>
> Does this weaken fill_pool()'s allocation attempt more than necessary?
> We can still pass __GFP_HIGH?

What do you mean? I think that killing base->lock => pgdat->kswapd_wait
by removing __GFP_KSWAPD_RECLAIM is the right fix. This weakening is
needed for avoiding base->lock => pgdat->kswapd_wait dependency from
debugobject code.

For locking dependency safety, I wish that GFP_ATOMIC / GFP_NOWAIT do not imply
__GFP_KSWAPD_RECLAIM. Such allocations should not try to allocate as many pages
as even __GFP_HIGH fails. And if such allocations try to allocate as many pages
as even __GFP_HIGH fails, they likely already failed before background kswapd
reclaim finds some reusable pages....

Thomas Gleixner

unread,
May 12, 2023, 8:54:22ā€ÆAM5/12/23
to Tetsuo Handa, Andrew Morton, syzbot, syzkall...@googlegroups.com, Ingo Molnar, linux-...@vger.kernel.org, linux-mm
On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
> On 2023/05/12 12:44, Andrew Morton wrote:
>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> wrote:
>>
>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>> Since fill_pool() might be called with arbitrary locks held,
>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.

https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/

Tetsuo Handa

unread,
May 12, 2023, 9:09:42ā€ÆAM5/12/23
to Thomas Gleixner, Andrew Morton, syzbot, syzkall...@googlegroups.com, Ingo Molnar, linux-...@vger.kernel.org, linux-mm
.config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about
base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock
dependency but does not say about db->lock.

How can your patch fix this problem?

Thomas Gleixner

unread,
May 12, 2023, 2:07:58ā€ÆPM5/12/23
to Tetsuo Handa, Andrew Morton, syzbot, syzkall...@googlegroups.com, Ingo Molnar, linux-...@vger.kernel.org, linux-mm
It's described in the changelog, no?

The main change is to make the refill invocation conditional when the
lookup fails. That's how that code has been from day one.

The patch which closed the race recently wreckaged those refill
oportunities and the fix for that introduced this problem.

Thanks,

tglx

Tetsuo Handa

unread,
May 12, 2023, 7:15:00ā€ÆPM5/12/23
to Thomas Gleixner, Andrew Morton, syzbot, syzkall...@googlegroups.com, Ingo Molnar, linux-...@vger.kernel.org, linux-mm
On 2023/05/13 3:07, Thomas Gleixner wrote:
> On Fri, May 12 2023 at 22:09, Tetsuo Handa wrote:
>> On 2023/05/12 21:54, Thomas Gleixner wrote:
>>> On Fri, May 12 2023 at 19:57, Tetsuo Handa wrote:
>>>> On 2023/05/12 12:44, Andrew Morton wrote:
>>>>> On Thu, 11 May 2023 22:47:32 +0900 Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp> wrote:
>>>>>
>>>>>> syzbot is reporting lockdep warning in fill_pool(), for GFP_ATOMIC is
>>>>>> (__GFP_HIGH | __GFP_KSWAPD_RECLAIM) which wakes up kswapd.
>>>>>> Since fill_pool() might be called with arbitrary locks held,
>>>>>> fill_pool() should not assume that holding pgdat->kswapd_wait is safe.
>>>
>>> https://lore.kernel.org/lkml/871qjldbes.ffs@tglx/
>>
>> .config says IS_ENABLED(CONFIG_PREEMPT_RT) == false, and lockdep says about
>> base->lock => pgdat->kswapd_wait => p->pi_lock => rq->__lock => base->lock
>> dependency but does not say about db->lock.
>>
>> How can your patch fix this problem?
>
> It's described in the changelog, no?

I can't find a proof that lookup_object() never returns NULL
when debug_object_activate() is called.

>
> The main change is to make the refill invocation conditional when the
> lookup fails. That's how that code has been from day one.

Making refill conditional helps reducing frequency of doing allocations.
I want a proof that allocations never happens in the worst scenario.

Are you saying that some debugobject function other than debug_object_activate()
guarantees that memory for that object was already allocated before
debug_object_activate() is called for the first time for that object,
_and_ such debugobject function is called without locks held?

Thomas Gleixner

unread,
May 13, 2023, 4:33:49ā€ÆAM5/13/23
to Tetsuo Handa, Andrew Morton, syzbot, syzkall...@googlegroups.com, Ingo Molnar, linux-...@vger.kernel.org, linux-mm, Peter Zijlstra
The point is that the allocation in activate() only happens when the
tracked entity was not initialized _before_ activate() is invoked.

That's a bug for dynamically allocated entities, but a valid scenario
for statically initialized entities as they can be activated without
prior init() obviously.

For dynamically allocated entities the init() function takes care of the
tracking object allocation and that's where the pool is refilled. So for
those the lookup will never fail.

Now I just stared at __alloc_pages_slowpath() and looked at the
condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because
debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM.

So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that
wakeup path.

Thanks,

tglx



Tetsuo Handa

unread,
May 13, 2023, 5:33:24ā€ÆAM5/13/23
to Thomas Gleixner, Andrew Morton, syzbot, syzkall...@googlegroups.com, Ingo Molnar, linux-...@vger.kernel.org, linux-mm, Peter Zijlstra
On 2023/05/13 17:33, Thomas Gleixner wrote:
> Now I just stared at __alloc_pages_slowpath() and looked at the
> condition for wakeup_all_kswapds(). ALLOC_KSWAPD is set because
> debugobject uses GFP_ATOMIC which contains __GFP_KSWAPD_RECLAIM.
>
> So debug objects needs to have s/GFP_ATOMIC/__GFP_HIGH/ to prevent that
> wakeup path.

Yes. That is exactly what my patch does.

Thomas Gleixner

unread,
May 13, 2023, 3:42:53ā€ÆPM5/13/23
to Tetsuo Handa, Andrew Morton, syzbot, syzkall...@googlegroups.com, Ingo Molnar, linux-...@vger.kernel.org, linux-mm, Peter Zijlstra
Indeed. For some reason your patch (though you cc'ed me) did not show up
in my inbox. I've grabbed it from lore so no need to resend.

Actually we want both changes.

- Your's to fix the underlying ancient problem.

- The one I did which restores the performance behaviour

Thanks,

tglx

Reply all
Reply to author
Forward
0 new messages