BUG: soft lockup in smp_call

Sanan Hasanov

unread,

Sep 12, 2023, 7:03:00 PM9/12/23

to pet...@infradead.org, pau...@kernel.org, jgr...@suse.com, vsch...@redhat.com, yury....@gmail.com, linux-...@vger.kernel.org, syzk...@googlegroups.com, con...@pgazz.com

Good day, dear maintainers,

We found a bug using a modified kernel configuration file used by syzbot.

We enhanced the coverage of the configuration file using our tool, klocalizer.

Kernel Branch: 6.3.0-next-20230426
Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing
Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing
Thank you!

Best regards,
Sanan Hasanov

watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [kworker/u16:1:12]
Modules linked in:
irq event stamp: 192794
hardirqs last enabled at (192793): [<ffffffff89a0140a>] asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
hardirqs last disabled at (192794): [<ffffffff89975d4f>] sysvec_apic_timer_interrupt+0xf/0xc0 arch/x86/kernel/apic/apic.c:1106
softirqs last enabled at (187764): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline]
softirqs last enabled at (187764): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650
softirqs last disabled at (187671): [<ffffffff814b94bd>] invoke_softirq kernel/softirq.c:445 [inline]
softirqs last disabled at (187671): [<ffffffff814b94bd>] __irq_exit_rcu+0x11d/0x190 kernel/softirq.c:650
CPU: 5 PID: 12 Comm: kworker/u16:1 Not tainted 6.3.0-next-20230426 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: events_unbound toggle_allocation_gate
RIP: 0010:csd_lock_wait kernel/smp.c:294 [inline]
RIP: 0010:smp_call_function_many_cond+0x5bd/0x1020 kernel/smp.c:828
Code: 0b 00 85 ed 74 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 b5 07 0b 00 f3 90 <41> 0f b6 04 24 40 38 c5 7c 08 84 c0 0f 85 46 08 00 00 8b 43 08 31
RSP: 0018:ffffc900000cf9e8 EFLAGS: 00000293
RAX: 0000000000000000 RBX: ffff888119cc4d80 RCX: 0000000000000000
RDX: ffff888100325940 RSI: ffffffff8176807b RDI: 0000000000000005
RBP: 0000000000000003 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffffed10233989b1
R13: 0000000000000001 R14: ffff888119cc4d88 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff888119e80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555556a6cc88 CR3: 000000000bb73000 CR4: 0000000000350ee0
Call Trace:
<TASK>
on_each_cpu_cond_mask+0x40/0x90 kernel/smp.c:996
on_each_cpu include/linux/smp.h:71 [inline]
text_poke_sync arch/x86/kernel/alternative.c:1770 [inline]
text_poke_bp_batch+0x237/0x770 arch/x86/kernel/alternative.c:1970
text_poke_flush arch/x86/kernel/alternative.c:2161 [inline]
text_poke_flush arch/x86/kernel/alternative.c:2158 [inline]
text_poke_finish+0x1a/0x30 arch/x86/kernel/alternative.c:2168
arch_jump_label_transform_apply+0x17/0x30 arch/x86/kernel/jump_label.c:146
jump_label_update+0x321/0x400 kernel/jump_label.c:829
static_key_enable_cpuslocked+0x1b5/0x270 kernel/jump_label.c:205
static_key_enable+0x1a/0x20 kernel/jump_label.c:218
toggle_allocation_gate mm/kfence/core.c:831 [inline]
toggle_allocation_gate+0xf4/0x220 mm/kfence/core.c:823
process_one_work+0x993/0x15e0 kernel/workqueue.c:2405
worker_thread+0x67d/0x10c0 kernel/workqueue.c:2552
kthread+0x33e/0x440 kernel/kthread.c:379
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
Sending NMI from CPU 5 to CPUs 0-4,6-7:
NMI backtrace for cpu 1
CPU: 1 PID: 20602 Comm: syz-executor.3 Not tainted 6.3.0-next-20230426 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:hlock_class kernel/locking/lockdep.c:228 [inline]
RIP: 0010:check_wait_context kernel/locking/lockdep.c:4747 [inline]
RIP: 0010:__lock_acquire+0x489/0x5d00 kernel/locking/lockdep.c:5024
Code: 41 81 e5 ff 1f 45 0f b7 ed be 08 00 00 00 4c 89 e8 48 c1 e8 06 48 8d 3c c5 00 6b 2c 90 e8 5f 90 6e 00 4c 0f a3 2d d7 35 c9 0e <0f> 83 5c 0c 00 00 4f 8d 6c 6d 00 49 c1 e5 06 49 81 c5 20 6f 2c 90
RSP: 0018:ffffc90002aa7350 EFLAGS: 00000047
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81633521
RDX: fffffbfff2058d62 RSI: 0000000000000008 RDI: ffffffff902c6b08
RBP: ffff888042995940 R08: 0000000000000000 R09: ffffffff902c6b0f
R10: fffffbfff2058d61 R11: 0000000000000001 R12: ffff888119e2b818
R13: 0000000000000063 R14: 0000000000000002 R15: ffff888042996598
FS: 00007fdaad065700(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b30623000 CR3: 0000000101969000 CR4: 0000000000350ee0
Call Trace:
<TASK>
lock_acquire kernel/locking/lockdep.c:5691 [inline]
lock_acquire+0x1b1/0x520 kernel/locking/lockdep.c:5656
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162
lock_hrtimer_base kernel/time/hrtimer.c:173 [inline]
hrtimer_try_to_cancel kernel/time/hrtimer.c:1331 [inline]
hrtimer_try_to_cancel+0xa9/0x2e0 kernel/time/hrtimer.c:1316
hrtimer_cancel+0x17/0x40 kernel/time/hrtimer.c:1443
__disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline]
drm_vblank_disable_and_save+0x282/0x3d0 drivers/gpu/drm/drm_vblank.c:478
drm_crtc_vblank_off+0x312/0x970 drivers/gpu/drm/drm_vblank.c:1366
disable_outputs+0x7c7/0xbb0 drivers/gpu/drm/drm_atomic_helper.c:1202
drm_atomic_helper_commit_modeset_disables+0x1d/0x40 drivers/gpu/drm/drm_atomic_helper.c:1397
vkms_atomic_commit_tail+0x51/0x240 drivers/gpu/drm/vkms/vkms_drv.c:71
commit_tail+0x288/0x420 drivers/gpu/drm/drm_atomic_helper.c:1812
drm_atomic_helper_commit drivers/gpu/drm/drm_atomic_helper.c:2052 [inline]
drm_atomic_helper_commit+0x306/0x390 drivers/gpu/drm/drm_atomic_helper.c:1985
drm_atomic_commit+0x20a/0x2d0 drivers/gpu/drm/drm_atomic.c:1503
drm_client_modeset_commit_atomic+0x698/0x7e0 drivers/gpu/drm/drm_client_modeset.c:1045
drm_client_modeset_dpms+0x174/0x200 drivers/gpu/drm/drm_client_modeset.c:1226
drm_fb_helper_dpms drivers/gpu/drm/drm_fb_helper.c:323 [inline]
drm_fb_helper_blank+0xd1/0x260 drivers/gpu/drm/drm_fb_helper.c:356
fb_blank+0x105/0x190 drivers/video/fbdev/core/fbmem.c:1088
do_fb_ioctl+0x390/0x760 drivers/video/fbdev/core/fbmem.c:1180
fb_ioctl+0xeb/0x150 drivers/video/fbdev/core/fbmem.c:1204
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:870 [inline]
__se_sys_ioctl fs/ioctl.c:856 [inline]
__x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0x80 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fdaabe8edcd
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fdaad064bf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fdaabfbbf80 RCX: 00007fdaabe8edcd
RDX: 0000000000000004 RSI: 0000000000004611 RDI: 0000000000000003
RBP: 00007fdaabefc59c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffdadeffe9f R14: 00007ffdadf00040 R15: 00007fdaad064d80
</TASK>
NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline]
NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline]
NMI backtrace for cpu 0 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729
NMI backtrace for cpu 3 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline]
NMI backtrace for cpu 3 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline]
NMI backtrace for cpu 3 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729
NMI backtrace for cpu 6 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline]
NMI backtrace for cpu 6 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline]
NMI backtrace for cpu 6 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729
NMI backtrace for cpu 7 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline]
NMI backtrace for cpu 7 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline]
NMI backtrace for cpu 7 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729
NMI backtrace for cpu 4
CPU: 4 PID: 20623 Comm: syz-executor.6 Not tainted 6.3.0-next-20230426 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:kvm_wait+0xb7/0x110 arch/x86/kernel/kvm.c:1064
Code: 40 38 c6 74 1b 48 83 c4 10 c3 c3 e8 93 d3 50 00 eb 07 0f 00 2d 4a 04 92 08 fb f4 48 83 c4 10 c3 eb 07 0f 00 2d 3a 04 92 08 f4 <48> 83 c4 10 c3 89 74 24 0c 48 89 3c 24 e8 d7 d4 50 00 8b 74 24 0c
RSP: 0018:ffffc90000300b50 EFLAGS: 00000046
RAX: 0000000000000003 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88810b0803d8
RBP: ffff88810b0803d8 R08: 0000000000000001 R09: ffff88810b0803d8
R10: ffffed102161007b R11: ffffc90000300ff8 R12: 0000000000000000
R13: ffffed102161007b R14: 0000000000000001 R15: ffff888119e3d3c0
FS: 0000000000000000(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f28183bd0b0 CR3: 000000000bb73000 CR4: 0000000000350ee0
Call Trace:
<IRQ>
pv_wait arch/x86/include/asm/paravirt.h:598 [inline]
pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:470 [inline]
__pv_queued_spin_lock_slowpath+0x8e4/0xb80 kernel/locking/qspinlock.c:511
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:586 [inline]
queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline]
queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
do_raw_spin_lock+0x20d/0x2b0 kernel/locking/spinlock_debug.c:115
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
_raw_spin_lock_irqsave+0x45/0x60 kernel/locking/spinlock.c:162
drm_handle_vblank+0x11e/0xb80 drivers/gpu/drm/drm_vblank.c:1986
vkms_vblank_simulate+0xe8/0x3e0 drivers/gpu/drm/vkms/vkms_crtc.c:29
__run_hrtimer kernel/time/hrtimer.c:1685 [inline]
__hrtimer_run_queues+0x599/0xa30 kernel/time/hrtimer.c:1749
hrtimer_interrupt+0x320/0x7b0 kernel/time/hrtimer.c:1811
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1095 [inline]
__sysvec_apic_timer_interrupt+0x14a/0x430 arch/x86/kernel/apic/apic.c:1112
sysvec_apic_timer_interrupt+0x92/0xc0 arch/x86/kernel/apic/apic.c:1106
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x11/0x70 kernel/kcov.c:207
Code: a8 01 00 00 e8 b0 ff ff ff 31 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 65 8b 05 0d 33 82 7e 89 c1 48 8b 34 24 <81> e1 00 01 00 00 65 48 8b 14 25 40 bb 03 00 a9 00 01 ff 00 74 0e
RSP: 0018:ffffc90002be76d8 EFLAGS: 00000286
RAX: 0000000080000001 RBX: 0000000000000001 RCX: 0000000080000001
RDX: 00007f2817c77000 RSI: ffffffff81bcd756 RDI: ffffc90002be7ad8
RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00014fc480
R13: 0000000000000000 R14: dffffc0000000000 R15: 8000000053f12007
zap_drop_file_uffd_wp mm/memory.c:1352 [inline]
zap_install_uffd_wp_if_needed mm/memory.c:1371 [inline]
zap_pte_range mm/memory.c:1417 [inline]
zap_pmd_range mm/memory.c:1564 [inline]
zap_pud_range mm/memory.c:1593 [inline]
zap_p4d_range mm/memory.c:1614 [inline]
unmap_page_range+0x1046/0x4470 mm/memory.c:1635
unmap_single_vma+0x19a/0x2b0 mm/memory.c:1681
unmap_vmas+0x234/0x380 mm/memory.c:1720
exit_mmap+0x190/0x930 mm/mmap.c:3111
__mmput+0x128/0x4c0 kernel/fork.c:1351
mmput+0x60/0x70 kernel/fork.c:1373
exit_mm kernel/exit.c:564 [inline]
do_exit+0x9d1/0x29f0 kernel/exit.c:858
do_group_exit+0xd4/0x2a0 kernel/exit.c:1021
get_signal+0x2311/0x25c0 kernel/signal.c:2874
arch_do_signal_or_restart+0x79/0x5a0 arch/x86/kernel/signal.c:307
exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
exit_to_user_mode_prepare+0x11f/0x240 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:286 [inline]
syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:297
do_syscall_64+0x46/0x80 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f281828edcd
Code: Unable to access opcode bytes at 0x7f281828eda3.
RSP: 002b:00007f28194c0c98 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00007f28183bbf80 RCX: 00007f281828edcd
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f28183bbf88
RBP: 00007f28183bbf88 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f28183bbf8c
R13: 00007ffd5038e1ef R14: 00007ffd5038e390 R15: 00007f28194c0d80
</TASK>
NMI backtrace for cpu 2 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:48 [inline]
NMI backtrace for cpu 2 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:86 [inline]
NMI backtrace for cpu 2 skipped: idling at default_idle+0xf/0x20 arch/x86/kernel/process.c:729

Peter Zijlstra

unread,

Sep 13, 2023, 6:05:41 AM9/13/23

to Sanan Hasanov, pau...@kernel.org, jgr...@suse.com, vsch...@redhat.com, yury....@gmail.com, linux-...@vger.kernel.org, syzk...@googlegroups.com, con...@pgazz.com, rodrigosi...@gmail.com, melis...@gmail.com, maira...@riseup.net, hamoha...@gmail.com, dan...@ffwll.ch, air...@gmail.com

On Tue, Sep 12, 2023 at 11:02:56PM +0000, Sanan Hasanov wrote:
> Good day, dear maintainers,
>
> We found a bug using a modified kernel configuration file used by syzbot.
>
> We enhanced the coverage of the configuration file using our tool, klocalizer.
>
> Kernel Branch: 6.3.0-next-20230426
> Kernel Config: https://drive.google.com/file/d/1WSUEWrith9-539qo6xRqmwy4LfDtmKpp/view?usp=sharing
> Reproducer: https://drive.google.com/file/d/1pN6FfcjuUs6Wx94g1gufuYGjRbMMgiZ4/view?usp=sharing
> Thank you!

AFAICT the thing is stuck in DRM somewhere...

Right, so this is waiting for an IPI to be processed.. while #1 has IRQs
disabled

And this is trying to cancel a hrtimer which is ran on CPU4 and won't be
making much progress.

> __disable_vblank drivers/gpu/drm/drm_vblank.c:434 [inline]

So we're here, holding vbl_lock, vblank_time_lock one of which is what
#4 is waiting on.

This also has IRQs disabled, which is what #1 is waiting on.

stuck on a spinlock held by #5

Hillf Danton

unread,

Sep 13, 2023, 7:07:26 AM9/13/23

to Sanan Hasanov, pet...@infradead.org, Linus Torvalds, Tetsuo Handa, Thomas Gleixner, syzk...@googlegroups.com, linu...@kvack.org, LKML

On Tue, 12 Sep 2023 23:02:56 +0000 Sanan Hasanov <Sanan....@ucf.edu>

> Good day, dear maintainers,
>
> We found a bug using a modified kernel configuration file used by syzbot.
>

Thanks for your report.

cpu1 cpu4 (see below)
==== ====
drm_crtc_vblank_off __run_hrtimer
spin_lock_irq(&dev->event_lock);
...
drm_handle_vblank
hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags);

Deadlock should have been reported instead provided the lockdep_map in
struct timer_list were added also to hrtimer, so it is highly appreciated
if Tetsuo or Thomas adds it before 6.8 or 6.10.

Hillf

Tetsuo Handa

unread,

Sep 13, 2023, 10:21:14 AM9/13/23

to Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzk...@googlegroups.com, LKML, Hillf Danton, Sanan Hasanov

Hello. A deadlock was reported in drivers/gpu/drm/vkms/ .
It looks like this locking pattern remains as of 6.6-rc1. Please fix.

void drm_crtc_vblank_off(struct drm_crtc *crtc) {
spin_lock_irq(&dev->event_lock);
drm_vblank_disable_and_save(dev, pipe) {
__disable_vblank(dev, pipe) {
crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank {
hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds.
ret = hrtimer_try_to_cancel(timer) {
base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress.
}
}
}
}
}
spin_unlock_irq(&dev->event_lock);
}

static void __run_hrtimer(...) {
restart = fn(timer) == vkms_vblank_simulate {
drm_crtc_handle_vblank(crtc) {
drm_handle_vblank(struct drm_device *dev, unsigned int pipe) {
spin_lock_irqsave(&dev->event_lock, irqflags); // Trying to hold dev->event_lock inside timer interrupt handler. => Deadlock was reported as a soft lockup.
spin_unlock_irqrestore(&dev->event_lock, irqflags);

Tetsuo Handa

unread,

Sep 13, 2023, 10:30:39 AM9/13/23

to Hillf Danton, Sanan Hasanov, Thomas Gleixner, pet...@infradead.org, Linus Torvalds, syzk...@googlegroups.com, LKML

On 2023/09/13 20:07, Hillf Danton wrote:
>

> cpu1 cpu4 (see below)
> ==== ====
> drm_crtc_vblank_off __run_hrtimer
> spin_lock_irq(&dev->event_lock);
> ...
> drm_handle_vblank
> hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags);
>
>
> Deadlock should have been reported instead provided the lockdep_map in
> struct timer_list were added also to hrtimer, so it is highly appreciated
> if Tetsuo or Thomas adds it before 6.8 or 6.10.

Not me. ;-)

Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds,
we want to add a lockdep annotation into hrtimer_cancel() so that we can
detect this type of deadlock?

Linus Torvalds

unread,

Sep 13, 2023, 12:48:23 PM9/13/23

to Tetsuo Handa, Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzk...@googlegroups.com, LKML, Hillf Danton, Sanan Hasanov

On Wed, 13 Sept 2023 at 07:21, Tetsuo Handa

<penguin...@i-love.sakura.ne.jp> wrote:
>
> Hello. A deadlock was reported in drivers/gpu/drm/vkms/ .
> It looks like this locking pattern remains as of 6.6-rc1. Please fix.
>
> void drm_crtc_vblank_off(struct drm_crtc *crtc) {
> spin_lock_irq(&dev->event_lock);
> drm_vblank_disable_and_save(dev, pipe) {
> __disable_vblank(dev, pipe) {
> crtc->funcs->disable_vblank(crtc) == vkms_disable_vblank {
> hrtimer_cancel(&out->vblank_hrtimer) { // Retries with dev->event_lock held until lock_hrtimer_base() succeeds.
> ret = hrtimer_try_to_cancel(timer) {
> base = lock_hrtimer_base(timer, &flags); // Fails forever because vkms_vblank_simulate() is in progress.

Heh. Ok. This is clearly a bug, but it does seem to be limited to just
the vkms driver, and literally only to the "simulate vblank" case.

The worst part about it is that it's so subtle and not obvious.

Some light grepping seems to show that amdgpu has almost the exact
same pattern in its own vkms thing, except it uses

hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer);

directly, which presumably fixes the livelock, but means that the
cancel will fail if it's currently running.

So just doing the same thing in the vkms driver probably fixes things.

Maybe the vkms people need to add a flag to say "it's canceled" so
that it doesn't then get re-enabled? Or maybe it doesn't matter
and/or already happens for some reason I didn't look into.

Linus

Thomas Gleixner

unread,

Sep 13, 2023, 5:08:25 PM9/13/23

to Linus Torvalds, Tetsuo Handa, Rodrigo Siqueira, Melissa Wen, Maira Canal, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzk...@googlegroups.com, LKML, Hillf Danton, Sanan Hasanov

Maybe the VKMS people need to understand locking in the first place. The
first thing I saw in this code is:

static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
{
...
mutex_unlock(&output->enabled_lock);

What?

Unlocking a mutex in the context of a hrtimer callback is simply
violating all mutex locking rules.

How has this code ever survived lock debugging without triggering a big
fat warning?

Thanks,

tglx

Tetsuo Handa

unread,

Sep 14, 2023, 2:33:54 AM9/14/23

to Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzk...@googlegroups.com, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds

On 2023/09/14 6:08, Thomas Gleixner wrote:
> Maybe the VKMS people need to understand locking in the first place. The
> first thing I saw in this code is:
>
> static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
> {
> ...
> mutex_unlock(&output->enabled_lock);
>
> What?
>
> Unlocking a mutex in the context of a hrtimer callback is simply
> violating all mutex locking rules.
>
> How has this code ever survived lock debugging without triggering a big
> fat warning?

Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer
and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't
tested with the lock debugging yet...

Maíra and Arthur, mutex_unlock() from interrupt context is not permitted.
Please revert that patch immediately.
I guess that a semaphore (down()/up()) could be used instead of a mutex.

Daniel Vetter

unread,

Sep 14, 2023, 4:12:42 AM9/14/23

to Tetsuo Handa, Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, Daniel Vetter, David Airlie, DRI, syzk...@googlegroups.com, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds

On Thu, Sep 14, 2023 at 03:33:41PM +0900, Tetsuo Handa wrote:
> On 2023/09/14 6:08, Thomas Gleixner wrote:
> > Maybe the VKMS people need to understand locking in the first place. The
> > first thing I saw in this code is:
> >
> > static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
> > {
> > ...
> > mutex_unlock(&output->enabled_lock);
> >
> > What?
> >
> > Unlocking a mutex in the context of a hrtimer callback is simply
> > violating all mutex locking rules.
> >
> > How has this code ever survived lock debugging without triggering a big
> > fat warning?
>
> Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer
> and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't
> tested with the lock debugging yet...

Yeah that needs an immediate revert, there's not much that looks legit in
that patch. I'll chat with Maira.

Also yes how that landed without anyone running lockdep is ... not good. I
guess we need a lockdep enabled drm ci target that runs vkms tests asap
:-)

> Maíra and Arthur, mutex_unlock() from interrupt context is not permitted.
> Please revert that patch immediately.
> I guess that a semaphore (down()/up()) could be used instead of a mutex.

From a quick look this smells like a classic "try to use locking when you
want synchronization primitives", so semaphore here doesn't look any
better. The vkms_set_composer() function was originally for crc
generation, where it's userspace's job to make sure they wait for all the
crc they need to be generated before they shut it down again. But for
writeback the kernel must guarantee that the compositiona actually
happens, and the current function just doesn't make any such guarantees.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Hillf Danton

unread,

Sep 14, 2023, 8:21:53 AM9/14/23

to Tetsuo Handa, Sanan Hasanov, Thomas Gleixner, Linus Torvalds, syzk...@googlegroups.com, linu...@kvack.org, LKML

Yes, you are right.

The diff below is my two cents (only for thoughts).

--- x/include/linux/timer.h
+++ y/include/linux/timer.h
@@ -124,6 +124,9 @@ struct hrtimer {
u8 is_rel;
u8 is_soft;
u8 is_hard;
+#ifdef CONFIG_LOCKDEP
+ struct lockdep_map lockdep_map;
+#endif
};

/**
@@ -369,33 +372,65 @@ static inline void hrtimer_cancel_wait_r
/* Exported timer functions: */

/* Initialize timers: */
-extern void hrtimer_init(struct hrtimer *timer, clockid_t which_clock,
- enum hrtimer_mode mode);
-extern void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id,
- enum hrtimer_mode mode);
+extern void hrtimer_init_key(struct hrtimer *timer, clockid_t which_clock,
+ enum hrtimer_mode mode,
+ const char *name, struct lock_class_key *key);
+extern void hrtimer_init_sleeper_key(struct hrtimer_sleeper *sl, clockid_t clock_id,
+ enum hrtimer_mode mode,
+ const char *name, struct lock_class_key *key);
+#ifdef CONFIG_LOCKDEP
+#define hrtimer_init(t, c, m) \
+ do { \
+ static struct lock_class_key __key; \
+ hrtimer_init_key(t, c, m, #t, &__key); \
+ } while (0)
+
+#define hrtimer_init_sleeper(s, c, m) \
+ do { \
+ static struct lock_class_key __key; \
+ hrtimer_init_sleeper_key(s, c, m, #s, &__key); \
+ } while (0)
+#else
+#define hrtimer_init(t, c, m) \
+ hrtimer_init_key(t, c, m, NULL, NULL)
+
+#define hrtimer_init_sleeper(s, c, m) \
+ hrtimer_init_sleeper_key(s, c, m, NULL, NULL)
+#endif

#ifdef CONFIG_DEBUG_OBJECTS_TIMERS
-extern void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t which_clock,
- enum hrtimer_mode mode);
-extern void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
+extern void hrtimer_init_on_stack_key(struct hrtimer *timer, clockid_t which_clock,
+ enum hrtimer_mode mode,
+ const char *name, struct lock_class_key *key);
+extern void hrtimer_init_sleeper_on_stack_key(struct hrtimer_sleeper *sl,
clockid_t clock_id,
- enum hrtimer_mode mode);
+ enum hrtimer_mode mode,
+ const char *name, struct lock_class_key *key);
+#ifdef CONFIG_LOCKDEP
+ #define hrtimer_init_on_stack(t, c, m) \
+ do { \
+ static struct lock_class_key __key; \
+ hrtimer_init_on_stack_key(t, c, m, #t, &__key); \
+ } while (0)
+ #define hrtimer_init_sleeper_on_stack(s, c, m) \
+ do { \
+ static struct lock_class_key __key; \
+ hrtimer_init_sleeper_on_stack_key(s, c, m, #s, &__key); \
+ } while (0)
+#else
+ #define hrtimer_init_on_stack(t, c, m) \
+ hrtimer_init_on_stack_key(t, c, m, NULL, NULL)
+ #define hrtimer_init_sleeper_on_stack(s, c, m) \
+ hrtimer_init_sleeper_on_stack_key(s, c, m, NULL, NULL)
+#endif

extern void destroy_hrtimer_on_stack(struct hrtimer *timer);
#else
-static inline void hrtimer_init_on_stack(struct hrtimer *timer,
- clockid_t which_clock,
- enum hrtimer_mode mode)
-{
- hrtimer_init(timer, which_clock, mode);
-}
+#define hrtimer_init_on_stack(t, c, m) \
+ hrtimer_init(t, c, m)

-static inline void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
- clockid_t clock_id,
- enum hrtimer_mode mode)
-{
- hrtimer_init_sleeper(sl, clock_id, mode);
-}
+#define hrtimer_init_sleeper_on_stack(s, c, m) \
+ hrtimer_init_sleeper(s, c, m)

static inline void destroy_hrtimer_on_stack(struct hrtimer *timer) { }
#endif
--- x/kernel/time/hrtimer.c
+++ y/kernel/time/hrtimer.c
@@ -428,22 +428,26 @@ static inline void debug_hrtimer_deactiv
static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
enum hrtimer_mode mode);

-void hrtimer_init_on_stack(struct hrtimer *timer, clockid_t clock_id,
- enum hrtimer_mode mode)
+void hrtimer_init_on_stack_key(struct hrtimer *timer, clockid_t clock_id,
+ enum hrtimer_mode mode,
+ const char *name, struct lock_class_key *key)
{
debug_object_init_on_stack(timer, &hrtimer_debug_descr);
__hrtimer_init(timer, clock_id, mode);
+ lockdep_init_map(&timer->lockdep_map, name, key, 0);
}
EXPORT_SYMBOL_GPL(hrtimer_init_on_stack);

static void __hrtimer_init_sleeper(struct hrtimer_sleeper *sl,
clockid_t clock_id, enum hrtimer_mode mode);

-void hrtimer_init_sleeper_on_stack(struct hrtimer_sleeper *sl,
- clockid_t clock_id, enum hrtimer_mode mode)
+void hrtimer_init_sleeper_on_stack_key(struct hrtimer_sleeper *sl,
+ clockid_t clock_id, enum hrtimer_mode mode,
+ const char *name, struct lock_class_key *key)
{
debug_object_init_on_stack(&sl->timer, &hrtimer_debug_descr);
__hrtimer_init_sleeper(sl, clock_id, mode);
+ lockdep_init_map(&sl->timer.lockdep_map, name, key, 0);
}
EXPORT_SYMBOL_GPL(hrtimer_init_sleeper_on_stack);

@@ -1439,6 +1443,8 @@ int hrtimer_cancel(struct hrtimer *timer
{
int ret;

+ lock_map_acquire(&timer->lockdep_map);
+ lock_map_release(&timer->lockdep_map);
do {
ret = hrtimer_try_to_cancel(timer);

@@ -1586,11 +1592,12 @@ static void __hrtimer_init(struct hrtime
* but the PINNED bit is ignored as pinning happens
* when the hrtimer is started
*/
-void hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
- enum hrtimer_mode mode)
+void hrtimer_init_key(struct hrtimer *timer, clockid_t clock_id, enum hrtimer_mode mode,
+ const char *name, struct lock_class_key *key)
{
debug_init(timer, clock_id, mode);
__hrtimer_init(timer, clock_id, mode);
+ lockdep_init_map(&timer->lockdep_map, name, key, 0);
}
EXPORT_SYMBOL_GPL(hrtimer_init);

@@ -1647,6 +1654,11 @@ static void __run_hrtimer(struct hrtimer
enum hrtimer_restart (*fn)(struct hrtimer *);
bool expires_in_hardirq;
int restart;
+#ifdef CONFIG_LOCKDEP
+ struct lockdep_map lockdep_map;
+
+ lockdep_copy_map(&lockdep_map, &timer->lockdep_map);
+#endif

lockdep_assert_held(&cpu_base->lock);

@@ -1682,7 +1694,9 @@ static void __run_hrtimer(struct hrtimer
trace_hrtimer_expire_entry(timer, now);
expires_in_hardirq = lockdep_hrtimer_enter(timer);

+ lock_map_acquire(&lockdep_map);
restart = fn(timer);
+ lock_map_release(&lockdep_map);

lockdep_hrtimer_exit(expires_in_hardirq);
trace_hrtimer_expire_exit(timer);
@@ -2004,12 +2018,13 @@ static void __hrtimer_init_sleeper(struc
* @clock_id: the clock to be used
* @mode: timer mode abs/rel
*/
-void hrtimer_init_sleeper(struct hrtimer_sleeper *sl, clockid_t clock_id,
- enum hrtimer_mode mode)
+void hrtimer_init_sleeper_key(struct hrtimer_sleeper *sl, clockid_t clock_id,
+ enum hrtimer_mode mode,
+ const char *name, struct lock_class_key *key)
{
debug_init(&sl->timer, clock_id, mode);
__hrtimer_init_sleeper(sl, clock_id, mode);
-
+ lockdep_init_map(&sl->timer.lockdep_map, name, key, 0);
}
EXPORT_SYMBOL_GPL(hrtimer_init_sleeper);

--

Tetsuo Handa

unread,

Sep 14, 2023, 9:14:22 AM9/14/23

to Thomas Gleixner, Sanan Hasanov, Linus Torvalds, syzk...@googlegroups.com, LKML, Hillf Danton

On 2023/09/14 21:21, Hillf Danton wrote:
> On Wed, 13 Sep 2023 23:30:23 +0900 Tetsuo Handa wrote:
>> On 2023/09/13 20:07, Hillf Danton wrote:
>>>
>>> cpu1 cpu4 (see below)
>>> ==== ====
>>> drm_crtc_vblank_off __run_hrtimer
>>> spin_lock_irq(&dev->event_lock);
>>> ...
>>> drm_handle_vblank
>>> hrtimer_cancel spin_lock_irqsave(&dev->event_lock, irqflags);
>>>
>>>
>>> Deadlock should have been reported instead provided the lockdep_map in
>>> struct timer_list were added also to hrtimer, so it is highly appreciated
>>> if Tetsuo or Thomas adds it before 6.8 or 6.10.
>>
>> Not me. ;-)
>>
>> Since hrtimer_cancel() retries forever until lock_hrtimer_base() succeeds,
>> we want to add a lockdep annotation into hrtimer_cancel() so that we can
>> detect this type of deadlock?

Here is a reproducer.

----------------------------------------
#include <linux/module.h>
#include <linux/delay.h>
static DEFINE_SPINLOCK(lock1);
static struct hrtimer timer1;
static enum hrtimer_restart timer_func(struct hrtimer *timer)
{
unsigned long flags;
mdelay(100); // Wait for test_init() to hold lock1.
spin_lock_irqsave(&lock1, flags);
spin_unlock_irqrestore(&lock1, flags);
return HRTIMER_RESTART;
}
static int __init test_init(void)
{
unsigned long flags;
hrtimer_init(&timer1, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
timer1.function = &timer_func;
hrtimer_start(&timer1, 1, HRTIMER_MODE_REL);
mdelay(10); // Wait for timer_func() to start.
spin_lock_irqsave(&lock1, flags);
hrtimer_cancel(&timer1); // Wait for timer_func() to finish.
spin_unlock_irqrestore(&lock1, flags);
return -EINVAL;
}
module_init(test_init);
MODULE_LICENSE("GPL");
----------------------------------------

----------------------------------------
[ 996.507681] test: loading out-of-tree module taints kernel.
[ 996.514019] test: module verification failed: signature and/or required key missing - tainting kernel
[ 1061.893054] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1061.900411] rcu: 4-...0: (1 GPs behind) idle=ed6c/1/0x4000000000000000 softirq=3304/3305 fqs=15784
[ 1061.909128] rcu: (detected by 0, t=65018 jiffies, g=12625, q=4422 ncpus=12)
[ 1061.915003] Sending NMI from CPU 0 to CPUs 4:
[ 1061.918972] NMI backtrace for cpu 4
[ 1061.919036] CPU: 4 PID: 3826 Comm: insmod Tainted: G OE 6.6.0-rc1+ #20
[ 1061.919093] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 1061.919095] RIP: 0010:delay_tsc+0x34/0xa0
[ 1061.919560] Code: ff 05 e8 b1 26 70 65 44 8b 0d e4 b1 26 70 0f 01 f9 66 90 48 c1 e2 20 48 09 c2 49 89 d0 eb 21 65 ff 0d c8 b1 26 70 74 54 f3 90 <65> ff 05 bd b1 26 70 65 8b 35 ba b1 26 70 41 39 f1 75 28 41 89 f1
[ 1061.919563] RSP: 0018:ffffb471c059cf00 EFLAGS: 00000083
[ 1061.919567] RAX: 0000028efe104ef6 RBX: 0000000000000041 RCX: 0000000000000004
[ 1061.919569] RDX: 00000000002192f8 RSI: 0000000000000004 RDI: 000000000027d81e
[ 1061.919571] RBP: ffff8970dafe5040 R08: 0000028efdeebbfe R09: 0000000000000004
[ 1061.919572] R10: 0000000000000001 R11: ffffffffc0a8d600 R12: ffffffff90e030e0
[ 1061.919574] R13: ffff8970dafe5040 R14: ffffffffc0a8b010 R15: ffff8970dafe5100
[ 1061.919630] FS: 00007fdd998eb740(0000) GS:ffff8970dae00000(0000) knlGS:0000000000000000
[ 1061.919633] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1061.919635] CR2: 0000000001edf678 CR3: 00000001a1060000 CR4: 0000000000350ee0
[ 1061.919639] Call Trace:
[ 1061.919641] <NMI>
[ 1061.919646] ? nmi_cpu_backtrace+0xb1/0x130
[ 1061.919711] ? nmi_cpu_backtrace_handler+0x11/0x20
[ 1061.922096] ? nmi_handle+0xe4/0x290
[ 1061.922163] ? default_do_nmi+0x49/0x100
[ 1061.922196] ? exc_nmi+0x152/0x1e0
[ 1061.922198] ? end_repeat_nmi+0x16/0x67
[ 1061.922340] ? __pfx_timer_func+0x10/0x10 [test]
[ 1061.922347] ? delay_tsc+0x34/0xa0
[ 1061.922349] ? delay_tsc+0x34/0xa0
[ 1061.922350] ? delay_tsc+0x34/0xa0
[ 1061.922352] </NMI>
[ 1061.922353] <IRQ>
[ 1061.922353] timer_func+0x19/0xff0 [test]
[ 1061.922358] __hrtimer_run_queues+0x177/0x3a0
[ 1061.922362] hrtimer_interrupt+0x104/0x240
[ 1061.922364] ? __do_softirq+0x2db/0x392
[ 1061.922827] __sysvec_apic_timer_interrupt+0x64/0x180
[ 1061.922833] sysvec_apic_timer_interrupt+0x65/0x80
[ 1061.922894] </IRQ>
[ 1061.922896] <TASK>
[ 1061.922898] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 1061.922902] RIP: 0010:delay_tsc+0x4d/0xa0
[ 1061.922907] Code: c2 49 89 d0 eb 21 65 ff 0d c8 b1 26 70 74 54 f3 90 65 ff 05 bd b1 26 70 65 8b 35 ba b1 26 70 41 39 f1 75 28 41 89 f1 0f 01 f9 <66> 90 48 c1 e2 20 48 09 d0 48 89 c2 4c 29 c2 48 39 fa 72 c8 65 ff
[ 1061.922909] RSP: 0018:ffffb471c1e63bd0 EFLAGS: 00000246
[ 1061.922912] RAX: 00000000751ed8ab RBX: 000000000000000a RCX: 0000000000000004
[ 1061.922914] RDX: 0000000000000267 RSI: 0000000000000004 RDI: 000000000027d81e
[ 1061.922915] RBP: ffffffffc0a91010 R08: 00000267751adc59 R09: 0000000000000004
[ 1061.922917] R10: 0000000000000001 R11: ffffffff90cd85c8 R12: 0000000000000000
[ 1061.922918] R13: ffffb471c1e63d20 R14: 0000000000000000 R15: ffffffffc0a8d080
[ 1061.922923] ? __pfx_test_init+0x10/0x10 [test]
[ 1061.922934] test_init+0x52/0xff0 [test]
[ 1061.922941] do_one_initcall+0x5c/0x280
[ 1061.923004] ? kmalloc_trace+0xa9/0xc0
[ 1061.923105] do_init_module+0x60/0x240
[ 1061.923111] load_module+0x1f6e/0x20d0
[ 1061.923119] ? ima_post_read_file+0xe3/0xf0
[ 1061.923225] ? init_module_from_file+0x88/0xc0
[ 1061.923229] init_module_from_file+0x88/0xc0
[ 1061.923238] idempotent_init_module+0x19c/0x250
[ 1061.923244] ? security_capable+0x39/0x60
[ 1061.923304] __x64_sys_finit_module+0x5b/0xb0
[ 1061.923310] do_syscall_64+0x3b/0x90
[ 1061.923366] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 1061.923421] RIP: 0033:0x7fdd986f8e29
[ 1061.923427] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 17 e0 2c 00 f7 d8 64 89 01 48
[ 1061.923429] RSP: 002b:00007ffe2f34dd18 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[ 1061.923432] RAX: ffffffffffffffda RBX: 0000000001ede240 RCX: 00007fdd986f8e29
[ 1061.923434] RDX: 0000000000000000 RSI: 000000000041a96e RDI: 0000000000000003
[ 1061.923435] RBP: 000000000041a96e R08: 0000000000000000 R09: 00007ffe2f34deb8
[ 1061.923436] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[ 1061.923437] R13: 0000000001ede210 R14: 0000000000000000 R15: 0000000000000000
[ 1061.923444] </TASK>
[ 1061.923446] INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 4.474 msecs
----------------------------------------

>
> Yes, you are right.
>
> The diff below is my two cents (only for thoughts).
>

I'm thinking something like below. (Completely untested.)

I haven't checked IRQ state handling. But in the last diff chunk, why raw_spin_unlock_irqrestore()
(which does not re-enable IRQs if the caller already disabled IRQs) is used before calling the callback
function and raw_spin_lock_irq() (which always disables IRQs) is used after calling the callback
function? Is it legal to disable IRQs again when the caller already disabled IRQs?

----------------------------------------
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 0ee140176f10..5640730ec31c 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -123,8 +123,11 @@ struct hrtimer {
u8 state;

u8 is_rel;
u8 is_soft;
u8 is_hard;
+#ifdef CONFIG_LOCKDEP
+ struct lockdep_map lockdep_map;
+#endif
};

/**

* struct hrtimer_sleeper - simple sleeper structure
@@ -440,15 +443,15 @@ static inline void hrtimer_restart(struct hrtimer *timer)
hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
}

/* Query timers: */
-extern ktime_t __hrtimer_get_remaining(const struct hrtimer *timer, bool adjust);
+extern ktime_t __hrtimer_get_remaining(struct hrtimer *timer, bool adjust);

/**
* hrtimer_get_remaining - get remaining time for the timer
* @timer: the timer to read
*/
-static inline ktime_t hrtimer_get_remaining(const struct hrtimer *timer)
+static inline ktime_t hrtimer_get_remaining(struct hrtimer *timer)
{
return __hrtimer_get_remaining(timer, false);
}

diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 238262e4aba7..fe0681d34b56 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -161,14 +161,23 @@ static inline bool is_migration_base(struct hrtimer_clock_base *base)
* possible to set timer->base = &migration_base and drop the lock: the timer
* remains locked.
*/
static
-struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
+struct hrtimer_clock_base *lock_hrtimer_base(struct hrtimer *timer,
unsigned long *flags)
__acquires(&timer->base->lock)
{
struct hrtimer_clock_base *base;

+#ifdef CONFIG_LOCKDEP
+ unsigned long flags2;
+
+ local_irq_save(flags2);

+ lock_map_acquire(&timer->lockdep_map);
+ lock_map_release(&timer->lockdep_map);

+ local_irq_restore(flags2);
+#endif
+
for (;;) {
base = READ_ONCE(timer->base);
if (likely(base != &migration_base)) {
raw_spin_lock_irqsave(&base->cpu_base->lock, *flags);
@@ -1456,9 +1465,9 @@ EXPORT_SYMBOL_GPL(hrtimer_cancel);
* __hrtimer_get_remaining - get remaining time for the timer
* @timer: the timer to read
* @adjust: adjust relative timers when CONFIG_TIME_LOW_RES=y
*/
-ktime_t __hrtimer_get_remaining(const struct hrtimer *timer, bool adjust)
+ktime_t __hrtimer_get_remaining(struct hrtimer *timer, bool adjust)
{
unsigned long flags;
ktime_t rem;

@@ -1574,8 +1583,14 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
timer->is_soft = softtimer;
timer->is_hard = !!(mode & HRTIMER_MODE_HARD);
timer->base = &cpu_base->clock_base[base];
timerqueue_init(&timer->node);
+#ifdef CONFIG_LOCKDEP
+ {

+ static struct lock_class_key __key;

+ lockdep_init_map(&timer->lockdep_map, "hrtimer", &__key, 0);
+ }
+#endif
}

/**
* hrtimer_init - initialize a timer to the given clock
@@ -1684,9 +1699,19 @@ static void __run_hrtimer(struct hrtimer_cpu_base *cpu_base,
raw_spin_unlock_irqrestore(&cpu_base->lock, flags);

trace_hrtimer_expire_entry(timer, now);
expires_in_hardirq = lockdep_hrtimer_enter(timer);

+#ifdef CONFIG_LOCKDEP
+ local_irq_save(flags);
+ lock_map_acquire(&timer->lockdep_map);
+ local_irq_restore(flags);
+#endif
restart = fn(timer);
+#ifdef CONFIG_LOCKDEP
+ local_irq_save(flags);
+ lock_map_release(&timer->lockdep_map);
+ local_irq_restore(flags);
+#endif

lockdep_hrtimer_exit(expires_in_hardirq);
trace_hrtimer_expire_exit(timer);
raw_spin_lock_irq(&cpu_base->lock);
----------------------------------------

Helen Koike

unread,

Sep 18, 2023, 9:07:20 PM9/18/23

to Tetsuo Handa, Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, David Airlie, DRI, syzk...@googlegroups.com, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds, Daniel Stone, David Heidelberg, Vignesh Raman

On 14/09/2023 05:12, Daniel Vetter wrote:
> On Thu, Sep 14, 2023 at 03:33:41PM +0900, Tetsuo Handa wrote:
>> On 2023/09/14 6:08, Thomas Gleixner wrote:
>>> Maybe the VKMS people need to understand locking in the first place. The
>>> first thing I saw in this code is:
>>>
>>> static enum hrtimer_restart vkms_vblank_simulate(struct hrtimer *timer)
>>> {
>>> ...
>>> mutex_unlock(&output->enabled_lock);
>>>
>>> What?
>>>
>>> Unlocking a mutex in the context of a hrtimer callback is simply
>>> violating all mutex locking rules.
>>>
>>> How has this code ever survived lock debugging without triggering a big
>>> fat warning?
>>
>> Commit a0e6a017ab56936c ("drm/vkms: Fix race-condition between the hrtimer
>> and the atomic commit") in 6.6-rc1 replaced spinlock with mutex. So we haven't
>> tested with the lock debugging yet...
>
> Yeah that needs an immediate revert, there's not much that looks legit in
> that patch. I'll chat with Maira.
>
> Also yes how that landed without anyone running lockdep is ... not good. I
> guess we need a lockdep enabled drm ci target that runs vkms tests asap
> :-)

btw, I just executed a draft version of vkms targed on the ci, we do get
the warning:

https://gitlab.freedesktop.org/helen.fornazier/linux/-/jobs/49156305#L623

I'm just not sure if tests would fail (since it is a warning) and it has
a chance to be ignored if people don't look at the logs (unless if igt
already handles that).

I still need to do some adjustments (it seems the tests is hanging
somewhere and we got a timeout) but we should have vkms in drm ci soon.

Regards,
Helen

Daniel Stone

unread,

Sep 19, 2023, 5:44:57 AM9/19/23

to Helen Koike, Tetsuo Handa, Maira Canal, Arthur Grillo, Rodrigo Siqueira, Melissa Wen, Haneen Mohammed, David Airlie, DRI, syzk...@googlegroups.com, LKML, Hillf Danton, Sanan Hasanov, Thomas Gleixner, Linus Torvalds, Daniel Stone, David Heidelberg, Vignesh Raman

On Mon, 18 Sept 2023 at 23:02, Helen Koike <helen...@collabora.com> wrote:
> On 14/09/2023 05:12, Daniel Vetter wrote:
> > Also yes how that landed without anyone running lockdep is ... not good. I
> > guess we need a lockdep enabled drm ci target that runs vkms tests asap
> > :-)
>
> btw, I just executed a draft version of vkms targed on the ci, we do get
> the warning:
>
> https://gitlab.freedesktop.org/helen.fornazier/linux/-/jobs/49156305#L623
>
> I'm just not sure if tests would fail (since it is a warning) and it has
> a chance to be ignored if people don't look at the logs (unless if igt
> already handles that).

Hmm, dmesg-warn is already a separate igt test status. I guess it just
needs to be handled explicitly.

> I still need to do some adjustments (it seems the tests is hanging
> somewhere and we got a timeout) but we should have vkms in drm ci soon.

Might be due to the locking explosion? The kernels should probably
have soft-lockup detection enabled as well as lockdep.

Cheers,
Daniel

Hillf Danton

unread,

Oct 24, 2023, 7:02:22 AM10/24/23

to Tetsuo Handa, Thomas Gleixner, Sanan Hasanov, Linus Torvalds, syzk...@googlegroups.com, LKML

On Thu, 14 Sep 2023 22:13:42 +0900 Tetsuo Handa wrote:
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -123,8 +123,11 @@ struct hrtimer {
> u8 state;
> u8 is_rel;
> u8 is_soft;
> u8 is_hard;
> +#ifdef CONFIG_LOCKDEP
> + struct lockdep_map lockdep_map;
> +#endif
> };
>

[...]

> @@ -1574,8 +1583,14 @@ static void __hrtimer_init(struct hrtimer *timer, clockid_t clock_id,
> timer->is_soft = softtimer;
> timer->is_hard = !!(mode & HRTIMER_MODE_HARD);
> timer->base = &cpu_base->clock_base[base];
> timerqueue_init(&timer->node);
> +#ifdef CONFIG_LOCKDEP
> + {
> + static struct lock_class_key __key;
> + lockdep_init_map(&timer->lockdep_map, "hrtimer", &__key, 0);
> + }
> +#endif
> }

Take a look at another lock init [1] please.

[1] Subject: [PATCH] XArray: Make xa_lock_init macro
https://lore.kernel.org/lkml/20231002082535.151640...@linux.intel.com/

Reply all

Reply to author

Forward

BUG: soft lockup in smp_call_function

Sanan Hasanov

Peter Zijlstra

Hillf Danton

Tetsuo Handa

Tetsuo Handa

Linus Torvalds

Thomas Gleixner

Tetsuo Handa

Daniel Vetter

Hillf Danton

Tetsuo Handa

Helen Koike

Daniel Stone

Hillf Danton