console: lockup on boot

Sasha Levin

unread,

May 30, 2014, 9:20:02 AM5/30/14

to

Hi all,

I sometime see lockups when booting my KVM guest with the latest -next kernel,
it basically hangs right when it should start 'init', and after a while I get
the following spew:

[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 30.770667] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.541585] kworker/dying (172) used greatest stack depth: 3656 bytes left
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.010589] INFO: task swapper/0:1 blocked for more than 200 seconds.
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] general protection fault: 0000 [#1]
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 2950.759251] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 3109.224481] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 3251.771402] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 3401.522129] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 3568.564422] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 3754.638392] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 3941.597288] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 4114.243312] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 4287.428963] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 320.551104] BUG: spinlock lockup suspected on CPU#33, kworker/dying/172
[ 4458.566115] BUG: spinlock lockup suspected on CPU#32, swapper/32/0
[ 408.020558] BUG: spinlock lockup suspected on CPU#43, khungtaskd/3760
[ 30.790833] PANIC: double fault, error_code: 0x0
[ 320.551104] lock: console_sem+0x0/0x60, .magic: dead4ead, .owner: kworker/u101:1/484, .owner_cpu: -1
[ 408.020558] lock: console_sem+0x0/0x60, .magic: dead4ead, .owner: kworker/u101:1/484, .owner_cpu: 33
[ 321.761422] lock: console_sem+0x0/0x60, .magic: dead4ead, .owner: kworker/u101:1/484, .owner_cpu: -1
[ 320.551104] CPU: 33 PID: 172 Comm: kworker/dying Tainted: G W 3.15.0-rc7-next-20140529-sasha-00017-gd239c90-dirty #550
[ 320.551104] ffffffff8e087060 ffff88012a800b38 ffffffff8c4e92e3 0000000000000001
[ 30.790833] ffff88007bdd3000
[ 30.790833] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W 3.15.0-rc7-next-20140529-sasha-00017-gd239c90-dirty #550
[ 30.790833] ffff88012a800b58
[ 30.790833] task: ffff880a2b3f0000 ti: ffff88007c1d0000 task.ti: ffff88007c1d0000
[ 30.790833] ffffffff8c4dc9fe
[ 320.551104] RIP: (??:?)
[ 320.551104] ffffffff8e087060
[ 30.790833] 0000000086c41770
[ 30.790833] string.isra.5 (lib/vsprintf.c:506)
[ 30.790833] ffff88012a800b88
[ 30.790833] RSP: 0000:ffff88012ba00000 EFLAGS: 00010092
[ 30.790833] ffffffff891d08b4
[ 30.790833] RAX: ffffffff8d6ee711 RBX: ffffffff906949f0 RCX: 0000000000000000
[ 30.790833] ffffffff8e087078
[ 30.790833] RDX: ffffffff8d6eb246 RSI: ffffffffffffffff RDI: ffffffff906949f0
[ 320.551104]
[ 30.790833] RBP: ffff88012ba00018 R08: 000000000000ffff R09: 000000000000ffff
[ 320.551104] Call Trace:
[ 30.790833] R10: 73203a4755423001 R11: 206b636f6c6e6970 R12: ffffffff90694dc0
[ 30.790833] R13: ffffffff90694dc0 R14: 000000000000ffff R15: 0000000000000000
[ 30.790833] <UNK>
[ 30.790833] FS: 0000000000000000(0000) GS:ffff88012ba00000(0000) knlGS:0000000000000000
[ 321.761422] CPU: 0 PID: 484 Comm: kworker/u101:1 Tainted: G W 3.15.0-rc7-next-20140529-sasha-00017-gd239c90-dirty #550
[ 30.790833] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 320.551104] sending NMI to all CPUs:
[ 30.790833] CR2: ffff88012b9ffff8 CR3: 000000000e02c000 CR4: 00000000000006a0
[ 321.761422] BUG: spinlock lockup suspected on CPU#0, kworker/u101:1/484
[ 321.761422] lock: logbuf_lock+0x0/0x48, .magic: dead4ead, .owner: kworker/u101:1/484, .owner_cpu: -1
[ 321.761422] CPU: 0 PID: 484 Comm: kworker/u101:1 Tainted: G W 3.15.0-rc7-next-20140529-sasha-00017-gd239c90-dirty #550
[ 321.761422] Workqueue: events_power_efficient fb_flashcursor
[ 321.761422] ffffffff8e087000 ffff88007da06c78 ffffffff8c4e92e3 0000000000000000
[ 321.761422] ffff88007bdd3000 ffff88007da06c98 ffffffff8c4dc9fe ffffffff8e087000
[ 321.761422] 0000000086c41770 ffff88007da06cc8 ffffffff891d08b4 ffffffff8e087018
[ 321.761422] Call Trace:
[ 321.761422] <NMI> dump_stack (lib/dump_stack.c:52)
[ 321.761422] spin_dump (kernel/locking/spinlock_debug.c:68 (discriminator 6))
[ 321.761422] do_raw_spin_lock (include/linux/nmi.h:35 kernel/locking/spinlock_debug.c:119 kernel/locking/spinlock_debug.c:137)
[ 321.761422] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 321.761422] ? vprintk_emit (kernel/printk/printk.c:1620)
[ 321.761422] vprintk_emit (kernel/printk/printk.c:1620)
[ 321.761422] printk (kernel/printk/printk.c:1806)
[ 321.761422] arch_trigger_all_cpu_backtrace_handler (arch/x86/kernel/apic/hw_nmi.c:85)
[ 321.761422] nmi_handle (arch/x86/kernel/nmi.c:133)
[ 321.761422] ? nmi_handle (arch/x86/kernel/nmi.c:114)
[ 321.761422] default_do_nmi (arch/x86/kernel/nmi.c:336)
[ 321.761422] ? get_color.isra.12 (drivers/video/console/fbcon.c:369)
[ 321.761422] do_nmi (./arch/x86/include/asm/current.h:14 arch/x86/kernel/nmi.c:540)
[ 321.761422] end_repeat_nmi (arch/x86/kernel/entry_64.S:1701)
[ 321.761422] ? get_color.isra.12 (drivers/video/console/fbcon.c:369)
[ 321.761422] ? get_color.isra.12 (drivers/video/console/fbcon.c:369)
[ 321.761422] ? get_symbol_pos (kernel/kallsyms.c:230)
[ 321.761422] ? get_symbol_pos (kernel/kallsyms.c:230)
[ 321.761422] ? get_symbol_pos (kernel/kallsyms.c:230)
[ 321.761422] <<EOE>> <UNK>
[ 321.761422] NMI backtrace for cpu 0
[ 321.761422] CPU: 0 PID: 484 Comm: kworker/u101:1 Tainted: G W 3.15.0-rc7-next-20140529-sasha-00017-gd239c90-dirty #550
[ 321.761422] Workqueue: events_power_efficient fb_flashcursor
[ 321.761422] task: ffff88007bdd3000 ti: ffff88007bde8000 task.ti: ffff88007bde8000
[ 321.761422] RIP: get_symbol_pos (kernel/kallsyms.c:230)
[ 321.761422] RSP: 0000:ffff88007bde5c78 EFLAGS: 00000006
[ 321.761422] RAX: 0000000000008c7c RBX: ffff88007bde5cd0 RCX: 0000000000009831
[ 321.761422] RDX: ffff88007bde5cd8 RSI: ffff88007bde5ce0 RDI: ffffffff89bba9c0
[ 321.761422] RBP: ffff88007bde5c78 R08: 000000000000a3e6 R09: 000000000000ffff
[ 321.761422] R10: 000000000005da8d R11: 3a6d6d6f43203438 R12: ffff88007bde5d41
[ 321.761422] R13: ffffffff89bba9c0 R14: 0000000000000000 R15: ffffffff8d6e8c47
[ 321.761422] FS: 0000000000000000(0000) GS:ffff88007da00000(0000) knlGS:0000000000000000
[ 321.761422] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 321.761422] CR2: 00000000ffffffff CR3: 000000000e02c000 CR4: 00000000000006b0
[ 321.761422] Stack:
[ 321.761422] ffff88007bde5cb8 ffffffff8920eb4e 70202c786c383025 ffff88007bde5e31
[ 321.761422] ffff8800fbde5e2b ffff88007bde5d41 ffffffff89bba9c0 0000000000000000
[ 321.761422] ffff88007bde5d08 ffffffff8920ec42 522829676e6fffff 296e656c3e2d4345
[ 321.761422] Call Trace:
[ 321.761422] <UNK>
[ 321.761422] Code: 30 0e f7 8c 55 48 85 c0 48 89 e5 75 02 0f 0b 4c 8b 15 ca fc 04 04 31 c0 4d 89 d0 eb 16 48 d1 e9 48 01 c1 48 39 3c cd 30 0e f7 8c <4c> 0f 47 c1 48 0f 46 c1 4c 89 c1 48 29 c1 48 83 f9 01 77 de 48
All code
========
0: 30 0e xor %cl,(%rsi)
2: f7 (bad)
3: 8c 55 48 mov %ss,0x48(%rbp)
6: 85 c0 test %eax,%eax
8: 48 89 e5 mov %rsp,%rbp
b: 75 02 jne 0xf
d: 0f 0b ud2
f: 4c 8b 15 ca fc 04 04 mov 0x404fcca(%rip),%r10 # 0x404fce0
16: 31 c0 xor %eax,%eax
18: 4d 89 d0 mov %r10,%r8
1b: eb 16 jmp 0x33
1d: 48 d1 e9 shr %rcx
20: 48 01 c1 add %rax,%rcx
23: 48 39 3c cd 30 0e f7 cmp %rdi,-0x7308f1d0(,%rcx,8)
2a: 8c
2b:* 4c 0f 47 c1 cmova %rcx,%r8 <-- trapping instruction
2f: 48 0f 46 c1 cmovbe %rcx,%rax
33: 4c 89 c1 mov %r8,%rcx
36: 48 29 c1 sub %rax,%rcx
39: 48 83 f9 01 cmp $0x1,%rcx
3d: 77 de ja 0x1d
3f: 48 rex.W
...

Code starting with the faulting instruction
===========================================
0: 4c 0f 47 c1 cmova %rcx,%r8
4: 48 0f 46 c1 cmovbe %rcx,%rax
8: 4c 89 c1 mov %r8,%rcx
b: 48 29 c1 sub %rax,%rcx
e: 48 83 f9 01 cmp $0x1,%rcx
12: 77 de ja 0xfffffffffffffff2
14: 48 rex.W
...
[ 30.790833] NMI backtrace for cpu 1
[hang]

Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Peter Hurley

unread,

May 30, 2014, 10:00:02 AM5/30/14

to

On 05/30/2014 09:11 AM, Sasha Levin wrote:
> Hi all,
>
> I sometime see lockups when booting my KVM guest with the latest -next kernel,
> it basically hangs right when it should start 'init', and after a while I get
> the following spew:
>
> [ 30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0

Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
from Jet Chen which was bisected to

commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
Author: Jan Kara <ja...@suse.cz>
AuthorDate: Thu May 22 10:43:35 2014 +1000
Commit: Stephen Rothwell <s...@canb.auug.org.au>
CommitDate: Thu May 22 10:43:35 2014 +1000

printk: enable interrupts before calling console_trylock_for_printk()
We need interrupts disabled when calling console_trylock_for_printk() only
so that cpu id we pass to can_use_console() remains valid (for other
things console_sem provides all the exclusion we need and deadlocks on
console_sem due to interrupts are impossible because we use
down_trylock()). However if we are rescheduled, we are guaranteed to run
on an online cpu so we can easily just get the cpu id in
can_use_console().
We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH it
can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().
Signed-off-by: Jan Kara <ja...@suse.cz>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>

?

Jan Kara

unread,

May 30, 2014, 10:10:03 AM5/30/14

to

Yeah, very likely. I think I see the problem, I'll send the fix shortly.

Honza

Jan Kara <ja...@suse.cz>
SUSE Labs, CR

Sasha Levin

unread,

Jun 6, 2014, 3:10:02 PM6/6/14

to

Hi Jan,

It seems that the issue I'm seeing is different from the "[prink] BUG: spinlock
lockup suspected on CPU#0, swapper/1".

Is there anything else I could try here? The issue is very common during testing.

Peter Hurley

unread,

Jun 10, 2014, 12:00:03 PM6/10/14

to

Sasha,

Is this bisectable? Maybe that's the best way forward here.

Regards,
Peter Hurley

Sasha Levin

unread,

Jun 11, 2014, 11:00:02 AM6/11/14

to

I've ran a bisection again and ended up at the same commit as Jet Chen (the commit
unfortunately already made it to Linus's tree).

Note that I did try Jan's proposed fix and that didn't solve the issue for me, I
believe we're seeing different issues caused by the same commit.

939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
Author: Jan Kara <ja...@suse.cz>
Date: Wed Jun 4 16:11:37 2014 -0700

printk: enable interrupts before calling console_trylock_for_printk()

We need interrupts disabled when calling console_trylock_for_printk()
only so that cpu id we pass to can_use_console() remains valid (for
other things console_sem provides all the exclusion we need and
deadlocks on console_sem due to interrupts are impossible because we use
down_trylock()). However if we are rescheduled, we are guaranteed to
run on an online cpu so we can easily just get the cpu id in
can_use_console().

We can lose a bit of performance when we enable interrupts in
vprintk_emit() and then disable them again in console_unlock() but OTOH
it can somewhat reduce interrupt latency caused by console_unlock()
especially since later in the patch series we will want to spin on
console_sem in console_trylock_for_printk().

Signed-off-by: Jan Kara <ja...@suse.cz>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>

Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Thanks,
Sasha

Peter Hurley

unread,

Jun 11, 2014, 11:40:02 AM6/11/14

to

I apologize; I didn't look at the patch very closely, but now that I do, this
sticks out:

@@ -1597,17 +1599,22 @@ asmlinkage int vprintk_emit(int facility, int level,

logbuf_cpu = UINT_MAX;
raw_spin_unlock(&logbuf_lock);
+ lockdep_on();
+ local_irq_restore(flags);
+

What prevents cpu migration right here?
If nothing, then logbuf_cpu is now stale and the recursion test at
the top of vprintk_emit is doing nothing to prevent recursion.

+ /*
+ * Disable preemption to avoid being preempted while holding
+ * console_sem which would prevent anyone from printing to console
+ */
+ preempt_disable();
/*
* Try to acquire and then immediately release the console semaphore.
* The release will print out buffers and wake up /dev/kmsg and syslog()
* users.
*/
- if (console_trylock_for_printk(this_cpu))
+ if (console_trylock_for_printk())
console_unlock();

Regards,
Peter Hurley

Linus Torvalds

unread,

Jun 11, 2014, 1:40:03 PM6/11/14

to

On Wed, Jun 11, 2014 at 7:55 AM, Sasha Levin <sasha...@oracle.com> wrote:
>
> I've ran a bisection again and ended up at the same commit as Jet Chen (the commit
> unfortunately already made it to Linus's tree).

Ok, I'm going to revert that commit. The games it plays with cpu
numbers and preemption make me worried, and it looks silly.

If the only reason for that patch is interrupt latency, then if the
"preempt_disable()" had been done *before* restoring irq's, none of
those "let's reload the CPU number" games would have been necessary,
and the patch could have been smaller. So even if that is the solution
to the problems people report, it's actually *better* to just revert
the patch entirely, and try again with a smaller and simpler version,
rather than try to change the patch after-the-fact.

Linus

Linus Torvalds

unread,

Jun 11, 2014, 1:50:01 PM6/11/14

to

On Wed, Jun 11, 2014 at 10:38 AM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
> Ok, I'm going to revert that commit. The games it plays with cpu
> numbers and preemption make me worried, and it looks silly.
>
> If the only reason for that patch is interrupt latency, then if the
> "preempt_disable()" had been done *before* restoring irq's, none of
> those "let's reload the CPU number" games would have been necessary,
> and the patch could have been smaller.

Ugh. It doesn't revert cleanly, so I guess I'm not just reverting it
after all. Jan?

Jan Kara

unread,

Jun 11, 2014, 4:40:01 PM6/11/14

to

Nothing.

> If nothing, then logbuf_cpu is now stale and the recursion test at
> the top of vprintk_emit is doing nothing to prevent recursion.

Well, note that logbuf_cpu has just been set to UINT_MAX (i.e. undefined)
two lines above. So my patch changes nothing wrt. how printk recursion
detection works (at least AFAICT).

> + /*
> + * Disable preemption to avoid being preempted while holding
> + * console_sem which would prevent anyone from printing to console
> + */
> + preempt_disable();
> /*
> * Try to acquire and then immediately release the console semaphore.
> * The release will print out buffers and wake up /dev/kmsg and syslog()
> * users.
> */
> - if (console_trylock_for_printk(this_cpu))
> + if (console_trylock_for_printk())
> console_unlock();

Honza

--
Jan Kara <ja...@suse.cz>
SUSE Labs, CR

Jan Kara

unread,

Jun 11, 2014, 4:40:01 PM6/11/14

to

Sorry it has been busy time lately and I didn't have as much time to look
into this as would be needed.

>
>
> 939f04bec1a4ef6ba4370b0f34b01decc844b1b1 is the first bad commit
> commit 939f04bec1a4ef6ba4370b0f34b01decc844b1b1
> Author: Jan Kara <ja...@suse.cz>
> Date: Wed Jun 4 16:11:37 2014 -0700
>
> printk: enable interrupts before calling console_trylock_for_printk()
>
> We need interrupts disabled when calling console_trylock_for_printk()
> only so that cpu id we pass to can_use_console() remains valid (for
> other things console_sem provides all the exclusion we need and
> deadlocks on console_sem due to interrupts are impossible because we use
> down_trylock()). However if we are rescheduled, we are guaranteed to
> run on an online cpu so we can easily just get the cpu id in
> can_use_console().
>
> We can lose a bit of performance when we enable interrupts in
> vprintk_emit() and then disable them again in console_unlock() but OTOH
> it can somewhat reduce interrupt latency caused by console_unlock()
> especially since later in the patch series we will want to spin on
> console_sem in console_trylock_for_printk().
>
> Signed-off-by: Jan Kara <ja...@suse.cz>
> Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
>
>
> Thanks,
> Sasha
--

Jan Kara <ja...@suse.cz>
SUSE Labs, CR

Jan Kara

unread,

Jun 11, 2014, 5:40:02 PM6/11/14

to

Oops, pressed send too early... So I have two debug patches for you. Can
you try whether the problem reproduces with the first one or with both of
them applied?

The stacktrace from spinlock lockup is somewhat weird - the spinlock lockup
message comes from NMI context where we tried to do print stack dump.
Indeed calling printk from NMI context *can* lead to this sort of lockups
(and the current process is a worker thread which apparently runs
fb_flashcursor() function which calls console_unlock() which can hold
logbuf_lock so all these things fit together). What I really fail to see
is how my patch makes the problem happen to you pretty reliably.

Another somewhat strange thing to me is that although lock->owner in your
dump looks fine and points to currently running process, lock->owner_cpu is
-1 which is a combination that shouldn't ever happen as I'm looking into
spinlock debug code.

So for now I'm still puzzled.
Honza

0001-printk-Debug-patch1.patch

0002-printk-Debug-patch-2.patch

Sasha Levin

unread,

Jun 11, 2014, 11:10:02 PM6/11/14

to

The first patch fixed it (I assumed that there's no need to try the second).

Thanks,
Sasha

Jan Kara

unread,

Jun 12, 2014, 4:30:03 AM6/12/14

to

Good. So that shows that it is the increased lockdep coverage which is
causing problems - with my patch, lockdep is able to identify some problem
because console drivers are now called with lockdep enabled. But because
the problem was found in some difficult context, lockdep just hung the
machine when trying to report it... Sadly the stacktraces you posted don't
tell us what lockdep found.

Adding Peter Zijlstra to CC. Peter, any idea how lockdep could report
problems when holding logbuf_lock? One possibility would be to extend
logbuf_cpu recursion logic to every holder of logbuf_lock. That will at
least avoid the spinlock recursion killing the machine but we won't be able
to see what lockdep found...

Honza
--
Jan Kara <ja...@suse.cz>
SUSE Labs, CR

Mike Galbraith

unread,

Jun 12, 2014, 5:00:03 AM6/12/14

to

Could tell lockdep to use trace_printk().

Jan Kara

unread,

Jun 19, 2014, 1:30:02 PM6/19/14

to

On Wed 11-06-14 23:07:04, Sasha Levin wrote:

OK, so I got back to it. Can you try with attached patch and without my
"fix"? lockdep should dump complaints using trace buffer (so you should
check it from time to time) instead of using printk and thus we should be
able to see what it is complaining about. Thanks!

0001-lockdep-Dump-info-via-tracing.patch

Peter Zijlstra

unread,

Jul 8, 2014, 9:10:02 AM7/8/14

to

On Thu, Jun 12, 2014 at 10:54:22AM +0200, Mike Galbraith wrote:
> On Thu, 2014-06-12 at 10:26 +0200, Jan Kara wrote:
> > On Wed 11-06-14 23:07:04, Sasha Levin wrote:
>
> > > The first patch fixed it (I assumed that there's no need to try the second).
> > Good. So that shows that it is the increased lockdep coverage which is
> > causing problems - with my patch, lockdep is able to identify some problem
> > because console drivers are now called with lockdep enabled. But because
> > the problem was found in some difficult context, lockdep just hung the
> > machine when trying to report it... Sadly the stacktraces you posted don't
> > tell us what lockdep found.
> >
> > Adding Peter Zijlstra to CC. Peter, any idea how lockdep could report
> > problems when holding logbuf_lock? One possibility would be to extend
> > logbuf_cpu recursion logic to every holder of logbuf_lock. That will at
> > least avoid the spinlock recursion killing the machine but we won't be able
> > to see what lockdep found...
>
> Could tell lockdep to use trace_printk().

lkml.kernel.org/r/201112211111...@chello.nl