upstream test error: WARNING in __local_bh_enable_ip

63 views
Skip to first unread message

syzbot

unread,
Aug 5, 2020, 3:19:18 AM8/5/20
to b...@alien8.de, dave....@linux.intel.com, fengh...@intel.com, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, tony...@intel.com, x...@kernel.org, yu-ch...@intel.com
Hello,

syzbot found the following issue on:

HEAD commit: 2324d50d Merge tag 'docs-5.9' of git://git.lwn.net/linux
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16502632900000
kernel config: https://syzkaller.appspot.com/x/.config?x=b21ad52fca7be434
dashboard link: https://syzkaller.appspot.com/bug?extid=8db9e1ecde74e590a657
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project.git ca2dcbd030eadbf0aa9b660efe864ff08af6e18b)

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+8db9e1...@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 8690 at kernel/softirq.c:175 __local_bh_enable_ip+0xdf/0x100 kernel/softirq.c:175
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 8690 Comm: syz-fuzzer Not tainted 5.8.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x16e/0x25d lib/dump_stack.c:118
panic+0x20c/0x69a kernel/panic.c:231
__warn+0x211/0x240 kernel/panic.c:600
report_bug+0x153/0x1d0 lib/bug.c:198
handle_bug+0x4d/0x90 arch/x86/kernel/traps.c:235
exc_invalid_op+0x16/0x70 arch/x86/kernel/traps.c:255
asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:536
RIP: 0010:__local_bh_enable_ip+0xdf/0x100 kernel/softirq.c:175
Code: 00 00 65 8b 05 ae cd cf 7e 85 c0 75 05 e8 10 ae ce ff 5b 41 5e c3 0f 0b 83 3d 40 ee 2f 07 00 0f 85 49 ff ff ff e9 53 ff ff ff <0f> 0b e9 4c ff ff ff e8 15 00 00 00 eb aa 0f 0b 0f 0b 0f 1f 44 00
RSP: 0000:ffffc90000debd40 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000200 RCX: ffff88811fc2a3c0
RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff811bf3f5
RBP: 0000000000000000 R08: 0000000000000000 R09: 000088811fc2b853
R10: 0000ffffffffffff R11: ffff88811fc2a3c0 R12: ffff88811fc2a3c0
R13: 000000c000099c80 R14: ffffffff811bf3f5 R15: ffff88811fc2b850
local_bh_enable+0x1b/0x20 include/linux/bottom_half.h:32
fpregs_unlock arch/x86/include/asm/fpu/api.h:41 [inline]
copy_fpstate_to_sigframe+0x2aa/0x5a0 arch/x86/kernel/fpu/signal.c:195
get_sigframe+0xf9/0x130 arch/x86/kernel/signal.c:274
__setup_rt_frame+0x56/0x310 arch/x86/kernel/signal.c:450
setup_rt_frame arch/x86/kernel/signal.c:702 [inline]
handle_signal arch/x86/kernel/signal.c:746 [inline]
arch_do_signal+0x1bd/0x270 arch/x86/kernel/signal.c:813
exit_to_user_mode_loop kernel/entry/common.c:135 [inline]
exit_to_user_mode_prepare+0x1bd/0x290 kernel/entry/common.c:166
irqentry_exit_to_user_mode+0x6/0x30 kernel/entry/common.c:254
irqentry_exit+0x23/0x80 kernel/entry/common.c:342
asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:538
RIP: 0033:0x467f53
Code: 00 48 81 eb 00 01 00 00 48 81 c7 00 01 00 00 48 81 fb 00 01 00 00 73 82 e9 07 ff ff ff c5 fd ef c0 48 81 fb 00 00 00 02 73 46 <c5> fe 7f 07 c5 fe 7f 47 20 c5 fe 7f 47 40 c5 fe 7f 47 60 48 81 eb
RSP: 002b:000000c00009fed8 EFLAGS: 00010202
RAX: 0000000000000000 RBX: 000000000000ce00 RCX: 0000000000000000
RDX: 00007f952f2dee00 RSI: 0000000000040000 RDI: 00007f952f2d2000
RBP: 000000c00009ff50 R08: 0000000000800000 R09: 00007f952f441fff
R10: 00007f952f086248 R11: 00000000000009cd R12: 00000000000003ff
R13: 0000000000000012 R14: 000080c000b9c000 R15: 000080c00139bfff
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Marco Elver

unread,
Aug 5, 2020, 9:26:38 AM8/5/20
to b...@alien8.de, dave....@linux.intel.com, fengh...@intel.com, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, tony...@intel.com, x...@kernel.org, yu-ch...@intel.com, pet...@infradead.org, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasa...@googlegroups.com, syzbot
Add missing noinstr to arch_local*() helpers, as they may be called from
noinstr code.

On a KCSAN config with CONFIG_PARAVIRT=y, syzbot stumbled across corrupt
IRQ trace state, with lockdep_assert_irqs_enabled() failing spuriously.
When enabling CONFIG_DEBUG_LOCKDEP=y, we get a warning about

DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled())

which we had seen before due to KCSAN-lockdep recursion. Due to
"lockdep: Prepare for NMI IRQ state tracking", KCSAN was changed to use
non-raw local_irq_{save,restore}(), assuming there is no more
KCSAN-lockdep recursion.

It turns out that the arch_local*() helpers in paravirt.h were missing
noinstr, as they themselves are used from noinstr code that is called
from lockdep. When inserting debug-code that warns us if lockdep is in
the stacktrace from KCSAN, we get,

RIP: 0010:kcsan_setup_watchpoint[...]
[...]
Call Trace:
arch_local_save_flags+0x11/0x30 arch/x86/include/asm/paravirt.h:765
check_preemption_disabled+0x51/0x140 lib/smp_processor_id.c:19
__this_cpu_preempt_check+0x18/0x20 lib/smp_processor_id.c:65
lockdep_hardirqs_off+0xaa/0x130 kernel/locking/lockdep.c:3801
trace_hardirqs_off+0x14/0x80 kernel/trace/trace_preemptirq.c:76
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
_raw_spin_lock_irqsave+0x48/0x90 kernel/locking/spinlock.c:159
wake_up_new_task+0x2c/0x270 kernel/sched/core.c:3338
_do_fork+0x27b/0x4f0 kernel/fork.c:2474
kernel_thread+0x85/0xb0 kernel/fork.c:2502
create_kthread kernel/kthread.c:315 [inline]
kthreadd+0x427/0x500 kernel/kthread.c:634
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

pointing to arch_local_save_flags() in paravirt.h, which is called from
noinstr functions in smp_processor_id.c, which in turn are called from
lockdep.

Link: https://lkml.kernel.org/r/0000000000007d...@google.com
Reported-by: syzbot+8db9e1...@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <el...@google.com>
---
arch/x86/include/asm/paravirt.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 3d2afecde50c..a606f2ba2b5e 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -760,27 +760,27 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
((struct paravirt_callee_save) { func })

#ifdef CONFIG_PARAVIRT_XXL
-static inline notrace unsigned long arch_local_save_flags(void)
+static inline noinstr unsigned long arch_local_save_flags(void)
{
return PVOP_CALLEE0(unsigned long, irq.save_fl);
}

-static inline notrace void arch_local_irq_restore(unsigned long f)
+static inline noinstr void arch_local_irq_restore(unsigned long f)
{
PVOP_VCALLEE1(irq.restore_fl, f);
}

-static inline notrace void arch_local_irq_disable(void)
+static inline noinstr void arch_local_irq_disable(void)
{
PVOP_VCALLEE0(irq.irq_disable);
}

-static inline notrace void arch_local_irq_enable(void)
+static inline noinstr void arch_local_irq_enable(void)
{
PVOP_VCALLEE0(irq.irq_enable);
}

-static inline notrace unsigned long arch_local_irq_save(void)
+static inline noinstr unsigned long arch_local_irq_save(void)
{
unsigned long f;

--
2.28.0.163.g6104cc2f0b6-goog

pet...@infradead.org

unread,
Aug 5, 2020, 9:42:39 AM8/5/20
to Marco Elver, b...@alien8.de, dave....@linux.intel.com, fengh...@intel.com, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, tony...@intel.com, x...@kernel.org, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasa...@googlegroups.com, syzbot
On Wed, Aug 05, 2020 at 03:26:29PM +0200, Marco Elver wrote:
> Add missing noinstr to arch_local*() helpers, as they may be called from
> noinstr code.
>
> On a KCSAN config with CONFIG_PARAVIRT=y, syzbot stumbled across corrupt

Cute, so I've been working on adding objtool support for this a little:

https://lkml.kernel.org/r/2020080314...@hirez.programming.kicks-ass.net

> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 3d2afecde50c..a606f2ba2b5e 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -760,27 +760,27 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
> ((struct paravirt_callee_save) { func })
>
> #ifdef CONFIG_PARAVIRT_XXL
> -static inline notrace unsigned long arch_local_save_flags(void)
> +static inline noinstr unsigned long arch_local_save_flags(void)
> {
> return PVOP_CALLEE0(unsigned long, irq.save_fl);
> }
>
> -static inline notrace void arch_local_irq_restore(unsigned long f)
> +static inline noinstr void arch_local_irq_restore(unsigned long f)
> {
> PVOP_VCALLEE1(irq.restore_fl, f);
> }
>
> -static inline notrace void arch_local_irq_disable(void)
> +static inline noinstr void arch_local_irq_disable(void)
> {
> PVOP_VCALLEE0(irq.irq_disable);
> }
>
> -static inline notrace void arch_local_irq_enable(void)
> +static inline noinstr void arch_local_irq_enable(void)
> {
> PVOP_VCALLEE0(irq.irq_enable);
> }
>
> -static inline notrace unsigned long arch_local_irq_save(void)
> +static inline noinstr unsigned long arch_local_irq_save(void)
> {
> unsigned long f;
>

Shouldn't we __always_inline those? They're going to be really small.

Marco Elver

unread,
Aug 5, 2020, 9:59:47 AM8/5/20
to pet...@infradead.org, b...@alien8.de, dave....@linux.intel.com, fengh...@intel.com, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, tony...@intel.com, x...@kernel.org, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasa...@googlegroups.com, syzbot
I can send a v2, and you can choose. For reference, though:

ffffffff86271ee0 <arch_local_save_flags>:
ffffffff86271ee0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffff86271ee5: 48 83 3d 43 87 e4 01 cmpq $0x0,0x1e48743(%rip) # ffffffff880ba630 <pv_ops+0x120>
ffffffff86271eec: 00
ffffffff86271eed: 74 0d je ffffffff86271efc <arch_local_save_flags+0x1c>
ffffffff86271eef: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffff86271ef4: ff 14 25 30 a6 0b 88 callq *0xffffffff880ba630
ffffffff86271efb: c3 retq
ffffffff86271efc: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffff86271f01: 0f 0b ud2
ffffffff86271f03: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
ffffffff86271f0a: 00 00 00 00
ffffffff86271f0e: 66 90 xchg %ax,%ax

[...]

ffffffff86271a90 <arch_local_irq_restore>:
ffffffff86271a90: 53 push %rbx
ffffffff86271a91: 48 89 fb mov %rdi,%rbx
ffffffff86271a94: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffff86271a99: 48 83 3d 97 8b e4 01 cmpq $0x0,0x1e48b97(%rip) # ffffffff880ba638 <pv_ops+0x128>
ffffffff86271aa0: 00
ffffffff86271aa1: 74 11 je ffffffff86271ab4 <arch_local_irq_restore+0x24>
ffffffff86271aa3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffff86271aa8: 48 89 df mov %rbx,%rdi
ffffffff86271aab: ff 14 25 38 a6 0b 88 callq *0xffffffff880ba638
ffffffff86271ab2: 5b pop %rbx
ffffffff86271ab3: c3 retq
ffffffff86271ab4: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffff86271ab9: 0f 0b ud2
ffffffff86271abb: cc int3
ffffffff86271abc: cc int3
ffffffff86271abd: cc int3
ffffffff86271abe: cc int3
ffffffff86271abf: cc int3

[... and the rest looking of similar size.]

Thanks,
-- Marco

pet...@infradead.org

unread,
Aug 5, 2020, 10:12:51 AM8/5/20
to Marco Elver, b...@alien8.de, dave....@linux.intel.com, fengh...@intel.com, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, tony...@intel.com, x...@kernel.org, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasa...@googlegroups.com, syzbot
On Wed, Aug 05, 2020 at 03:59:40PM +0200, Marco Elver wrote:
> On Wed, Aug 05, 2020 at 03:42PM +0200, pet...@infradead.org wrote:

> > Shouldn't we __always_inline those? They're going to be really small.
>
> I can send a v2, and you can choose. For reference, though:
>
> ffffffff86271ee0 <arch_local_save_flags>:
> ffffffff86271ee0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> ffffffff86271ee5: 48 83 3d 43 87 e4 01 cmpq $0x0,0x1e48743(%rip) # ffffffff880ba630 <pv_ops+0x120>
> ffffffff86271eec: 00
> ffffffff86271eed: 74 0d je ffffffff86271efc <arch_local_save_flags+0x1c>
> ffffffff86271eef: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> ffffffff86271ef4: ff 14 25 30 a6 0b 88 callq *0xffffffff880ba630
> ffffffff86271efb: c3 retq
> ffffffff86271efc: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> ffffffff86271f01: 0f 0b ud2

> ffffffff86271a90 <arch_local_irq_restore>:
> ffffffff86271a90: 53 push %rbx
> ffffffff86271a91: 48 89 fb mov %rdi,%rbx
> ffffffff86271a94: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> ffffffff86271a99: 48 83 3d 97 8b e4 01 cmpq $0x0,0x1e48b97(%rip) # ffffffff880ba638 <pv_ops+0x128>
> ffffffff86271aa0: 00
> ffffffff86271aa1: 74 11 je ffffffff86271ab4 <arch_local_irq_restore+0x24>
> ffffffff86271aa3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> ffffffff86271aa8: 48 89 df mov %rbx,%rdi
> ffffffff86271aab: ff 14 25 38 a6 0b 88 callq *0xffffffff880ba638
> ffffffff86271ab2: 5b pop %rbx
> ffffffff86271ab3: c3 retq
> ffffffff86271ab4: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> ffffffff86271ab9: 0f 0b ud2


Blergh, that's abysmall. In part I suspect because you have
CONFIG_PARAVIRT_DEBUG, let me try and untangle that PV macro maze.

Jürgen Groß

unread,
Aug 5, 2020, 10:17:09 AM8/5/20
to pet...@infradead.org, Marco Elver, b...@alien8.de, dave....@linux.intel.com, fengh...@intel.com, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, tony...@intel.com, x...@kernel.org, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasa...@googlegroups.com, syzbot
Probably. I have found the following in my kernel:

fffffff81540a5f <arch_local_save_flags>:
ffffffff81540a5f: ff 14 25 40 a4 23 82 callq *0xffffffff8223a440
ffffffff81540a66: c3 retq


Juergen

pet...@infradead.org

unread,
Aug 5, 2020, 10:17:15 AM8/5/20
to Marco Elver, b...@alien8.de, dave....@linux.intel.com, fengh...@intel.com, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, tony...@intel.com, x...@kernel.org, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasa...@googlegroups.com, syzbot
Yeah, look here:

0000 0000000000462149 <arch_local_save_flags>:
0000 462149: ff 14 25 00 00 00 00 callq *0x0
0003 46214c: R_X86_64_32S pv_ops+0x120
0007 462150: c3 retq


That's exactly what I was expecting.

Marco Elver

unread,
Aug 5, 2020, 10:36:26 AM8/5/20
to Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot
Ah, for some reason the __always_inline version does *not* work with
KCSAN -- I'm getting various warnings, including the same lockdep
warning. I think there is some weirdness when this stuff gets inlined
into instrumented functions. At least with KCSAN, when any accesses
here are instrumented, and then KCSAN disable/enables interrupts,
things break. So, these functions should never be instrumented,
noinstr or not. Marking them 'inline noinstr' seems like the safest
option. Without CONFIG_PARAVIRT_DEBUG, any compiler should hopefully
inline them?

Thanks,
-- Marco

Marco Elver

unread,
Aug 5, 2020, 1:31:20 PM8/5/20
to Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot
Oh well, it seems that KCSAN on syzbot still crashes even with this
"fix". It's harder to reproduce though, and I don't have a clear
reproducer other than "fuzz the kernel" right now. I think the new IRQ
state tracking code is still not compatible with KCSAN, even though we
thought it would be. Most likely there are still ways to get recursion
lockdep->KCSAN. An alternative would be to deal with the recursion
like we did before, instead of trying to squash all of it. I'll try to
investigate -- Peter, if you have ideas, help is appreciated.

Thanks,
-- Marco

Marco Elver

unread,
Aug 6, 2020, 3:47:32 AM8/6/20
to Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot
On Wed, Aug 05, 2020 at 07:31PM +0200, Marco Elver wrote:
...
> Oh well, it seems that KCSAN on syzbot still crashes even with this
> "fix". It's harder to reproduce though, and I don't have a clear
> reproducer other than "fuzz the kernel" right now. I think the new IRQ
> state tracking code is still not compatible with KCSAN, even though we
> thought it would be. Most likely there are still ways to get recursion
> lockdep->KCSAN. An alternative would be to deal with the recursion
> like we did before, instead of trying to squash all of it. I'll try to
> investigate -- Peter, if you have ideas, help is appreciated.

Testing my hypothesis that raw then nested non-raw
local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
below. This is at least 1 case I can think of that we're bound to hit.

Thanks,
-- Marco

------ >8 ------

diff --git a/init/main.c b/init/main.c
index 15bd0efff3df..0873319dcff4 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1041,6 +1041,22 @@ asmlinkage __visible void __init start_kernel(void)
sfi_init_late();
kcsan_init();

+ /* DEBUG CODE */
+ lockdep_assert_irqs_enabled(); /* Pass. */
+ {
+ unsigned long flags1;
+ raw_local_irq_save(flags1);
+ {
+ unsigned long flags2;
+ lockdep_assert_irqs_enabled(); /* Pass - expectedly blind. */
+ local_irq_save(flags2);
+ lockdep_assert_irqs_disabled(); /* Pass. */
+ local_irq_restore(flags2);
+ }
+ raw_local_irq_restore(flags1);
+ }
+ lockdep_assert_irqs_enabled(); /* FAIL! */
+
/* Do the rest non-__init'ed, we're now alive */
arch_call_rest_init();

pet...@infradead.org

unread,
Aug 6, 2020, 7:32:47 AM8/6/20
to Marco Elver, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot
On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
> Testing my hypothesis that raw then nested non-raw
> local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
> below. This is at least 1 case I can think of that we're bound to hit.

Aaargh!

> diff --git a/init/main.c b/init/main.c
> index 15bd0efff3df..0873319dcff4 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1041,6 +1041,22 @@ asmlinkage __visible void __init start_kernel(void)
> sfi_init_late();
> kcsan_init();
>
> + /* DEBUG CODE */
> + lockdep_assert_irqs_enabled(); /* Pass. */
> + {
> + unsigned long flags1;
> + raw_local_irq_save(flags1);

This disables IRQs but doesn't trace..

> + {
> + unsigned long flags2;
> + lockdep_assert_irqs_enabled(); /* Pass - expectedly blind. */

Indeed, we didn't trace the above disable, so software state is still
on.

> + local_irq_save(flags2);

So here we save IRQ state, and unconditionally disable IRQs and trace
them disabled.

> + lockdep_assert_irqs_disabled(); /* Pass. */
> + local_irq_restore(flags2);

But here, we restore IRQ state to 'disabled' and explicitly trace it
disabled *again* (which is a bit daft, but whatever).

> + }
> + raw_local_irq_restore(flags1);

This then restores the IRQ state to enable, but no tracing.

> + }
> + lockdep_assert_irqs_enabled(); /* FAIL! */

And we're out of sync... :/

/me goes ponder things...

How's something like this then?

---
include/linux/sched.h | 3 ---
kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++---------------
2 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 06ec60462af0..2f5aef57e687 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1193,9 +1193,6 @@ struct task_struct {

#ifdef CONFIG_KCSAN
struct kcsan_ctx kcsan_ctx;
-#ifdef CONFIG_TRACE_IRQFLAGS
- struct irqtrace_events kcsan_save_irqtrace;
-#endif
#endif

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 9147ff6a12e5..9c4436bf0561 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -291,17 +291,50 @@ static inline unsigned int get_delay(void)
0);
}

-void kcsan_save_irqtrace(struct task_struct *task)
+/*
+ * KCSAN hooks are everywhere, which means they're NMI like for interrupt
+ * tracing. In order to present a 'normal' as possible context to the code
+ * called by KCSAN when reporting errors we need to update the irq-tracing
+ * state.
+ *
+ * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
+ * runtime is entered for every memory access, and potentially useful
+ * information is lost if dirtied by KCSAN.
+ */
+
+struct kcsan_irq_state {
+ unsigned long flags;
+#ifdef CONFIG_TRACE_IRQFLAGS
+ int hardirqs;
+ struct irqtrace_events irqtrace;
+#endif
+};
+
+void kcsan_save_irqtrace(struct kcsan_irq_state *irq_state)
{
#ifdef CONFIG_TRACE_IRQFLAGS
- task->kcsan_save_irqtrace = task->irqtrace;
+ irq_state->irqtrace = task->irqtrace;
+ irq_state->hardirq = lockdep_hardirqs_enabled();
#endif
+ if (!kcsan_interrupt_watcher) {
+ raw_local_irq_save(irq_state->flags);
+ lockdep_hardirqs_off(CALLER_ADDR0);
+ }
}

-void kcsan_restore_irqtrace(struct task_struct *task)
+void kcsan_restore_irqtrace(struct kcsan_irq_state *irq_state)
{
+ if (!kcsan_interrupt_watcher) {
+#ifdef CONFIG_TRACE_IRQFLAGS
+ if (irq_state->hardirqs) {
+ lockdep_hardirqs_on_prepare(CALLER_ADDR0);
+ lockdep_hardirqs_on(CALLER_ADDR0);
+ }
+#endif
+ raw_local_irq_restore(irq_state->flags);
+ }
#ifdef CONFIG_TRACE_IRQFLAGS
- task->irqtrace = task->kcsan_save_irqtrace;
+ task->irqtrace = irq_state->irqtrace;
#endif
}

@@ -350,11 +383,13 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
flags = user_access_save();

if (consumed) {
- kcsan_save_irqtrace(current);
+ struct kcsan_irq_state irqstate;
+
+ kcsan_save_irqtrace(&irqstate);
kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_MAYBE,
KCSAN_REPORT_CONSUMED_WATCHPOINT,
watchpoint - watchpoints);
- kcsan_restore_irqtrace(current);
+ kcsan_restore_irqtrace(&irqstate);
} else {
/*
* The other thread may not print any diagnostics, as it has
@@ -387,7 +422,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
unsigned long access_mask;
enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE;
unsigned long ua_flags = user_access_save();
- unsigned long irq_flags = 0;
+ struct kcsan_irq_state irqstate;

/*
* Always reset kcsan_skip counter in slow-path to avoid underflow; see
@@ -412,14 +447,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
goto out;
}

- /*
- * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
- * runtime is entered for every memory access, and potentially useful
- * information is lost if dirtied by KCSAN.
- */
- kcsan_save_irqtrace(current);
- if (!kcsan_interrupt_watcher)
- local_irq_save(irq_flags);
+ kcsan_save_irqtrace(&irqstate);

watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
if (watchpoint == NULL) {
@@ -559,9 +587,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
remove_watchpoint(watchpoint);
kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS);
out_unlock:
- if (!kcsan_interrupt_watcher)
- local_irq_restore(irq_flags);
- kcsan_restore_irqtrace(current);
+ kcsan_restore_irqtrace(&irqstate);
out:
user_access_restore(ua_flags);
}


Marco Elver

unread,
Aug 6, 2020, 9:17:11 AM8/6/20
to pet...@infradead.org, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot
On Thu, Aug 06, 2020 at 01:32PM +0200, pet...@infradead.org wrote:
> On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
> > Testing my hypothesis that raw then nested non-raw
> > local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
> > below. This is at least 1 case I can think of that we're bound to hit.
...
>
> /me goes ponder things...
>
> How's something like this then?
>
> ---
> include/linux/sched.h | 3 ---
> kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++---------------
> 2 files changed, 44 insertions(+), 21 deletions(-)

Thank you! That approach seems to pass syzbot (also with
CONFIG_PARAVIRT) and kcsan-test tests.

I had to modify it some, so that report.c's use of the restore logic
works and not mess up the IRQ trace printed on KCSAN reports (with
CONFIG_KCSAN_VERBOSE).

I still need to fully convince myself all is well now and we don't end
up with more fixes. :-) If it passes further testing, I'll send it as a
real patch (I want to add you as Co-developed-by, but would need your
Signed-off-by for the code you pasted, I think.)

Thanks,
-- Marco

------ >8 ------

diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 9147ff6a12e5..b1d5dca10aa5 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -4,6 +4,7 @@
#include <linux/bug.h>
#include <linux/delay.h>
#include <linux/export.h>
+#include <linux/ftrace.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/list.h>
@@ -291,13 +292,28 @@ static inline unsigned int get_delay(void)
0);
}

-void kcsan_save_irqtrace(struct task_struct *task)
-{
+/*
+ * KCSAN instrumentation is everywhere, which means we must treat the hooks
+ * NMI-like for interrupt tracing. In order to present a 'normal' as possible
+ * context to the code called by KCSAN when reporting errors we need to update
+ * the IRQ-tracing state.
+ *
+ * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
+ * runtime is entered for every memory access, and potentially useful
+ * information is lost if dirtied by KCSAN.
+ */
+
+struct kcsan_irq_state {
+ unsigned long flags;
#ifdef CONFIG_TRACE_IRQFLAGS
- task->kcsan_save_irqtrace = task->irqtrace;
+ int hardirqs;
#endif
-}
+};

+/*
+ * This is also called by the reporting task for the other task, to generate the
+ * right report with CONFIG_KCSAN_VERBOSE. No harm in restoring more than once.
+ */
void kcsan_restore_irqtrace(struct task_struct *task)
{
#ifdef CONFIG_TRACE_IRQFLAGS
@@ -305,6 +321,34 @@ void kcsan_restore_irqtrace(struct task_struct *task)
#endif
}

+static void kcsan_irq_save(struct kcsan_irq_state *irq_state) {
+#ifdef CONFIG_TRACE_IRQFLAGS
+ current->kcsan_save_irqtrace = current->irqtrace;
+ irq_state->hardirqs = lockdep_hardirqs_enabled();
+#endif
+ if (!kcsan_interrupt_watcher) {
+ raw_local_irq_save(irq_state->flags);
+ kcsan_disable_current(); /* Lockdep might WARN. */
+ lockdep_hardirqs_off(CALLER_ADDR0);
+ kcsan_enable_current();
+ }
+}
+
+static void kcsan_irq_restore(struct kcsan_irq_state *irq_state) {
+ if (!kcsan_interrupt_watcher) {
+#ifdef CONFIG_TRACE_IRQFLAGS
+ if (irq_state->hardirqs) {
+ kcsan_disable_current(); /* Lockdep might WARN. */
+ lockdep_hardirqs_on_prepare(CALLER_ADDR0);
+ lockdep_hardirqs_on(CALLER_ADDR0);
+ kcsan_enable_current();
+ }
+#endif
+ raw_local_irq_restore(irq_state->flags);
+ }
+ kcsan_restore_irqtrace(current);
+}
+
/*
* Pull everything together: check_access() below contains the performance
* critical operations; the fast-path (including check_access) functions should
@@ -350,11 +394,13 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
flags = user_access_save();

if (consumed) {
- kcsan_save_irqtrace(current);
+ struct kcsan_irq_state irqstate;
+
+ kcsan_irq_save(&irqstate);
kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_MAYBE,
KCSAN_REPORT_CONSUMED_WATCHPOINT,
watchpoint - watchpoints);
- kcsan_restore_irqtrace(current);
+ kcsan_irq_restore(&irqstate);
} else {
/*
* The other thread may not print any diagnostics, as it has
@@ -387,7 +433,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
unsigned long access_mask;
enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE;
unsigned long ua_flags = user_access_save();
- unsigned long irq_flags = 0;
+ struct kcsan_irq_state irqstate;

/*
* Always reset kcsan_skip counter in slow-path to avoid underflow; see
@@ -412,14 +458,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
goto out;
}

- /*
- * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
- * runtime is entered for every memory access, and potentially useful
- * information is lost if dirtied by KCSAN.
- */
- kcsan_save_irqtrace(current);
- if (!kcsan_interrupt_watcher)
- local_irq_save(irq_flags);
+ kcsan_irq_save(&irqstate);

watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
if (watchpoint == NULL) {
@@ -559,9 +598,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
remove_watchpoint(watchpoint);
kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS);
out_unlock:
- if (!kcsan_interrupt_watcher)
- local_irq_restore(irq_flags);
- kcsan_restore_irqtrace(current);
+ kcsan_irq_restore(&irqstate);
out:
user_access_restore(ua_flags);
}
diff --git a/kernel/kcsan/kcsan.h b/kernel/kcsan/kcsan.h
index 29480010dc30..6eb35a9514d8 100644
--- a/kernel/kcsan/kcsan.h
+++ b/kernel/kcsan/kcsan.h
@@ -24,9 +24,8 @@ extern unsigned int kcsan_udelay_interrupt;
extern bool kcsan_enabled;

/*
- * Save/restore IRQ flags state trace dirtied by KCSAN.
+ * Restore IRQ flags state trace dirtied by KCSAN.
*/
-void kcsan_save_irqtrace(struct task_struct *task);
void kcsan_restore_irqtrace(struct task_struct *task);

/*

Marco Elver

unread,
Aug 6, 2020, 12:07:00 PM8/6/20
to Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot
On Thu, 6 Aug 2020 at 15:17, Marco Elver <el...@google.com> wrote:
>
> On Thu, Aug 06, 2020 at 01:32PM +0200, pet...@infradead.org wrote:
> > On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
> > > Testing my hypothesis that raw then nested non-raw
> > > local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
> > > below. This is at least 1 case I can think of that we're bound to hit.
> ...
> >
> > /me goes ponder things...
> >
> > How's something like this then?
> >
> > ---
> > include/linux/sched.h | 3 ---
> > kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++---------------
> > 2 files changed, 44 insertions(+), 21 deletions(-)
>
> Thank you! That approach seems to pass syzbot (also with
> CONFIG_PARAVIRT) and kcsan-test tests.
>
> I had to modify it some, so that report.c's use of the restore logic
> works and not mess up the IRQ trace printed on KCSAN reports (with
> CONFIG_KCSAN_VERBOSE).
>
> I still need to fully convince myself all is well now and we don't end
> up with more fixes. :-) If it passes further testing, I'll send it as a
> real patch (I want to add you as Co-developed-by, but would need your
> Signed-off-by for the code you pasted, I think.)

With CONFIG_PARAVIRT=y (without the notrace->noinstr patch), I still
get lockdep DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()), although
it takes longer for syzbot to hit them. But I think that's expected
because we can still get the recursion that I pointed out, and will
need that patch.

I also get some "BUG: MAX_LOCKDEP_CHAINS too low!" on syzbot (KCSAN is
not in the stacktrace). Although it may be unrelated:
https://lore.kernel.org/lkml/00000000000056...@google.com/
-- when are they expected?

Thanks,
-- Marco

kernel test robot

unread,
Aug 6, 2020, 5:03:13 PM8/6/20
to Marco Elver, b...@alien8.de, dave....@linux.intel.com, fengh...@intel.com, h...@zytor.com, linux-...@vger.kernel.org, mi...@redhat.com, syzkall...@googlegroups.com, tg...@linutronix.de, tony...@intel.com, x...@kernel.org, kbuil...@lists.01.org
Hi Marco,

I love your patch! Perhaps something to improve:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on v5.8 next-20200806]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Marco-Elver/x86-paravirt-Add-missing-noinstr-to-arch_local-helpers/20200806-040134
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ef2ff0f5d6008d325c9a068e20981c0d0acc4d6b
config: x86_64-rhel (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# save the attached .config to linux build tree
make W=1 ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <l...@intel.com>

All warnings (new ones prefixed by >>):

In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from include/linux/mutex.h:19,
from include/linux/notifier.h:14,
from include/linux/clk.h:14,
from drivers/opp/core.c:13:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
--
In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/mm.h:10,
from include/linux/scatterlist.h:8,
from include/linux/hyperv.h:18,
from drivers/hv/hyperv_vmbus.h:19,
from drivers/hv/hv_trace.c:3:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
In file included from drivers/hv/hv_trace.h:346,
from drivers/hv/hv_trace.c:6:
include/trace/define_trace.h:95:42: fatal error: ./hv_trace.h: No such file or directory
95 | #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
| ^
compilation terminated.
--
In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/jiffies.h:9,
from drivers/hv/hv_balloon.c:12:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
In file included from drivers/hv/hv_trace_balloon.h:48,
from drivers/hv/hv_balloon.c:31:
include/trace/define_trace.h:95:42: fatal error: ./hv_trace_balloon.h: No such file or directory
95 | #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
| ^
compilation terminated.
--
In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/input.h:8,
from drivers/platform/x86/msi-wmi.c:13:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
drivers/platform/x86/msi-wmi.c: In function 'msi_wmi_query_block':
drivers/platform/x86/msi-wmi.c:93:14: warning: variable 'status' set but not used [-Wunused-but-set-variable]
93 | acpi_status status;
| ^~~~~~
--
In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/stat.h:19,
from include/linux/module.h:13,
from drivers/platform/x86/panasonic-laptop.c:107:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
drivers/platform/x86/panasonic-laptop.c: In function 'acpi_pcc_retrieve_biosdata':
drivers/platform/x86/panasonic-laptop.c:290:35: warning: suggest braces around empty body in an 'else' statement [-Wempty-body]
290 | "Invalid HKEY.SINF data\n"));
| ^
drivers/platform/x86/panasonic-laptop.c: In function 'acpi_pcc_generate_keyinput':
drivers/platform/x86/panasonic-laptop.c:462:45: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
462 | "Unknown hotkey event: %d\n", result));
| ^
--
In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/wait.h:9,
from include/linux/wait_bit.h:8,
from include/linux/fs.h:6,
from include/linux/debugfs.h:15,
from drivers/platform/x86/intel_ips.c:48:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
drivers/platform/x86/intel_ips.c: In function 'read_mgtv':
drivers/platform/x86/intel_ips.c:832:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
832 | u16 ret;
| ^~~
--
In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/percpu.h:6,
from include/linux/context_tracking_state.h:5,
from include/linux/hardirq.h:5,
from include/linux/interrupt.h:11,
from drivers/block/rsxx/core.c:13:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
drivers/block/rsxx/core.c:393:8: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
393 | static const char * const rsxx_card_state_to_str(unsigned int state)
| ^~~~~
--
In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/stat.h:19,
from include/linux/module.h:13,
from drivers/misc/sgi-xp/xp_main.c:17:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
drivers/misc/sgi-xp/xp_main.c:227:1: warning: no previous prototype for 'xp_init' [-Wmissing-prototypes]
227 | xp_init(void)
| ^~~~~~~
drivers/misc/sgi-xp/xp_main.c:250:1: warning: no previous prototype for 'xp_exit' [-Wmissing-prototypes]
250 | xp_exit(void)
| ^~~~~~~
--
In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from drivers/misc/sgi-gru/grufault.c:16:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
drivers/misc/sgi-gru/grufault.c: In function 'gru_try_dropin':
drivers/misc/sgi-gru/grufault.c:361:54: warning: variable 'indexway' set but not used [-Wunused-but-set-variable]
361 | int pageshift = 0, asid, write, ret, atomic = !cbk, indexway;
| ^~~~~~~~
--
In file included from arch/x86/include/asm/msr.h:257,
from arch/x86/include/asm/processor.h:22,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/seqlock.h:36,
from include/linux/time.h:6,
from include/linux/stat.h:19,
from include/linux/module.h:13,
from drivers/cpufreq/acpi-cpufreq.c:14:
>> arch/x86/include/asm/paravirt.h:764:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
764 | {
| ^
arch/x86/include/asm/paravirt.h:769:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
769 | {
| ^
arch/x86/include/asm/paravirt.h:774:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
774 | {
| ^
arch/x86/include/asm/paravirt.h:779:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
779 | {
| ^
arch/x86/include/asm/paravirt.h:784:1: warning: ignoring attribute 'noinline' because it conflicts with attribute 'gnu_inline' [-Wattributes]
784 | {
| ^
drivers/cpufreq/acpi-cpufreq.c: In function 'cpu_freq_read_intel':
drivers/cpufreq/acpi-cpufreq.c:247:11: warning: variable 'dummy' set but not used [-Wunused-but-set-variable]
247 | u32 val, dummy;
| ^~~~~
drivers/cpufreq/acpi-cpufreq.c: In function 'cpu_freq_read_amd':
drivers/cpufreq/acpi-cpufreq.c:264:11: warning: variable 'dummy' set but not used [-Wunused-but-set-variable]
264 | u32 val, dummy;
| ^~~~~
..

vim +764 arch/x86/include/asm/paravirt.h

2e47d3e6c35bb5b include/asm-x86/paravirt.h Glauber de Oliveira Costa 2008-01-30 724
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 725 /*
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 726 * Generate a thunk around a function which saves all caller-save
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 727 * registers except for the return value. This allows C functions to
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 728 * be called from assembler code where fewer than normal registers are
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 729 * available. It may also help code generation around calls from C
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 730 * code if the common case doesn't use many registers.
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 731 *
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 732 * When a callee is wrapped in a thunk, the caller can assume that all
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 733 * arg regs and all scratch registers are preserved across the
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 734 * call. The return value in rax/eax will not be saved, even for void
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 735 * functions.
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 736 */
87b240cbe3e51bf arch/x86/include/asm/paravirt.h Josh Poimboeuf 2016-01-21 737 #define PV_THUNK_NAME(func) "__raw_callee_save_" #func
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 738 #define PV_CALLEE_SAVE_REGS_THUNK(func) \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 739 extern typeof(func) __raw_callee_save_##func; \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 740 \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 741 asm(".pushsection .text;" \
87b240cbe3e51bf arch/x86/include/asm/paravirt.h Josh Poimboeuf 2016-01-21 742 ".globl " PV_THUNK_NAME(func) ";" \
87b240cbe3e51bf arch/x86/include/asm/paravirt.h Josh Poimboeuf 2016-01-21 743 ".type " PV_THUNK_NAME(func) ", @function;" \
87b240cbe3e51bf arch/x86/include/asm/paravirt.h Josh Poimboeuf 2016-01-21 744 PV_THUNK_NAME(func) ":" \
87b240cbe3e51bf arch/x86/include/asm/paravirt.h Josh Poimboeuf 2016-01-21 745 FRAME_BEGIN \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 746 PV_SAVE_ALL_CALLER_REGS \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 747 "call " #func ";" \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 748 PV_RESTORE_ALL_CALLER_REGS \
87b240cbe3e51bf arch/x86/include/asm/paravirt.h Josh Poimboeuf 2016-01-21 749 FRAME_END \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 750 "ret;" \
083db6764821996 arch/x86/include/asm/paravirt.h Josh Poimboeuf 2019-07-17 751 ".size " PV_THUNK_NAME(func) ", .-" PV_THUNK_NAME(func) ";" \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 752 ".popsection")
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 753
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 754 /* Get a reference to a callee-save function */
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 755 #define PV_CALLEE_SAVE(func) \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 756 ((struct paravirt_callee_save) { __raw_callee_save_##func })
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 757
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 758 /* Promise that "func" already uses the right calling convention */
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 759 #define __PV_IS_CALLEE_SAVE(func) \
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 760 ((struct paravirt_callee_save) { func })
ecb93d1ccd0aac6 arch/x86/include/asm/paravirt.h Jeremy Fitzhardinge 2009-01-28 761
6da63eb241a05b0 arch/x86/include/asm/paravirt.h Juergen Gross 2018-08-28 762 #ifdef CONFIG_PARAVIRT_XXL
688d9af14814db8 arch/x86/include/asm/paravirt.h Marco Elver 2020-08-05 763 static inline noinstr unsigned long arch_local_save_flags(void)
139ec7c416248b9 include/asm-i386/paravirt.h Rusty Russell 2006-12-07 @764 {
5c83511bdb9832c arch/x86/include/asm/paravirt.h Juergen Gross 2018-08-28 765 return PVOP_CALLEE0(unsigned long, irq.save_fl);
139ec7c416248b9 include/asm-i386/paravirt.h Rusty Russell 2006-12-07 766 }
139ec7c416248b9 include/asm-i386/paravirt.h Rusty Russell 2006-12-07 767

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuil...@lists.01.org
.config.gz

Marco Elver

unread,
Aug 7, 2020, 5:01:42 AM8/7/20
to Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, jgr...@suse.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
On Thu, 6 Aug 2020 at 18:06, Marco Elver <el...@google.com> wrote:
> On Thu, 6 Aug 2020 at 15:17, Marco Elver <el...@google.com> wrote:
> > On Thu, Aug 06, 2020 at 01:32PM +0200, pet...@infradead.org wrote:
> > > On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
> > > > Testing my hypothesis that raw then nested non-raw
> > > > local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
> > > > below. This is at least 1 case I can think of that we're bound to hit.
> > ...
> > >
> > > /me goes ponder things...
> > >
> > > How's something like this then?
> > >
> > > ---
> > > include/linux/sched.h | 3 ---
> > > kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++---------------
> > > 2 files changed, 44 insertions(+), 21 deletions(-)
> >
> > Thank you! That approach seems to pass syzbot (also with
> > CONFIG_PARAVIRT) and kcsan-test tests.
> >
> > I had to modify it some, so that report.c's use of the restore logic
> > works and not mess up the IRQ trace printed on KCSAN reports (with
> > CONFIG_KCSAN_VERBOSE).
> >
> > I still need to fully convince myself all is well now and we don't end
> > up with more fixes. :-) If it passes further testing, I'll send it as a
> > real patch (I want to add you as Co-developed-by, but would need your
> > Signed-off-by for the code you pasted, I think.)

I let it run on syzbot through the night, and it's fine without
PARAVIRT (see below). I have sent the patch (need your Signed-off-by
as it's based on your code, thank you!):
https://lkml.kernel.org/r/20200807090031....@google.com

> With CONFIG_PARAVIRT=y (without the notrace->noinstr patch), I still
> get lockdep DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()), although
> it takes longer for syzbot to hit them. But I think that's expected
> because we can still get the recursion that I pointed out, and will
> need that patch.

Never mind, I get these warnings even if I don't turn on KCSAN
(CONFIG_KCSAN=n). Something else is going on with PARAVIRT=y that
throws off IRQ state tracking. :-/

Thanks,
-- Marco

Jürgen Groß

unread,
Aug 7, 2020, 5:24:28 AM8/7/20
to Marco Elver, Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
What are the settings of CONFIG_PARAVIRT_XXL and
CONFIG_PARAVIRT_SPINLOCKS in this case?


Juergen

Marco Elver

unread,
Aug 7, 2020, 5:50:42 AM8/7/20
to Jürgen Groß, Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
I attached a config.

$> grep PARAVIRT .config
CONFIG_PARAVIRT=y
CONFIG_PARAVIRT_XXL=y
# CONFIG_PARAVIRT_DEBUG is not set
CONFIG_PARAVIRT_SPINLOCKS=y
# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
CONFIG_PARAVIRT_CLOCK=y

Thanks,
-- Marco
.config

Jürgen Groß

unread,
Aug 7, 2020, 6:35:23 AM8/7/20
to Marco Elver, Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
Anything special I need to do to reproduce the problem? Or would you be
willing to do some more rounds with different config settings?

I think CONFIG_PARAVIRT_XXL shouldn't matter, but I'm not completely
sure about that. CONFIG_PARAVIRT_SPINLOCKS would be my primary suspect.


Juergen

Marco Elver

unread,
Aug 7, 2020, 7:38:47 AM8/7/20
to Jürgen Groß, Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
I can only test it with syzkaller, but that probably doesn't help if you
don't already have it set up. It can't seem to find a C reproducer.

I did some more rounds with different configs.

> I think CONFIG_PARAVIRT_XXL shouldn't matter, but I'm not completely
> sure about that. CONFIG_PARAVIRT_SPINLOCKS would be my primary suspect.

Yes, PARAVIRT_XXL doesn't make a different. When disabling
PARAVIRT_SPINLOCKS, however, the warnings go away.

Thanks,
-- Marco

Jürgen Groß

unread,
Aug 7, 2020, 8:04:24 AM8/7/20
to Marco Elver, Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
Thanks for testing!

I take it you are doing the tests in a KVM guest?

If so I have a gut feeling that the use of local_irq_save() and
local_irq_restore() in kvm_wait() might be fishy. I might be completely
wrong here, though.

BTW, I think Xen's variant of pv spinlocks is fine (no playing with IRQ
on/off).

Hyper-V seems to do the same as KVM, and kicking another vcpu could be
problematic as well, as it is just using IPI.


Juergen

Marco Elver

unread,
Aug 7, 2020, 8:08:25 AM8/7/20
to Jürgen Groß, Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
Yes, correct.

> If so I have a gut feeling that the use of local_irq_save() and
> local_irq_restore() in kvm_wait() might be fishy. I might be completely
> wrong here, though.

Happy to help debug more, although I might need patches or pointers
what to play with.

Marco Elver

unread,
Aug 7, 2020, 11:19:11 AM8/7/20
to Jürgen Groß, Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
I experimented a bit more, and the below patch seems to solve the
warnings. However, that was based on your pointer about kvm_wait(), and
I can't quite tell if it is the right solution.

My hypothesis here is simply that kvm_wait() may be called in a place
where we get the same case I mentioned to Peter,

raw_local_irq_save(); /* or other IRQs off without tracing */
...
kvm_wait() /* IRQ state tracing gets confused */
...
raw_local_irq_restore();

and therefore, using raw variants in kvm_wait() works. It's also safe
because it doesn't call any other libraries that would result in corrupt
IRQ state AFAIK.

Thanks,
-- Marco

------ >8 ------

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 233c77d056c9..1d412d1466f0 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -797,7 +797,7 @@ static void kvm_wait(u8 *ptr, u8 val)
if (in_nmi())
return;

- local_irq_save(flags);
+ raw_local_irq_save(flags);

if (READ_ONCE(*ptr) != val)
goto out;
@@ -810,10 +810,10 @@ static void kvm_wait(u8 *ptr, u8 val)
if (arch_irqs_disabled_flags(flags))
halt();
else
- safe_halt();
+ raw_safe_halt();

out:
- local_irq_restore(flags);
+ raw_local_irq_restore(flags);
}

#ifdef CONFIG_X86_32

Marco Elver

unread,
Aug 11, 2020, 3:00:45 AM8/11/20
to Jürgen Groß, Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
On Fri, 7 Aug 2020 at 17:19, Marco Elver <el...@google.com> wrote:
> On Fri, Aug 07, 2020 at 02:08PM +0200, Marco Elver wrote:
> > On Fri, 7 Aug 2020 at 14:04, Jürgen Groß <jgr...@suse.com> wrote:
> > >
> > > On 07.08.20 13:38, Marco Elver wrote:
> > > > On Fri, Aug 07, 2020 at 12:35PM +0200, Jürgen Groß wrote:
...
Just to follow-up, it'd still be nice to fix this. Suggestions?

I could send the below as a patch, but can only go off my above
hypothesis and the fact that syzbot is happier, so not entirely
convincing.

Jürgen Groß

unread,
Aug 11, 2020, 3:04:50 AM8/11/20
to Marco Elver, Peter Zijlstra, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu
Peter has told me via IRC he will look soon further into this.

Your finding suggests that the pv-lock implementation for Hyper-V
needs some tweaking, too. For that purpose I'm adding Wei to Cc.


Juergen

Peter Zijlstra

unread,
Aug 11, 2020, 3:41:39 AM8/11/20
to Marco Elver, Jürgen Groß, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney
On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote:

> My hypothesis here is simply that kvm_wait() may be called in a place
> where we get the same case I mentioned to Peter,
>
> raw_local_irq_save(); /* or other IRQs off without tracing */
> ...
> kvm_wait() /* IRQ state tracing gets confused */
> ...
> raw_local_irq_restore();
>
> and therefore, using raw variants in kvm_wait() works. It's also safe
> because it doesn't call any other libraries that would result in corrupt

Yes, this is definitely an issue.

Tracing, we also musn't call into tracing when using raw_local_irq_*().
Because then we re-intoduce this same issue all over again.

Both halt() and safe_halt() are more paravirt calls, but given we're in
a KVM paravirt call already, I suppose we can directly use native_*()
here.

Something like so then... I suppose, but then the Xen variants need TLC
too.

---
arch/x86/include/asm/irqflags.h | 4 ++--
arch/x86/include/asm/kvm_para.h | 18 +++++++++---------
arch/x86/kernel/kvm.c | 14 +++++++-------
3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 02a0cf547d7b..7c614db25274 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -54,13 +54,13 @@ static __always_inline void native_irq_enable(void)
asm volatile("sti": : :"memory");
}

-static inline __cpuidle void native_safe_halt(void)
+static __always_inline __cpuidle void native_safe_halt(void)
{
mds_idle_clear_cpu_buffers();
asm volatile("sti; hlt": : :"memory");
}

-static inline __cpuidle void native_halt(void)
+static __always_inline __cpuidle void native_halt(void)
{
mds_idle_clear_cpu_buffers();
asm volatile("hlt": : :"memory");
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 49d3a9edb06f..90f7ea58ebb0 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -30,7 +30,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
* noted by the particular hypercall.
*/

-static inline long kvm_hypercall0(unsigned int nr)
+static __always_inline long kvm_hypercall0(unsigned int nr)
{
long ret;
asm volatile(KVM_HYPERCALL
@@ -40,7 +40,7 @@ static inline long kvm_hypercall0(unsigned int nr)
return ret;
}

-static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
+static __always_inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
{
long ret;
asm volatile(KVM_HYPERCALL
@@ -50,8 +50,8 @@ static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
return ret;
}

-static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
- unsigned long p2)
+static __always_inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
+ unsigned long p2)
{
long ret;
asm volatile(KVM_HYPERCALL
@@ -61,8 +61,8 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
return ret;
}

-static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
- unsigned long p2, unsigned long p3)
+static __always_inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3)
{
long ret;
asm volatile(KVM_HYPERCALL
@@ -72,9 +72,9 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
return ret;
}

-static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
- unsigned long p2, unsigned long p3,
- unsigned long p4)
+static __always_inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
+ unsigned long p2, unsigned long p3,
+ unsigned long p4)
{
long ret;
asm volatile(KVM_HYPERCALL
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 233c77d056c9..15f8dfd8812d 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -779,7 +779,7 @@ arch_initcall(kvm_alloc_cpumask);
#ifdef CONFIG_PARAVIRT_SPINLOCKS

/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
-static void kvm_kick_cpu(int cpu)
+static notrace kvm_kick_cpu(int cpu)
{
int apicid;
unsigned long flags = 0;
@@ -790,14 +790,14 @@ static void kvm_kick_cpu(int cpu)

#include <asm/qspinlock.h>

-static void kvm_wait(u8 *ptr, u8 val)
+static notrace kvm_wait(u8 *ptr, u8 val)
{
unsigned long flags;

if (in_nmi())
return;

- local_irq_save(flags);
+ raw_local_irq_save(flags);

if (READ_ONCE(*ptr) != val)
goto out;
@@ -808,16 +808,16 @@ static void kvm_wait(u8 *ptr, u8 val)
* in irq spinlock slowpath and no spurious interrupt occur to save us.
*/
if (arch_irqs_disabled_flags(flags))
- halt();
+ native_halt();
else
- safe_halt();
+ native_safe_halt();

out:
- local_irq_restore(flags);
+ raw_local_irq_restore(flags);
}

#ifdef CONFIG_X86_32
-__visible bool __kvm_vcpu_is_preempted(long cpu)
+__visible notrace bool __kvm_vcpu_is_preempted(long cpu)
{
struct kvm_steal_time *src = &per_cpu(steal_time, cpu);


Jürgen Groß

unread,
Aug 11, 2020, 3:57:57 AM8/11/20
to Peter Zijlstra, Marco Elver, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu
On 11.08.20 09:41, Peter Zijlstra wrote:
> On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote:
>
>> My hypothesis here is simply that kvm_wait() may be called in a place
>> where we get the same case I mentioned to Peter,
>>
>> raw_local_irq_save(); /* or other IRQs off without tracing */
>> ...
>> kvm_wait() /* IRQ state tracing gets confused */
>> ...
>> raw_local_irq_restore();
>>
>> and therefore, using raw variants in kvm_wait() works. It's also safe
>> because it doesn't call any other libraries that would result in corrupt
>
> Yes, this is definitely an issue.
>
> Tracing, we also musn't call into tracing when using raw_local_irq_*().
> Because then we re-intoduce this same issue all over again.
>
> Both halt() and safe_halt() are more paravirt calls, but given we're in
> a KVM paravirt call already, I suppose we can directly use native_*()
> here.
>
> Something like so then... I suppose, but then the Xen variants need TLC
> too.

Just to be sure I understand you correct:

You mean that xen_qlock_kick() and xen_qlock_wait() and all functions
called by those should gain the "notrace" attribute, right?

I am not sure why the kick variants need it, though. IMO those are
called only after the lock has been released, so they should be fine
without notrace.

And again: we shouldn't forget the Hyper-V variants.


Juergen

Peter Zijlstra

unread,
Aug 11, 2020, 4:12:12 AM8/11/20
to Jürgen Groß, Marco Elver, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu
The issue happens when someone uses arch_spinlock_t under
raw_local_irq_*().

> And again: we shouldn't forget the Hyper-V variants.

Bah, my grep failed :/ Also *groan*, that's calling apic->send_IPI().


Jürgen Groß

unread,
Aug 11, 2020, 4:18:53 AM8/11/20
to Peter Zijlstra, Marco Elver, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu
Ah, okay.

>
>> And again: we shouldn't forget the Hyper-V variants.
>
> Bah, my grep failed :/ Also *groan*, that's calling apic->send_IPI().

I've seen that, too. :-(


Juergen

Jürgen Groß

unread,
Aug 11, 2020, 4:38:52 AM8/11/20
to Peter Zijlstra, Marco Elver, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu
On 11.08.20 10:12, Peter Zijlstra wrote:
In case you don't want to do it I can send the patch for the Xen
variants.


Juergen

pet...@infradead.org

unread,
Aug 11, 2020, 5:21:02 AM8/11/20
to Jürgen Groß, Marco Elver, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu
On Tue, Aug 11, 2020 at 10:38:50AM +0200, Jürgen Groß wrote:
> In case you don't want to do it I can send the patch for the Xen
> variants.

I might've opened a whole new can of worms here. I'm not sure we
can/want to fix the entire fallout this release :/

Let me ponder this a little, because the more I look at things, the more
problems I keep finding... bah bah bah.

pet...@infradead.org

unread,
Aug 11, 2020, 5:47:02 AM8/11/20
to Jürgen Groß, Marco Elver, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu, Steven Rostedt
That is, most of these irq-tracking problem are new because commit:

859d069ee1dd ("lockdep: Prepare for NMI IRQ state tracking")

changed irq-tracking to ignore the lockdep recursion count.

This then allows:

lock_acquire()
raw_local_irq_save();
current->lockdep_recursion++;
trace_lock_acquire()
... tracing ...
#PF under raw_local_irq_*()

__lock_acquire()
arch_spin_lock(&graph_lock)
pv-spinlock-wait()
local_irq_save() under raw_local_irq_*()

However afaict that just made a bad situation worse. There already were
issues, take for example:

trace_clock_global()
raw_local_irq_save();
arch_spin_lock()
pv-spinlock-wait
local_irq_save()

And that has no lockdep_recursion to 'save' the say.

The tracing recursion does however avoid some of the obvious fails
there, like trace_clock calling into paravirt which then calls back into
tracing. But still, that would've caused IRQ tracking problems even with
the old code.

And in that respect, this is all the exact same problem as that other
set of patches has ( 202008071923...@infradead.org ).

Now, on the flip side, it does find actual problems, the trace_lock_*()
things were using RCU in RCU-disabled code, and here I found that
trace_clock_global() thinkg (and I suspect there's more of that).

But at this point I'm not entirelty sure how best to proceed... tracing
uses arch_spinlock_t, which means all spinlock implementations should be
notrace, but then that drops into paravirt and all hell breaks loose
because Hyper-V then calls into the APIC code etc.. etc..

At that rate we'll have the entire kernel marked notrace, and I'm fairly
sure that's not a solution either.

So let me once again see if I can't find a better solution for this all.
Clearly it needs one :/

pet...@infradead.org

unread,
Aug 11, 2020, 4:18:11 PM8/11/20
to Jürgen Groß, Marco Elver, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu, Steven Rostedt
On Tue, Aug 11, 2020 at 11:46:51AM +0200, pet...@infradead.org wrote:

> So let me once again see if I can't find a better solution for this all.
> Clearly it needs one :/

So the below boots without triggering the debug code from Marco -- it
should allow nesting local_irq_save/restore under raw_local_irq_*().

I tried unconditional counting, but there's some _reallly_ wonky /
asymmetric code that wrecks that and I've not been able to come up with
anything useful.

This one starts counting when local_irq_save() finds it didn't disable
IRQs while lockdep though it did. At that point, local_irq_restore()
will decrement and enable things again when it reaches 0.

This assumes local_irq_save()/local_irq_restore() are nested sane, which
is mostly true.

This leaves #PF, which I fixed in these other patches, but I realized it
needs fixing for all architectures :-( No bright ideas there yet.

---
arch/x86/entry/thunk_32.S | 5 ----
include/linux/irqflags.h | 45 +++++++++++++++++++-------------
init/main.c | 16 ++++++++++++
kernel/locking/lockdep.c | 58 +++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace_preemptirq.c | 33 +++++++++++++++++++++++
5 files changed, 134 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/thunk_32.S b/arch/x86/entry/thunk_32.S
index 3a07ce3ec70b..f1f96d4d8cd6 100644
--- a/arch/x86/entry/thunk_32.S
+++ b/arch/x86/entry/thunk_32.S
@@ -29,11 +29,6 @@ SYM_CODE_START_NOALIGN(\name)
SYM_CODE_END(\name)
.endm

-#ifdef CONFIG_TRACE_IRQFLAGS
- THUNK trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1
- THUNK trace_hardirqs_off_thunk,trace_hardirqs_off_caller,1
-#endif
-
#ifdef CONFIG_PREEMPTION
THUNK preempt_schedule_thunk, preempt_schedule
THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index bd5c55755447..67e2a16d3846 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -23,12 +23,16 @@
extern void lockdep_hardirqs_on_prepare(unsigned long ip);
extern void lockdep_hardirqs_on(unsigned long ip);
extern void lockdep_hardirqs_off(unsigned long ip);
+ extern void lockdep_hardirqs_save(unsigned long ip, unsigned long flags);
+ extern void lockdep_hardirqs_restore(unsigned long ip, unsigned long flags);
#else
static inline void lockdep_softirqs_on(unsigned long ip) { }
static inline void lockdep_softirqs_off(unsigned long ip) { }
static inline void lockdep_hardirqs_on_prepare(unsigned long ip) { }
static inline void lockdep_hardirqs_on(unsigned long ip) { }
static inline void lockdep_hardirqs_off(unsigned long ip) { }
+ static inline void lockdep_hardirqs_save(unsigned long ip, unsigned long flags) { }
+ static inline void lockdep_hardirqs_restore(unsigned long ip, unsigned long flags) { }
#endif

#ifdef CONFIG_TRACE_IRQFLAGS
@@ -49,10 +53,13 @@ struct irqtrace_events {
DECLARE_PER_CPU(int, hardirqs_enabled);
DECLARE_PER_CPU(int, hardirq_context);

- extern void trace_hardirqs_on_prepare(void);
- extern void trace_hardirqs_off_finish(void);
- extern void trace_hardirqs_on(void);
- extern void trace_hardirqs_off(void);
+extern void trace_hardirqs_on_prepare(void);
+extern void trace_hardirqs_off_finish(void);
+extern void trace_hardirqs_on(void);
+extern void trace_hardirqs_off(void);
+extern void trace_hardirqs_save(unsigned long flags);
+extern void trace_hardirqs_restore(unsigned long flags);
+
# define lockdep_hardirq_context() (this_cpu_read(hardirq_context))
# define lockdep_softirq_context(p) ((p)->softirq_context)
# define lockdep_hardirqs_enabled() (this_cpu_read(hardirqs_enabled))
@@ -120,17 +127,19 @@ do { \
#else
# define trace_hardirqs_on_prepare() do { } while (0)
# define trace_hardirqs_off_finish() do { } while (0)
-# define trace_hardirqs_on() do { } while (0)
-# define trace_hardirqs_off() do { } while (0)
-# define lockdep_hardirq_context() 0
-# define lockdep_softirq_context(p) 0
-# define lockdep_hardirqs_enabled() 0
-# define lockdep_softirqs_enabled(p) 0
-# define lockdep_hardirq_enter() do { } while (0)
-# define lockdep_hardirq_threaded() do { } while (0)
-# define lockdep_hardirq_exit() do { } while (0)
-# define lockdep_softirq_enter() do { } while (0)
-# define lockdep_softirq_exit() do { } while (0)
+# define trace_hardirqs_on() do { } while (0)
+# define trace_hardirqs_off() do { } while (0)
+# define trace_hardirqs_save(flags) do { } while (0)
+# define trace_hardirqs_restore(flags) do { } while (0)
+# define lockdep_hardirq_context() 0
+# define lockdep_softirq_context(p) 0
+# define lockdep_hardirqs_enabled() 0
+# define lockdep_softirqs_enabled(p) 0
+# define lockdep_hardirq_enter() do { } while (0)
+# define lockdep_hardirq_threaded() do { } while (0)
+# define lockdep_hardirq_exit() do { } while (0)
+# define lockdep_softirq_enter() do { } while (0)
+# define lockdep_softirq_exit() do { } while (0)
# define lockdep_hrtimer_enter(__hrtimer) false
# define lockdep_hrtimer_exit(__context) do { } while (0)
# define lockdep_posixtimer_enter() do { } while (0)
@@ -185,18 +194,18 @@ do { \
do { trace_hardirqs_on(); raw_local_irq_enable(); } while (0)
#define local_irq_disable() \
do { raw_local_irq_disable(); trace_hardirqs_off(); } while (0)
+
#define local_irq_save(flags) \
do { \
raw_local_irq_save(flags); \
- trace_hardirqs_off(); \
+ trace_hardirqs_save(flags); \
} while (0)

-
#define local_irq_restore(flags) \
do { \
if (raw_irqs_disabled_flags(flags)) { \
raw_local_irq_restore(flags); \
- trace_hardirqs_off(); \
+ trace_hardirqs_restore(flags); \
} else { \
trace_hardirqs_on(); \
raw_local_irq_restore(flags); \
diff --git a/init/main.c b/init/main.c
index 15bd0efff3df..0873319dcff4 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1041,6 +1041,22 @@ asmlinkage __visible void __init start_kernel(void)
sfi_init_late();
kcsan_init();

+ /* DEBUG CODE */
+ lockdep_assert_irqs_enabled(); /* Pass. */
+ {
+ unsigned long flags1;
+ raw_local_irq_save(flags1);
+ {
+ unsigned long flags2;
+ lockdep_assert_irqs_enabled(); /* Pass - expectedly blind. */
+ local_irq_save(flags2);
+ lockdep_assert_irqs_disabled(); /* Pass. */
+ local_irq_restore(flags2);
+ }
+ raw_local_irq_restore(flags1);
+ }
+ lockdep_assert_irqs_enabled(); /* FAIL! */
+
/* Do the rest non-__init'ed, we're now alive */
arch_call_rest_init();

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3617abb08e31..2c88574b817c 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3763,6 +3763,30 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
}
EXPORT_SYMBOL_GPL(lockdep_hardirqs_on);

+static DEFINE_PER_CPU(int, hardirqs_disabled);
+
+void lockdep_hardirqs_restore(unsigned long ip, unsigned long flags)
+{
+ if (unlikely(!debug_locks))
+ return;
+
+ if (in_nmi()) {
+ if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI))
+ return;
+ } else if (current->lockdep_recursion & LOCKDEP_RECURSION_MASK)
+ return;
+
+ if (__this_cpu_read(hardirqs_disabled) &&
+ __this_cpu_dec_return(hardirqs_disabled) == 0) {
+
+ lockdep_hardirqs_on_prepare(ip);
+ lockdep_hardirqs_on(ip);
+ } else {
+ lockdep_hardirqs_off(ip);
+ }
+}
+EXPORT_SYMBOL_GPL(lockdep_hardirqs_restore);
+
/*
* Hardirqs were disabled:
*/
@@ -3805,6 +3829,40 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
}
EXPORT_SYMBOL_GPL(lockdep_hardirqs_off);

+void lockdep_hardirqs_save(unsigned long ip, unsigned long flags)
+{
+ if (unlikely(!debug_locks))
+ return;
+
+ /*
+ * Matching lockdep_hardirqs_on(), allow NMIs in the middle of lockdep;
+ * they will restore the software state. This ensures the software
+ * state is consistent inside NMIs as well.
+ */
+ if (in_nmi()) {
+ if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI))
+ return;
+ } else if (current->lockdep_recursion & LOCKDEP_RECURSION_MASK)
+ return;
+
+ /*
+ * If IRQs were disabled, but IRQ tracking state said enabled we
+ * 'missed' an update (or are nested inside raw_local_irq_*()) and
+ * cannot rely on IRQ state to tell us when we should enable again.
+ *
+ * Count our way through this.
+ */
+ if (raw_irqs_disabled_flags(flags)) {
+ if (__this_cpu_read(hardirqs_enabled)) {
+ WARN_ON_ONCE(__this_cpu_read(hardirqs_disabled) != 0);
+ __this_cpu_write(hardirqs_disabled, 1);
+ } else if (__this_cpu_read(hardirqs_disabled))
+ __this_cpu_inc(hardirqs_disabled);
+ }
+ lockdep_hardirqs_off(ip);
+}
+EXPORT_SYMBOL_GPL(lockdep_hardirqs_save);
+
/*
* Softirqs will be enabled:
*/
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index f10073e62603..080deaa1d9c9 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -85,6 +85,36 @@ void trace_hardirqs_off(void)
EXPORT_SYMBOL(trace_hardirqs_off);
NOKPROBE_SYMBOL(trace_hardirqs_off);

+void trace_hardirqs_save(unsigned long flags)
+{
+ lockdep_hardirqs_save(CALLER_ADDR0, flags);
+
+ if (!this_cpu_read(tracing_irq_cpu)) {
+ this_cpu_write(tracing_irq_cpu, 1);
+ tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
+ if (!in_nmi())
+ trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+ }
+}
+EXPORT_SYMBOL(trace_hardirqs_save);
+NOKPROBE_SYMBOL(trace_hardirqs_save);
+
+void trace_hardirqs_restore(unsigned long flags)
+{
+ if (this_cpu_read(tracing_irq_cpu)) {
+ if (!in_nmi())
+ trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+ tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
+ this_cpu_write(tracing_irq_cpu, 0);
+ }
+
+ lockdep_hardirqs_restore(CALLER_ADDR0, flags);
+}
+EXPORT_SYMBOL(trace_hardirqs_restore);
+NOKPROBE_SYMBOL(trace_hardirqs_restore);
+
+#ifdef __s390__
+
__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
{
if (this_cpu_read(tracing_irq_cpu)) {
@@ -113,6 +143,9 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
}
EXPORT_SYMBOL(trace_hardirqs_off_caller);
NOKPROBE_SYMBOL(trace_hardirqs_off_caller);
+
+#endif /* __s390__ */
+
#endif /* CONFIG_TRACE_IRQFLAGS */

#ifdef CONFIG_TRACE_PREEMPT_TOGGLE

Marco Elver

unread,
Aug 12, 2020, 4:06:59 AM8/12/20
to pet...@infradead.org, Jürgen Groß, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu, Steven Rostedt
On Tue, Aug 11, 2020 at 10:17PM +0200, pet...@infradead.org wrote:
> On Tue, Aug 11, 2020 at 11:46:51AM +0200, pet...@infradead.org wrote:
>
> > So let me once again see if I can't find a better solution for this all.
> > Clearly it needs one :/
>
> So the below boots without triggering the debug code from Marco -- it
> should allow nesting local_irq_save/restore under raw_local_irq_*().
>
> I tried unconditional counting, but there's some _reallly_ wonky /
> asymmetric code that wrecks that and I've not been able to come up with
> anything useful.
>
> This one starts counting when local_irq_save() finds it didn't disable
> IRQs while lockdep though it did. At that point, local_irq_restore()
> will decrement and enable things again when it reaches 0.
>
> This assumes local_irq_save()/local_irq_restore() are nested sane, which
> is mostly true.
>
> This leaves #PF, which I fixed in these other patches, but I realized it
> needs fixing for all architectures :-( No bright ideas there yet.
>
> ---
> arch/x86/entry/thunk_32.S | 5 ----
> include/linux/irqflags.h | 45 +++++++++++++++++++-------------
> init/main.c | 16 ++++++++++++
> kernel/locking/lockdep.c | 58 +++++++++++++++++++++++++++++++++++++++++
> kernel/trace/trace_preemptirq.c | 33 +++++++++++++++++++++++
> 5 files changed, 134 insertions(+), 23 deletions(-)

Testing this again with syzkaller produced some new reports:

BUG: stack guard page was hit in error_entry
BUG: stack guard page was hit in exc_int3
PANIC: double fault in error_entry
PANIC: double fault in exc_int3

Most of them have corrupted reports, but this one might be useful:

BUG: stack guard page was hit at 000000001fab0982 (stack is 00000000063f33dc..00000000bf04b0d8)
BUG: stack guard page was hit at 00000000ca97ac69 (stack is 00000000af3e6c84..000000001597e1bf)
kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP
CPU: 1 PID: 4709 Comm: kworker/1:1H Not tainted 5.8.0+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Workqueue: events_highpri snd_vmidi_output_work
RIP: 0010:exc_int3+0x5/0xf0 arch/x86/kernel/traps.c:636
Code: c9 85 4d 89 e8 31 c0 e8 a9 7d 68 fd e9 90 fe ff ff e8 0f 35 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 53 48 89 fb <e8> 76 0e 00 00 85 c0 74 03 5b 5d c3 f6 83 88 00 00 00 03 74 7e 48
RSP: 0018:ffffc90008114000 EFLAGS: 00010083
RAX: 0000000084e00e17 RBX: ffffc90008114018 RCX: ffffffff84e00e17
RDX: 0000000000000000 RSI: ffffffff84e00a39 RDI: ffffc90008114018
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88807dc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc90008113ff8 CR3: 000000002dae4006 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 00000000
Call Trace:
asm_exc_int3+0x31/0x40 arch/x86/include/asm/idtentry.h:537
RIP: 0010:arch_static_branch include/trace/events/preemptirq.h:40 [inline]
RIP: 0010:static_key_false include/linux/jump_label.h:200 [inline]
RIP: 0010:trace_irq_enable_rcuidle+0xd/0x120 include/trace/events/preemptirq.h:40
Code: 24 08 48 89 df e8 43 8d ef ff 48 89 df 5b e9 4a 2e 99 03 66 2e 0f 1f 84 00 00 00 00 00 55 41 56 53 48 89 fb e8 84 1a fd ff cc <1f> 44 00 00 5b 41 5e 5d c3 65 8b 05 ab 74 c3 7e 89 c0 31 f6 48 0f
RSP: 0018:ffffc900081140f8 EFLAGS: 00000093
RAX: ffffffff813d9e8c RBX: ffffffff81314dd3 RCX: ffff888076ce6000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81314dd3
RBP: 0000000000000000 R08: ffffffff813da3d4 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000082 R14: 0000000000000000 R15: ffff888076ce6000
trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074

<... repeated many many times ...>

trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
Lost 500 message(s)!
BUG: stack guard page was hit at 00000000cab483ba (stack is 00000000b1442365..00000000c26f9ad3)
BUG: stack guard page was hit at 00000000318ff8d8 (stack is 00000000fd87d656..0000000058100136)
---[ end trace 4157e0bb4a65941a ]---

.config is attached. Unfortunately no C-reproducer yet.

Thanks,
-- Marco
.config

pet...@infradead.org

unread,
Aug 12, 2020, 4:21:24 AM8/12/20
to Marco Elver, Jürgen Groß, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu, Steven Rostedt
Wheee... recursion! Let me try and see if I can make something of that.

pet...@infradead.org

unread,
Aug 12, 2020, 4:57:32 AM8/12/20
to Marco Elver, Jürgen Groß, Borislav Petkov, Dave Hansen, fengh...@intel.com, H. Peter Anvin, LKML, Ingo Molnar, syzkaller-bugs, Thomas Gleixner, Luck, Tony, the arch/x86 maintainers, yu-ch...@intel.com, sd...@vmware.com, virtual...@lists.linux-foundation.org, kasan-dev, syzbot, Paul E. McKenney, Wei Liu, Steven Rostedt
On Wed, Aug 12, 2020 at 10:18:32AM +0200, pet...@infradead.org wrote:
> > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> >
> > <... repeated many many times ...>
> >
> > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> > Lost 500 message(s)!
> > BUG: stack guard page was hit at 00000000cab483ba (stack is 00000000b1442365..00000000c26f9ad3)
> > BUG: stack guard page was hit at 00000000318ff8d8 (stack is 00000000fd87d656..0000000058100136)
> > ---[ end trace 4157e0bb4a65941a ]---
>
> Wheee... recursion! Let me try and see if I can make something of that.

All that's needed is enabling the preemptirq tracepoints. Lemme go fix.

Big Budsupply

unread,
Aug 12, 2020, 7:44:07 AM8/12/20
to pet...@infradead.org, Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar, Jürgen Groß, LKML, Luck, Tony, Marco Elver, Paul E. McKenney, Steven Rostedt, Thomas Gleixner, Wei Liu, fengh...@intel.com, kasan-dev, sd...@vmware.com, syzbot, syzkaller-bugs, the arch/x86 maintainers, virtual...@lists.linux-foundation.org, yu-ch...@intel.com
Hello guys hope you are doing good! we are Bigbudsupply we grow and sell the best medical marijuana product, we are looking for long time customers, you can Email us /Bigbud...@gmail.com
Looking forward to working with you guys

--

You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.

To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200812085717.GJ35926%40hirez.programming.kicks-ass.net.

syzbot

unread,
Sep 9, 2020, 4:06:15 PM9/9/20
to syzkall...@googlegroups.com
Auto-closing this bug as obsolete.
Crashes did not happen for a while, no reproducer and no activity.
Reply all
Reply to author
Forward
0 new messages