We get a page fault immediately (next instruction) after returning from
the guest when running with oprofile. The page fault address does not
match anything the instruction does, so presumably it is one of the
accesses the processor performs in order to service an NMI (ordinary
interrupts are masked; and the fact that it happens with oprofile
strengthens this assumption).
If this is correct, the fault is not in the NMI handler itself, but in
one of the memory areas the cpu looks in to vector the NMI, which can be:
- the IDT
- the GDT
- the TSS
- the NMI stack
Except for the IDT these are per-cpu structure, though I don't know
whether they are allocated with the percpu infrastructure.
Here is the code in question:
> 3ae7: 75 05 jne 3aee<vmx_vcpu_run+0x26a>
> 3ae9: 0f 01 c2 vmlaunch
> 3aec: eb 03 jmp 3af1<vmx_vcpu_run+0x26d>
> 3aee: 0f 01 c3 vmresume
> 3af1: 48 87 0c 24 xchg %rcx,(%rsp)
^^^ fault, but not at (%rsp)
> 3af5: 48 89 81 18 01 00 00 mov %rax,0x118(%rcx)
> 3afc: 48 89 99 30 01 00 00 mov %rbx,0x130(%rcx)
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Avi Kivity wrote:
> We get a page fault immediately (next instruction) after returning from
> the guest when running with oprofile. The page fault address does not
> match anything the instruction does, so presumably it is one of the
> accesses the processor performs in order to service an NMI (ordinary
> interrupts are masked; and the fact that it happens with oprofile
> strengthens this assumption).
Ah... okay, that's tricky but IIRC faults like that can be
distinguished from regular ones via processor state, right?
> If this is correct, the fault is not in the NMI handler itself, but in
> one of the memory areas the cpu looks in to vector the NMI, which can be:
>
> - the IDT
> - the GDT
> - the TSS
> - the NMI stack
>
> Except for the IDT these are per-cpu structure, though I don't know
> whether they are allocated with the percpu infrastructure.
Don't know where NMI stack is but all else are percpu.
> Here is the code in question:
>
>> 3ae7: 75 05 jne 3aee<vmx_vcpu_run+0x26a>
>> 3ae9: 0f 01 c2 vmlaunch
>> 3aec: eb 03 jmp 3af1<vmx_vcpu_run+0x26d>
>> 3aee: 0f 01 c3 vmresume
>> 3af1: 48 87 0c 24 xchg %rcx,(%rsp)
>
> ^^^ fault, but not at (%rsp)
Can you please post the full oops (including kernel debug messages
during boot) or give me a pointer to the original message? Also, does
the faulting address coincide with any symbol?
Thanks.
--
tejun
Not on x86. But given that the fault address is different from %rsp
(which is what the instruction accesses) and %rip, there aren't many
alternatives.
>> Here is the code in question:
>>
>>
>>> 3ae7: 75 05 jne 3aee<vmx_vcpu_run+0x26a>
>>> 3ae9: 0f 01 c2 vmlaunch
>>> 3aec: eb 03 jmp 3af1<vmx_vcpu_run+0x26d>
>>> 3aee: 0f 01 c3 vmresume
>>> 3af1: 48 87 0c 24 xchg %rcx,(%rsp)
>>>
>> ^^^ fault, but not at (%rsp)
>>
> Can you please post the full oops (including kernel debug messages
> during boot) or give me a pointer to the original message?
http://www.mail-archive.com/k...@vger.kernel.org/msg23458.html
> Also, does
> the faulting address coincide with any symbol?
>
No (at least, not in System.map).
--
error compiling committee.c: too many arguments to function
--
11/01/2009 08:31 PM, Avi Kivity wrote:
>>> Here is the code in question:
>>>
>>>
>>>> 3ae7: 75 05 jne
>>>> 3aee<vmx_vcpu_run+0x26a>
>>>> 3ae9: 0f 01 c2 vmlaunch
>>>> 3aec: eb 03 jmp
>>>> 3af1<vmx_vcpu_run+0x26d>
>>>> 3aee: 0f 01 c3 vmresume
>>>> 3af1: 48 87 0c 24 xchg %rcx,(%rsp)
>>>>
>>> ^^^ fault, but not at (%rsp)
>>>
>> Can you please post the full oops (including kernel debug messages
>> during boot) or give me a pointer to the original message?
>
> http://www.mail-archive.com/k...@vger.kernel.org/msg23458.html
>
>> Also, does
>> the faulting address coincide with any symbol?
>>
>
> No (at least, not in System.map).
Has there been any progress? Is kvm + oprofile still broken?
Thanks.
--
tejun
I just tried testing tip of kvm.git, but unfortunately I think I might
be hitting a different problem, where processes run 100% in kernel mode.
In my case, cpus 9 and 13 were stuck, running qemu processes. A stack
backtrace for both cpus are below. FWIW, kernel.org 2.6.32-rc7 does not
have this problem, or the original problem.
> NMI backtrace for cpu 9
> CPU 9:
> Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput sr_mod cdrom ata_generic pata_acpi ata_piix joydev libata ide_pci_generic usbhid ide_core hid serio_raw cdc_ether usbnet mii matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt i2c_i801 i2c_core pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core bnx2 rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: processor]
> Pid: 5687, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1 -[7947AC1]-
> RIP: 0010:[<ffffffff810b802b>] [<ffffffff810b802b>] fire_user_return_notifiers+0x31/0x36
> RSP: 0018:ffff88095024df08 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000800 RCX: ffff88095024c000
> RDX: ffff880028340000 RSI: 0000000000000000 RDI: ffff88095024df58
> RBP: ffff88095024df18 R08: 0000000000000000 R09: 0000000000000001
> R10: 000000caf1fff62d R11: ffff8805b584de40 R12: 00007fffae48e0f0
> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> FS: 00007f45c69d57c0(0000) GS:ffff880028340000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: fffff9800121056e CR3: 0000000953d36000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
> <#DB[1]> <<EOE>> Pid: 5687, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1
> Call Trace:
> <NMI> [<ffffffff8100af53>] ? show_regs+0x44/0x49
> [<ffffffff812e57b2>] nmi_watchdog_tick+0xc2/0x1b9
> [<ffffffff812e4e73>] do_nmi+0xb0/0x252
> [<ffffffff812e48a0>] nmi+0x20/0x30
> [<ffffffff810b802b>] ? fire_user_return_notifiers+0x31/0x36
> <<EOE>> [<ffffffff8100b844>] do_notify_resume+0x62/0x69
> [<ffffffff8100bf48>] ? int_check_syscall_exit_work+0x9/0x3d
> [<ffffffff8100bf8e>] int_signal+0x12/0x17
> NMI backtrace for cpu 13
> CPU 13:
> Modules linked in: tun sunrpc af_packet bridge stp ipv6 binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath scsi_dh dm_mod kvm_intel kvm uinput sr_mod cdrom ata_generic pata_acpi ata_piix joydev libata ide_pci_generic usbhid ide_core hid serio_raw cdc_ether usbnet mii matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc iTCO_wdt i2c_i801 i2c_core pcspkr iTCO_vendor_support ioatdma thermal rtc_cmos rtc_core bnx2 rtc_lib dca thermal_sys hwmon sg button shpchp pci_hotplug qla2xxx scsi_transport_fc scsi_tgt sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: processor]
> Pid: 5792, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1 -[7947AC1]-
> RIP: 0010:[<ffffffff8100bfb0>] [<ffffffff8100bfb0>] int_restore_rest+0x1d/0x3d
> RSP: 0018:ffff88124f491f58 EFLAGS: 00000292
> RAX: 0000000000000800 RBX: 00007fff9df852e0 RCX: ffff88124f490000
> RDX: ffff88099ff40000 RSI: 0000000000000000 RDI: 000000000000fe2e
> RBP: 00007fff9df85260 R08: ffff88124f490000 R09: 0000000000000000
> R10: 0000000000000005 R11: ffff880954971da0 R12: 00007fff9df851e0
> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
> FS: 00007f73b5b1d7c0(0000) GS:ffff88099ff40000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f8d5a8de9d0 CR3: 0000000eb34d7000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
> <#DB[1]> <<EOE>> Pid: 5792, comm: qemu-system-x86 Not tainted 2.6.32-rc7-5e8cb552cb8b48244b6d07bff984b3c4080d4bc9-autokern1 #1
> Call Trace:
> <NMI> [<ffffffff8100af53>] ? show_regs+0x44/0x49
> [<ffffffff812e57b2>] nmi_watchdog_tick+0xc2/0x1b9
> [<ffffffff812e4e73>] do_nmi+0xb0/0x252
> [<ffffffff812e48a0>] nmi+0x20/0x30
> [<ffffffff8100bfb0>] ? int_restore_rest+0x1d/0x3d
> <<EOE>>
-Andrew
11/26/2009 10:35 AM, Andrew Theurer wrote:
> I just tried testing tip of kvm.git, but unfortunately I think I might
> be hitting a different problem, where processes run 100% in kernel mode.
> In my case, cpus 9 and 13 were stuck, running qemu processes. A stack
> backtrace for both cpus are below. FWIW, kernel.org 2.6.32-rc7 does not
> have this problem, or the original problem.
2.6.32-rc7 doesn't have problem with kvm + oprofile? If the original
analysis was right, I can't think of anything which could have changed
that between the merge commit and 2.6.32-rc7.
Thanks.
--
tejun
That's a bug with the new user return notifiers. Is your host kernel
preemptible?
I think I saw this once but I'm not sure. I can't reproduce with a host
kernel build, some silly guest workload, and 'perf top' to generate an
nmi load.
--
error compiling committee.c: too many arguments to function
--
preempt is off.
>
> I think I saw this once but I'm not sure. I can't reproduce with a host
> kernel build, some silly guest workload, and 'perf top' to generate an
> nmi load.
>
-Andrew
I just posted a patch fixing this, titled "[PATCH tip:x86/entry] core:
fix user return notifier on fork()".
--
error compiling committee.c: too many arguments to function
--
-Andrew