intel_pstate divide error with v3.13-rc4-256-gb7000ad

Josh Boyer

unread,

Dec 24, 2013, 9:40:01 AM12/24/13

to

Hi All,

We've had a report [1] that the pstate driver causes KVM guests to
fail to boot because of a divide error. See the backtrace below.

4.839784] Intel P-state driver initializing.
[ 4.859972] Intel pstate controlling: cpu 0
[ 4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
[ 4.877269] divide error: 0000 [#1] SMP
[ 4.878127] Modules linked in:
[ 4.878127] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
3.13.0-0.rc4.git5.1.fc21.x86_64 #1
[ 4.878127] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 4.878127] task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti:
ffff88001e9bc000
[ 4.878127] RIP: 0010:[<ffffffff815c551d>] [<ffffffff815c551d>]
intel_pstate_timer_func+0x11d/0x2b0
[ 4.878127] RSP: 0000:ffff88001ee03e18 EFLAGS: 00010246
[ 4.878127] RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100
[ 4.878127] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 4.878127] RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000
[ 4.878127] R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000
[ 4.878127] R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348
[ 4.878127] FS: 0000000000000000(0000) GS:ffff88001ee00000(0000)
knlGS:0000000000000000
[ 4.878127] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4.878127] CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0
[ 4.878127] Stack:
[ 4.878127] fffffffb1a454390 ffffffff821a4500 ffff88001a454390
0000000000000100
[ 4.878127] ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15
ffffffff82d5ed40
[ 4.878127] ffffffff8258cc60 0000000000000000 ffffffff81ac39de
0000000000000000
[ 4.878127] Call Trace:
[ 4.878127] <IRQ>
[ 4.878127] [<ffffffff81083e9a>] call_timer_fn+0x8a/0x310
[ 4.878127] [<ffffffff81083e15>] ? call_timer_fn+0x5/0x310
[ 4.878127] [<ffffffff815c5400>] ? pid_param_set+0x130/0x130
[ 4.878127] [<ffffffff81084354>] run_timer_softirq+0x234/0x380
[ 4.878127] [<ffffffff8107aee4>] __do_softirq+0x104/0x430
[ 4.878127] [<ffffffff8107b5fd>] irq_exit+0xcd/0xe0
[ 4.878127] [<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60
[ 4.878127] [<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80
[ 4.878127] <EOI>
[ 4.878127] [<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0
[ 4.878127] [<ffffffff81757719>] printk+0x67/0x69
[ 4.878127] [<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0
[ 4.878127] [<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20
[ 4.878127] [<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0
[ 4.878127] [<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210
[ 4.878127] [<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be
[ 4.878127] [<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10
[ 4.878127] [<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12
[ 4.878127] [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
[ 4.878127] [<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0
[ 4.878127] [<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287
[ 4.878127] [<ffffffff81f638d0>] ? do_early_param+0x88/0x88
[ 4.878127] [<ffffffff8174b530>] ? rest_init+0x150/0x150
[ 4.878127] [<ffffffff8174b53e>] kernel_init+0xe/0x130
[ 4.878127] [<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0
[ 4.878127] [<ffffffff8174b530>] ? rest_init+0x150/0x150
[ 4.878127] Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00
00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1
e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa
1f 89
[ 4.878127] RIP [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
[ 4.878127] RSP <ffff88001ee03e18>
[ 5.438189] ---[ end trace f166110ed22cc37a ]---
[ 5.446428] Kernel panic - not syncing: Fatal exception in interrupt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Viresh Kumar

unread,

Dec 24, 2013, 11:10:02 AM12/24/13

to

Adding Dirk..

On 24 December 2013 20:06, Josh Boyer <jwb...@fedoraproject.org> wrote:
> Hi All,
>
> We've had a report [1] that the pstate driver causes KVM guests to
> fail to boot because of a divide error. See the backtrace below.
>
> 4.839784] Intel P-state driver initializing.
> [ 4.859972] Intel pstate controlling: cpu 0
> [ 4.867653] cpufreq: __cpufreq_add_dev: ->get() failed

After a call to ->init(), ->get() is supposed to work.
@Dirk: Any idea why it failed?

And then I don't know what made this divide by zero to happen :)

One Thousand Gnomes

unread,

Dec 27, 2013, 7:30:02 AM12/27/13

to

On Tue, 24 Dec 2013 21:36:01 +0530
Viresh Kumar <viresh...@linaro.org> wrote:

> Adding Dirk..
>
> On 24 December 2013 20:06, Josh Boyer <jwb...@fedoraproject.org> wrote:
> > Hi All,
> >
> > We've had a report [1] that the pstate driver causes KVM guests to
> > fail to boot because of a divide error. See the backtrace below.
> >
> > 4.839784] Intel P-state driver initializing.
> > [ 4.859972] Intel pstate controlling: cpu 0
> > [ 4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
>
> After a call to ->init(), ->get() is supposed to work.
> @Dirk: Any idea why it failed?
>
> And then I don't know what made this divide by zero to happen :)

> > [ 4.878127] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011

Emulator bug perhaps ?

Gleb Natapov

unread,

Dec 27, 2013, 7:50:01 AM12/27/13

to

On Fri, Dec 27, 2013 at 12:24:22PM +0000, One Thousand Gnomes wrote:
> On Tue, 24 Dec 2013 21:36:01 +0530
> Viresh Kumar <viresh...@linaro.org> wrote:
>
> > Adding Dirk..
> >
> > On 24 December 2013 20:06, Josh Boyer <jwb...@fedoraproject.org> wrote:
> > > Hi All,
> > >
> > > We've had a report [1] that the pstate driver causes KVM guests to
> > > fail to boot because of a divide error. See the backtrace below.
> > >
> > > 4.839784] Intel P-state driver initializing.
> > > [ 4.859972] Intel pstate controlling: cpu 0
> > > [ 4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
> >
> > After a call to ->init(), ->get() is supposed to work.
> > @Dirk: Any idea why it failed?
> >
> > And then I don't know what made this divide by zero to happen :)
>
> > > [ 4.878127] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>
> Emulator bug perhaps ?
>

KVM does not emulate P-states at all. intel_pstate_init() calls
intel_pstate_msrs_not_valid() before printing "Intel P-state driver
initializing." which suppose to fail since it checks that two reads of
MSR_IA32_APERF return different values, but KVM does not emulate this msr
at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
are patched to be rdmsrl_safe in a guest).

Anything interesting in host dmesg? Is it reproducible?

--
Gleb.

Josh Boyer

unread,

Dec 27, 2013, 8:50:01 AM12/27/13

to

On Fri, Dec 27, 2013 at 7:47 AM, Gleb Natapov <gl...@kernel.org> wrote:
> On Fri, Dec 27, 2013 at 12:24:22PM +0000, One Thousand Gnomes wrote:
>> On Tue, 24 Dec 2013 21:36:01 +0530
>> Viresh Kumar <viresh...@linaro.org> wrote:
>>
>> > Adding Dirk..
>> >
>> > On 24 December 2013 20:06, Josh Boyer <jwb...@fedoraproject.org> wrote:
>> > > Hi All,
>> > >
>> > > We've had a report [1] that the pstate driver causes KVM guests to
>> > > fail to boot because of a divide error. See the backtrace below.
>> > >
>> > > 4.839784] Intel P-state driver initializing.
>> > > [ 4.859972] Intel pstate controlling: cpu 0
>> > > [ 4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
>> >
>> > After a call to ->init(), ->get() is supposed to work.
>> > @Dirk: Any idea why it failed?
>> >
>> > And then I don't know what made this divide by zero to happen :)
>>
>> > > [ 4.878127] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>
>> Emulator bug perhaps ?
>>
> KVM does not emulate P-states at all. intel_pstate_init() calls
> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
> initializing." which suppose to fail since it checks that two reads of
> MSR_IA32_APERF return different values, but KVM does not emulate this msr
> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
> are patched to be rdmsrl_safe in a guest).
>
> Anything interesting in host dmesg? Is it reproducible?

Seems reproducible. I forgot to link to the actual bug in my original
report, but you can find it here:

https://bugzilla.redhat.com/show_bug.cgi?id=1046317

Adding Richard and Kashyap on CC.

josh

Kashyap Chamarthy

unread,

Dec 27, 2013, 9:20:01 AM12/27/13

to

[. . .]

>> KVM does not emulate P-states at all. intel_pstate_init() calls
>> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
>> initializing." which suppose to fail since it checks that two reads of
>> MSR_IA32_APERF return different values, but KVM does not emulate this msr
>> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
>> are patched to be rdmsrl_safe in a guest).
>>
>> Anything interesting in host dmesg?

Heya Gleb,

Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
=====
.
.
.
[ 3.462741] Intel pstate controlling: cpu 0
[ 3.463972] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.465399] Intel pstate controlling: cpu 1
[ 3.466505] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.468056] Intel pstate controlling: cpu 2
[ 3.469208] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.470751] Intel pstate controlling: cpu 3
[ 3.471870] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.473289] Intel pstate controlling: cpu 4
[ 3.474545] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.475957] Intel pstate controlling: cpu 5
[ 3.477135] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.478544] Intel pstate controlling: cpu 6
[ 3.479670] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.481124] Intel pstate controlling: cpu 7
[ 3.482288] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.483752] Intel pstate controlling: cpu 8
[ 3.484986] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.486396] Intel pstate controlling: cpu 9
[ 3.488000] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.489684] Intel pstate controlling: cpu 10
[ 3.491027] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.492647] Intel pstate controlling: cpu 11
[ 3.494026] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.495662] Intel pstate controlling: cpu 12
[ 3.497065] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.498733] Intel pstate controlling: cpu 13
[ 3.500071] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.501787] Intel pstate controlling: cpu 14
[ 3.503105] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.504809] Intel pstate controlling: cpu 15
[ 3.506142] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.507825] Intel pstate controlling: cpu 16
[ 3.509175] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.510780] Intel pstate controlling: cpu 17
[ 3.512158] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.513765] Intel pstate controlling: cpu 18
[ 3.515225] cpufreq: __cpufreq_add_dev: ->get() failed
[ 3.516898] Intel pstate controlling: cpu 19
.
.
.
=====

Full dmesg output: https://bugzilla.kernel.org/attachment.cgi?id=119701

(Kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=67761)

>> Is it reproducible?

Yes, just now I was able to reproduce it again. It's trivial with
libguestfs-test-tool:

$ export LIBGUESTFS_BACKEND=direct

$ guestfish get-backend
direct

$ libguestfs-test-tool

You see the Kernel panic on stdout.

NOTE: You can reproduce the issue without setting LIBGUESTFS_BACKEND=direct,
in which case, it'll default to 'libvirt' back-end.

Version:

$ uname -r; rpm -q libguestfs libvirt qemu-system-x86
3.13.0-0.rc4.git5.1.fc21.x86_64
libguestfs-1.25.18-1.fc21.x86_64
libvirt-1.1.3.1-2.fc20.x86_64
qemu-system-x86-1.6.1-2.fc20.x86_64

>
> Seems reproducible. I forgot to link to the actual bug in my original
> report, but you can find it here:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1046317
>
> Adding Richard and Kashyap on CC.
>
> josh
>

--

/kashyap

Rafael J. Wysocki

unread,

Dec 27, 2013, 9:30:02 AM12/27/13

to

On Tuesday, December 24, 2013 09:36:01 PM Viresh Kumar wrote:
> Adding Dirk..
>
> On 24 December 2013 20:06, Josh Boyer <jwb...@fedoraproject.org> wrote:
> > Hi All,
> >
> > We've had a report [1] that the pstate driver causes KVM guests to
> > fail to boot because of a divide error. See the backtrace below.
> >
> > 4.839784] Intel P-state driver initializing.
> > [ 4.859972] Intel pstate controlling: cpu 0
> > [ 4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
>
> After a call to ->init(), ->get() is supposed to work.
> @Dirk: Any idea why it failed?

Well, it looks like sample->freq is 0 in intel_pstate_get().

> And then I don't know what made this divide by zero to happen :)

From code inspection it looks like that is caused by the
intel_pstate_get_scaled_busy() called from intel_pstate_adjust_busy_pstate(),
so it appears that cpu->pstate.current_state is 0 at that point.

I'm wondering if something like the (untested) patch below helps, then?

Rafael

---
drivers/cpufreq/intel_pstate.c | 5 +++++
1 file changed, 5 insertions(+)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -652,6 +652,11 @@ static int intel_pstate_init_cpu(unsigne
cpu = all_cpu_data[cpunum];

intel_pstate_get_cpu_pstates(cpu);
+ if (!cpu->pstate.current_state) {
+ all_cpu_data[cpunum] = NULL;
+ kfree(cpu);
+ return -ENODATA;
+ }

cpu->cpu = cpunum;

Richard W.M. Jones

unread,

Dec 27, 2013, 9:40:02 AM12/27/13

to

Probably the qemu command line is more interesting, which is in this
comment and reproduced below.

https://bugzilla.redhat.com/show_bug.cgi?id=1046317#c1

/usr/bin/qemu-kvm \
-global virtio-blk-pci.scsi=off \
-nodefconfig \
-enable-fips \
-nodefaults \
-display none \
-machine accel=kvm:tcg \
-cpu host,+kvmclock \
-m 500 \
-no-reboot \
-no-hpet \
-kernel /var/tmp/.guestfs-0/kernel.32370 \
-initrd /var/tmp/.guestfs-0/initrd.32370 \
-device virtio-scsi-pci,id=scsi \
-drive file=/tmp/libguestfswnVOx7/scratch.1,cache=unsafe,format=raw,id=hd0,if=none \
-device scsi-hd,drive=hd0 \
-drive file=/var/tmp/.guestfs-0/root.32370,snapshot=on,id=appliance,cache=unsafe,if=none \
-device scsi-hd,drive=appliance \
-device virtio-serial-pci \
-serial stdio \
-device sga \
-chardev socket,path=/tmp/libguestfswnVOx7/guestfsd.sock,id=channel0 \
-device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
-append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=screen'

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

Gleb Natapov

unread,

Dec 27, 2013, 12:00:01 PM12/27/13

to

On Fri, Dec 27, 2013 at 03:15:39PM +0100, Kashyap Chamarthy wrote:
> [. . .]
>
> >> KVM does not emulate P-states at all. intel_pstate_init() calls
> >> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
> >> initializing." which suppose to fail since it checks that two reads of
> >> MSR_IA32_APERF return different values, but KVM does not emulate this msr
> >> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
> >> are patched to be rdmsrl_safe in a guest).
> >>
> >> Anything interesting in host dmesg?
>
> Heya Gleb,
>
> Here's the relevant dmesg snippet (full dmesg, refer the attachment below):

That's guest dmesg. What about host one? Can you ftrace the failure?

--
Gleb.

Gleb Natapov

unread,

Dec 27, 2013, 12:10:03 PM12/27/13

to

On Fri, Dec 27, 2013 at 06:52:48PM +0200, Gleb Natapov wrote:
> On Fri, Dec 27, 2013 at 03:15:39PM +0100, Kashyap Chamarthy wrote:
> > [. . .]
> >
> > >> KVM does not emulate P-states at all. intel_pstate_init() calls
> > >> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
> > >> initializing." which suppose to fail since it checks that two reads of
> > >> MSR_IA32_APERF return different values, but KVM does not emulate this msr
> > >> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
> > >> are patched to be rdmsrl_safe in a guest).
> > >>
> > >> Anything interesting in host dmesg?
> >
> > Heya Gleb,
> >
> > Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
> That's guest dmesg. What about host one? Can you ftrace the failure?
>

Ugh, it looks like guest dmesg but there are KVM messages there too ("[
281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
to MSR_IA32_APERF I was talking about above), so I guess this is nested
guest invocation? Does it happen in non nested guest?

Kashyap Chamarthy

unread,

Dec 27, 2013, 12:30:03 PM12/27/13

to

On 12/27/2013 06:01 PM, Gleb Natapov wrote:
> On Fri, Dec 27, 2013 at 06:52:48PM +0200, Gleb Natapov wrote:
>> On Fri, Dec 27, 2013 at 03:15:39PM +0100, Kashyap Chamarthy wrote:
>>> [. . .]
>>>
>>>>> KVM does not emulate P-states at all. intel_pstate_init() calls
>>>>> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
>>>>> initializing." which suppose to fail since it checks that two reads of
>>>>> MSR_IA32_APERF return different values, but KVM does not emulate this msr
>>>>> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
>>>>> are patched to be rdmsrl_safe in a guest).
>>>>>
>>>>> Anything interesting in host dmesg?
>>>
>>> Heya Gleb,
>>>
>>> Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
>> That's guest dmesg. What about host one?

Here's host dmesg: https://bugzilla.kernel.org/attachment.cgi?id=119751

>> Can you ftrace the failure?

Can try, need some time (rest of the day I'll be away travelling,
will try to do it over the weekend, and update the Kernel
bugzilla with observations).

>>
> Ugh, it looks like guest dmesg but there are KVM messages there too ("[
> 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
> to MSR_IA32_APERF I was talking about above), so I guess this is nested
> guest invocation?

Yeah -- sorry, I forgot to note it's in a nested environment :(

> Does it happen in non nested guest?

I need to that.

Note to self: Also try with a newer Kernel on the host.

--
/kashyap

Richard W.M. Jones

unread,

Dec 27, 2013, 12:30:03 PM12/27/13

to

On Fri, Dec 27, 2013 at 07:01:48PM +0200, Gleb Natapov wrote:
> On Fri, Dec 27, 2013 at 06:52:48PM +0200, Gleb Natapov wrote:
> > On Fri, Dec 27, 2013 at 03:15:39PM +0100, Kashyap Chamarthy wrote:
> > > [. . .]
> > >
> > > >> KVM does not emulate P-states at all. intel_pstate_init() calls
> > > >> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
> > > >> initializing." which suppose to fail since it checks that two reads of
> > > >> MSR_IA32_APERF return different values, but KVM does not emulate this msr
> > > >> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
> > > >> are patched to be rdmsrl_safe in a guest).
> > > >>
> > > >> Anything interesting in host dmesg?
> > >
> > > Heya Gleb,
> > >
> > > Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
> > That's guest dmesg. What about host one? Can you ftrace the failure?
> >
> Ugh, it looks like guest dmesg but there are KVM messages there too ("[
> 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
> to MSR_IA32_APERF I was talking about above), so I guess this is nested
> guest invocation? Does it happen in non nested guest?

FWIW I could not reproduce the reported bug. I am also using a guest
for testing, but *not* using nested KVM.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones

Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)

Rafael J. Wysocki

unread,

Dec 27, 2013, 4:40:02 PM12/27/13

to

Please try the patch I posted earlier today when you're at it:

https://patchwork.kernel.org/patch/3411991/

Rafael

Kashyap Chamarthy

unread,

Dec 29, 2013, 7:20:01 AM12/29/13

to

[. . .]

>> Here's host dmesg: https://bugzilla.kernel.org/attachment.cgi?id=119751
>>
>>>> Can you ftrace the failure?
>>
>> Can try, need some time (rest of the day I'll be away travelling,
>> will try to do it over the weekend, and update the Kernel
>> bugzilla with observations).
>>
>>>>
>>> Ugh, it looks like guest dmesg but there are KVM messages there too ("[
>>> 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
>>> to MSR_IA32_APERF I was talking about above), so I guess this is nested
>>> guest invocation?
>>
>> Yeah -- sorry, I forgot to note it's in a nested environment :(
>>
>>> Does it happen in non nested guest?
>>
>> I need to that.
>>
>> Note to self: Also try with a newer Kernel on the host.
>
> Please try the patch I posted earlier today when you're at it:
>
> https://patchwork.kernel.org/patch/3411991/

I applied the patch & tried to build the Kernel, it failed with:

=======
.
.
.
Generating a 4096 bit RSA private key
............................................................................................................drivers/cpufreq/intel_pstate.c:
In function 'intel_pstate_init_cpu':
drivers/cpufreq/intel_pstate.c:617:18: error: 'struct pstate_data' has no member named
'current_state'
if (!cpu->pstate.current_state) {
^
...make[2]: *** [drivers/cpufreq/intel_pstate.o] Error 1
make[1]: *** [drivers/cpufreq] Error 2
make[1]: *** Waiting for unfinished jobs....
.....................................make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....
...................................++
=======

I'll try it on Fedora Kernel git, do a test build and re-try again.

--
/kashyap

Rafael J. Wysocki

unread,

Dec 29, 2013, 10:00:01 AM12/29/13

to

My bad, that should have been current_pstate. Updated patch is appended.

Thanks,

Rafael

---
drivers/cpufreq/intel_pstate.c | 5 +++++
1 file changed, 5 insertions(+)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c

@@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne

cpu = all_cpu_data[cpunum];

intel_pstate_get_cpu_pstates(cpu);

+ if (!cpu->pstate.current_pstate) {

+ all_cpu_data[cpunum] = NULL;
+ kfree(cpu);
+ return -ENODATA;
+ }

cpu->cpu = cpunum;

Kashyap Chamarthy

unread,

Dec 30, 2013, 10:00:02 AM12/30/13

to

Thanks Rafel, I can confirm this patch helps.

I re-built the Kernel on L0 (physical host) and L1 (guest hypervisor) with
the above patch, and re-ran the libguestfs test (which invokes an L2 appliance),
it now successfully completes:

http://kashyapc.fedorapeople.org/temp/libguestfs-test-tool-stdout.txt

Here's a Fedora Kernel scratch build (not retained for more than 2 weeks)
with the above patch I used -- http://koji.fedoraproject.org/koji/taskinfo?taskID=6342414

--
/kashyap

Rafael J. Wysocki

unread,

Dec 30, 2013, 9:00:01 PM12/30/13

to

Awesome, thanks!

Below is an official version with a changelog. I'll queue it up as a fix
for 3.13.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rafael.j...@intel.com>
Subject: intel_pstate: Fail initialization if P-state information is missing

If pstate.current_pstate is 0 after the initial
intel_pstate_get_cpu_pstates(), this means that we were unable to
obtain any useful P-state information and there is no reason to
continue, so free memory and return an error in that case.

This fixes the following divide error occuring in a nested KVM
guest:

Intel P-state driver initializing.

Intel pstate controlling: cpu 0

cpufreq: __cpufreq_add_dev: ->get() failed

divide error: 0000 [#1] SMP

Modules linked in:

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-0.rc4.git5.1.fc21.x86_64 #1

Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011

task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti: ffff88001e9bc000

RIP: 0010:[<ffffffff815c551d>] [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0

RSP: 0000:ffff88001ee03e18 EFLAGS: 00010246

RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100

RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000

RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000

R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000

R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348

FS: 0000000000000000(0000) GS:ffff88001ee00000(0000) knlGS:0000000000000000

CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b

CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0

Stack:
fffffffb1a454390 ffffffff821a4500 ffff88001a454390 0000000000000100
ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15 ffffffff82d5ed40
ffffffff8258cc60 0000000000000000 ffffffff81ac39de 0000000000000000
Call Trace:
<IRQ>
[<ffffffff81083e9a>] call_timer_fn+0x8a/0x310
[<ffffffff81083e15>] ? call_timer_fn+0x5/0x310
[<ffffffff815c5400>] ? pid_param_set+0x130/0x130
[<ffffffff81084354>] run_timer_softirq+0x234/0x380
[<ffffffff8107aee4>] __do_softirq+0x104/0x430
[<ffffffff8107b5fd>] irq_exit+0xcd/0xe0
[<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60
[<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80
<EOI>
[<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0
[<ffffffff81757719>] printk+0x67/0x69
[<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0
[<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20
[<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0
[<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210
[<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be
[<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10
[<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12
[<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
[<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0
[<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287
[<ffffffff81f638d0>] ? do_early_param+0x88/0x88
[<ffffffff8174b530>] ? rest_init+0x150/0x150
[<ffffffff8174b53e>] kernel_init+0xe/0x130
[<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0
[<ffffffff8174b530>] ? rest_init+0x150/0x150

Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00 00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1 e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa 1f 89

RIP [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
RSP <ffff88001ee03e18>

---[ end trace f166110ed22cc37a ]---

Kernel panic - not syncing: Fatal exception in interrupt

Reported-and-tested-by: Kashyap Chamarthy <kcha...@redhat.com>
Cc: Josh Boyer <jwb...@fedoraproject.org>
Signed-off-by: Rafael J. Wysocki <rafael.j...@intel.com>

---
drivers/cpufreq/intel_pstate.c | 5 +++++
1 file changed, 5 insertions(+)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne
cpu = all_cpu_data[cpunum];

intel_pstate_get_cpu_pstates(cpu);
+ if (!cpu->pstate.current_pstate) {
+ all_cpu_data[cpunum] = NULL;
+ kfree(cpu);
+ return -ENODATA;
+ }

cpu->cpu = cpunum;

Dirk Brandewie

unread,

Jan 3, 2014, 12:40:03 PM1/3/14

to

Hi All,

Sorry for being late to the party but I just got back from vacation.

There is something deeply wrong here. We should have never gotten to
intel_pstate_init_cpu(). The VM had to have returned value from the read
of the max pstate at driver init time and 0 when the CPU was being brought up.

intel_pstate_msrs_not_valid() was added to solve this issue early on
if I remember correctly it was Josh that reported it then. Is there
a definative way to detect whether we are running in a VM?

Can some one tell me how the nested environment differs in regards to
reading MSRs?

TIA
--Dirk

Gleb Natapov

unread,

Jan 3, 2014, 1:10:05 PM1/3/14

to

On Fri, Jan 03, 2014 at 09:30:28AM -0800, Dirk Brandewie wrote:
> Hi All,
>
> Sorry for being late to the party but I just got back from vacation.
>
> There is something deeply wrong here. We should have never gotten to
> intel_pstate_init_cpu(). The VM had to have returned value from the read
> of the max pstate at driver init time and 0 when the CPU was being brought up.
>
> intel_pstate_msrs_not_valid() was added to solve this issue early on
> if I remember correctly it was Josh that reported it then. Is there
> a definative way to detect whether we are running in a VM?
>

Checking for VM is a wrong thing to do here. KVM should behave like it
does not support p-state.

> Can some one tell me how the nested environment differs in regards to
> reading MSRs?
>

It shouldn't differ, but there may be bug somewhere in nested emulation.
We shouldn't try and hind the bug by doing more checks in Linux but
rather fixing KVM bug that causes Linux to behave incorrectly.

Gleb.

Dirk Brandewie

unread,

Jan 3, 2014, 3:10:02 PM1/3/14

to

On 01/03/2014 10:04 AM, Gleb Natapov wrote:
> On Fri, Jan 03, 2014 at 09:30:28AM -0800, Dirk Brandewie wrote:
>> Hi All,
>>
>> Sorry for being late to the party but I just got back from vacation.
>>
>> There is something deeply wrong here. We should have never gotten to
>> intel_pstate_init_cpu(). The VM had to have returned value from the read
>> of the max pstate at driver init time and 0 when the CPU was being brought up.
>>
>> intel_pstate_msrs_not_valid() was added to solve this issue early on
>> if I remember correctly it was Josh that reported it then. Is there
>> a definative way to detect whether we are running in a VM?
>>
> Checking for VM is a wrong thing to do here. KVM should behave like it
> does not support p-state.
>
>> Can some one tell me how the nested environment differs in regards to
>> reading MSRs?
>>
> It shouldn't differ, but there may be bug somewhere in nested emulation.
> We shouldn't try and hind the bug by doing more checks in Linux but
> rather fixing KVM bug that causes Linux to behave incorrectly.

Based on the unhandled MSR messages in the host dmesg the following patch
should make sure the correct values are returned for PLAT_FORM_INFO, APERF
and MPERF. intel_pstate and turbostat are the only users of these registers.

Could you try the folloinw patch minus Rafael's patch please.
Compile tested only.

TIA
--Dirk

commit 5594b89bee7f83200c1a70bf95d50ac35e4fe3f8
Author: Dirk Brandewie <dirk.j.b...@intel.com>
Date: Fri Jan 3 11:44:15 2014 -0800

x86/kvm: Handle MSR_PLATFORM_INFO, MSR_IA32_MPERF and MSR_IA32_APERF MSRs

Handle MSRs correctly when read from a nested KVM

Signed-off-by: Dirk Brandewie <dirk.j.b...@intel.com>
---
arch/x86/kvm/x86.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d004da..390ef27 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2336,6 +2336,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64
*pdata)

switch (msr) {
case MSR_IA32_PLATFORM_ID:
+ case MSR_IA32_MPERF:
+ case MSR_IA32_APERF:
+ case MSR_PLATFORM_INFO:
case MSR_IA32_EBL_CR_POWERON:
case MSR_IA32_DEBUGCTLMSR:
case MSR_IA32_LASTBRANCHFROMIP:

Paolo Bonzini

unread,

Jan 3, 2014, 5:10:02 PM1/3/14

to

Il 03/01/2014 21:00, Dirk Brandewie ha scritto:
> + case MSR_IA32_MPERF:
> + case MSR_IA32_APERF:

These should never be accessed. A KVM VM will always have
CPUID[06H].ECX = 0, and the Intel manual says that the MSRs are only
present if CPUID returns that value with bit 0 set.

I think the actual bug is that intel_pstate_init does not check the
feature bits in CPUID (either manually or via x86_match_cpu).

Paolo

Rafael J. Wysocki

unread,

Jan 3, 2014, 5:40:02 PM1/3/14

to

On Friday, January 03, 2014 08:04:36 PM Gleb Natapov wrote:
> On Fri, Jan 03, 2014 at 09:30:28AM -0800, Dirk Brandewie wrote:
> > Hi All,
> >
> > Sorry for being late to the party but I just got back from vacation.
> >
> > There is something deeply wrong here. We should have never gotten to
> > intel_pstate_init_cpu(). The VM had to have returned value from the read
> > of the max pstate at driver init time and 0 when the CPU was being brought up.
> >
> > intel_pstate_msrs_not_valid() was added to solve this issue early on
> > if I remember correctly it was Josh that reported it then. Is there
> > a definative way to detect whether we are running in a VM?
> >
> Checking for VM is a wrong thing to do here. KVM should behave like it
> does not support p-state.
>
> > Can some one tell me how the nested environment differs in regards to
> > reading MSRs?
> >
> It shouldn't differ, but there may be bug somewhere in nested emulation.
> We shouldn't try and hind the bug by doing more checks in Linux but
> rather fixing KVM bug that causes Linux to behave incorrectly.

Well, fixing the KVM bug is surely welcome.

That said, adding checks to ensure that your assumptions are valid is rarely
wrong, especially if they are done once per kernel boot. And the kernel only
should panic if it cannot continue to run, which isn't the case here.

Thanks,
Rafael

Paolo Bonzini

unread,

Jan 4, 2014, 3:40:01 AM1/4/14

to

Il 03/01/2014 23:46, Rafael J. Wysocki ha scritto:
> Well, fixing the KVM bug is surely welcome.
>
> That said, adding checks to ensure that your assumptions are valid is rarely
> wrong, especially if they are done once per kernel boot. And the kernel only
> should panic if it cannot continue to run, which isn't the case here.

I agree, but I suspect even your check is already late. Your patch is
welcome but perhaps it should have a WARN_ON too.

Paolo

Rafael J. Wysocki

unread,

Jan 4, 2014, 9:30:01 AM1/4/14

to

On Saturday, January 04, 2014 09:35:58 AM Paolo Bonzini wrote:
> Il 03/01/2014 23:46, Rafael J. Wysocki ha scritto:
> > Well, fixing the KVM bug is surely welcome.
> >
> > That said, adding checks to ensure that your assumptions are valid is rarely
> > wrong, especially if they are done once per kernel boot. And the kernel only
> > should panic if it cannot continue to run, which isn't the case here.
>
> I agree, but I suspect even your check is already late.

Well, it's just a sanity check and it makes the problem go away for the reporter.

> Your patch is welcome but perhaps it should have a WARN_ON too.

It has been pulled in already, so the WARN_ON() can only be added via a separate
patch now. Would you like to prepare that patch?

Rafael

Paolo Bonzini

unread,

Jan 4, 2014, 12:40:02 PM1/4/14

to

Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
> Well, it's just a sanity check and it makes the problem go away for the reporter.
>
>> > Your patch is welcome but perhaps it should have a WARN_ON too.
> It has been pulled in already, so the WARN_ON() can only be added via a separate
> patch now. Would you like to prepare that patch?

Yes, I'll add it together with the CPUID check. I'll send the patch so
that it can get into 3.14.

Paolo

Gleb Natapov

unread,

Jan 4, 2014, 12:50:02 PM1/4/14

to

On Sat, Jan 04, 2014 at 06:38:59PM +0100, Paolo Bonzini wrote:
> Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
> > Well, it's just a sanity check and it makes the problem go away for the reporter.
> >
> >> > Your patch is welcome but perhaps it should have a WARN_ON too.
> > It has been pulled in already, so the WARN_ON() can only be added via a separate
> > patch now. Would you like to prepare that patch?
>
> Yes, I'll add it together with the CPUID check. I'll send the patch so
> that it can get into 3.14.
>

CPUID check, while correct, will sweep the problem under the rug. Current
Linux logic should detect non working pstate in KVM. We should look into
why this is not happening for nested.

--
Gleb.

Rafael J. Wysocki

unread,

Jan 4, 2014, 4:30:02 PM1/4/14

to

On Saturday, January 04, 2014 07:48:13 PM Gleb Natapov wrote:
> On Sat, Jan 04, 2014 at 06:38:59PM +0100, Paolo Bonzini wrote:
> > Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
> > > Well, it's just a sanity check and it makes the problem go away for the reporter.
> > >
> > >> > Your patch is welcome but perhaps it should have a WARN_ON too.
> > > It has been pulled in already, so the WARN_ON() can only be added via a separate
> > > patch now. Would you like to prepare that patch?
> >
> > Yes, I'll add it together with the CPUID check. I'll send the patch so
> > that it can get into 3.14.
> >
> CPUID check, while correct, will sweep the problem under the rug. Current
> Linux logic should detect non working pstate in KVM. We should look into
> why this is not happening for nested.

I agree. It's better not to use CPUID for that in my opinion.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Rafael J. Wysocki

unread,

Jan 6, 2014, 6:30:01 AM1/6/14

to

On Monday, January 06, 2014 12:20:32 PM Paolo Bonzini wrote:

> Il 04/01/2014 22:38, Rafael J. Wysocki ha scritto:
> > On Saturday, January 04, 2014 07:48:13 PM Gleb Natapov wrote:
> >> On Sat, Jan 04, 2014 at 06:38:59PM +0100, Paolo Bonzini wrote:
> >>> Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
> >>>> Well, it's just a sanity check and it makes the problem go away for the reporter.
> >>>>
> >>>>>> Your patch is welcome but perhaps it should have a WARN_ON too.
> >>>> It has been pulled in already, so the WARN_ON() can only be added via a separate
> >>>> patch now. Would you like to prepare that patch?
> >>>
> >>> Yes, I'll add it together with the CPUID check. I'll send the patch so
> >>> that it can get into 3.14.
> >>>
> >> CPUID check, while correct, will sweep the problem under the rug. Current
> >> Linux logic should detect non working pstate in KVM. We should look into
> >> why this is not happening for nested.
> >
> > I agree. It's better not to use CPUID for that in my opinion.
>

> Among hypervisors, RHEL5's Xen is probably one of the oldest in actual
> use with new hardware and new kernels, and the CPUID bit has been fixed
> in 2011. Older versions wouldn't run new kernels due to other CPUID
> bits not being cleared properly in VMs.
>
> Is there real hardware that has the CPUID bit set and non-working
> pstate? If there's no such real hardware, CPUID is what the SDM says
> you should use to detect presence of the APERF/MPERF msrs.

OK

> Having extra safety checks is fine on top of what the SDM says, but IMO
> they should be WARN_ONs. Otherwise you are sweeping bugs under the rug
> just as much.

As I said I'm not against adding WARN_ON()s there. :-)

Paolo Bonzini

unread,

Jan 6, 2014, 6:30:02 AM1/6/14

to

Il 04/01/2014 22:38, Rafael J. Wysocki ha scritto:
> On Saturday, January 04, 2014 07:48:13 PM Gleb Natapov wrote:
>> On Sat, Jan 04, 2014 at 06:38:59PM +0100, Paolo Bonzini wrote:
>>> Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
>>>> Well, it's just a sanity check and it makes the problem go away for the reporter.
>>>>
>>>>>> Your patch is welcome but perhaps it should have a WARN_ON too.
>>>> It has been pulled in already, so the WARN_ON() can only be added via a separate
>>>> patch now. Would you like to prepare that patch?
>>>
>>> Yes, I'll add it together with the CPUID check. I'll send the patch so
>>> that it can get into 3.14.
>>>
>> CPUID check, while correct, will sweep the problem under the rug. Current
>> Linux logic should detect non working pstate in KVM. We should look into
>> why this is not happening for nested.
>
> I agree. It's better not to use CPUID for that in my opinion.

Among hypervisors, RHEL5's Xen is probably one of the oldest in actual
use with new hardware and new kernels, and the CPUID bit has been fixed
in 2011. Older versions wouldn't run new kernels due to other CPUID
bits not being cleared properly in VMs.

Is there real hardware that has the CPUID bit set and non-working
pstate? If there's no such real hardware, CPUID is what the SDM says
you should use to detect presence of the APERF/MPERF msrs.

Having extra safety checks is fine on top of what the SDM says, but IMO
they should be WARN_ONs. Otherwise you are sweeping bugs under the rug
just as much.

Paolo

Dirk Brandewie

unread,

Jan 6, 2014, 12:20:02 PM1/6/14

to

On 01/03/2014 02:06 PM, Paolo Bonzini wrote:
> Il 03/01/2014 21:00, Dirk Brandewie ha scritto:
>> + case MSR_IA32_MPERF:
>> + case MSR_IA32_APERF:
>

OK I will spin the patch to only add MSR_PLATFORM_INFO.

> These should never be accessed. A KVM VM will always have
> CPUID[06H].ECX = 0, and the Intel manual says that the MSRs are only
> present if CPUID returns that value with bit 0 set.
>
> I think the actual bug is that intel_pstate_init does not check the
> feature bits in CPUID (either manually or via x86_match_cpu).

I will add the feature check.

What are the differences between the first and the nested KVM's?
At load time intel_pstate checks that APERF and MPERF are incrementing
and that PLATFORM_INFO has some value. Somehow these checks pass
in the nested environment and we fall over when the CPU is being added
by cpufreq.

--Dirk

Dirk Brandewie

unread,

Jan 6, 2014, 1:50:02 PM1/6/14

to

The patch below adds a feature check for APERF/MPERF. With this patch
you should NOT see "Intel P-state driver initializing." in dmesg for KVM.

commit 4279f36818bd3ac42f077de114b17eb27d81d482
Author: Dirk Brandewie <dirk.j.b...@intel.com>
Date: Mon Jan 6 10:19:38 2014 -0800

intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.

KVM environments do not support APERF/MPERF MSRs. intel_pstate cannot
operate without these registers.

The previous validity checks in intel_pstate_msrs_not_valid() are
insufficent in nested KVMs.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1046317

Signed-off-by: Dirk Brandewie <dirk.j.b...@intel.com>
---

drivers/cpufreq/intel_pstate.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 0f63f5d..fe91dad 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -619,7 +619,8 @@ static void intel_pstate_timer_func(unsigned long __data)
}

#define ICPU(model, policy) \
- { X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, (unsigned long)&policy }
+ { X86_VENDOR_INTEL, 6, model, X86_FEATURE_APERFMPERF,\
+ (unsigned long)&policy }

static const struct x86_cpu_id intel_pstate_cpu_ids[] = {
ICPU(0x2a, core_params),

Gleb Natapov

unread,

Jan 7, 2014, 11:20:01 AM1/7/14

to

On Mon, Jan 06, 2014 at 09:18:44AM -0800, Dirk Brandewie wrote:
> On 01/03/2014 02:06 PM, Paolo Bonzini wrote:
> >Il 03/01/2014 21:00, Dirk Brandewie ha scritto:
> >>+ case MSR_IA32_MPERF:
> >>+ case MSR_IA32_APERF:
> >
> OK I will spin the patch to only add MSR_PLATFORM_INFO.
>
> >These should never be accessed. A KVM VM will always have
> >CPUID[06H].ECX = 0, and the Intel manual says that the MSRs are only
> >present if CPUID returns that value with bit 0 set.
> >
> >I think the actual bug is that intel_pstate_init does not check the
> >feature bits in CPUID (either manually or via x86_match_cpu).
>
> I will add the feature check.
>
> What are the differences between the first and the nested KVM's?

There shouldn't be any. There is a bug in nested emulation probably.

> At load time intel_pstate checks that APERF and MPERF are incrementing
> and that PLATFORM_INFO has some value. Somehow these checks pass
> in the nested environment and we fall over when the CPU is being added
> by cpufreq.
>

KVM does not emulate either of those and inject #GP if one is accessed. Linux
catches those #GPs and fixs up rdmsr to return zero. It would be interesting to
see ftrace for nested kvm run.

--
Gleb.