Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] Option to disable AMD C1E (allows dynticks to work)

267 views
Skip to first unread message

Eduard-Gabriel Munteanu

unread,
Dec 13, 2007, 3:00:31 PM12/13/07
to
Some multiprocessor 64-bit AMD systems don't allow the user to disable
the C1E C-state. The kernel detects C1E and marks the LAPIC as
broken, thereby disabling dynticks. This patch adds an option to
disable C1E when detected. It also allows the user to enable this
processor feature even if that means disabling dynticks, which is
useful in case C1E might provide better power savings (e.g.: C-states
beyond C1 don't work). Tested on a Turion X2 TL-56 laptop. Thanks to
Mikhail Kshevetskiy and FreeBSD for pointing out the relevant AMD docs.

Signed-off-by: Eduard-Gabriel Munteanu <eduard....@linux360.ro>

---
Documentation/kernel-parameters.txt | 6 ++++++
arch/x86/Kconfig | 16 ++++++++++++++++
arch/x86/kernel/setup_64.c | 27 +++++++++++++++++++++++++--
3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 33121d6..0deee7a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -643,6 +643,12 @@ and is between 256 and 4096 characters. It is defined in the file
floppy= [HW]
See Documentation/floppy.txt.

+ force_amd_c1e [KNL,SMP,HW,BUGS=X86-64]
+ Don't disable C1E on AMD systems even if this means
+ disabling nohz. This is _not_ automatically implied by
+ any other parameters, such as "nohz=off".
+ Depends on CONFIG_X86_AMD_C1E_WORKAROUND.
+
gamecon.map[2|3]=
[HW,JOY] Multisystem joystick and NES/SNES/PSX pad
support via parallel port (up to 5 devices per port)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 368864d..8b9bb49 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -198,6 +198,22 @@ config SMP

If you don't know what to do here, say N.

+config X86_AMD_C1E_WORKAROUND
+ bool "Disable C1E on AMD systems to make dynticks work"
+ default y
+ depends on X86_64 && SMP && NO_HZ
+ ---help---
+ On some systems, the C1E C-state is enabled by default and cannot be
+ disabled from the CMOS setup. Local APICs don't behave as they should
+ in this case. If you say Y here, C1E will be disabled to allow
+ dynamic ticks to work. It's safe to enable this option even if
+ your system doesn't have an AMD CPU (there are no side-effects if
+ such a CPU isn't detected).
+
+ You can pass the "force_amd_c1e" boot parameter to the kernel to
+ disable this workaround without recompiling.
+ See Documentation/kernel-parameters.txt for more details.
+
choice
prompt "Subarchitecture Type"
default X86_PC
diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c
index 30d94d1..15556a0 100644
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -583,6 +583,17 @@ static void __init amd_detect_cmp(struct cpuinfo_x86 *c)
#endif
}

+#ifdef CONFIG_X86_AMD_C1E_WORKAROUND
+static int __cpuinit disable_amd_c1e = 1;
+
+static int __cpuinit force_amd_c1e(char *str) {
+ disable_amd_c1e = 0;
+ return 1;
+}
+
+__setup("force_amd_c1e", force_amd_c1e);
+#endif /* CONFIG_X86_AMD_C1E_WORKAROUND */
+
#define ENABLE_C1E_MASK 0x18000000
#define CPUID_PROCESSOR_SIGNATURE 1
#define CPUID_XFAM 0x0ff00000
@@ -597,6 +608,7 @@ static __cpuinit int amd_apic_timer_broken(void)
{
u32 lo, hi;
u32 eax = cpuid_eax(CPUID_PROCESSOR_SIGNATURE);
+
switch (eax & CPUID_XFAM) {
case CPUID_XFAM_K8:
if ((eax & CPUID_XMOD) < CPUID_XMOD_REV_F)
@@ -604,8 +616,19 @@ static __cpuinit int amd_apic_timer_broken(void)
case CPUID_XFAM_10H:
case CPUID_XFAM_11H:
rdmsr(MSR_K8_ENABLE_C1E, lo, hi);
- if (lo & ENABLE_C1E_MASK)
- return 1;
+#ifdef CONFIG_X86_AMD_C1E_WORKAROUND
+ if ((lo & ENABLE_C1E_MASK) && disable_amd_c1e) {
+ printk(KERN_INFO "Disabling AMD C1E on CPU %d\n",
+ smp_processor_id());
+ /*
+ * See AMD's "BIOS and Kernel Developer's Guide for AMD
+ * NPT Family 0Fh Processors", publication #32559,
+ * for details.
+ */
+ wrmsr(MSR_K8_ENABLE_C1E, lo & ~ENABLE_C1E_MASK, hi);
+ } else
+#endif /* CONFIG_X86_AMD_C1E_WORKAROUND */
+ if (lo & ENABLE_C1E_MASK) return 1;
break;
default:
/* err on the side of caution */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Eduard-Gabriel Munteanu

unread,
Dec 13, 2007, 3:10:22 PM12/13/07
to

Andi Kleen

unread,
Dec 13, 2007, 5:40:11 PM12/13/07
to
Eduard-Gabriel Munteanu <eduard....@linux360.ro> writes:
>
> + force_amd_c1e [KNL,SMP,HW,BUGS=X86-64]
> + Don't disable C1E on AMD systems even if this means

The description/option is not correct. The mainline kernel never disables C1e.
Some distribution kernels and Xen do, perhaps you're confusing this
with them.

You would rather need a "force_disable_c1e" option if anything.

Anyways this should be near all obsolete with forced HPET. With HPET
dynticks can be used even with C1e. So in most cases you can just
use hpet=force instead and get dynticks and C1e together.

-Andi

Johannes Weiner

unread,
Dec 13, 2007, 6:10:16 PM12/13/07
to
Hi,

Eduard-Gabriel Munteanu <eduard....@linux360.ro> writes:

I would find this way more readable:

if (lo & ENABLE_C1E_MASK) {
#ifdef CONFIG_X86_AMD_C1E_WORKAROUND
if (disable_amd_c1e) {
...
#else
return 1;
#endif
}

Why does it require to be enabled by compile-time AND run-time? Is that
something you might want to decide on every boot? Could we make it
settable at boot-time xor at compile-time?

Hannes

Mikhail Kshevetskiy

unread,
Dec 13, 2007, 7:20:07 PM12/13/07
to
Hello Eduard-Gabriel,

You write:

> + dynamic ticks to work. It's safe to enable this option even if
> + your system doesn't have an AMD CPU (there are no side-effects if
> + such a CPU isn't detected).

It's definitely not safe. There are the computers with totally broken
BIOSes, they setup C1e flags for one core only. If you try to disable
C1e on such machine, you most likely kill the kernel.

From my point of view, you patch should work in both directions, i.e.
1) user should be allowed to force disable C1e for all cores.
2) user should be allowed to force enable C1e for both cores.

and if the user don't use this flag, the C1e state should be untouched.

Mikhail Kshevetskiy

Eduard-Gabriel Munteanu

unread,
Dec 14, 2007, 4:50:09 AM12/14/07
to
On Thu, 13 Dec 2007 23:33:07 +0100
Andi Kleen <an...@firstfloor.org> wrote:

> The description/option is not correct. The mainline kernel never
> disables C1e. Some distribution kernels and Xen do, perhaps you're
> confusing this with them.
>
> You would rather need a "force_disable_c1e" option if anything.

The option I added (which is set to Y by default, but that's another
matter) disables C1E without any other kernel parameter. In my opinion,
this should be the normal behavior: the kernel has both SMP and NO_HZ
enabled, so do whatever is necessary to enable dynticks. But it would
also be useful, for benchmarking purposes, to prevent the kernel from
disabling C1E using a kernel parameter; that's what force_amd_c1e does.

> Anyways this should be near all obsolete with forced HPET. With HPET
> dynticks can be used even with C1e. So in most cases you can just
> use hpet=force instead and get dynticks and C1e together.

On my system, hpet=force does not enable dynticks:
$ dmesg | grep -E "(not functional|hpet)"
Command line: ro hpet=force
Kernel command line: ro hpet=force
hpet clockevent registered
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
hpet0: 3 32-bit timers, 25000000 Hz
Time: hpet clocksource has been installed.
Clockevents: could not switch to one-shot mode: lapic is not functional.
Clockevents: could not switch to one-shot mode: lapic is not functional.
hpet_resources: 0xfed00000 is busy

But, using this patch, the kernel enables dynticks.

Eduard-Gabriel Munteanu

unread,
Dec 14, 2007, 5:00:11 AM12/14/07
to
On Thu, 13 Dec 2007 23:58:50 +0100
Johannes Weiner <hannes...@saeurebad.de> wrote:
> I would find this way more readable:
>
> if (lo & ENABLE_C1E_MASK) {
> #ifdef CONFIG_X86_AMD_C1E_WORKAROUND
> if (disable_amd_c1e) {
> ...
> #else
> return 1;
> #endif
> }

I wanted to avoid using too many tabs, as that would require the lines
inside the if block to be split too many times. I'll resubmit the patch
if you think it's necessary.

> Why does it require to be enabled by compile-time AND run-time? Is
> that something you might want to decide on every boot? Could we make
> it settable at boot-time xor at compile-time?

Making it settable only at boot-time means compiling in unnecessary
code. Also, making it settable only at compile-time would be bad for
distros that distribute binary kernels, as some users may experience
worse power savings with dynticks than with C1E (especially when C2, C3
and higher C-states are not available, due to brain-dead firmware).
This allows them to opt for leaving C1E enabled.

It is not required to enable it at both compile-time and boot-time. You
enable it at compile-time and then you _can disable_ it at boot-time.

Andi Kleen

unread,
Dec 14, 2007, 5:20:06 AM12/14/07
to
>so do whatever is necessary to enable dynticks.

dynticks' main purpose is to save power, but C1e saves more power.
Disabling C1e for dynticks would be a fairly useless default
trade off.

-Andi

Eduard-Gabriel Munteanu

unread,
Dec 14, 2007, 5:50:10 AM12/14/07
to
On Fri, 14 Dec 2007 11:17:21 +0100
Andi Kleen <an...@firstfloor.org> wrote:

> >so do whatever is necessary to enable dynticks.
>
> dynticks' main purpose is to save power, but C1e saves more power.
> Disabling C1e for dynticks would be a fairly useless default
> trade off.

I see. Also, AMD specs say that either higher C-states are enabled, or
C1E, but not both at the same time. So if the BIOS doesn't offer an
option to disable C1E, it can't provide _CST or P_LVL* ACPI
objects for higher C-states.

Dynticks also provides lower latencies as far as I know. I think this
workaround should be merged, even if as a non-default config option.

I'll resubmit a patch if you agree.

(By the way, sorry for messing up the "Cc:" fields in the previous
message.)

Andi Kleen

unread,
Dec 14, 2007, 7:30:17 AM12/14/07
to
On Fri, Dec 14, 2007 at 03:41:06PM +0200, Eduard-Gabriel Munteanu wrote:
> On Fri, 14 Dec 2007 11:17:21 +0100
> Andi Kleen <an...@firstfloor.org> wrote:
>
> > >so do whatever is necessary to enable dynticks.
> >
> > dynticks' main purpose is to save power, but C1e saves more power.
> > Disabling C1e for dynticks would be a fairly useless default
> > trade off.
>
> I see. Also, AMD specs say that either higher C-states are enabled, or
> C1E, but not both at the same time. So if the BIOS doesn't offer an

AMD doesn't support states deeper than C1 on multi core currently, so
in general they don't matter much right now.

> Dynticks also provides lower latencies as far as I know.

The better solution there is to use HPET instead. Newer systems
generally have HPET already enabled in the BIOS and for older systems
hpet=force gains more and more support. So try that.

-Andi

Eduard-Gabriel Munteanu

unread,
Dec 14, 2007, 8:10:07 AM12/14/07
to
On Fri, 14 Dec 2007 13:20:48 +0100
Andi Kleen <an...@firstfloor.org> wrote:

> AMD doesn't support states deeper than C1 on multi core currently, so
> in general they don't matter much right now.

Thanks for the info, I wasn't aware of this.

> The better solution there is to use HPET instead. Newer systems
> generally have HPET already enabled in the BIOS and for older systems
> hpet=force gains more and more support. So try that.

Dynticks won't use the HPET, even if enabled. IIRC, HPET is enabled on
my system (NVIDIA MCP51) even without "hpet=force". Here's dmesg's
output on Linux 2.6.24-rc5:

$ dmesg | grep -Ei "(lapic|hpet|disabling)"Command line: ro hpet=force
ACPI: HPET 3FFA9730, 0038 (r1 A M I OEMHPET0 2000727 MSFT 97)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
ACPI: HPET id: 0x10de8201 base: 0xfed00000


Kernel command line: ro hpet=force
hpet clockevent registered

TSC calibrated against HPET
Disabling APIC timer


hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31
hpet0: 3 32-bit timers, 25000000 Hz
Time: hpet clocksource has been installed.
Clockevents: could not switch to one-shot mode: lapic is not functional.
Clockevents: could not switch to one-shot mode: lapic is not functional.

Unpacking initramfs...<6>Clockevents: could not switch to one-shot mode:<6>Clockevents: could not switch to one-shot mode: lapic is not functional.


lapic is not functional.
hpet_resources: 0xfed00000 is busy

LAPIC is seemingly disabled (C1E detection code does this), but
clockevents still tries to use it, instead of relying on HPET. I'll
look into this, but please give me a heads up if you know more about
what's happening. Looks like fixing this is better than using LAPIC for
dynticks (and disabling C1E) on such systems.

Thomas Gleixner

unread,
Dec 14, 2007, 1:20:09 PM12/14/07
to
On Fri, 14 Dec 2007, Eduard-Gabriel Munteanu wrote:

> LAPIC is seemingly disabled (C1E detection code does this), but
> clockevents still tries to use it, instead of relying on HPET.

It relies on HPET. The LAPIC is just used as a mechanism which allows
us to broadcast the tick to both cores.

> I'll look into this, but please give me a heads up if you know more
> about what's happening. Looks like fixing this is better than using
> LAPIC for dynticks (and disabling C1E) on such systems.

Yes, it's definitely worth fixing, but it's not trivial:

For Highres/dyntick we need per CPU clock event devices to avoid
serialization and broadcasting overhead in the normal operation mode.

The LAPIC timer is per CPU, fast and the best thing we have, except
for C1E enabled AMD systems.

The perfect solution for those systems would be to use the HPET
channels seperately as per CPU clock event devices. Venki tried this
some time ago, but it's hard to resolve simply because none of the
BIOSes gives us an idea to which interrupts we can route the HPET
channel interrupts. The only choice we have is to use the legacy
interrupts, which gives us the headache of emulating the RTC via the
HPET and some other stupid legacy issues.

We can not utilize the broadcast mechanism of the cpuidle code because
we do not have an idea that we are going into C1E as it is done
magically in the SMM code. To work around this is we would need to add
the broadcast notification to the halt(), safe_halt(), pm_idle_halt()
variants which float around in the kernel and make this conditional on
the C1E detection. That's nasty, but it seems the only solution for
now.

Thanks,

tglx

Andi Kleen

unread,
Dec 14, 2007, 1:20:10 PM12/14/07
to
> LAPIC is seemingly disabled (C1E detection code does this), but

It should only disable the LAPIC timer, but not the full use of the
LAPIC.

-Andi

Andi Kleen

unread,
Dec 14, 2007, 1:20:11 PM12/14/07
to
> magically in the SMM code. To work around this is we would need to add
> the broadcast notification to the halt(), safe_halt(), pm_idle_halt()
> variants which float around in the kernel and make this conditional on
> the C1E detection. That's nasty, but it seems the only solution for
> now.

On 64bit it would be easy using the idle notifiers. Perhaps they need
to be extended to pass in the sleep state though.

-Andi

Thomas Gleixner

unread,
Dec 14, 2007, 3:00:20 PM12/14/07
to
On Fri, 14 Dec 2007, Andi Kleen wrote:

> > LAPIC is seemingly disabled (C1E detection code does this), but
>
> It should only disable the LAPIC timer, but not the full use of the
> LAPIC.

That's what it does. The LAPIC timer is invalidated and registered as
a per CPU broadcast dummy source (CLOCK_EVT_FEAT_DUMMY).

Thanks,

tglx

Eduard-Gabriel Munteanu

unread,
Dec 14, 2007, 3:00:23 PM12/14/07
to

Thanks to both of you for shedding some light on this matter. I'll look
into HPET-related efforts; it looks like a better solution than my
patch.

Thomas Gleixner

unread,
Dec 14, 2007, 3:00:24 PM12/14/07
to
On Fri, 14 Dec 2007, Andi Kleen wrote:

> > magically in the SMM code. To work around this is we would need to add
> > the broadcast notification to the halt(), safe_halt(), pm_idle_halt()
> > variants which float around in the kernel and make this conditional on
> > the C1E detection. That's nasty, but it seems the only solution for
> > now.
>
> On 64bit it would be easy using the idle notifiers. Perhaps they need
> to be extended to pass in the sleep state though.

Well, that would interfere with the acpi-idle code.

Anyway the idle notifiers is a pretty artificial interface which is on
my get rid of it list anyway.

Thanks,

tglx

Message has been deleted

Chuck Ebbert

unread,
Dec 14, 2007, 5:50:13 PM12/14/07
to
On 12/14/2007 05:17 AM, Andi Kleen wrote:
>> so do whatever is necessary to enable dynticks.
>
> dynticks' main purpose is to save power, but C1e saves more power.
> Disabling C1e for dynticks would be a fairly useless default
> trade off.
>

What about machines where the BIOS has disabled C1e on CPU 0 but
left it enabled on CPU 1 ??

Eduard-Gabriel Munteanu

unread,
Dec 14, 2007, 7:20:07 PM12/14/07
to
On Fri, 14 Dec 2007 17:35:13 -0500
Chuck Ebbert <ceb...@redhat.com> wrote:

> On 12/14/2007 05:17 AM, Andi Kleen wrote:
> >> so do whatever is necessary to enable dynticks.
> >
> > dynticks' main purpose is to save power, but C1e saves more power.
> > Disabling C1e for dynticks would be a fairly useless default
> > trade off.
> >
>
> What about machines where the BIOS has disabled C1e on CPU 0 but
> left it enabled on CPU 1 ??

Do you mean Linux should enable C1E on CPU 0 if it's detected on CPU 1?
C3 + dynticks make up a better power saver than simply C1E, as far as I
know. Higher C-states should be enabled on such CPUs, as AMD docs say
firmware should either enable C1E or C2 & C3 (it must provide one of
these mutually exclusive options). I take having C1E on the second CPU
but not the first as an attempt on BIOS's part to provide higher
C-states instead of the former. How broken is it, really?

But maybe someone with access to such hardware can tell us what
happens: does he get C2/C3 power states under such circumstances?

Andi Kleen

unread,
Dec 15, 2007, 7:50:11 AM12/15/07
to
On Fri, Dec 14, 2007 at 05:35:13PM -0500, Chuck Ebbert wrote:
> On 12/14/2007 05:17 AM, Andi Kleen wrote:
> >> so do whatever is necessary to enable dynticks.
> >
> > dynticks' main purpose is to save power, but C1e saves more power.
> > Disabling C1e for dynticks would be a fairly useless default
> > trade off.
> >
>
> What about machines where the BIOS has disabled C1e on CPU 0 but
> left it enabled on CPU 1 ??

It's a BIOS bug. We handle this and threat it like C1e always enabled.

The right fix would be to enable it on both CPUs, but
then that's not strictly needed for correct operation and I'm not
sure it would be worth special fixup code in Linux. Probably not.

-Andi

Eduard-Gabriel Munteanu

unread,
Dec 17, 2007, 7:50:11 AM12/17/07
to
On Sun, 16 Dec 2007 06:29:57 -0800 (PST)
nihi...@gmail.com wrote:

> On Dec 15, 2:20 am, Eduard-Gabriel Munteanu


> <eduard.munte...@linux360.ro> wrote:
> > On Fri, 14 Dec 2007 17:35:13 -0500
> >

> > But maybe someone with access to such hardware can tell us what
> > happens: does he get C2/C3 power states under such circumstances?
>

> Hi,
>
> I have a nx6325 laptop with Turion 64 X2 CPU. So far I have had no
> success
> in getting NO_HZ to work on my box.
>
> I am a C coder, but have no experience in kernel hacking. However, I
> _really_ would
> like to help resolve this issue.
>
> It would be nice if You could give me any information on what/where to
> look for. Also,
> I would be happy to test any patches for this specific problem.
>
> Thanks,
> Cubris Zukilla, nihi...@gmail.com

Hi,

The patch I submitted gets NOHZ working, but it doesn't improve power
savings. The power usage actually increases by 1 watt or so on my
laptop, because there are no other C-states higher than C1 available
and C1E needs to be disabled.

I wasn't asking for testers for this patch. Instead, my question was
related to those laptops that disable C1E on the first CPU, but not on
the second. Does the BIOS offer an option to disable C1E? Does the BIOS
fill higher C-states? Andi Kleen provided an answer to my question,
saying that AMD doesn't currently support higher C-states on multi-core
CPUs, but I was wondering whether this was an attempt on the laptop
vendor's part to provide this functionality or just an ugly bug.

There isn't much you could do. The problem is rooted quite deeply
inside Linux and I don't think this would be a beginner's task.
Moreover, there are lots of hardware quirks one must account for.

Pavel Machek

unread,
Dec 18, 2007, 10:50:09 AM12/18/07
to
On Fri 2007-12-14 00:47:10, Eduard-Gabriel Munteanu wrote:
> Some multiprocessor 64-bit AMD systems don't allow the user to disable
> the C1E C-state. The kernel detects C1E and marks the LAPIC as
> broken, thereby disabling dynticks. This patch adds an option to
> disable C1E when detected. It also allows the user to enable this
> processor feature even if that means disabling dynticks, which is
> useful in case C1E might provide better power savings (e.g.: C-states
> beyond C1 don't work). Tested on a Turion X2 TL-56 laptop. Thanks to
> Mikhail Kshevetskiy and FreeBSD for pointing out the relevant AMD docs.
...

> +static int __cpuinit force_amd_c1e(char *str) {

{ on new line, please.

> + disable_amd_c1e = 0;
> + return 1;
> +}

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Eduard-Gabriel Munteanu

unread,
Dec 18, 2007, 12:20:10 PM12/18/07
to
On Tue, 18 Dec 2007 15:43:52 +0000
Pavel Machek <pa...@ucw.cz> wrote:

> On Fri 2007-12-14 00:47:10, Eduard-Gabriel Munteanu wrote:
>
> > +static int __cpuinit force_amd_c1e(char *str) {
>
> { on new line, please.
>
> > + disable_amd_c1e = 0;
> > + return 1;
> > +}

Sorry, I know it's a bad habit. I'll be more careful next time. Thanks
for pointing it out, though.

Anyway, there's no point in resubmitting a corrected patch, as we
already discussed that this approach isn't helpful.

0 new messages