linux-next: build failure after merge of the tip tree

Stephen Rothwell

unread,

Jan 13, 2014, 10:30:02 PM1/13/14

to

Hi all,

After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

drivers/cpufreq/speedstep-lib.c: In function 'speedstep_get_freqs':
drivers/cpufreq/speedstep-lib.c:467:2: error: implicit declaration of function 'preempt_check_resched' [-Werror=implicit-function-declaration]
preempt_check_resched();
^

Caused by commit 62b94a08da1b ("sched/preempt: Take away
preempt_enable_no_resched() from modules") interacting with commit
24e1937b2386 ("speedstep-smi: enable interrupts when waiting") from the
pm tree.

For now, I have reverted that pm tree commit. Please sort this out.
--
Cheers,
Stephen Rothwell s...@canb.auug.org.au

Peter Zijlstra

unread,

Jan 14, 2014, 9:20:01 AM1/14/14

to

On Tue, Jan 14, 2014 at 02:26:27PM +1100, Stephen Rothwell wrote:
> Hi all,
>
> After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
> failed like this:
>
> drivers/cpufreq/speedstep-lib.c: In function 'speedstep_get_freqs':
> drivers/cpufreq/speedstep-lib.c:467:2: error: implicit declaration of function 'preempt_check_resched' [-Werror=implicit-function-declaration]
> preempt_check_resched();
> ^
>
> Caused by commit 62b94a08da1b ("sched/preempt: Take away
> preempt_enable_no_resched() from modules") interacting with commit
> 24e1937b2386 ("speedstep-smi: enable interrupts when waiting") from the
> pm tree.

I think that pm commit is a very good example of why the sched/preempt
patch is a very good idea.

Also that Changelog fails to explain why enabling interrupts helps. What
interrupt is required for progress, and how does it make the progress
happen.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Mikulas Patocka

unread,

Jan 14, 2014, 2:10:01 PM1/14/14

to

On Tue, 14 Jan 2014, Peter Zijlstra wrote:

> On Tue, Jan 14, 2014 at 02:26:27PM +1100, Stephen Rothwell wrote:
> > Hi all,
> >
> > After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
> > failed like this:
> >
> > drivers/cpufreq/speedstep-lib.c: In function 'speedstep_get_freqs':
> > drivers/cpufreq/speedstep-lib.c:467:2: error: implicit declaration of function 'preempt_check_resched' [-Werror=implicit-function-declaration]
> > preempt_check_resched();
> > ^
> >
> > Caused by commit 62b94a08da1b ("sched/preempt: Take away
> > preempt_enable_no_resched() from modules") interacting with commit
> > 24e1937b2386 ("speedstep-smi: enable interrupts when waiting") from the
> > pm tree.

Try adding #include <linux/preempt.h> to speedstep-lib.c. Does it help?

> I think that pm commit is a very good example of why the sched/preempt
> patch is a very good idea.
>
> Also that Changelog fails to explain why enabling interrupts helps. What
> interrupt is required for progress, and how does it make the progress
> happen.

There is no explanation. It's hardware issue and I have no documentation
for the hardware.

The general problem is that if there are bus-master transfers running (or
possibly for other hardware reasons), the CPU refuses to change frequency.
You can wait a little bit and retry and maybe you succeed changing the
frequency next time.

If you enable interrupts, wait, disable interrupts and retry, you may
succeed. If you keep interrupts disabled and retry, you never succeed, no
matter how long do you wait. I found it experimentally, I don't know
reason for that.

Mikulas

Peter Zijlstra

unread,

Jan 14, 2014, 3:10:02 PM1/14/14

to

On Tue, Jan 14, 2014 at 02:06:57PM -0500, Mikulas Patocka wrote:
> On Tue, 14 Jan 2014, Peter Zijlstra wrote:
> > > Caused by commit 62b94a08da1b ("sched/preempt: Take away
> > > preempt_enable_no_resched() from modules")

Read these two lines, then note that:

> Try adding #include <linux/preempt.h> to speedstep-lib.c. Does it help?

this obviously will not work as preempt_check_resched() and
preempt_enable_no_resched() are no longer available to modules.

> > I think that pm commit is a very good example of why the sched/preempt
> > patch is a very good idea.
> >
> > Also that Changelog fails to explain why enabling interrupts helps. What
> > interrupt is required for progress, and how does it make the progress
> > happen.
>
> There is no explanation. It's hardware issue and I have no documentation
> for the hardware.

Rafael works for Intel, he ought to be able to figure out wtf the
hardware does/needs.

> The general problem is that if there are bus-master transfers running (or
> possibly for other hardware reasons), the CPU refuses to change frequency.
> You can wait a little bit and retry and maybe you succeed changing the
> frequency next time.
>
> If you enable interrupts, wait, disable interrupts and retry, you may
> succeed. If you keep interrupts disabled and retry, you never succeed, no
> matter how long do you wait. I found it experimentally, I don't know
> reason for that.

Sounds like magic goo..

In any case, try the below, it does the same but is far less horrid.

---
drivers/cpufreq/speedstep-smi.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/cpufreq/speedstep-smi.c b/drivers/cpufreq/speedstep-smi.c
index 0f5326d6f79f..57d31538c248 100644
--- a/drivers/cpufreq/speedstep-smi.c
+++ b/drivers/cpufreq/speedstep-smi.c
@@ -188,6 +188,7 @@ static void speedstep_set_state(unsigned int state)
return;

/* Disable IRQs */
+ preempt_disable();
local_irq_save(flags);

command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);
@@ -200,7 +201,9 @@ static void speedstep_set_state(unsigned int state)
if (retry) {
pr_debug("retry %u, previous result %u, waiting...\n",
retry, result);
+ local_irq_restore(flags);
mdelay(retry * 50);
+ local_irq_save(flags);
}
retry++;
__asm__ __volatile__(
@@ -217,6 +220,7 @@ static void speedstep_set_state(unsigned int state)

/* enable IRQs */
local_irq_restore(flags);
+ preempt_enable();

if (new_state == state)
pr_debug("change to %u MHz succeeded after %u tries "

Rafael J. Wysocki

unread,

Jan 14, 2014, 4:50:02 PM1/14/14

to

On Tuesday, January 14, 2014 04:43:43 PM Mikulas Patocka wrote:
>
> On Tue, 14 Jan 2014, Peter Zijlstra wrote:
>
> > On Tue, Jan 14, 2014 at 02:06:57PM -0500, Mikulas Patocka wrote:
> > > On Tue, 14 Jan 2014, Peter Zijlstra wrote:
> > > > > Caused by commit 62b94a08da1b ("sched/preempt: Take away
> > > > > preempt_enable_no_resched() from modules")
> >
> > Read these two lines, then note that:
> >
> > > Try adding #include <linux/preempt.h> to speedstep-lib.c. Does it help?
> >
> > this obviously will not work as preempt_check_resched() and
> > preempt_enable_no_resched() are no longer available to modules.
>

> I see, you added commit 62b94a08da1bae9d187d49dfcd6665af393750f8 to
> linux-next, that broke my patch.

> ^^^ this is wrong, because the function speedstep_set_state may already be
> called with interrupts disabled from speedstep_get_freqs. So, you need to
> enable interrupts unconditionally, even if they were disabled at the
> beginning of the function speedstep_set_state.
>
> I know it's dirty to enable interrupts in a function that was called with
> disabled interrupts, but here it must be so (you could rewrite
> speedstep_get_freqs to not disable interrupts if you want to avoid this
> dirtiness).

>
> > mdelay(retry * 50);
> > + local_irq_save(flags);
> > }
> > retry++;
> > __asm__ __volatile__(
> > @@ -217,6 +220,7 @@ static void speedstep_set_state(unsigned int state)
> >
> > /* enable IRQs */
> > local_irq_restore(flags);
> > + preempt_enable();
> >
> > if (new_state == state)
> > pr_debug("change to %u MHz succeeded after %u tries "
>

> You need also preempt_disable/enable in speedstep_get_freqs.
>
>
> Here I'm resending the patch, to account for
> 62b94a08da1bae9d187d49dfcd6665af393750f8.

Do I think correctly that this should work regardless of whether or not
62b94a08da1bae9d187d49dfcd6665af393750f8 is applied?

> From: Mikulas Patocka <mpat...@redhat.com>
>
> speedstep-smi: enable interrupts when waiting
>

> On Dell Latitude C600 laptop with Pentium 3 850MHz processor, the
> speedstep-smi driver sometimes loads and sometimes doesn't load with
> "change to state X failed" message.
>
> I found out that we need to enable interrupts while waiting. When we
> enable interrupts, the blockage that prevents frequency transition
> resolves and the transition is possible. With disabled interrupts, the
> blockage doesn't resolve (no matter how long do we wait).
>
> This patch enables interrupts in the function speedstep_set_state that can
> be called with disabled interrupts. However, this function is called with
> disabled interrupts only from speedstep_get_freqs, so it shouldn't cause
> any problem.
>
> Signed-off-by: Mikulas Patocka <mpat...@redhat.com>
>
> ---
> drivers/cpufreq/speedstep-lib.c | 3 +++
> drivers/cpufreq/speedstep-smi.c | 12 ++++++++++++
> 2 files changed, 15 insertions(+)
>
> Index: linux-next/drivers/cpufreq/speedstep-smi.c
> ===================================================================
> --- linux-next.orig/drivers/cpufreq/speedstep-smi.c 2014-01-14 22:26:59.000000000 +0100
> +++ linux-next/drivers/cpufreq/speedstep-smi.c 2014-01-14 22:35:11.000000000 +0100
> @@ -156,6 +156,7 @@ static void speedstep_set_state(unsigned

> return;
>
> /* Disable IRQs */
> + preempt_disable();
> local_irq_save(flags);
>
> command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);

> @@ -166,9 +167,19 @@ static void speedstep_set_state(unsigned
>
> do {
> if (retry) {
> + /*
> + * We need to enable interrupts, otherwise the blockage
> + * won't resolve.
> + *
> + * We disable preemption so that other processes don't
> + * run. If other processes were running, they could
> + * submit more DMA requests, making the blockage worse.
> + */

> pr_debug("retry %u, previous result %u, waiting...\n",
> retry, result);

> + local_irq_enable();
> mdelay(retry * 50);
> + local_irq_disable();
> }
> retry++;
> __asm__ __volatile__(
> @@ -185,6 +196,7 @@ static void speedstep_set_state(unsigned

>
> /* enable IRQs */
> local_irq_restore(flags);
> + preempt_enable();
>
> if (new_state == state)
> pr_debug("change to %u MHz succeeded after %u tries "

> Index: linux-next/drivers/cpufreq/speedstep-lib.c
> ===================================================================
> --- linux-next.orig/drivers/cpufreq/speedstep-lib.c 2014-01-14 22:29:07.000000000 +0100
> +++ linux-next/drivers/cpufreq/speedstep-lib.c 2014-01-14 22:31:04.000000000 +0100
> @@ -400,6 +400,7 @@ unsigned int speedstep_get_freqs(enum sp
>
> pr_debug("previous speed is %u\n", prev_speed);
>
> + preempt_disable();
> local_irq_save(flags);
>
> /* switch to low state */
> @@ -464,6 +465,8 @@ unsigned int speedstep_get_freqs(enum sp
>
> out:
> local_irq_restore(flags);
> + preempt_enable();
> +
> return ret;
> }
> EXPORT_SYMBOL_GPL(speedstep_get_freqs);
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in

> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Mikulas Patocka

unread,

Jan 14, 2014, 4:50:02 PM1/14/14

to

On Tue, 14 Jan 2014, Peter Zijlstra wrote:

> On Tue, Jan 14, 2014 at 02:06:57PM -0500, Mikulas Patocka wrote:
> > On Tue, 14 Jan 2014, Peter Zijlstra wrote:
> > > > Caused by commit 62b94a08da1b ("sched/preempt: Take away
> > > > preempt_enable_no_resched() from modules")
>
> Read these two lines, then note that:
>
> > Try adding #include <linux/preempt.h> to speedstep-lib.c. Does it help?
>
> this obviously will not work as preempt_check_resched() and
> preempt_enable_no_resched() are no longer available to modules.

I see, you added commit 62b94a08da1bae9d187d49dfcd6665af393750f8 to
linux-next, that broke my patch.

^^^ this is wrong, because the function speedstep_set_state may already be
called with interrupts disabled from speedstep_get_freqs. So, you need to
enable interrupts unconditionally, even if they were disabled at the
beginning of the function speedstep_set_state.

I know it's dirty to enable interrupts in a function that was called with
disabled interrupts, but here it must be so (you could rewrite
speedstep_get_freqs to not disable interrupts if you want to avoid this
dirtiness).

> mdelay(retry * 50);
> + local_irq_save(flags);
> }
> retry++;
> __asm__ __volatile__(
> @@ -217,6 +220,7 @@ static void speedstep_set_state(unsigned int state)
>
> /* enable IRQs */
> local_irq_restore(flags);
> + preempt_enable();
>
> if (new_state == state)
> pr_debug("change to %u MHz succeeded after %u tries "

You need also preempt_disable/enable in speedstep_get_freqs.

Here I'm resending the patch, to account for
62b94a08da1bae9d187d49dfcd6665af393750f8.

From: Mikulas Patocka <mpat...@redhat.com>

speedstep-smi: enable interrupts when waiting

On Dell Latitude C600 laptop with Pentium 3 850MHz processor, the
speedstep-smi driver sometimes loads and sometimes doesn't load with
"change to state X failed" message.

I found out that we need to enable interrupts while waiting. When we
enable interrupts, the blockage that prevents frequency transition
resolves and the transition is possible. With disabled interrupts, the
blockage doesn't resolve (no matter how long do we wait).

This patch enables interrupts in the function speedstep_set_state that can
be called with disabled interrupts. However, this function is called with
disabled interrupts only from speedstep_get_freqs, so it shouldn't cause
any problem.

Signed-off-by: Mikulas Patocka <mpat...@redhat.com>

---
drivers/cpufreq/speedstep-lib.c | 3 +++
drivers/cpufreq/speedstep-smi.c | 12 ++++++++++++
2 files changed, 15 insertions(+)

Index: linux-next/drivers/cpufreq/speedstep-smi.c
===================================================================
--- linux-next.orig/drivers/cpufreq/speedstep-smi.c 2014-01-14 22:26:59.000000000 +0100
+++ linux-next/drivers/cpufreq/speedstep-smi.c 2014-01-14 22:35:11.000000000 +0100
@@ -156,6 +156,7 @@ static void speedstep_set_state(unsigned

return;

/* Disable IRQs */
+ preempt_disable();
local_irq_save(flags);

command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);

@@ -166,9 +167,19 @@ static void speedstep_set_state(unsigned

do {
if (retry) {
+ /*
+ * We need to enable interrupts, otherwise the blockage
+ * won't resolve.
+ *
+ * We disable preemption so that other processes don't
+ * run. If other processes were running, they could
+ * submit more DMA requests, making the blockage worse.
+ */

pr_debug("retry %u, previous result %u, waiting...\n",
retry, result);

+ local_irq_enable();
mdelay(retry * 50);
+ local_irq_disable();
}
retry++;
__asm__ __volatile__(
@@ -185,6 +196,7 @@ static void speedstep_set_state(unsigned

/* enable IRQs */
local_irq_restore(flags);
+ preempt_enable();

if (new_state == state)
pr_debug("change to %u MHz succeeded after %u tries "

Index: linux-next/drivers/cpufreq/speedstep-lib.c
===================================================================
--- linux-next.orig/drivers/cpufreq/speedstep-lib.c 2014-01-14 22:29:07.000000000 +0100
+++ linux-next/drivers/cpufreq/speedstep-lib.c 2014-01-14 22:31:04.000000000 +0100
@@ -400,6 +400,7 @@ unsigned int speedstep_get_freqs(enum sp

pr_debug("previous speed is %u\n", prev_speed);

+ preempt_disable();
local_irq_save(flags);

/* switch to low state */
@@ -464,6 +465,8 @@ unsigned int speedstep_get_freqs(enum sp

out:
local_irq_restore(flags);
+ preempt_enable();
+
return ret;
}
EXPORT_SYMBOL_GPL(speedstep_get_freqs);

Mikulas Patocka

unread,

Jan 14, 2014, 5:00:03 PM1/14/14

to

Yes.

Mikulas

Rafael J. Wysocki

unread,

Jan 14, 2014, 5:10:02 PM1/14/14

to

OK

I'll replace your original patch with this version, then.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Stephen Rothwell

unread,

Jan 15, 2014, 11:00:02 PM1/15/14

to

Hi all,

After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
failed like this:

arch/x86/kernel/process.c: In function 'mwait_idle':
/scratch/sfr/next/arch/x86/kernel/process.c:434:3: error: implicit declaration of function '__monitor' [-Werror=implicit-function-declaration]
__monitor((void *)&current_thread_info()->flags, 0, 0);
^
arch/x86/kernel/process.c:437:4: error: implicit declaration of function '__sti_mwait' [-Werror=implicit-function-declaration]
__sti_mwait(0, 0);
^

Caused by commit 16824255394f ("x86, acpi, idle: Restructure the mwait
idle routines") interacting with commit 7760518cce95 ("x86 idle: restore
mwait_idle()") from the idle tree.

I am not sure how to fix this so I just reverted the idle tree commit for
now (since it reverted cleanly). Please let me know if there is a better
solution.

Peter Zijlstra

unread,

Jan 16, 2014, 7:20:02 AM1/16/14

to

On Tue, Jan 14, 2014 at 04:43:43PM -0500, Mikulas Patocka wrote:
> > @@ -200,7 +201,9 @@ static void speedstep_set_state(unsigned int state)
> > if (retry) {
> > pr_debug("retry %u, previous result %u, waiting...\n",
> > retry, result);
> > + local_irq_restore(flags);
>
> ^^^ this is wrong, because the function speedstep_set_state may already be
> called with interrupts disabled from speedstep_get_freqs. So, you need to
> enable interrupts unconditionally, even if they were disabled at the
> beginning of the function speedstep_set_state.
>
> I know it's dirty to enable interrupts in a function that was called with
> disabled interrupts, but here it must be so (you could rewrite
> speedstep_get_freqs to not disable interrupts if you want to avoid this
> dirtiness).

Egads; I think you had better, this is vile beyond reason.

> > mdelay(retry * 50);
> > + local_irq_save(flags);
> > }
> > retry++;
> > __asm__ __volatile__(
> > @@ -217,6 +220,7 @@ static void speedstep_set_state(unsigned int state)
> >
> > /* enable IRQs */
> > local_irq_restore(flags);
> > + preempt_enable();
> >
> > if (new_state == state)
> > pr_debug("change to %u MHz succeeded after %u tries "
>
> You need also preempt_disable/enable in speedstep_get_freqs.

Argh I see, this is really horrid.

Anyway, its Rafael's call, its his subsystem he gets to fix it when it
explodes.

/me shudders

Peter Zijlstra

unread,

Jan 16, 2014, 7:30:02 AM1/16/14

to

I think the below ought to work

---
arch/x86/kernel/process.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3fb8d95ab8b5..220df9b2f22a 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -28,6 +28,7 @@
#include <asm/fpu-internal.h>
#include <asm/debugreg.h>
#include <asm/nmi.h>
+#include <asm/mwait.h>

/*
* per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -398,6 +399,37 @@ static void amd_e400_idle(void)
default_idle();
}

+/*
+ * Intel Core2 and older machines prefer MWAIT over HALT for C1.
+ * We can't rely on cpuidle installing MWAIT, because it will not load
+ * on systems that support only C1 -- so the boot default must be MWAIT.
+ *
+ * Some AMD machines are the opposite, they depend on using HALT.
+ *
+ * So for default C1, which is used during boot until cpuidle loads,
+ * use MWAIT-C1 on Intel HW that has it, else use HALT.
+ */
+static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
+{
+ if (c->x86_vendor != X86_VENDOR_INTEL)
+ return 0;
+
+ if (!cpu_has(c, X86_FEATURE_MWAIT))
+ return 0;
+
+ return 1;
+}
+
+/*
+ * MONITOR/MWAIT with no hints, used for default default C1 state.
+ * This invokes MWAIT with interrutps enabled and no flags,
+ * which is backwards compatible with the original MWAIT implementation.
+ */
+static void mwait_idle(void)
+{
+ mwait_idle_with_hints(0, 0);
+}
+
void select_idle_routine(const struct cpuinfo_x86 *c)
{
#ifdef CONFIG_SMP
@@ -411,6 +443,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
/* E400: APIC timer interrupt does not wake up CPU from C1e */
pr_info("using AMD E400 aware idle routine\n");
x86_idle = amd_e400_idle;
+ } else if (prefer_mwait_c1_over_halt(c)) {
+ pr_info("using mwait in idle threads\n");
+ x86_idle = mwait_idle;
} else
x86_idle = default_idle;

Mikulas Patocka

unread,

Jan 16, 2014, 2:40:02 PM1/16/14

to

On Thu, 16 Jan 2014, Peter Zijlstra wrote:

> > > retry++;
> > > __asm__ __volatile__(
> > > @@ -217,6 +220,7 @@ static void speedstep_set_state(unsigned int state)
> > >
> > > /* enable IRQs */
> > > local_irq_restore(flags);
> > > + preempt_enable();
> > >
> > > if (new_state == state)
> > > pr_debug("change to %u MHz succeeded after %u tries "
> >
> > You need also preempt_disable/enable in speedstep_get_freqs.
>
> Argh I see, this is really horrid.
>
>
> Anyway, its Rafael's call, its his subsystem he gets to fix it when it
> explodes.
>
> /me shudders

speedstep_get_freqs disables the interrupts to measure the transition
latency. It is called from speedstep-ich.c (that requires the latency) and
from speedstep-smi.c (that passes NULL as a pointer to latency, because it
doesn't need it).

So, you could disable interrupts in speedstep_get_freqs only when the
transition_latency pointer is non-NULL.

I think speedstep_set_state doesn't need to disable interrupts at all -
all that it does is one "out" instruction, you don't need to synchronize
that "out" instruction with anything, so there is no need to disable
interrupts. Or - does some specification say that interrupts must be
disabled there?

I am sending this patch to clean up the mess to be applied on the top of
my previous patch.

Mikulas

From: Mikulas Patocka <mpat...@redhat.com>
Subject: speedstep: clean up interrupt disabling

This patch cleans up interrupt disabling in speedstep-smi and
speedstep-lib.

speedstep_get_freqs in speedstep-lib may be called from speedstep-smi or
speedstep-ich. When it is called from speedstep-ich, it needs to calculate
transition latency. When it is called from speedstep-smi, transition
latency doesn't have to be calculated.

The function speedstep_set_state in speedstep-smi needs to enable
interrupts. Previously it enabled interrupts even if it was called with
disabled interrupts, but it is dirty.

This patch changes speedstep_get_freqs so that it disables interrupts only
when it is called from speedstep-ich and when it is measuring the
transition latency. This avoids much of the code dirtiness.

Signed-off-by: Mikulas Patocka <mpat...@redhat.com>

---

drivers/cpufreq/speedstep-lib.c | 13 ++++++-------
drivers/cpufreq/speedstep-smi.c | 11 ++++-------
2 files changed, 10 insertions(+), 14 deletions(-)

Index: linux-3.4.75/drivers/cpufreq/speedstep-lib.c
===================================================================
--- linux-3.4.75.orig/drivers/cpufreq/speedstep-lib.c 2014-01-16 18:51:16.000000000 +0100
+++ linux-3.4.75/drivers/cpufreq/speedstep-lib.c 2014-01-16 18:51:22.000000000 +0100
@@ -400,9 +400,6 @@ unsigned int speedstep_get_freqs(enum sp

pr_debug("previous speed is %u\n", prev_speed);

- preempt_disable();
- local_irq_save(flags);
-

/* switch to low state */

set_state(SPEEDSTEP_LOW);
*low_speed = speedstep_get_frequency(processor);
@@ -414,15 +411,19 @@ unsigned int speedstep_get_freqs(enum sp
pr_debug("low speed is %u\n", *low_speed);

/* start latency measurement */
- if (transition_latency)
+ if (transition_latency) {
+ local_irq_save(flags);
do_gettimeofday(&tv1);
+ }

/* switch to high state */
set_state(SPEEDSTEP_HIGH);

/* end latency measurement */
- if (transition_latency)
+ if (transition_latency) {
do_gettimeofday(&tv2);
+ local_irq_restore(flags);
+ }

*high_speed = speedstep_get_frequency(processor);
if (!*high_speed) {
@@ -464,8 +465,6 @@ unsigned int speedstep_get_freqs(enum sp
}

out:
- local_irq_restore(flags);
- preempt_enable();

return ret;
}
Index: linux-3.4.75/drivers/cpufreq/speedstep-smi.c
===================================================================
--- linux-3.4.75.orig/drivers/cpufreq/speedstep-smi.c 2014-01-16 18:51:16.000000000 +0100
+++ linux-3.4.75/drivers/cpufreq/speedstep-smi.c 2014-01-16 18:51:22.000000000 +0100
@@ -180,16 +180,16 @@ static int speedstep_get_state(void)

static void speedstep_set_state(unsigned int state)

{
unsigned int result = 0, command, new_state, dummy;
- unsigned long flags;
unsigned int function = SET_SPEEDSTEP_STATE;
unsigned int retry = 0;

if (state > 0x1)
return;

- /* Disable IRQs */
+ WARN_ON_ONCE(irqs_disabled());
+
+ /* Disable preemption so that other processes don't run */
preempt_disable();
- local_irq_save(flags);

command = (smi_sig & 0xffffff00) | (smi_cmd & 0xff);

@@ -209,9 +209,7 @@ static void speedstep_set_state(unsigned
*/

pr_debug("retry %u, previous result %u, waiting...\n",
retry, result);

- local_irq_enable();
mdelay(retry * 50);
- local_irq_disable();
}
retry++;
__asm__ __volatile__(
@@ -226,8 +224,7 @@ static void speedstep_set_state(unsigned
);
} while ((new_state != state) && (retry <= SMI_TRIES));

- /* enable IRQs */
- local_irq_restore(flags);
+ /* enable preemption */

preempt_enable();

if (new_state == state)

Stephen Rothwell

unread,

Jan 16, 2014, 3:50:01 PM1/16/14

to

Hi Peter,

On Thu, 16 Jan 2014 13:19:55 +0100 Peter Zijlstra <pet...@infradead.org> wrote:
>
> I think the below ought to work

To be clear, all you did was replace the body of mwait_idle() with

mwait_idle_with_hints(0, 0);

(and the comment above it)? I need to apply in incremental patch in the
merge commit.

H. Peter Anvin

unread,

Jan 16, 2014, 4:00:01 PM1/16/14

to

On 01/16/2014 04:19 AM, Peter Zijlstra wrote:
> + * MONITOR/MWAIT with no hints, used for default default C1 state.

The default default?

-hpa

Peter Zijlstra

unread,

Jan 16, 2014, 5:30:02 PM1/16/14

to

On Fri, Jan 17, 2014 at 07:46:28AM +1100, Stephen Rothwell wrote:
> Hi Peter,
>
> On Thu, 16 Jan 2014 13:19:55 +0100 Peter Zijlstra <pet...@infradead.org> wrote:
> >
> > I think the below ought to work
>
> To be clear, all you did was replace the body of mwait_idle() with
>
> mwait_idle_with_hints(0, 0);

Pretty much, and add the asm/mwait.h include, otherwise you'll end up
with a compile fail.

> (and the comment above it)? I need to apply in incremental patch in the
> merge commit.

I don't think I touched the comment at all.

Stephen Rothwell

unread,

Jan 16, 2014, 5:40:02 PM1/16/14

to

Hi Peter,

On Thu, 16 Jan 2014 23:25:36 +0100 Peter Zijlstra <pet...@infradead.org> wrote:
>
> On Fri, Jan 17, 2014 at 07:46:28AM +1100, Stephen Rothwell wrote:
> >
> > On Thu, 16 Jan 2014 13:19:55 +0100 Peter Zijlstra <pet...@infradead.org> wrote:
> > >
> > > I think the below ought to work
> >
> > To be clear, all you did was replace the body of mwait_idle() with
> >
> > mwait_idle_with_hints(0, 0);
>
> Pretty much, and add the asm/mwait.h include, otherwise you'll end up
> with a compile fail.
>
> > (and the comment above it)? I need to apply in incremental patch in the
> > merge commit.
>
> I don't think I touched the comment at all.

Thanks.

H. Peter Anvin

unread,

Jan 16, 2014, 6:00:02 PM1/16/14

to

In retrospect this bit probably should have gone through the idle
tree. That was my bad, I need to coordinate with Len better.

-hpa

Rafael J. Wysocki

unread,

Jan 16, 2014, 8:10:02 PM1/16/14

to

Well, I really don't appreciate sending me stuff like this two days before a
merge window. So I've dropped the speedstep_smi patch and please send me
something I can queue up for 3.15 without making Peter shudder.

Thanks!

I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

Stephen Rothwell

unread,

Jan 16, 2014, 10:50:02 PM1/16/14

to

Hi all,

On Thu, 16 Jan 2014 14:51:08 -0800 "H. Peter Anvin" <h...@zytor.com> wrote:
>
> On 01/16/2014 02:34 PM, Stephen Rothwell wrote:
> >
> > On Thu, 16 Jan 2014 23:25:36 +0100 Peter Zijlstra
> > <pet...@infradead.org> wrote:
> >>
> >> On Fri, Jan 17, 2014 at 07:46:28AM +1100, Stephen Rothwell
> >> wrote:
> >>>
> >>> On Thu, 16 Jan 2014 13:19:55 +0100 Peter Zijlstra
> >>> <pet...@infradead.org> wrote:
> >>>>
> >>>> I think the below ought to work
> >>>
> >>> To be clear, all you did was replace the body of mwait_idle()
> >>> with
> >>>
> >>> mwait_idle_with_hints(0, 0);
> >>
> >> Pretty much, and add the asm/mwait.h include, otherwise you'll
> >> end up with a compile fail.
> >>
> >>> (and the comment above it)? I need to apply in incremental
> >>> patch in the merge commit.
> >>
> >> I don't think I touched the comment at all.
> >
>
> In retrospect this bit probably should have gone through the idle
> tree. That was my bad, I need to coordinate with Len better.

So this is what I added as a merge fix patch. Someone just needs to make
sure Linus gets this when the latter of the tow trees gets merged.

From: Stephen Rothwell <s...@canb.auug.org.au>
Date: Fri, 17 Jan 2014 14:42:06 +1100
Subject: [PATCH] x86 idle: mwait_idle merge update

Signed-off-by: Stephen Rothwell <s...@canb.auug.org.au>
---
arch/x86/kernel/process.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index db471a87fdd8..4da840f01561 100644

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -28,6 +28,7 @@
#include <asm/fpu-internal.h>
#include <asm/debugreg.h>
#include <asm/nmi.h>
+#include <asm/mwait.h>

/*
* per-CPU TSS segments. Threads are completely 'soft' on Linux,

@@ -427,18 +428,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)

static void mwait_idle(void)
{
- if (!need_resched()) {
- if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))
- clflush((void *)&current_thread_info()->flags);
-
- __monitor((void *)&current_thread_info()->flags, 0, 0);
- smp_mb();
- if (!need_resched())
- __sti_mwait(0, 0);
- else
- local_irq_enable();
- } else
- local_irq_enable();
+ mwait_idle_with_hints(0, 0);

}

void select_idle_routine(const struct cpuinfo_x86 *c)

--
1.8.5.2

Mike Galbraith

unread,

Jan 18, 2014, 4:50:01 AM1/18/14

to

On Fri, 2014-01-17 at 14:45 +1100, Stephen Rothwell wrote:
> Hi all,
>
> On Thu, 16 Jan 2014 14:51:08 -0800 "H. Peter Anvin" <h...@zytor.com> wrote:
> >
> > On 01/16/2014 02:34 PM, Stephen Rothwell wrote:
> > >
> > > On Thu, 16 Jan 2014 23:25:36 +0100 Peter Zijlstra
> > > <pet...@infradead.org> wrote:
> > >>
> > >> On Fri, Jan 17, 2014 at 07:46:28AM +1100, Stephen Rothwell
> > >> wrote:
> > >>>
> > >>> On Thu, 16 Jan 2014 13:19:55 +0100 Peter Zijlstra
> > >>> <pet...@infradead.org> wrote:
> > >>>>
> > >>>> I think the below ought to work
> > >>>
> > >>> To be clear, all you did was replace the body of mwait_idle()
> > >>> with
> > >>>
> > >>> mwait_idle_with_hints(0, 0);
> > >>
> > >> Pretty much, and add the asm/mwait.h include, otherwise you'll
> > >> end up with a compile fail.
> > >>
> > >>> (and the comment above it)? I need to apply in incremental
> > >>> patch in the merge commit.
> > >>
> > >> I don't think I touched the comment at all.
> > >
> >
> > In retrospect this bit probably should have gone through the idle
> > tree. That was my bad, I need to coordinate with Len better.
>
> So this is what I added as a merge fix patch. Someone just needs to make
> sure Linus gets this when the latter of the tow trees gets merged.

I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
Q6600 box. See below for an alternative.

idle: kill unnecessary mwait_idle() resched IPIs

Set/clear polling instead.

Q6600, pipe-test scheduling cross core:
3.8.13 487.2 KHz 1.000
3.13.0-master 415.5 KHz .852
3.13.0-master+ 415.2 KHz .852 + restore mwait_idle
3.13.0-master++ 488.5 KHz 1.002 + restore mwait_idle + IPI fix
3.13.0-next-20140117 -ENOBOOT
3.13.0-next-20140117+ 531.4 KHz 1.090 + IPI fix

Signed-off-by: Mike Galbraith <bitb...@online.de>
---
arch/x86/include/asm/processor.h | 8 ++++++++
arch/x86/kernel/process.c | 10 +++++++---
2 files changed, 15 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -723,6 +723,14 @@ static inline void sync_core(void)
#endif
}

+static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
+{
+ trace_hardirqs_on();
+ /* "mwait %eax, %ecx;" */
+ asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
+ :: "a" (eax), "c" (ecx));
+}
+
extern void select_idle_routine(const struct cpuinfo_x86 *c);
extern void init_amd_e400_c1e_mask(void);

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -28,6 +28,7 @@
#include <asm/fpu-internal.h>
#include <asm/debugreg.h>
#include <asm/nmi.h>
+#include <asm/mwait.h>

/*
* per-CPU TSS segments. Threads are completely 'soft' on Linux,

@@ -427,18 +428,21 @@ static int prefer_mwait_c1_over_halt(con

static void mwait_idle(void)
{
- if (!need_resched()) {
- if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR))

+ if (!current_set_polling_and_test()) {
+ if (static_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) {
+ mb();
clflush((void *)&current_thread_info()->flags);
+ mb();
+ }

__monitor((void *)&current_thread_info()->flags, 0, 0);
- smp_mb();

if (!need_resched())
__sti_mwait(0, 0);
else
local_irq_enable();
} else
local_irq_enable();
+ current_clr_polling();

}

void select_idle_routine(const struct cpuinfo_x86 *c)

--

Peter Zijlstra

unread,

Jan 18, 2014, 7:50:01 AM1/18/14

to

On Sat, Jan 18, 2014 at 10:46:06AM +0100, Mike Galbraith wrote:
>
> I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
> Q6600 box. See below for an alternative.

Urgh, I see, we call the idle arch_cpu_idle() callback with irqs
disabled.

Could something like this work?

local_irq_enable();
mwait_idle_with_hints(0,0);

The interrupt enable window is slightly larger, but I'm not immediately
seeing a problem with that.

Sabrina Dubroca

unread,

Jan 18, 2014, 9:10:01 AM1/18/14

to

2014-01-18, 13:44:51 +0100, Peter Zijlstra wrote:
> On Sat, Jan 18, 2014 at 10:46:06AM +0100, Mike Galbraith wrote:
> >
> > I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
> > Q6600 box. See below for an alternative.
>
> Urgh, I see, we call the idle arch_cpu_idle() callback with irqs
> disabled.
>
> Could something like this work?
>
> local_irq_enable();
> mwait_idle_with_hints(0,0);
>
> The interrupt enable window is slightly larger, but I'm not immediately
> seeing a problem with that.

next-20140117 doesn't boot on my T7300 laptop either. I've tried the
two fixes, they both work for me.

Thanks,

--
Sabrina

Mike Galbraith

unread,

Jan 18, 2014, 10:30:02 AM1/18/14

to

On Sat, 2014-01-18 at 13:44 +0100, Peter Zijlstra wrote:
> On Sat, Jan 18, 2014 at 10:46:06AM +0100, Mike Galbraith wrote:
> >
> > I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
> > Q6600 box. See below for an alternative.
>
> Urgh, I see, we call the idle arch_cpu_idle() callback with irqs
> disabled.
>
> Could something like this work?
>
> local_irq_enable();
> mwait_idle_with_hints(0,0);
>
> The interrupt enable window is slightly larger, but I'm not immediately
> seeing a problem with that.

Yup, works just fine. Less is more.

Nice to see a _progression_ in the pipe too btw.

-Mike

H. Peter Anvin

unread,

Jan 18, 2014, 2:20:01 PM1/18/14

to

On 01/18/2014 07:21 AM, Mike Galbraith wrote:
> On Sat, 2014-01-18 at 13:44 +0100, Peter Zijlstra wrote:
>> On Sat, Jan 18, 2014 at 10:46:06AM +0100, Mike Galbraith wrote:
>>>
>>> I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
>>> Q6600 box. See below for an alternative.
>>
>> Urgh, I see, we call the idle arch_cpu_idle() callback with irqs
>> disabled.
>>
>> Could something like this work?
>>
>> local_irq_enable();
>> mwait_idle_with_hints(0,0);
>>
>> The interrupt enable window is slightly larger, but I'm not immediately
>> seeing a problem with that.
>
> Yup, works just fine. Less is more.
>
> Nice to see a _progression_ in the pipe too btw.
>

This means an interrupt window is open and we can take an interrupt
between checking need_resched and the MWAIT, which couldn't happen with
__sti_mwait().

Are we sure that is actually safe?

-hpa

Peter Zijlstra

unread,

Jan 18, 2014, 5:00:02 PM1/18/14

to

On Sat, Jan 18, 2014 at 11:14:57AM -0800, H. Peter Anvin wrote:
> >> Could something like this work?
> >>
> >> local_irq_enable();
> >> mwait_idle_with_hints(0,0);
> >>

> This means an interrupt window is open and we can take an interrupt
> between checking need_resched and the MWAIT, which couldn't happen with
> __sti_mwait().
>
> Are we sure that is actually safe?

current_set_polling_and_test() vs resched_task() should be good that
way, but I've got a terrible head-ache today so don't rely on anything
much I say.

Len Brown

unread,

Jan 19, 2014, 8:10:01 PM1/19/14

to

IMO, a regression fix (restore mwait_idle()) is more important than a clean up
(restructure mwait routines), and the clean-up should take a back seat;
in -tip, in -next, upstream, and in -stable.

Also, I'm wondering if that clean-up went too far -- as not all users of mwait
are necessarily under the same conditions...

thanks,
Len Brown, Intel Open Source Technology Center

H. Peter Anvin

unread,

Jan 19, 2014, 8:10:02 PM1/19/14

to

On 01/19/2014 05:00 PM, Len Brown wrote:
> On Wed, Jan 15, 2014 at 10:58 PM, Stephen Rothwell <s...@canb.auug.org.au> wrote:
>> Hi all,
>>
>> After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
>> failed like this:
>>
>> arch/x86/kernel/process.c: In function 'mwait_idle':
>> /scratch/sfr/next/arch/x86/kernel/process.c:434:3: error: implicit declaration of function '__monitor' [-Werror=implicit-function-declaration]
>> __monitor((void *)&current_thread_info()->flags, 0, 0);
>> ^
>> arch/x86/kernel/process.c:437:4: error: implicit declaration of function '__sti_mwait' [-Werror=implicit-function-declaration]
>> __sti_mwait(0, 0);
>> ^
>>
>> Caused by commit 16824255394f ("x86, acpi, idle: Restructure the mwait
>> idle routines") interacting with commit 7760518cce95 ("x86 idle: restore
>> mwait_idle()") from the idle tree.
>>
>> I am not sure how to fix this so I just reverted the idle tree commit for
>> now (since it reverted cleanly). Please let me know if there is a better
>> solution.
>
> IMO, a regression fix (restore mwait_idle()) is more important than a clean up
> (restructure mwait routines), and the clean-up should take a back seat;
> in -tip, in -next, upstream, and in -stable.
>
> Also, I'm wondering if that clean-up went too far -- as not all users of mwait
> are necessarily under the same conditions...
>

Sounds like a NAK to me, in which case that bit should probably be
deferred and reintroduced after fixing via the idle tree?

-hpa

Stephen Rothwell

unread,

Jan 19, 2014, 11:00:03 PM1/19/14

to

On Sat, 18 Jan 2014 10:46:06 +0100 Mike Galbraith <bitb...@online.de> wrote:
>
> I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
> Q6600 box. See below for an alternative.
>

> idle: kill unnecessary mwait_idle() resched IPIs

OK, so despite even further discussion, I have applied this as a merge
fix patch for today. Let me know when it is all sorted out.

Len Brown

unread,

Jan 19, 2014, 11:50:01 PM1/19/14

to

+static void mwait_idle(void)
+{
+ mwait_idle_with_hints(0, 0);
+}
+

The reason the patch above will crash Core2 machines is because
core2 machines don't support mwait_idle_with_hints().

The calling sequence for old and new MWAIT instructions is different.
The former must be invoked with interrupts enabled,
and the later can be invoked with interrupts disabled,
which is a feature that Linux takes advantage of.

thanks,
-Len

Peter Zijlstra

unread,

Jan 20, 2014, 3:30:03 AM1/20/14

to

On Sun, Jan 19, 2014 at 11:45:43PM -0500, Len Brown wrote:
> +static void mwait_idle(void)
> +{
> + mwait_idle_with_hints(0, 0);
> +}
> +
>
> The reason the patch above will crash Core2 machines is because
> core2 machines don't support mwait_idle_with_hints().
>
> The calling sequence for old and new MWAIT instructions is different.
> The former must be invoked with interrupts enabled,
> and the later can be invoked with interrupts disabled,
> which is a feature that Linux takes advantage of.

What old and new? They're the same byte sequence: 0x0f 0x01 0xc9

And your 'old' __sti_mwait(0,0) has the exact same arguments as
mwait_idle_with_hints(0,0).

Peter Zijlstra

unread,

Jan 20, 2014, 3:40:02 AM1/20/14

to

On Sun, Jan 19, 2014 at 08:00:19PM -0500, Len Brown wrote:
> On Wed, Jan 15, 2014 at 10:58 PM, Stephen Rothwell <s...@canb.auug.org.au> wrote:
> > Hi all,
> >
> > After merging the tip tree, today's linux-next build (x86_64 allmodconfig)
> > failed like this:
> >
> > arch/x86/kernel/process.c: In function 'mwait_idle':
> > /scratch/sfr/next/arch/x86/kernel/process.c:434:3: error: implicit declaration of function '__monitor' [-Werror=implicit-function-declaration]
> > __monitor((void *)&current_thread_info()->flags, 0, 0);
> > ^
> > arch/x86/kernel/process.c:437:4: error: implicit declaration of function '__sti_mwait' [-Werror=implicit-function-declaration]
> > __sti_mwait(0, 0);
> > ^
> >
> > Caused by commit 16824255394f ("x86, acpi, idle: Restructure the mwait
> > idle routines") interacting with commit 7760518cce95 ("x86 idle: restore
> > mwait_idle()") from the idle tree.
> >
> > I am not sure how to fix this so I just reverted the idle tree commit for
> > now (since it reverted cleanly). Please let me know if there is a better
> > solution.
>
> IMO, a regression fix (restore mwait_idle()) is more important than a clean up
> (restructure mwait routines), and the clean-up should take a back seat;
> in -tip, in -next, upstream, and in -stable.

It was part of that other regression fix, the 50+ watt thingy for your
broken EX chips.

It was also written much earlier and widely mailed and published before
you did the core2 thing.

> Also, I'm wondering if that clean-up went too far -- as not all users of mwait
> are necessarily under the same conditions...

Then make them so. The fact was that most of the mwait idle sites
were bloody broken. And the single mwait_idle_with_hints() function
presents a single nice function that does all the required magics.

Sedat Dilek

unread,

Jan 20, 2014, 3:50:02 AM1/20/14

to

On Mon, Jan 20, 2014 at 4:51 AM, Stephen Rothwell <s...@canb.auug.org.au> wrote:
> On Sat, 18 Jan 2014 10:46:06 +0100 Mike Galbraith <bitb...@online.de> wrote:
>>
>> I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
>> Q6600 box. See below for an alternative.
>>
>> idle: kill unnecessary mwait_idle() resched IPIs
>
> OK, so despite even further discussion, I have applied this as a merge
> fix patch for today. Let me know when it is all sorted out.
>

Where is this fix?
( Browsing Linux-next remote GIT repository online. )
2x NOPE for me.

- Sedat -

[1] http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/log/?id=next-20140120&qt=grep&q=mwait_idle
[2] http://git.kernel.org/cgit/linux/kernel/git/sfr/next-fixes.git

> --
> Cheers,
> Stephen Rothwell s...@canb.auug.org.au

Sedat Dilek

unread,

Jan 20, 2014, 3:50:03 AM1/20/14

to

On Mon, Jan 20, 2014 at 9:42 AM, Sedat Dilek <sedat...@gmail.com> wrote:
> On Mon, Jan 20, 2014 at 4:51 AM, Stephen Rothwell <s...@canb.auug.org.au> wrote:
>> On Sat, 18 Jan 2014 10:46:06 +0100 Mike Galbraith <bitb...@online.de> wrote:
>>>
>>> I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
>>> Q6600 box. See below for an alternative.
>>>
>>> idle: kill unnecessary mwait_idle() resched IPIs
>>
>> OK, so despite even further discussion, I have applied this as a merge
>> fix patch for today. Let me know when it is all sorted out.
>>
>
> Where is this fix?
> ( Browsing Linux-next remote GIT repository online. )
> 2x NOPE for me.
>
> - Sedat -
>
> [1] http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/log/?id=next-20140120&qt=grep&q=mwait_idle
> [2] http://git.kernel.org/cgit/linux/kernel/git/sfr/next-fixes.git
>

Hmmm... Found this in Next/merge.log

+$ git am -3 ../patches/0001-x86-idle-mwait_idle-merge-update.patch
+Applying: idle: kill unnecessary mwait_idle() resched IPIs
+$ git reset HEAD^
+Unstaged changes after reset:
+M arch/x86/include/asm/processor.h
+M arch/x86/kernel/process.c

Is this a local patch not shipped in the Linux-next (remote) GIT repo?
Why is this not in your next-fixes GIT repo?

A bit confused about your -next policies,
- Sedat -

H. Peter Anvin

unread,

Jan 20, 2014, 4:00:03 AM1/20/14

to

On 01/20/2014 12:26 AM, Peter Zijlstra wrote:
> On Sun, Jan 19, 2014 at 11:45:43PM -0500, Len Brown wrote:
>> +static void mwait_idle(void)
>> +{
>> + mwait_idle_with_hints(0, 0);
>> +}
>> +
>>
>> The reason the patch above will crash Core2 machines is because
>> core2 machines don't support mwait_idle_with_hints().
>>
>> The calling sequence for old and new MWAIT instructions is different.
>> The former must be invoked with interrupts enabled,
>> and the later can be invoked with interrupts disabled,
>> which is a feature that Linux takes advantage of.
>
> What old and new? They're the same byte sequence: 0x0f 0x01 0xc9
>
> And your 'old' __sti_mwait(0,0) has the exact same arguments as
> mwait_idle_with_hints(0,0).
>

The difference is the STI!

-hpa

Mike Galbraith

unread,

Jan 20, 2014, 4:20:02 AM1/20/14

to

On Mon, 2014-01-20 at 09:42 +0100, Sedat Dilek wrote:
> On Mon, Jan 20, 2014 at 4:51 AM, Stephen Rothwell <s...@canb.auug.org.au> wrote:
> > On Sat, 18 Jan 2014 10:46:06 +0100 Mike Galbraith <bitb...@online.de> wrote:
> >>
> >> I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
> >> Q6600 box. See below for an alternative.
> >>
> >> idle: kill unnecessary mwait_idle() resched IPIs
> >
> > OK, so despite even further discussion, I have applied this as a merge
> > fix patch for today. Let me know when it is all sorted out.
> >
>
> Where is this fix?

If you pull next-20140120, the fix is in it.

> ( Browsing Linux-next remote GIT repository online. )
> 2x NOPE for me.

Probably because it's a temporary conflict fix.

-Mike

Peter Zijlstra

unread,

Jan 20, 2014, 4:20:02 AM1/20/14

to

On Mon, Jan 20, 2014 at 12:56:47AM -0800, H. Peter Anvin wrote:
> On 01/20/2014 12:26 AM, Peter Zijlstra wrote:
> > On Sun, Jan 19, 2014 at 11:45:43PM -0500, Len Brown wrote:
> >> +static void mwait_idle(void)
> >> +{
> >> + mwait_idle_with_hints(0, 0);
> >> +}
> >> +
> >>
> >> The reason the patch above will crash Core2 machines is because
> >> core2 machines don't support mwait_idle_with_hints().
> >>
> >> The calling sequence for old and new MWAIT instructions is different.
> >> The former must be invoked with interrupts enabled,
> >> and the later can be invoked with interrupts disabled,
> >> which is a feature that Linux takes advantage of.
> >
> > What old and new? They're the same byte sequence: 0x0f 0x01 0xc9
> >
> > And your 'old' __sti_mwait(0,0) has the exact same arguments as
> > mwait_idle_with_hints(0,0).
> >
>
> The difference is the STI!

So do the local_irq_enable(); mwait_idle_with_hints(0,0); thing.

But that's entirely different from saying that core2 doesn't support
mwait_idle_with_hints because its a different instruction.

Sedat Dilek

unread,

Jan 20, 2014, 4:30:02 AM1/20/14

to

On Mon, Jan 20, 2014 at 10:17 AM, Mike Galbraith <bitb...@online.de> wrote:
> On Mon, 2014-01-20 at 09:42 +0100, Sedat Dilek wrote:
>> On Mon, Jan 20, 2014 at 4:51 AM, Stephen Rothwell <s...@canb.auug.org.au> wrote:
>> > On Sat, 18 Jan 2014 10:46:06 +0100 Mike Galbraith <bitb...@online.de> wrote:
>> >>
>> >> I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
>> >> Q6600 box. See below for an alternative.
>> >>
>> >> idle: kill unnecessary mwait_idle() resched IPIs
>> >
>> > OK, so despite even further discussion, I have applied this as a merge
>> > fix patch for today. Let me know when it is all sorted out.
>> >
>>
>> Where is this fix?
>
> If you pull next-20140120, the fix is in it.
>
>> ( Browsing Linux-next remote GIT repository online. )
>> 2x NOPE for me.
>
> Probably because it's a temporary conflict fix.
>

It's about the handling of fixes for -next.
For such kind of "special" tweaks Stephen introduced *next-fixes* (see
his email on linux-next ML and [1]).
Such make-my-system-boot-again and other critical fixes belong there.

BTW, I found the merge hunk (see my other email).

- Sedat -

[1] http://git.kernel.org/cgit/linux/kernel/git/sfr/next-fixes.git

Ingo Molnar

unread,

Jan 20, 2014, 4:30:02 AM1/20/14

to

* H. Peter Anvin <h...@zytor.com> wrote:

> On 01/20/2014 01:16 AM, Peter Zijlstra wrote:
> >>
> >> The difference is the STI!
> >
> > So do the local_irq_enable(); mwait_idle_with_hints(0,0); thing.
> >
>

> No, that doesn't work. The point of __sti_mwait() is that the STI
> is the instruction immediately before the MWAIT, just like the
> combination STI;HLT. Since the execution of STI is always delayed
> by one instruction, these two instructions form an atomic unit,
> which means interrupts are enabled "after" we have entered MWAIT or
> HLT.

>
> > But that's entirely different from saying that core2 doesn't
> > support mwait_idle_with_hints because its a different instruction.
>

> If you think of STI;MWAIT as a "compound instruction" it kind of is.
> Newer CPUs don't have to play that trick anymore, because there is a
> flag to MWAIT which breaks us out of MWAIT on a pending interrupt
> without having to actually enable interrupts at the point of the
> MWAIT.

As a side note, at minimum the semantic and compatibility difference
needs to be _very_ clearly present in the naming. Something like
mwait_old_() or mwait_core2_(). That way such dependencies and
assumptions don't get lost in code restructuring, etc.

Thanks,

Ingo

H. Peter Anvin

unread,

Jan 20, 2014, 4:30:03 AM1/20/14

to

On 01/20/2014 01:16 AM, Peter Zijlstra wrote:
>>

>> The difference is the STI!
>
> So do the local_irq_enable(); mwait_idle_with_hints(0,0); thing.
>

No, that doesn't work. The point of __sti_mwait() is that the STI is
the instruction immediately before the MWAIT, just like the combination
STI;HLT. Since the execution of STI is always delayed by one
instruction, these two instructions form an atomic unit, which means
interrupts are enabled "after" we have entered MWAIT or HLT.

> But that's entirely different from saying that core2 doesn't support
> mwait_idle_with_hints because its a different instruction.

If you think of STI;MWAIT as a "compound instruction" it kind of is.
Newer CPUs don't have to play that trick anymore, because there is a
flag to MWAIT which breaks us out of MWAIT on a pending interrupt
without having to actually enable interrupts at the point of the MWAIT.

-hpa

Mike Galbraith

unread,

Jan 20, 2014, 5:00:02 AM1/20/14

to

On Mon, 2014-01-20 at 10:25 +0100, Sedat Dilek wrote:

> It's about the handling of fixes for -next.

Ah, it was a gripe in query form. My bad.

-Mike

Peter Zijlstra

unread,

Jan 20, 2014, 5:00:02 AM1/20/14

to

On Mon, Jan 20, 2014 at 01:23:00AM -0800, H. Peter Anvin wrote:
> On 01/20/2014 01:16 AM, Peter Zijlstra wrote:
> >>
> >> The difference is the STI!
> >
> > So do the local_irq_enable(); mwait_idle_with_hints(0,0); thing.
> >
>
> No, that doesn't work. The point of __sti_mwait() is that the STI is
> the instruction immediately before the MWAIT, just like the combination
> STI;HLT. Since the execution of STI is always delayed by one
> instruction, these two instructions form an atomic unit, which means
> interrupts are enabled "after" we have entered MWAIT or HLT.
>
> > But that's entirely different from saying that core2 doesn't support
> > mwait_idle_with_hints because its a different instruction.
>
> If you think of STI;MWAIT as a "compound instruction" it kind of is.
> Newer CPUs don't have to play that trick anymore, because there is a
> flag to MWAIT which breaks us out of MWAIT on a pending interrupt
> without having to actually enable interrupts at the point of the MWAIT.

Ok, so I still don't get the problem of enabling interrupts early.

If we enable them early we can get interrupts; which afaict fall into
two groups, those that do and do not set NEED_RESCHED.

For those that do not set NEED_RESCHED, we'd have woken from MWAIT/HLT
and looped right back into it, so receiving those early -- before
actually calling MWAIT/HLT seems like a NO-OP.

For those setting NEED_RESCHED, we test NEED_RESCHED in all the right
places.

- current_set_polling_and_test(), we test need_resched after telling
remote CPUs they don't need to send interrupts because we're polling
for it -- the remote cpus set NEED_RESCHED before testing if we're
polling for it.

- we test NEED_RESCHED after setting up the monitor and before calling
MWAIT. Therefore, if an interrupt would happen right before we call
MWAIT, the monitor is already set and the MWAIT does an immediate
exit.

AFAICT we simply cannot get stuck and miss a NEED_RESCHED this way.

Peter Zijlstra

unread,

Jan 20, 2014, 5:20:02 AM1/20/14

to

On Mon, Jan 20, 2014 at 09:30:21AM +0100, Peter Zijlstra wrote:
> Then make them so. The fact was that most of the mwait idle sites
> were bloody broken. And the single mwait_idle_with_hints() function
> presents a single nice function that does all the required magics.

To stress this a bit more; have a look see at mwwait_idle_with_hints();
it does a whole lot of subtle magic.

- current_{set,clr}_polling*(), these are crucial in not missing and
wrecking NEED_RESCHED state.

- X86_FEATURE_CLFLUSH_MONTIOR quirk

- Does the monitor(); if (!need_resched()) mwait() thing.

All of those are required for a correct and functional idle loop. And
I've seen sites where any or all of the above were missing/broken.

Not unifying the lot into a simple usable function is just stupid --
history has shown people simply cannot be trusted to get this right.

H. Peter Anvin

unread,

Jan 20, 2014, 5:30:01 AM1/20/14

to

On 01/20/2014 02:13 AM, Peter Zijlstra wrote:
> On Mon, Jan 20, 2014 at 09:30:21AM +0100, Peter Zijlstra wrote:
>> Then make them so. The fact was that most of the mwait idle sites
>> were bloody broken. And the single mwait_idle_with_hints() function
>> presents a single nice function that does all the required magics.
>
> To stress this a bit more; have a look see at mwwait_idle_with_hints();
> it does a whole lot of subtle magic.
>
> - current_{set,clr}_polling*(), these are crucial in not missing and
> wrecking NEED_RESCHED state.
>
> - X86_FEATURE_CLFLUSH_MONTIOR quirk
>
> - Does the monitor(); if (!need_resched()) mwait() thing.
>
> All of those are required for a correct and functional idle loop. And
> I've seen sites where any or all of the above were missing/broken.
>
> Not unifying the lot into a simple usable function is just stupid --
> history has shown people simply cannot be trusted to get this right.
>

I don't think anyone is arguing that. The question is rather if the
implementation is correct, and if it is ready for the merge window.

-hpa

H. Peter Anvin

unread,

Jan 20, 2014, 6:10:02 AM1/20/14

to

On 01/20/2014 01:55 AM, Peter Zijlstra wrote:
>
> Ok, so I still don't get the problem of enabling interrupts early.
>
> If we enable them early we can get interrupts; which afaict fall into
> two groups, those that do and do not set NEED_RESCHED.
>
> For those that do not set NEED_RESCHED, we'd have woken from MWAIT/HLT
> and looped right back into it, so receiving those early -- before
> actually calling MWAIT/HLT seems like a NO-OP.

The description for commit d331e739f5ad seems to indicate otherwise:

Idle callbacks has some races when enter_idle() sets isidle and
subsequent
interrupts that can happen on that CPU, before CPU goes to idle. Due
to this,
an IDLE_END can get called before IDLE_START. To avoid these races,
disable
interrupts before enter_idle and make sure that all idle routines do not
enable interrupts before entering idle.

This implies to me that once we have set isidle, if we take an interrupt
we *have* to drop out of the idle routine.

> For those setting NEED_RESCHED, we test NEED_RESCHED in all the right
> places.
>
> - current_set_polling_and_test(), we test need_resched after telling
> remote CPUs they don't need to send interrupts because we're polling
> for it -- the remote cpus set NEED_RESCHED before testing if we're
> polling for it.
>
> - we test NEED_RESCHED after setting up the monitor and before calling
> MWAIT. Therefore, if an interrupt would happen right before we call
> MWAIT, the monitor is already set and the MWAIT does an immediate
> exit.
>
> AFAICT we simply cannot get stuck and miss a NEED_RESCHED this way.
>

Well, it is obviously needed for the HLT case. For MWAIT it seems like
the MONITOR should have gotten disarmed and therefore MWAIT shouldn't
sleep... I don't know off the top of my head if there are any errata in
that department and/or if there are any other issues.

-hpa

Peter Zijlstra

unread,

Jan 20, 2014, 6:10:02 AM1/20/14

to

On Mon, Jan 20, 2014 at 03:00:29AM -0800, H. Peter Anvin wrote:
> On 01/20/2014 01:55 AM, Peter Zijlstra wrote:
> >
> > Ok, so I still don't get the problem of enabling interrupts early.
> >
> > If we enable them early we can get interrupts; which afaict fall into
> > two groups, those that do and do not set NEED_RESCHED.
> >
> > For those that do not set NEED_RESCHED, we'd have woken from MWAIT/HLT
> > and looped right back into it, so receiving those early -- before
> > actually calling MWAIT/HLT seems like a NO-OP.
>
> The description for commit d331e739f5ad seems to indicate otherwise:
>
> Idle callbacks has some races when enter_idle() sets isidle and
> subsequent
> interrupts that can happen on that CPU, before CPU goes to idle. Due
> to this,
> an IDLE_END can get called before IDLE_START. To avoid these races,
> disable
> interrupts before enter_idle and make sure that all idle routines do not
> enable interrupts before entering idle.
>
> This implies to me that once we have set isidle, if we take an interrupt
> we *have* to drop out of the idle routine.

I don't think that applies anymore; the generic idle loop calls
arch_cpu_idle_enter() before calling arch_cpu_idle() where we would do
the enable.

So in that sense its impossible to get arch_cpu_idle_exit() -- or rather
exit_idle() as called from the interrupts -- to happen before
arch_cpu_idle_enter().

Peter Zijlstra

unread,

Jan 20, 2014, 6:10:02 AM1/20/14

to

On Mon, Jan 20, 2014 at 02:19:30AM -0800, H. Peter Anvin wrote:
> On 01/20/2014 02:13 AM, Peter Zijlstra wrote:
> > On Mon, Jan 20, 2014 at 09:30:21AM +0100, Peter Zijlstra wrote:
> >> Then make them so. The fact was that most of the mwait idle sites
> >> were bloody broken. And the single mwait_idle_with_hints() function
> >> presents a single nice function that does all the required magics.
> >
> > To stress this a bit more; have a look see at mwwait_idle_with_hints();
> > it does a whole lot of subtle magic.
> >
> > - current_{set,clr}_polling*(), these are crucial in not missing and
> > wrecking NEED_RESCHED state.
> >
> > - X86_FEATURE_CLFLUSH_MONTIOR quirk
> >
> > - Does the monitor(); if (!need_resched()) mwait() thing.
> >
> > All of those are required for a correct and functional idle loop. And
> > I've seen sites where any or all of the above were missing/broken.
> >
> > Not unifying the lot into a simple usable function is just stupid --
> > history has shown people simply cannot be trusted to get this right.
> >
>
> I don't think anyone is arguing that. The question is rather if the
> implementation is correct, and if it is ready for the merge window.

I've yet to hear an argument against it other than vaguaries.

Stephen Rothwell

unread,

Jan 20, 2014, 8:20:02 AM1/20/14

to

Hi Sedat,

On Mon, 20 Jan 2014 09:46:55 +0100 Sedat Dilek <sedat...@gmail.com> wrote:
>
> On Mon, Jan 20, 2014 at 9:42 AM, Sedat Dilek <sedat...@gmail.com> wrote:
> > On Mon, Jan 20, 2014 at 4:51 AM, Stephen Rothwell <s...@canb.auug.org.au> wrote:
> >> On Sat, 18 Jan 2014 10:46:06 +0100 Mike Galbraith <bitb...@online.de> wrote:
> >>>
> >>> I hope it doesn't look quite like that, next-20140117 is -ENOBOOT on
> >>> Q6600 box. See below for an alternative.
> >>>
> >>> idle: kill unnecessary mwait_idle() resched IPIs
> >>
> >> OK, so despite even further discussion, I have applied this as a merge
> >> fix patch for today. Let me know when it is all sorted out.
> >>
> >
> > Where is this fix?
> > ( Browsing Linux-next remote GIT repository online. )
> > 2x NOPE for me.
> >
> > - Sedat -
> >
> > [1] http://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/log/?id=next-20140120&qt=grep&q=mwait_idle
> > [2] http://git.kernel.org/cgit/linux/kernel/git/sfr/next-fixes.git
> >
>
> Hmmm... Found this in Next/merge.log
>
> +$ git am -3 ../patches/0001-x86-idle-mwait_idle-merge-update.patch
> +Applying: idle: kill unnecessary mwait_idle() resched IPIs
> +$ git reset HEAD^
> +Unstaged changes after reset:
> +M arch/x86/include/asm/processor.h
> +M arch/x86/kernel/process.c

You missed the next three lines:

$ git add -A .
$ git commit -v -a --amend
[master 65d9a14a9a41] Merge remote-tracking branch 'tip/auto-latest'

> Is this a local patch not shipped in the Linux-next (remote) GIT repo?
> Why is this not in your next-fixes GIT repo?

Its part of the conflict resolution for the merge of the tip tree. It
cannot go into my fixes tree - that is for fixes to bugs in Linus' tree
until they are integrated there. The tip and pm trees are both fine on
their own, but combined they don't. So this fix has to go into the actual
merge commit for the merge of the later tree. When Linus' merges the
later of these trees he will also need this fix - or a better one - which
is what is still under discussion.

> A bit confused about your -next policies,

Any better?

Sedat Dilek

unread,

Jan 20, 2014, 10:30:01 AM1/20/14

to

[ I was absent for a while from Linux-next, so I am asking for clarification. ]
[ I might be wrong. ]

What does that mean?
AFAICS you applied an important fix by yourself on top of tip/auto-latest?

>> Is this a local patch not shipped in the Linux-next (remote) GIT repo?
>> Why is this not in your next-fixes GIT repo?
>
> Its part of the conflict resolution for the merge of the tip tree. It
> cannot go into my fixes tree - that is for fixes to bugs in Linus' tree
> until they are integrated there. The tip and pm trees are both fine on
> their own, but combined they don't. So this fix has to go into the actual
> merge commit for the merge of the later tree. When Linus' merges the
> later of these trees he will also need this fix - or a better one - which
> is what is still under discussion.
>

I was asking in general about next-fixes to have a "bootable" (aka
working) Linux-next kernel.

You see next-fixes as a place to fix Linus-tree, seriously?

The question here in this special case seems to be a "logical"
(not-working-together) problem between tip and pm.

And we are in a merge-window...

>> A bit confused about your -next policies,
>
> Any better?
>

Not really.
You should clarify on what you are doing in your next-fixes tree.
Your daily report for Linux-next releases even does not mention next-fixes.

Looking through my INBOX Thierry had the initial idea of "fixes for
linux-next" when you were on vacation and he took over maintainership.

Len Brown

unread,

Jan 20, 2014, 4:50:02 PM1/20/14

to

> As a side note, at minimum the semantic and compatibility difference
> needs to be _very_ clearly present in the naming. Something like
> mwait_old_() or mwait_core2_(). That way such dependencies and
> assumptions don't get lost in code restructuring, etc.

Agreed.
We started with mwait_idle() -- which was erroneously removed
and is now being restored under it original name.

The "new" function is mwait_idle_with_hints() -- which uses
the additional hints that were not available w/ the original MWAIT instruction.
Where "new" is Core Duo and later -- all the processor that can use
MWAIT for C-states deeper than C1.

Len Brown, Intel Open Source Technology Center

Peter Zijlstra

unread,

Jan 20, 2014, 5:00:03 PM1/20/14

to

On Mon, Jan 20, 2014 at 04:39:45PM -0500, Len Brown wrote:
> > As a side note, at minimum the semantic and compatibility difference
> > needs to be _very_ clearly present in the naming. Something like
> > mwait_old_() or mwait_core2_(). That way such dependencies and
> > assumptions don't get lost in code restructuring, etc.
>
> Agreed.
> We started with mwait_idle() -- which was erroneously removed
> and is now being restored under it original name.
>
> The "new" function is mwait_idle_with_hints() -- which uses
> the additional hints that were not available w/ the original MWAIT instruction.
> Where "new" is Core Duo and later -- all the processor that can use
> MWAIT for C-states deeper than C1.

I'm still waiting for someone to explain what's wrong with:

static inline void mwait_idle(void)
{
local_irq_enable();
mwait_idle_with_hints(0, 0);

Mike Galbraith

unread,

Jan 20, 2014, 10:30:03 PM1/20/14

to

On Mon, 2014-01-20 at 22:51 +0100, Peter Zijlstra wrote:

> I'm still waiting for someone to explain what's wrong with:
>
> static inline void mwait_idle(void)
> {
> local_irq_enable();
> mwait_idle_with_hints(0, 0);
> }

How about just do that going forward, it work, and can always be fixed
if something turns up, and the below for stable once it hits mainline?

Q6600 box is happy camper in all trees.

From: Len Brown <len....@intel.com>

x86 idle: restore mwait_idle()

In Linux-3.9 we removed the mwait_idle() loop:
'x86 idle: remove mwait_idle() and "idle=mwait" cmdline param'
(69fb3676df3329a7142803bb3502fa59dc0db2e3)

The reasoning was that modern machines should be sufficiently
happy during the boot process using the default_idle() HALT loop,
until cpuidle loads and either acpi_idle or intel_idle
invoke the newer MWAIT-with-hints idle loop.

But two machines reported problems:
1. Certain Core2-era machines support MWAIT-C1 and HALT only.
MWAIT-C1 is preferred for optimal power and performance.
But if they support just C1, cpuidle never loads and
so they use the boot-time default idle loop forever.

2. Some laptops will boot-hang if HALT is used,
but will boot successfully if MWAIT is used.
This appears to be a hidden assumption in BIOS SMI,
that is presumably valid on the proprietary OS
where the BIOS was validated.

https://bugzilla.kernel.org/show_bug.cgi?id=60770

So here we effectively revert the patch above, restoring
the mwait_idle() loop. However, we don't bother restoring
the idle=mwait cmdline parameter, since it appears to add
no value.

Maintainer notes:
For 3.9, simply revert 69fb3676df
for 3.10, patch -F3 applies, fuzz needed due to __cpuinit use in context
For 3.11, 3.12, 3.13, this patch applies cleanly

Mike: reinstate polling, and add clflush barriers.

Cc: Mike Galbraith <bitb...@online.de>
Cc: Ian Malone <ibma...@gmail.com>
Cc: Josh Boyer <jwb...@redhat.com>
Cc: <sta...@vger.kernel.org> # 3.9, 3.10, 3.11, 3.12, 3.13
Signed-off-by: Mike Galbraith <bitb...@online.de>
Signed-off-by: Len Brown <len....@intel.com>
---
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 3fb8d95ab8b5..c5db2a43e730 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -398,6 +398,52 @@ static void amd_e400_idle(void)
default_idle();
}

+/*
+ * Intel Core2 and older machines prefer MWAIT over HALT for C1.
+ * We can't rely on cpuidle installing MWAIT, because it will not load
+ * on systems that support only C1 -- so the boot default must be MWAIT.
+ *
+ * Some AMD machines are the opposite, they depend on using HALT.
+ *
+ * So for default C1, which is used during boot until cpuidle loads,
+ * use MWAIT-C1 on Intel HW that has it, else use HALT.
+ */
+static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
+{
+ if (c->x86_vendor != X86_VENDOR_INTEL)
+ return 0;
+
+ if (!cpu_has(c, X86_FEATURE_MWAIT))
+ return 0;
+
+ return 1;
+}
+
+/*
+ * MONITOR/MWAIT with no hints, used for default default C1 state.
+ * This invokes MWAIT with interrutps enabled and no flags,
+ * which is backwards compatible with the original MWAIT implementation.
+ */
+
+static void mwait_idle(void)
+{
+ if (!current_set_polling_and_test()) {
+ if (static_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) {
+ mb();
+ clflush((void *)&current_thread_info()->flags);
+ mb();
+ }
+
+ __monitor((void *)&current_thread_info()->flags, 0, 0);
+ if (!need_resched())
+ __sti_mwait(0, 0);
+ else
+ local_irq_enable();
+ } else
+ local_irq_enable();
+ __current_clr_polling();
+}
+
void select_idle_routine(const struct cpuinfo_x86 *c)
{
#ifdef CONFIG_SMP
@@ -411,6 +457,9 @@ void select_idle_routine(const struct cpuinfo_x86 *c)
/* E400: APIC timer interrupt does not wake up CPU from C1e */
pr_info("using AMD E400 aware idle routine\n");
x86_idle = amd_e400_idle;
+ } else if (prefer_mwait_c1_over_halt(c)) {
+ pr_info("using mwait in idle threads\n");
+ x86_idle = mwait_idle;
} else
x86_idle = default_idle;

Ingo Molnar

unread,

Jan 21, 2014, 5:50:01 AM1/21/14

to

* Peter Zijlstra <pet...@infradead.org> wrote:

> On Mon, Jan 20, 2014 at 04:39:45PM -0500, Len Brown wrote:
> > > As a side note, at minimum the semantic and compatibility difference
> > > needs to be _very_ clearly present in the naming. Something like
> > > mwait_old_() or mwait_core2_(). That way such dependencies and
> > > assumptions don't get lost in code restructuring, etc.
> >
> > Agreed.
> > We started with mwait_idle() -- which was erroneously removed
> > and is now being restored under it original name.
> >
> > The "new" function is mwait_idle_with_hints() -- which uses the
> > additional hints that were not available w/ the original MWAIT
> > instruction. Where "new" is Core Duo and later -- all the
> > processor that can use MWAIT for C-states deeper than C1.
>
> I'm still waiting for someone to explain what's wrong with:
>
> static inline void mwait_idle(void)
> {
> local_irq_enable();
> mwait_idle_with_hints(0, 0);
> }

Absolutely agreed, we don't want to carry it on 'just because', the
compatibility aspect needs to be documented - otherwise we degrade
into cargo cult programming.

Thanks,

Ingo