[RFC][PATCH] cpuidle: avoid singing capacitors

Pierre Ossman

unread,

Feb 29, 2008, 1:50:21 PM2/29/08

to

Many devices today are of a less than stellar quality, and singing
transistors are a common problem. A high-pitch noise is created, caused
by power fluctuations as the processor enters and leaves deep sleep at
a high frequency.

Instead of just disabling the deep sleep (which wastes power). This
patch merely reduces the number of times it is entered so that the
frequency doesn't exceed 500 Hz. That should make sure the problem is
inaudible.

Signed-off-by: Pierre Ossman <drz...@drzeus.cx>
--

The basic idea is above, but the implementation doesn't quite work. It seems jiffies is not a good time source in this scenario. The time spent in C3 takes a much bigger hit than I expect. The fact that powertop says that it has an average residency of 200 ms in C2 tells me that those should have been spent in C3.

So, pointers on what else to do?

(Patch also needs an on/off switch, but that's a later problem)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..171d838 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,12 @@

#define BREAK_FUZZ 4 /* 4 us */

+/*
+ * The minimum number of ticks needed to not oscillate faster than
+ * 500 Hz.
+ */
+#define MIN_DEEP_INTERVAL (HZ / 500)
+
struct menu_device {
int last_state_idx;

@@ -23,6 +29,8 @@ struct menu_device {
unsigned int predicted_us;
unsigned int last_measured_us;
unsigned int elapsed_us;
+
+ unsigned long last_deep_jif;
};

static DEFINE_PER_CPU(struct menu_device, menu_devices);
@@ -50,9 +58,16 @@ static int menu_select(struct cpuidle_device *dev)
break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+ if ((dev->states[i].flags & CPUIDLE_FLAG_DEEP) &&
+ time_before_eq(jiffies, data->last_deep_jif + MIN_DEEP_INTERVAL))
+ break;
}

data->last_state_idx = i - 1;
+
+ if (dev->states[i - 1].flags & CPUIDLE_FLAG_DEEP)
+ data->last_deep_jif = jiffies;
+
return i - 1;
}

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Alan Jenkins

unread,

Feb 29, 2008, 4:10:13 PM2/29/08

to

Pierre Ossman wrote:
> Many devices today are of a less than stellar quality, and singing
> transistors are a common problem. A high-pitch noise is created, caused
> by power fluctuations as the processor enters and leaves deep sleep at
> a high frequency.

Capacitors or transistors? The subject and the description disagree.

Alan

Lennart Sorensen

unread,

Feb 29, 2008, 4:50:20 PM2/29/08

to

On Fri, Feb 29, 2008 at 07:38:12PM +0100, Pierre Ossman wrote:
> +/*
> + * The minimum number of ticks needed to not oscillate faster than
> + * 500 Hz.
> + */
> +#define MIN_DEEP_INTERVAL (HZ / 500)

What happens here if HZ < 500? Or does the fact that you have less than
500HZ jiffies automatically imply that you can't go to sleep more than
the jiffy rate times per second?

--
Len Sorensen

Pierre Ossman

unread,

Mar 1, 2008, 7:40:10 AM3/1/08

to

On Fri, 29 Feb 2008 13:09:05 -0800 (PST)
Alan Jenkins <alan-j...@tuffmail.co.uk> wrote:

> Pierre Ossman wrote:
> > Many devices today are of a less than stellar quality, and singing
> > transistors are a common problem. A high-pitch noise is created, caused
> > by power fluctuations as the processor enters and leaves deep sleep at
> > a high frequency.
>
> Capacitors or transistors? The subject and the description disagree.
>

That should teach me to write commit messages when I'm tired... Capacitors is of course the right answer. :)

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pierre Ossman

unread,

Mar 1, 2008, 7:40:14 AM3/1/08

to

On Fri, 29 Feb 2008 16:44:07 -0500
lsor...@csclub.uwaterloo.ca (Lennart Sorensen) wrote:

> On Fri, Feb 29, 2008 at 07:38:12PM +0100, Pierre Ossman wrote:
> > +/*
> > + * The minimum number of ticks needed to not oscillate faster than
> > + * 500 Hz.
> > + */
> > +#define MIN_DEEP_INTERVAL (HZ / 500)
>
> What happens here if HZ < 500? Or does the fact that you have less than
> 500HZ jiffies automatically imply that you can't go to sleep more than
> the jiffy rate times per second?

A low HZ will still go to sleep very often (provided NO_HZ is in effect). But a HZ < 500 makes that number up there turn to zero. But the check further down makes sure that at least 1 tick passes. So that means it will not enter C3 more often than min(HZ, 500) Hz. Another reason to stop using jiffies.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pierre Ossman

unread,

Mar 1, 2008, 8:50:09 AM3/1/08

to

On Fri, 29 Feb 2008 19:38:12 +0100
Pierre Ossman <drzeu...@drzeus.cx> wrote:

> @@ -50,9 +58,16 @@ static int menu_select(struct cpuidle_device *dev)
> break;
> if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
> break;
> + if ((dev->states[i].flags & CPUIDLE_FLAG_DEEP) &&
> + time_before_eq(jiffies, data->last_deep_jif + MIN_DEEP_INTERVAL))
> + break;
> }
>

I guess another approach would be to refuse to enter deep sleep if the sleep time is less than 2 ms. That would mean we would not lose the long sleeps, but if it is just doing short sleeps then we would never enter C3...

Is there a decent way of testing which approach is actually doing the least damage?

Rgds

Lee Revell

unread,

Mar 1, 2008, 9:30:09 PM3/1/08

to

On Fri, Feb 29, 2008 at 1:38 PM, Pierre Ossman <drzeu...@drzeus.cx> wrote:
> Many devices today are of a less than stellar quality, and singing
> transistors are a common problem. A high-pitch noise is created, caused
> by power fluctuations as the processor enters and leaves deep sleep at
> a high frequency.
>
> Instead of just disabling the deep sleep (which wastes power). This
> patch merely reduces the number of times it is entered so that the
> frequency doesn't exceed 500 Hz. That should make sure the problem is
> inaudible.

Should there be a comment stating the motivation for the change?

Thanks for trying to address this problem, I know many users who are afflicted.

Lee

Pierre Ossman

unread,

Mar 2, 2008, 9:30:19 AM3/2/08

to

On Sat, 1 Mar 2008 21:27:09 -0500
"Lee Revell" <rlre...@joe-job.com> wrote:

>
> Should there be a comment stating the motivation for the change?

In the code you mean? Yes, that will be added.

>
> Thanks for trying to address this problem, I know many users who are afflicted.
>

As always, the quickest way to get things done is to make sure a kernel developer suffers from the same issue. ;)

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Andi Kleen

unread,

Mar 3, 2008, 7:40:12 AM3/3/08

to

Pierre Ossman <drzeu...@drzeus.cx> writes:

> Many devices today are of a less than stellar quality, and singing
> transistors are a common problem. A high-pitch noise is created, caused
> by power fluctuations as the processor enters and leaves deep sleep at
> a high frequency.
>
> Instead of just disabling the deep sleep (which wastes power). This
> patch merely reduces the number of times it is entered so that the
> frequency doesn't exceed 500 Hz. That should make sure the problem is
> inaudible.
>
> Signed-off-by: Pierre Ossman <drz...@drzeus.cx>
> --
>
> The basic idea is above, but the implementation doesn't quite work. It seems jiffies is not a good time source in this scenario. The time spent in C3 takes a much bigger hit than I expect. The fact that powertop says that it has an average residency of 200 ms in C2 tells me that those should have been spent in C3.

I like the patch in principle, although I think the threshold should
be configurable, not hard coded.

I haven't checked in detail if that is the case, but you need to make
sure that you only reference jiffies on idle exit after the clock
source has caught up with lost jiffies after idle. That might be
the cause of your problem.

-Andi

Pierre Ossman

unread,

Mar 3, 2008, 3:30:23 PM3/3/08

to

Many devices today are of a less than stellar quality, and singing

capacitors are a common problem. A high-pitch noise is created, caused

by power fluctuations as the processor enters and leaves deep sleep at
a high frequency.

Instead of just disabling the deep sleep (which wastes power). This
patch merely reduces the number of times it is entered so that the
frequency doesn't exceed 500 Hz. That should make sure the problem is
inaudible.

Signed-off-by: Pierre Ossman <drz...@drzeus.cx>
--

I've been playing around with this for a bit, and the jiffies approach
just had too many gotchas. So I tried using the information that was
already present in the menu governor. This is what I came up with.

It solves the primary problem of getting rid of the noise. I've also
tried some crude power measurements and I couldn't get any difference
with and without the patch. Powertop also nicely shows that the CPU is
still spending almost all of its time in C3. Therefore, I'm letting the
effects of it be enabled by default.

So I'm taking the [RFC] off it. Please still test and see that it
solves the issue on other machines, and the it does not cause any big
power surges.

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..d9c43e3 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,8 @@

#define BREAK_FUZZ 4 /* 4 us */

+static unsigned int min_deep_sleep = 2000;

+
struct menu_device {
int last_state_idx;

@@ -50,6 +52,19 @@ static int menu_select(struct cpuidle_device *dev)

break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+

+ /*
+ * In order to avoid the problem of "singing capacitors",
+ * don't enter a deep sleep for short durations (by
+ * default, nothing shorter than 2 ms). This will,
+ * hopefully, keep the problem inaudible.
+ */
+ if (s->flags & CPUIDLE_FLAG_DEEP) {
+ if (min_deep_sleep > data->expected_us)
+ break;
+ if (min_deep_sleep > data->predicted_us)
+ break;
+ }

}

data->last_state_idx = i - 1;

@@ -132,6 +147,9 @@ static void __exit exit_menu(void)
cpuidle_unregister_governor(&menu_governor);
}

+module_param(min_deep_sleep, uint, 0644)
+MODULE_PARM_DESC(min_deep_sleep, "min time (us) to spend in deep sleep to avoid noise")
+
MODULE_LICENSE("GPL");
module_init(init_menu);
module_exit(exit_menu);

Pavel Machek

unread,

Mar 3, 2008, 3:50:08 PM3/3/08

to

Hi!

> I've been playing around with this for a bit, and the jiffies approach
> just had too many gotchas. So I tried using the information that was
> already present in the menu governor. This is what I came up with.
>
> It solves the primary problem of getting rid of the noise. I've also
> tried some crude power measurements and I couldn't get any difference
> with and without the patch. Powertop also nicely shows that the CPU is
> still spending almost all of its time in C3. Therefore, I'm letting the
> effects of it be enabled by default.
>
> So I'm taking the [RFC] off it. Please still test and see that it
> solves the issue on other machines, and the it does not cause any big
> power surges.
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 78d77c5..d9c43e3 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -16,6 +16,8 @@
>
> #define BREAK_FUZZ 4 /* 4 us */
>
> +static unsigned int min_deep_sleep = 2000;
> +

Well, why not, but I believe we should default to old behaviour... not
all machines are cheaply-build.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pierre Ossman

unread,

Mar 3, 2008, 4:10:10 PM3/3/08

to

On Mon, 3 Mar 2008 21:46:03 +0100
Pavel Machek <pa...@ucw.cz> wrote:

> >
> > +static unsigned int min_deep_sleep = 2000;
> > +
>
> Well, why not, but I believe we should default to old behaviour... not
> all machines are cheaply-build.

One would hope. ;)

But the problem is that most people will not be able to find this
option (or even know such an option exists). I'd guess the distros will
just end up having this on by default anyway. And since I could not
measure any extra power drain, I believe it's hard to justify having it
off by default (more than by pure principle).

(Then there's also the whole "But Windows doesn't have this problem!"
line of reasoning...)

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pavel Machek

unread,

Mar 3, 2008, 4:10:17 PM3/3/08

to

On Mon 2008-03-03 22:03:10, Pierre Ossman wrote:
> On Mon, 3 Mar 2008 21:46:03 +0100
> Pavel Machek <pa...@ucw.cz> wrote:
>
> > >
> > > +static unsigned int min_deep_sleep = 2000;
> > > +
> >
> > Well, why not, but I believe we should default to old behaviour... not
> > all machines are cheaply-build.
>
> One would hope. ;)
>
> But the problem is that most people will not be able to find this
> option (or even know such an option exists). I'd guess the distros will
> just end up having this on by default anyway. And since I could not
> measure any extra power drain, I believe it's hard to justify having it
> off by default (more than by pure principle).

So just leave it off by default, and let distros break their own
kernels ;-).

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pallipadi, Venkatesh

unread,

Mar 3, 2008, 4:20:12 PM3/3/08

to

I prefer leaving it off my default and enabling it on faulty hardware by
some blacklist.

Thanks,
Venki

Pierre Ossman

unread,

Mar 3, 2008, 4:20:16 PM3/3/08

to

On Mon, 3 Mar 2008 13:14:28 -0800
"Pallipadi, Venkatesh" <venkatesh...@intel.com> wrote:

>
> I prefer leaving it off my default and enabling it on faulty hardware by
> some blacklist.
>

But are these kind of shoddy components really trackable by model? I'd suspect it has more to do with the shipment of parts the day when the computer was manufactured.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pallipadi, Venkatesh

unread,

Mar 3, 2008, 5:10:16 PM3/3/08

to

>-----Original Message-----
>From: Pierre Ossman [mailto:drzeu...@drzeus.cx]
>Sent: Monday, March 03, 2008 1:18 PM
>To: Pallipadi, Venkatesh
>Cc: Pavel Machek; Adam Belay;
>linu...@lists.linux-foundation.org; LKML; Andi Kleen; Lee Revell
>Subject: Re: [PATCH] cpuidle: avoid singing capacitors
>

>On Mon, 3 Mar 2008 13:14:28 -0800
>"Pallipadi, Venkatesh" <venkatesh...@intel.com> wrote:
>
>>
>> I prefer leaving it off my default and enabling it on faulty
>hardware by
>> some blacklist.
>>
>
>But are these kind of shoddy components really trackable by
>model? I'd suspect it has more to do with the shipment of
>parts the day when the computer was manufactured.
>

But, with this patch:
- we are penalizing good hardware and making them less power efficient
to match the bad ones.
- There may also be server systems which first may not have these sort
of power fluctuations and even when buggy and have this noise, system
may be in some corner of some lab with fans making more noise than the
capacitors.

Thanks,
Venki

Alan Stern

unread,

Mar 3, 2008, 6:10:12 PM3/3/08

to

On Mon, 3 Mar 2008, Pallipadi, Venkatesh wrote:

> But, with this patch:
> - we are penalizing good hardware and making them less power efficient
> to match the bad ones.
> - There may also be server systems which first may not have these sort
> of power fluctuations and even when buggy and have this noise, system
> may be in some corner of some lab with fans making more noise than the
> capacitors.

Can you make it configurable through sysfs? Default to disabled, but
allow the user to turn it on if the machine makes too much noise.

Alan Stern

Andi Kleen

unread,

Mar 3, 2008, 6:10:18 PM3/3/08

to

> But, with this patch:
> - we are penalizing good hardware and making them less power efficient
> to match the bad ones.

The question is how much it would be penalized. Most likely the penalty
is very little. I would agree with Pierre's reasoning for enabling
it by default: distributions will enable it anyways and it's better
to have the same defaults in the mainline kernel as with distros.

It might be worth researching a good default value though. Perhaps
it can be a lot shorter than the current one, lessing the penalty
even more?

-Andi

Andi Kleen

unread,

Mar 3, 2008, 6:10:18 PM3/3/08

to

On Mon, Mar 03, 2008 at 06:05:35PM -0500, Alan Stern wrote:
> On Mon, 3 Mar 2008, Pallipadi, Venkatesh wrote:
>
> > But, with this patch:
> > - we are penalizing good hardware and making them less power efficient
> > to match the bad ones.
> > - There may also be server systems which first may not have these sort
> > of power fluctuations and even when buggy and have this noise, system
> > may be in some corner of some lab with fans making more noise than the
> > capacitors.
>
> Can you make it configurable through sysfs?

It already is, through a writable module_parm()

> Default to disabled, but
> allow the user to turn it on if the machine makes too much noise.

99+% of the users wouldn't be able to figure that out.

-Andi

Dave Jones

unread,

Mar 3, 2008, 11:10:09 PM3/3/08

to

On Tue, Mar 04, 2008 at 12:10:33AM +0100, Andi Kleen wrote:
> On Mon, Mar 03, 2008 at 06:05:35PM -0500, Alan Stern wrote:
> > On Mon, 3 Mar 2008, Pallipadi, Venkatesh wrote:
> >
> > > But, with this patch:
> > > - we are penalizing good hardware and making them less power efficient
> > > to match the bad ones.
> > > - There may also be server systems which first may not have these sort
> > > of power fluctuations and even when buggy and have this noise, system
> > > may be in some corner of some lab with fans making more noise than the
> > > capacitors.
> >
> > Can you make it configurable through sysfs?
>
> It already is, through a writable module_parm()
>
> > Default to disabled, but
> > allow the user to turn it on if the machine makes too much noise.
>
> 99+% of the users wouldn't be able to figure that out.

99+% of users don't have singing capacitors. (Or don't care enough to complain)
For those that do can't figure out what to do from google,
we have a documentation problem.

Dave

--
http://www.codemonkey.org.uk

Pierre Ossman

unread,

Mar 4, 2008, 1:20:09 AM3/4/08

to

Many devices today are of a less than stellar quality, and singing
capacitors are a common problem. A high-pitch noise is created, caused
by power fluctuations as the processor enters and leaves deep sleep at
a high frequency.

Instead of just disabling the deep sleep (which wastes power), this
patch allows you to reduces the number of times it is entered so that
the frequency can be kept inaudible.

Signed-off-by: Pierre Ossman <drz...@drzeus.cx>
--

I'm not religious about the default value, and since Dave Jones piped
in I guess one of my major arguments are gone. :)

Here's the same patch with the default set to 0 (effectively disabling
the patch).

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..d9c43e3 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,8 @@

#define BREAK_FUZZ 4 /* 4 us */

+static unsigned int min_deep_sleep = 0;

+
struct menu_device {
int last_state_idx;

@@ -50,6 +52,19 @@ static int menu_select(struct cpuidle_device *dev)
break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+
+ /*
+ * In order to avoid the problem of "singing capacitors",

+ * don't enter a deep sleep for short durations (a value
+ * of 2 ms is usually sufficient). This will, hopefully,
+ * keep the problem inaudible.

+ */
+ if (s->flags & CPUIDLE_FLAG_DEEP) {
+ if (min_deep_sleep > data->expected_us)
+ break;
+ if (min_deep_sleep > data->predicted_us)
+ break;
+ }
}

data->last_state_idx = i - 1;
@@ -132,6 +147,9 @@ static void __exit exit_menu(void)
cpuidle_unregister_governor(&menu_governor);
}

+module_param(min_deep_sleep, uint, 0644)
+MODULE_PARM_DESC(min_deep_sleep, "min time (us) to spend in deep sleep to avoid noise")
+
MODULE_LICENSE("GPL");
module_init(init_menu);
module_exit(exit_menu);

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Andi Kleen

unread,

Mar 4, 2008, 4:40:16 AM3/4/08

to

On Mon, Mar 03, 2008 at 11:00:48PM -0500, Dave Jones wrote:
> On Tue, Mar 04, 2008 at 12:10:33AM +0100, Andi Kleen wrote:
> > On Mon, Mar 03, 2008 at 06:05:35PM -0500, Alan Stern wrote:
> > > On Mon, 3 Mar 2008, Pallipadi, Venkatesh wrote:
> > >
> > > > But, with this patch:
> > > > - we are penalizing good hardware and making them less power efficient
> > > > to match the bad ones.
> > > > - There may also be server systems which first may not have these sort
> > > > of power fluctuations and even when buggy and have this noise, system
> > > > may be in some corner of some lab with fans making more noise than the
> > > > capacitors.
> > >
> > > Can you make it configurable through sysfs?
> >
> > It already is, through a writable module_parm()
> >
> > > Default to disabled, but
> > > allow the user to turn it on if the machine makes too much noise.
> >
> > 99+% of the users wouldn't be able to figure that out.
>
> 99+% of users don't have singing capacitors. (Or don't care enough to complain)
> For those that do can't figure out what to do from google,
> we have a documentation problem.

The big question is if there is a measurable difference in power consumption
from a reasonable default value. I doubt it actually. The normal rule
of thumb is that if you average sleeps are long enough (and they still
stay this way even with the patch enabled) you're doing ok. If there is
no or not significant penalty there is no reason to not enable it by default.

-Andi

Pierre Ossman

unread,

Mar 4, 2008, 12:30:18 PM3/4/08

to

On Tue, 4 Mar 2008 07:14:23 +0100
Pierre Ossman <drzeu...@drzeus.cx> wrote:

>
> Many devices today are of a less than stellar quality, and singing
> capacitors are a common problem. A high-pitch noise is created, caused
> by power fluctuations as the processor enters and leaves deep sleep at
> a high frequency.
>
> Instead of just disabling the deep sleep (which wastes power), this
> patch allows you to reduces the number of times it is entered so that
> the frequency can be kept inaudible.
>
> Signed-off-by: Pierre Ossman <drz...@drzeus.cx>
> --

Hold off on this. It turns out it still has major problems. The menu algorithm now and then gets really bad at predicting sleep times, completely killing this avoidance scheme.

(On that subject, does anyone except Adam understand that algorithm? Some comments wouldn't hurt...)

So for now, I'm back to thinking that measuring the interval between deep sleeps is the better approach. I could use some ideas for a good clock source though. I haven't dug much deeper than jiffies when it comes to kernel timekeeping.

Rgds

Andi Kleen

unread,

Mar 4, 2008, 12:30:20 PM3/4/08

to

> So for now, I'm back to thinking that measuring the interval between deep sleeps is the better approach. I could use some ideas for a good clock source though. I haven't dug much deeper than jiffies when it comes to kernel timekeeping.

jiffies should work, you just need to make sure you measure them at the
right place. In particular there is some code in dyntick that catches
up on jiffies after the deep sleep when the normal timer handler didn't run
and jiffies are only usable again after that code executed.

-ANdi

Pierre Ossman

unread,

Mar 4, 2008, 12:40:08 PM3/4/08

to

On Tue, 4 Mar 2008 18:29:18 +0100
Andi Kleen <an...@firstfloor.org> wrote:

> > So for now, I'm back to thinking that measuring the interval between deep sleeps is the better approach. I could use some ideas for a good clock source though. I haven't dug much deeper than jiffies when it comes to kernel timekeeping.
>
> jiffies should work, you just need to make sure you measure them at the
> right place. In particular there is some code in dyntick that catches
> up on jiffies after the deep sleep when the normal timer handler didn't run
> and jiffies are only usable again after that code executed.
>

And how can I tell if this handler has run? Not that it really matters I guess, as I don't think running it from the cpuidle governor is very sane.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Andi Kleen

unread,

Mar 4, 2008, 12:50:08 PM3/4/08

to

> And how can I tell if this handler has run? Not that it really matters I guess, as I don't think running it from the cpuidle governor is very sane.

After irq_enter->tick_nohz_update_jiffies()

You might need a new callback into the idle governours for this though.

-Andi

Pierre Ossman

unread,

Mar 4, 2008, 1:10:16 PM3/4/08

to

On Tue, 4 Mar 2008 18:43:15 +0100
Andi Kleen <an...@firstfloor.org> wrote:

> > And how can I tell if this handler has run? Not that it really matters I guess, as I don't think running it from the cpuidle governor is very sane.
>
> After irq_enter->tick_nohz_update_jiffies()
>
> You might need a new callback into the idle governours for this though.
>

I don't think I quite see the point. If we can't force an update, then
it doesn't really help us knowing if the jiffies value is stale or not.

Anyway, I'm running the following patch for now. I'll keep it on my
machine for a few days and see if I still hear crickets.

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..73f954d 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,9 @@

#define BREAK_FUZZ 4 /* 4 us */

+static unsigned int min_deep_sleep = 0;

+static unsigned int failed_deep = 0;

+
struct menu_device {
int last_state_idx;

@@ -23,6 +26,8 @@ struct menu_device {
unsigned int predicted_us;
unsigned int last_measured_us;
unsigned int elapsed_us;
+
+ unsigned long last_deep;
};

static DEFINE_PER_CPU(struct menu_device, menu_devices);
@@ -50,9 +55,33 @@ static int menu_select(struct cpuidle_device *dev)

break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+
+ /*
+ * In order to avoid the problem of "singing capacitors",

+ * don't enter a deep sleep for short durations (2 ms seems
+ * to do the trick). This will, hopefully, keep the problem
+ * inaudible.
+ */
+ if (min_deep_sleep && (s->flags & CPUIDLE_FLAG_DEEP)) {
+ if (time_before_eq(jiffies, data->last_deep +
+ HZ * min_deep_sleep / USEC_PER_SEC))
+ break;
+#if 0

+ if (min_deep_sleep > data->expected_us)
+ break;
+ if (min_deep_sleep > data->predicted_us)
+ break;

+ if (min_deep_sleep > data->last_measured_us)
+ break;
+#endif

+ }
}

data->last_state_idx = i - 1;

+
+ if (dev->states[i - 1].flags & CPUIDLE_FLAG_DEEP)
+ data->last_deep = jiffies;
+
return i - 1;
}

@@ -80,6 +109,10 @@ static void menu_reflect(struct cpuidle_device *dev)
if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
measured_us = USEC_PER_SEC / HZ;

+ if ((target->flags & CPUIDLE_FLAG_DEEP) &&
+ (cpuidle_get_last_residency(dev) < min_deep_sleep))
+ failed_deep++;
+
/* Predict time remaining until next break event */
if (measured_us + BREAK_FUZZ < data->expected_us - target->exit_latency) {
data->predicted_us = max(measured_us, data->last_measured_us);
@@ -103,6 +136,11 @@ static int menu_enable_device(struct cpuidle_device *dev)
struct menu_device *data = &per_cpu(menu_devices, dev->cpu);

memset(data, 0, sizeof(struct menu_device));
+ /*
+ * 0 might be slightly ahead of the current jiffies count, so
+ * it's a bad initial value.
+ */
+ data->last_deep = jiffies;

return 0;
}
@@ -132,6 +170,11 @@ static void __exit exit_menu(void)

cpuidle_unregister_governor(&menu_governor);
}

+module_param(min_deep_sleep, uint, 0644)
+MODULE_PARM_DESC(min_deep_sleep, "min time (us) to spend in deep sleep to avoid noise")
+

+module_param(failed_deep, uint, 0644)

+
MODULE_LICENSE("GPL");
module_init(init_menu);
module_exit(exit_menu);

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Andi Kleen

unread,

Mar 4, 2008, 1:40:11 PM3/4/08

to

On Tue, Mar 04, 2008 at 07:04:46PM +0100, Pierre Ossman wrote:
> On Tue, 4 Mar 2008 18:43:15 +0100
> Andi Kleen <an...@firstfloor.org> wrote:
>
> > > And how can I tell if this handler has run? Not that it really matters I guess, as I don't think running it from the cpuidle governor is very sane.
> >
> > After irq_enter->tick_nohz_update_jiffies()
> >
> > You might need a new callback into the idle governours for this though.
> >
>
> I don't think I quite see the point. If we can't force an update, then
> it doesn't really help us knowing if the jiffies value is stale or not.

What do you mean with force an update? The system always does an
jiffies update automatically after idle wakeup. It's just that if you add
code to the code path that uses jiffies you have to make sure it is after the
standard update, otherwise you'll see a stale value.

-Andi

Pallipadi, Venkatesh

unread,

Mar 4, 2008, 2:20:14 PM3/4/08

to

>-----Original Message-----
>From: Pierre Ossman [mailto:drzeu...@drzeus.cx]

Prediction is based on cumulative time till "non expected wakeup". So,
prediction will come into play only when there are very short wakeups
due to "unexpected wakeups".

>So for now, I'm back to thinking that measuring the interval
>between deep sleeps is the better approach. I could use some
>ideas for a good clock source though. I haven't dug much
>deeper than jiffies when it comes to kernel timekeeping.

I think best solution is to use get_last_residency that is already
there. If the last residency or expected_us is very low, you can avoid
deep idle states. That way you don't have to depend on jiffies being
updated at the time you are checking it.

Thanks,
Venki

Pierre Ossman

unread,

Mar 5, 2008, 1:10:09 AM3/5/08

to

On Tue, 4 Mar 2008 11:01:28 -0800
"Pallipadi, Venkatesh" <venkatesh...@intel.com> wrote:

> >
> >(On that subject, does anyone except Adam understand that
> >algorithm? Some comments wouldn't hurt...)
>
> Prediction is based on cumulative time till "non expected wakeup". So,
> prediction will come into play only when there are very short wakeups
> due to "unexpected wakeups".
>

Ah, so the measured_us and elapsed_us are trying to keep track of time between non-timer wakeups?

> >So for now, I'm back to thinking that measuring the interval
> >between deep sleeps is the better approach. I could use some
> >ideas for a good clock source though. I haven't dug much
> >deeper than jiffies when it comes to kernel timekeeping.
>
> I think best solution is to use get_last_residency that is already
> there. If the last residency or expected_us is very low, you can avoid
> deep idle states. That way you don't have to depend on jiffies being
> updated at the time you are checking it.
>

I tried using predicted_us and last_measured_us, and those didn't work (see the #if 0 code in my last patch). And since cpuidle_get_last_residency() is part of predicted_us, I don't think it is reporting useful values.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pierre Ossman

unread,

Mar 5, 2008, 1:10:10 AM3/5/08

to

On Tue, 4 Mar 2008 19:34:06 +0100
Andi Kleen <an...@firstfloor.org> wrote:

> On Tue, Mar 04, 2008 at 07:04:46PM +0100, Pierre Ossman wrote:
> >
> > I don't think I quite see the point. If we can't force an update, then
> > it doesn't really help us knowing if the jiffies value is stale or not.
>
> What do you mean with force an update? The system always does an
> jiffies update automatically after idle wakeup. It's just that if you add
> code to the code path that uses jiffies you have to make sure it is after the
> standard update, otherwise you'll see a stale value.
>

I still don't follow. Perhaps you can express it in pseudo code? If I have a stale value that I cannot refresh, knowing that it is stale doesn't change anything.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pierre Ossman

unread,

Mar 5, 2008, 3:50:09 AM3/5/08

to

On Wed, 5 Mar 2008 07:02:01 +0100
Pierre Ossman <drzeu...@drzeus.cx> wrote:

>
> I tried using predicted_us and last_measured_us, and those didn't work (see the #if 0 code in my last patch). And since cpuidle_get_last_residency() is part of predicted_us, I don't think it is reporting useful values.
>

I take this back. They might be working just fine. It seems I've been looking at a too small piece of the puzzle. This machine has a dual core processor, and the governors control each core independently. Unfortunately it's the power fluctuations of the entire socket that causes noise, not just each processor.

So I need to build some global algorithm instead of one per core. Ideas are welcome.

From what I can tell, disabling one core makes the noise go away. So I guess both cores need to go into C3 (or perhaps one C2 and one C3) at the same time to cause the problem. I'm not 100% sure of this as the damn noise comes and goes, but I've been running for an hour or so now with one core disabled and without my anti-noise patch.

Pavel Machek

unread,

Mar 5, 2008, 7:40:23 AM3/5/08

to

On Wed 2008-03-05 09:40:23, Pierre Ossman wrote:
> On Wed, 5 Mar 2008 07:02:01 +0100
> Pierre Ossman <drzeu...@drzeus.cx> wrote:
>
> >
> > I tried using predicted_us and last_measured_us, and those didn't work (see the #if 0 code in my last patch). And since cpuidle_get_last_residency() is part of predicted_us, I don't think it is reporting useful values.
> >
>
> I take this back. They might be working just fine. It seems I've been looking at a too small piece of the puzzle. This machine has a dual core processor, and the governors control each core independently. Unfortunately it's the power fluctuations of the entire socket that causes noise, not just each processor.
>
> So I need to build some global algorithm instead of one per core. Ideas are welcome.
>
> From what I can tell, disabling one core makes the noise go away. So I guess both cores need to go into C3 (or perhaps one C2 and one C3) at the same time to cause the problem. I'm not 100% sure of this as the damn noise comes and goes, but I've been running for an hour or so now with one core disabled and without my anti-noise patch.
>

Actually, there are more uncertaininties. Suspecting some machines
seemed to produce noise whenever they were in low-power state. Not
with fluctuations: when I forced them to low-power, it just produced
steady beep.

(Thinkpad 560X and toshiba 4030cdt, IIRC).

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pierre Ossman

unread,

Mar 5, 2008, 8:50:16 AM3/5/08

to

On Wed, 5 Mar 2008 10:03:04 +0100
Pavel Machek <pa...@ucw.cz> wrote:

>
> Actually, there are more uncertaininties. Suspecting some machines
> seemed to produce noise whenever they were in low-power state. Not
> with fluctuations: when I forced them to low-power, it just produced
> steady beep.
>

Ouch. How do I quickly test that here?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pavel Machek

unread,

Mar 5, 2008, 8:50:17 AM3/5/08

to

On Wed 2008-03-05 14:42:05, Pierre Ossman wrote:
> On Wed, 5 Mar 2008 10:03:04 +0100
> Pavel Machek <pa...@ucw.cz> wrote:
>
> >
> > Actually, there are more uncertaininties. Suspecting some machines
> > seemed to produce noise whenever they were in low-power state. Not
> > with fluctuations: when I forced them to low-power, it just produced
> > steady beep.
> >
>
> Ouch. How do I quickly test that here?

Setting cpufreq to lowest frequency reproduced it for me, but I guess
notebooks are different.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pierre Ossman

unread,

Mar 5, 2008, 9:00:14 AM3/5/08

to

On Wed, 5 Mar 2008 14:47:38 +0100
Pavel Machek <pa...@ucw.cz> wrote:

> On Wed 2008-03-05 14:42:05, Pierre Ossman wrote:
> >
> > Ouch. How do I quickly test that here?
>
> Setting cpufreq to lowest frequency reproduced it for me, but I guess
> notebooks are different.
>

That has no effect here unfortunately.

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Andi Kleen

unread,

Mar 5, 2008, 10:50:12 AM3/5/08

to

On Wed, Mar 05, 2008 at 07:04:54AM +0100, Pierre Ossman wrote:
> I still don't follow. Perhaps you can express it in pseudo code? If I have a stale value that I cannot refresh, knowing that it is stale doesn't change anything.

Start: You discovered at some point where you currently have code a variable
is not updated yet.
Fact: You have some new code that runs before that point
Information: The variable is updated later eventually in the idle exit path.
Fact II: You require the variable to be updated in your new code
Possible solutions:
(1) you move your new code in a point of the idle exit path after the variable
is updated
(2) you move the code that updates the variable earlier before your code
Solution: I described the first variant which is likely easier.
How: I told you where it is updated, so that shouldn't be too difficult.
Action: Implement solution (1) or (2)
Action: Test if it works
Check: If test succeeded exit
Otherwise: Restart at Start

-Andi

Pierre Ossman

unread,

Mar 5, 2008, 12:00:11 PM3/5/08

to

On Wed, 5 Mar 2008 16:48:56 +0100
Andi Kleen <an...@firstfloor.org> wrote:

>
> Start: You discovered at some point where you currently have code a variable
> is not updated yet.
> Fact: You have some new code that runs before that point
> Information: The variable is updated later eventually in the idle exit path.
> Fact II: You require the variable to be updated in your new code

For some value of "require". The code will do what it's supposed to do anyway, just in a sub-optimal way (it will avoid a deep sleep even though it didn't need to).

> Possible solutions:
> (1) you move your new code in a point of the idle exit path after the variable
> is updated

But I don't use jiffies in the idle exit path, only the entry path.

> (2) you move the code that updates the variable earlier before your code

Which basically leaves this option. I.e. guarantee that jiffies are updated between one cpuidle reflect and the subsequent select.

> Solution: I described the first variant which is likely easier.
> How: I told you where it is updated, so that shouldn't be too difficult.
> Action: Implement solution (1) or (2)
> Action: Test if it works
> Check: If test succeeded exit
> Otherwise: Restart at Start
>

Now you're just being a smart-ass. :)

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Andi Kleen

unread,

Mar 5, 2008, 12:40:12 PM3/5/08

to

> But I don't use jiffies in the idle exit path, only the entry path.

On entry jiffies should be always uptodate. Only on exit there
is the small code window where it isn't.

> > Action: Test if it works
> > Check: If test succeeded exit
> > Otherwise: Restart at Start
> >
>
> Now you're just being a smart-ass. :)

Well you asked for pseudo code ...

-Andi

Jeremy Fitzhardinge

unread,

Mar 5, 2008, 12:40:16 PM3/5/08

to

Pierre Ossman wrote:
> On Fri, 29 Feb 2008 13:09:05 -0800 (PST)
> Alan Jenkins <alan-j...@tuffmail.co.uk> wrote:

>
>
>> Pierre Ossman wrote:
>>
>>> Many devices today are of a less than stellar quality, and singing

>>> transistors are a common problem. A high-pitch noise is created, caused

>>> by power fluctuations as the processor enters and leaves deep sleep at
>>> a high frequency.
>>>

>> Capacitors or transistors? The subject and the description disagree.
>>
>>
>
> That should teach me to write commit messages when I'm tired... Capacitors is of course the right answer. :)

More likely inductors, I think. The coils can vibrate against the coil
if they haven't been properly potted in something. Capacitors don't
really have anything which can "sing".

J

Pierre Ossman

unread,

Mar 5, 2008, 1:10:16 PM3/5/08

to

On Wed, 05 Mar 2008 09:32:15 -0800
Jeremy Fitzhardinge <jer...@goop.org> wrote:

>
> More likely inductors, I think. The coils can vibrate against the coil
> if they haven't been properly potted in something. Capacitors don't
> really have anything which can "sing".
>

I have no idea. Previous reports of this behaviour have always been described as "singing" capacitors and explained by some piezoelectric effect.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pierre Ossman

unread,

Mar 6, 2008, 3:30:28 AM3/6/08

to

On Wed, 5 Mar 2008 09:40:23 +0100
Pierre Ossman <drzeu...@drzeus.cx> wrote:

>
> So I need to build some global algorithm instead of one per core. Ideas are welcome.
>

I'm currently trying this variant. No noise yet, but the time spent in C3 seems to take a massive hit. I'll see if I can get around to some kind of power measurement (ideas for doing that in a sane way are still welcome btw).

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..960aa39 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,11 @@

#define BREAK_FUZZ 4 /* 4 us */

+static unsigned int min_deep_sleep = 0;

+static unsigned int failed_deep = 0;
+

+static unsigned long global_last_deep;

+
struct menu_device {
int last_state_idx;

@@ -23,6 +28,8 @@ struct menu_device {

unsigned int predicted_us;
unsigned int last_measured_us;
unsigned int elapsed_us;
+
+ unsigned long last_deep;
};

static DEFINE_PER_CPU(struct menu_device, menu_devices);

@@ -50,9 +57,51 @@ static int menu_select(struct cpuidle_device *dev)

break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+
+ /*
+ * In order to avoid the problem of "singing capacitors",

+ * don't enter a deep sleep for short durations (2 ms seems
+ * to do the trick). This will, hopefully, keep the problem

+ * inaudible.
+ */

+ if (min_deep_sleep && (s->flags & CPUIDLE_FLAG_DEEP)) {
+ if (time_before_eq(jiffies, data->last_deep +
+ HZ * min_deep_sleep / USEC_PER_SEC))
+ break;
+

+ rmb();
+
+ if (time_before_eq(jiffies, global_last_deep +

+ HZ * min_deep_sleep / USEC_PER_SEC))
+ break;
+

+#if 0

+ if (min_deep_sleep > data->expected_us)
+ break;

+ if (min_deep_sleep > cpuidle_get_last_residency(dev))
+ break;
+#endif
+#if 0

+ if (min_deep_sleep > data->predicted_us)
+ break;

+ if (min_deep_sleep > data->last_measured_us)
+ break;
+#endif

+ }
}

data->last_state_idx = i - 1;

+
+ if (dev->states[i - 1].flags & CPUIDLE_FLAG_DEEP) {
+#if 0
+ printk("Sleep interval: %ld jiffies\n",
+ jiffies - data->last_deep);
+#endif

+ data->last_deep = jiffies;

+ global_last_deep = jiffies;
+ wmb();
+ }

+
return i - 1;
}

@@ -80,6 +129,15 @@ static void menu_reflect(struct cpuidle_device *dev)

if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
measured_us = USEC_PER_SEC / HZ;

+ if ((target->flags & CPUIDLE_FLAG_DEEP) &&

+ (cpuidle_get_last_residency(dev) < min_deep_sleep)) {
+#if 0
+ printk("F: %d / %d,%d\n", cpuidle_get_last_residency(dev),
+ data->expected_us,data->predicted_us);
+#endif
+ failed_deep++;
+ }

+
/* Predict time remaining until next break event */
if (measured_us + BREAK_FUZZ < data->expected_us - target->exit_latency) {
data->predicted_us = max(measured_us, data->last_measured_us);

@@ -102,7 +160,14 @@ static int menu_enable_device(struct cpuidle_device *dev)

{
struct menu_device *data = &per_cpu(menu_devices, dev->cpu);

+ printk("Attaching menu governor...\n");
+

memset(data, 0, sizeof(struct menu_device));
+ /*
+ * 0 might be slightly ahead of the current jiffies count, so
+ * it's a bad initial value.
+ */
+ data->last_deep = jiffies;

return 0;
}

@@ -121,6 +186,8 @@ static struct cpuidle_governor menu_governor = {
*/
static int __init init_menu(void)
{
+ global_last_deep = jiffies;
+
return cpuidle_register_governor(&menu_governor);
}

@@ -132,6 +199,11 @@ static void __exit exit_menu(void)

cpuidle_unregister_governor(&menu_governor);
}

+module_param(min_deep_sleep, uint, 0644)
+MODULE_PARM_DESC(min_deep_sleep, "min time (us) to spend in deep sleep to avoid noise")
+

+module_param(failed_deep, uint, 0644)

+
MODULE_LICENSE("GPL");
module_init(init_menu);
module_exit(exit_menu);

Pierre Ossman

unread,

Mar 9, 2008, 10:30:16 AM3/9/08

to

I'm beginning to think this is a lost cause. I've tried several variants, all without satisfactory results.

In case anyone else has any more ideas, I'll detail what I've found influences the noise:

1. C state

This is the big one. There is no noise as long as C3 is avoided (processor.max_cstate).

2. uhci_hcd driver

USB is somehow involved in this problem. Unloading the uhci_hcd driver almost entirely kills the noise on a 1000 HZ NO_HZ kernel. On a 100 HZ, no NO_HZ kernel, the effect is very small, but still there.

3. Low speed USB devices

Related, the noise goes away if I insert a USB mouse (low speed). A high-speed device does not effect the noise, neither does the two built-in low speed devices (a fingerprint reader and a bluetooth host).

4. Battery and AC

The noise increases with the battery present and also when the AC supply is removed.

5. Second core

Disabling the second core makes the noise go away. This might be a subset of 1., as I've been told that a stopped core enters C1.

Changing HZ and NO_HZ has no noticeable effect on the problem (except the odd behaviour in 2.). This is further supported by the fact that Windows also has the problem (which should behave close to 100 HZ without NO_HZ).

So for now, the only viable workaround is processor.max_cstate....

Rafael J. Wysocki

unread,

Mar 9, 2008, 2:30:23 PM3/9/08

to

On Sunday, 9 of March 2008, Pierre Ossman wrote:
> I'm beginning to think this is a lost cause. I've tried several variants, all without satisfactory results.
>
> In case anyone else has any more ideas, I'll detail what I've found influences the noise:
>
> 1. C state
>
> This is the big one. There is no noise as long as C3 is avoided (processor.max_cstate).
>
> 2. uhci_hcd driver
>
> USB is somehow involved in this problem. Unloading the uhci_hcd driver almost
> entirely kills the noise on a 1000 HZ NO_HZ kernel. On a 100 HZ, no NO_HZ
> kernel, the effect is very small, but still there.
>
> 3. Low speed USB devices
>
> Related, the noise goes away if I insert a USB mouse (low speed).
> A high-speed device does not effect the noise, neither does the two built-in
> low speed devices (a fingerprint reader and a bluetooth host).
>
> 4. Battery and AC
>
> The noise increases with the battery present and also when the AC supply is
> removed.
>
> 5. Second core
>
> Disabling the second core makes the noise go away. This might be
> a subset of 1., as I've been told that a stopped core enters C1.
>
>
> Changing HZ and NO_HZ has no noticeable effect on the problem (except
> the odd behaviour in 2.). This is further supported by the fact that Windows
> also has the problem (which should behave close to 100 HZ without NO_HZ).
>
>
> So for now, the only viable workaround is processor.max_cstate....

Well, there may be some users willing to trade some battery life for having a
quiet box. :-)

Perhaps it's worth documenting?

Thanks,
Rafael

Alan Stern

unread,

Mar 9, 2008, 3:00:12 PM3/9/08

to

On Sun, 9 Mar 2008, Pierre Ossman wrote:

> I'm beginning to think this is a lost cause. I've tried several variants, all without satisfactory results.
>
> In case anyone else has any more ideas, I'll detail what I've found influences the noise:
>
> 1. C state
>
> This is the big one. There is no noise as long as C3 is avoided (processor.max_cstate).
>
> 2. uhci_hcd driver
>
> USB is somehow involved in this problem. Unloading the uhci_hcd driver almost entirely kills the noise on a 1000 HZ NO_HZ kernel. On a 100 HZ, no NO_HZ kernel, the effect is very small, but still there.
>
> 3. Low speed USB devices
>
> Related, the noise goes away if I insert a USB mouse (low speed). A high-speed device does not effect the noise, neither does the two built-in low speed devices (a fingerprint reader and a bluetooth host).

The relation to UHCI probably has to do with ongoing DMA. The amount
of DMA varies according to whether or not the USB devices are
suspended, whether or not they have drivers, and the speed at which
they run.

You can test whether suspending the onboard USB devices changes
anything. See Documentation/usb/power-management.txt.

Alan Stern

Henrique de Moraes Holschuh

unread,

Mar 9, 2008, 3:40:13 PM3/9/08

to

On Sun, 09 Mar 2008, Pierre Ossman wrote:
> I'm beginning to think this is a lost cause. I've tried several variants, all without satisfactory results.

THIS will work:
http://www.nordichardware.com/image3.php?id=1446

ThinkPad owners have known this for a while...

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

Pierre Ossman

unread,

Mar 9, 2008, 4:20:10 PM3/9/08

to

On Sun, 9 Mar 2008 16:30:09 -0300
Henrique de Moraes Holschuh <h...@hmh.eng.br> wrote:

> On Sun, 09 Mar 2008, Pierre Ossman wrote:
> > I'm beginning to think this is a lost cause. I've tried several variants, all without satisfactory results.
>
> THIS will work:
> http://www.nordichardware.com/image3.php?id=1446
>
> ThinkPad owners have known this for a while...
>

Odd. The thinkpad sites I've been looking at all agree that glue doesn't help at all. Do you have some more info about that "fix"?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Henrique de Moraes Holschuh

unread,

Mar 9, 2008, 4:50:15 PM3/9/08

to

On Sun, 09 Mar 2008, Pierre Ossman wrote:
> On Sun, 9 Mar 2008 16:30:09 -0300
> Henrique de Moraes Holschuh <h...@hmh.eng.br> wrote:
> > On Sun, 09 Mar 2008, Pierre Ossman wrote:
> > > I'm beginning to think this is a lost cause. I've tried several variants, all without satisfactory results.
> >
> > THIS will work:
> > http://www.nordichardware.com/image3.php?id=1446
> >
> > ThinkPad owners have known this for a while...
>
> Odd. The thinkpad sites I've been looking at all agree that glue doesn't help at all. Do you have some more info about that "fix"?

I asked for some pictures for thinkwiki, but nobody sent them in yet :(

You also need to know what glue to use (it needs to be one that is *not* too
hard when cured or it will just propagate the high frequencies instead of
dampening them), and where to apply it (it may be the chip, the capacitor,
or one of the inductors, etc).

As soon as the warranty on my T43 is over, I will do some heavy research on
the subject and hardware-mod it with Arcric ice, a PLL/ICH heatsink and
noise dampening.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

Henrique de Moraes Holschuh

unread,

Mar 9, 2008, 5:00:09 PM3/9/08

to

On Sun, 09 Mar 2008, Henrique de Moraes Holschuh wrote:
> As soon as the warranty on my T43 is over, I will do some heavy research on
> the subject and hardware-mod it with Arcric ice, a PLL/ICH heatsink and
> noise dampening.

I mean Arctic Silver, the thermal-transfer paste (must be the heat here
playing tricks with my mind :p).

It is better than what Lenovo used on the T43 (but not by much, unless you
did get a defective part to begin with).

Pavel Machek

unread,

Mar 10, 2008, 7:10:08 AM3/10/08

to

On Sun 2008-03-09 15:16:59, Pierre Ossman wrote:
> I'm beginning to think this is a lost cause. I've tried several variants, all without satisfactory results.
>
> In case anyone else has any more ideas, I'll detail what I've found influences the noise:
>
> 1. C state
>
> This is the big one. There is no noise as long as C3 is avoided (processor.max_cstate).
>
> 2. uhci_hcd driver
>
> USB is somehow involved in this problem. Unloading the uhci_hcd driver almost entirely kills the noise on a 1000 HZ NO_HZ kernel. On a 100 HZ, no NO_HZ kernel, the effect is very small, but still there.

USB keeps processor out of C3 in many cases.

> 4. Battery and AC
>
> The noise increases with the battery present and also when the AC supply is removed.
>

On battery, some machines swap C3 for "secret" C4, which is slower but
saves power.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Pierre Ossman

unread,

Mar 10, 2008, 9:00:12 AM3/10/08

to

On Mon, 10 Mar 2008 11:00:08 +0100
Pavel Machek <pa...@ucw.cz> wrote:

> On Sun 2008-03-09 15:16:59, Pierre Ossman wrote:
> > 2. uhci_hcd driver
> >
> > USB is somehow involved in this problem. Unloading the uhci_hcd driver almost entirely kills the noise on a 1000 HZ NO_HZ kernel. On a 100 HZ, no NO_HZ kernel, the effect is very small, but still there.
>
> USB keeps processor out of C3 in many cases.
>

I figured that was the case. But I did not see any difference in powertop.

> > 4. Battery and AC
> >
> > The noise increases with the battery present and also when the AC supply is removed.
> >
>
> On battery, some machines swap C3 for "secret" C4, which is slower but
> saves power.

I guess the only way to find out is to disassemble the DSDT. Exposing myself to such concentrations of ACPI is not an appealing project. :/

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Andi Kleen

unread,

Mar 10, 2008, 9:10:16 AM3/10/08

to

> I guess the only way to find out is to disassemble the DSDT. Exposing myself to such concentrations of ACPI is not an appealing project. :/

Normally such things are not visible in the DSDT, but hidden in SMM traps.

-Andi

Pierre Ossman

unread,

Mar 10, 2008, 9:40:10 AM3/10/08

to

On Mon, 10 Mar 2008 14:04:15 +0100
Andi Kleen <an...@firstfloor.org> wrote:

>
> > I guess the only way to find out is to disassemble the DSDT. Exposing myself to such concentrations of ACPI is not an appealing project. :/
>
> Normally such things are not visible in the DSDT, but hidden in SMM traps.
>

Delightful. There is something about the lower layers of PC hardware that just makes me want to cry...

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pierre Ossman

unread,

Mar 11, 2008, 4:00:22 AM3/11/08

to

I had a sudden surge of inspiration and decided to follow up on this
one a bit:

>
> 5. Second core
>
> Disabling the second core makes the noise go away. This might be a subset of 1., as I've been told that a stopped core enters C1.
>

I have now found a very hacky workaround that is slightly better than
disabling C3 altogether; making C3 exclusive to one core at a time. It
seems to kill the noise and the system now spends 50% in C3, instead of
0%.

I read somewhere that the Intel Duo:s have an extra low power mode that
is entered if both cores are suspended at the same time (disabling the
shared cache or something like that I guess). So perhaps that mode is
the culprit. I'm hoping there's a way to disable just that feature. Do
any of you know a way? Venkatesh perhaps?

Here's the patch I'm currently running at least:

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 78d77c5..9a859fb 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -16,6 +16,8 @@

#define BREAK_FUZZ 4 /* 4 us */

+static atomic_t cur_deep_cpu = ATOMIC_INIT(-1);

+
struct menu_device {
int last_state_idx;

@@ -50,6 +52,13 @@ static int menu_select(struct cpuidle_device *dev)

break;
if (s->exit_latency > pm_qos_requirement(PM_QOS_CPU_DMA_LATENCY))
break;
+

+ if (s->flags & CPUIDLE_FLAG_DEEP) {
+ rmb();
+ if (atomic_cmpxchg(&cur_deep_cpu, -1, dev->cpu) != -1)
+ break;
+ wmb();

+ }
}

data->last_state_idx = i - 1;

@@ -80,6 +89,11 @@ static void menu_reflect(struct cpuidle_device *dev)

if (!(target->flags & CPUIDLE_FLAG_TIME_VALID))
measured_us = USEC_PER_SEC / HZ;

+ if (atomic_read(&cur_deep_cpu) == dev->cpu) {
+ atomic_set(&cur_deep_cpu, -1);
+ wmb();

+ }
+
/* Predict time remaining until next break event */
if (measured_us + BREAK_FUZZ < data->expected_us - target->exit_latency) {
data->predicted_us = max(measured_us, data->last_measured_us);

Andi Kleen

unread,

Mar 11, 2008, 6:50:09 AM3/11/08

to

> I have now found a very hacky workaround that is slightly better than
> disabling C3 altogether; making C3 exclusive to one core at a time. It
> seems to kill the noise and the system now spends 50% in C3, instead of
> 0%.

I suspect it won't do much of C3 if only one core is idle this way.
Most likely you just disabled C3 this way in a way invisible to software.

-Andi

Pierre Ossman

unread,

Mar 11, 2008, 11:30:11 AM3/11/08

to

On Tue, 11 Mar 2008 11:48:22 +0100
Andi Kleen <an...@firstfloor.org> wrote:

> > I have now found a very hacky workaround that is slightly better than
> > disabling C3 altogether; making C3 exclusive to one core at a time. It
> > seems to kill the noise and the system now spends 50% in C3, instead of
> > 0%.
>
> I suspect it won't do much of C3 if only one core is idle this way.
> Most likely you just disabled C3 this way in a way invisible to software.
>

What a letdown... I'll pull out the old watt meter again and see if I can confirm that.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pierre Ossman

unread,

Mar 11, 2008, 1:40:15 PM3/11/08

to

On Tue, 11 Mar 2008 16:20:26 +0100
Pierre Ossman <drzeu...@drzeus.cx> wrote:

> On Tue, 11 Mar 2008 11:48:22 +0100
> Andi Kleen <an...@firstfloor.org> wrote:
>
> > > I have now found a very hacky workaround that is slightly better than
> > > disabling C3 altogether; making C3 exclusive to one core at a time. It
> > > seems to kill the noise and the system now spends 50% in C3, instead of
> > > 0%.
> >
> > I suspect it won't do much of C3 if only one core is idle this way.
> > Most likely you just disabled C3 this way in a way invisible to software.
> >
>
> What a letdown... I'll pull out the old watt meter again and see if I can confirm that.
>

It seems that you are correct Andi. Power usage is in line with C3 completely disabled. Bummer :/

Len Brown

unread,

Mar 12, 2008, 3:20:09 PM3/12/08

to

On Tuesday 11 March 2008, Andi Kleen wrote:
> > I have now found a very hacky workaround that is slightly better than
> > disabling C3 altogether; making C3 exclusive to one core at a time. It
> > seems to kill the noise and the system now spends 50% in C3, instead of
> > 0%.
>
> I suspect it won't do much of C3 if only one core is idle this way.
> Most likely you just disabled C3 this way in a way invisible to software.

Andi is correct.
C-states are coordinated in hardware.
ie. if both cores enter core-C3, then the hardware allows the package
to enter package C3.

Package C-states are a big deal -- that is where most of the power savings is.

So preventing all the cores from entering C3 at the same time is
the opposite of what we want to be doing.

-Len

Len Brown

unread,

Mar 12, 2008, 3:20:10 PM3/12/08

to

> > USB keeps processor out of C3 in many cases.
>
> I figured that was the case. But I did not see any difference in powertop.

Modern Intel mobile processors have a feature called "C2 popup"
that allows the processor to retire DMA from C3 without
breaking into C0. Instead the processor pops up to C2
where the cache snoop can allow the DMA to retire --
then it returns to C3, all transparent to software.

That is why powertop does not see it.

> > > 4. Battery and AC
> > >
> > > The noise increases with the battery present and also when the AC supply is removed.
> > >
> >
> > On battery, some machines swap C3 for "secret" C4, which is slower but
> > saves power.

ak> Normally such things are not visible in the DSDT, but hidden in SMM traps.

no, this mechanism is totally visible to the OS via a _CST re-evaluation on AC/DC transition.

This is very commmon, as the reference BIOS code does it.
It isn't a secret. There are two common techniques -- either
re-define C3 to enter hardware C4, or simply add C4 as 4th C-state.

> I guess the only way to find out is to disassemble the DSDT. Exposing myself to such concentrations of ACPI is not an appealing project. :/

the latest cpuidle code actually publishes the details of the C-states being used.

$ pwd
/sys/devices/system/cpu/cpu0/cpuidle/state3
$ grep . *
desc:ACPI FFH INTEL MWAIT 0x20
latency:17
name:C3
power:250
time:1862590932
usage:9028970

You'll see "desc" change if ACPI pulls a _CST change on you.

> Disabling the second core makes the noise go away. This might be a subset of 1., as I've been told that a stopped core enters C1.

if you disable it at run-time, Linux puts it in C1.
If you never boot it in the first place (eg. maxcpusp=1), the BIOS leaves it in the deepest available C-sate.

The former will prevent the package from entering a deep package C-state,
as c-states are coordinated in hardware. The later will allow the package
to enter whatever C-state the running core chooses.

> Changing HZ and NO_HZ has no noticeable effect on the problem (except the odd behaviour in 2.).
> This is further supported by the fact that Windows also has the problem (which should behave close to 100 HZ without NO_HZ).
>
>
> So for now, the only viable workaround is processor.max_cstate....

yup, that's why we put it in. Is it insufficient?

-Len

Len Brown

unread,

Mar 12, 2008, 4:40:18 PM3/12/08

to

> I'll see if I can get around to some kind of power measurement (ideas for doing that in a sane way are still welcome btw).

Run BLTK.

http://www.lesswatts.org/projects/bltk/

Pavel Machek

unread,

Mar 13, 2008, 6:20:16 AM3/13/08

to

Hi!

> > > USB keeps processor out of C3 in many cases.
> >
> > I figured that was the case. But I did not see any difference in powertop.
>
> Modern Intel mobile processors have a feature called "C2 popup"
> that allows the processor to retire DMA from C3 without
> breaking into C0. Instead the processor pops up to C2
> where the cache snoop can allow the DMA to retire --
> then it returns to C3, all transparent to software.

Does that mean we should go to C3 on modern intels, even with
busmaster going on, so that cpu can keep going C2..C3..C2 as needed?

> if you disable it at run-time, Linux puts it in C1. If you never
> boot it in the first place (eg. maxcpusp=1), the BIOS leaves it in
> the deepest available C-sate.

It would be nice to fix this BTW. offlined cpu in C1 eats too much
power, and makes measurements on the second core impossible...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Andi Kleen

unread,

Mar 13, 2008, 6:50:07 AM3/13/08

to

On Thu, Mar 13, 2008 at 09:10:48AM +0100, Pavel Machek wrote:
> Hi!
>
> > > > USB keeps processor out of C3 in many cases.
> > >
> > > I figured that was the case. But I did not see any difference in powertop.
> >
> > Modern Intel mobile processors have a feature called "C2 popup"
> > that allows the processor to retire DMA from C3 without
> > breaking into C0. Instead the processor pops up to C2
> > where the cache snoop can allow the DMA to retire --
> > then it returns to C3, all transparent to software.
>
> Does that mean we should go to C3 on modern intels, even with
> busmaster going on, so that cpu can keep going C2..C3..C2 as needed?

C3 is still more expensive power wise to enter, so entering C3 just
to let it immediately go back to C2 for bus mastering would be likely
still a loss over staying at C2.

-Andi

Pierre Ossman

unread,

Mar 13, 2008, 12:40:09 PM3/13/08

to

On Wed, 12 Mar 2008 15:11:17 -0400
Len Brown <le...@kernel.org> wrote:

> > I guess the only way to find out is to disassemble the DSDT. Exposing myself to such concentrations of ACPI is not an appealing project. :/
>
> the latest cpuidle code actually publishes the details of the C-states being used.
>
> $ pwd
> /sys/devices/system/cpu/cpu0/cpuidle/state3
> $ grep . *
> desc:ACPI FFH INTEL MWAIT 0x20
> latency:17
> name:C3
> power:250
> time:1862590932
> usage:9028970
>
> You'll see "desc" change if ACPI pulls a _CST change on you.
>

It does not. But my C3 desc looks like this:

state3/desc:ACPI FFH INTEL MWAIT 0x50

What's the meaning of the last number?

I also have different latency and power:

state3/latency:162
state3/power:100

> > Disabling the second core makes the noise go away. This might be a subset of 1., as I've been told that a stopped core enters C1.
>
> if you disable it at run-time, Linux puts it in C1.
> If you never boot it in the first place (eg. maxcpusp=1), the BIOS leaves it in the deepest available C-sate.
>

Ah. Didn't know that. I've confirmed this as the noise is there when booting with maxcpus.

> >
> > So for now, the only viable workaround is processor.max_cstate....
>
> yup, that's why we put it in. Is it insufficient?
>

For some value of insufficient. It is sub-optimal. I'm hoping there is a way to enter C3 in a pattern that avoids the noise and still gives a reduced power usage.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pallipadi, Venkatesh

unread,

Mar 13, 2008, 1:10:17 PM3/13/08

to

>-----Original Message-----
>From: linux-ker...@vger.kernel.org
>[mailto:linux-ker...@vger.kernel.org] On Behalf Of Pierre Ossman
>Sent: Thursday, March 13, 2008 9:35 AM
>To: Len Brown
>Cc: linu...@lists.linux-foundation.org; Pavel Machek; LKML;
>Adam Belay; Andi Kleen; Lee Revell
>Subject: Re: [linux-pm] [PATCH] cpuidle: avoid singing capacitors
>
>On Wed, 12 Mar 2008 15:11:17 -0400
>Len Brown <le...@kernel.org> wrote:
>
>> >
>> > So for now, the only viable workaround is processor.max_cstate....
>>
>> yup, that's why we put it in. Is it insufficient?
>>
>
>For some value of insufficient. It is sub-optimal. I'm hoping
>there is a way to enter C3 in a pattern that avoids the noise
>and still gives a reduced power usage.
>

Can you try using /dev/cpu_dma_latency from an user space application.
Refer Documentation/pm_qos_interface.txt for usage. With this interface
you can change menu governor behavior from userspace, by dynamically
allowing it to use C3 or not. I am thinking that something like allowing
C3 for few seconds and blocking it for few seconds may help.

Thanks,
Venki

Pierre Ossman

unread,

Mar 13, 2008, 1:50:08 PM3/13/08

to

On Thu, 13 Mar 2008 09:47:30 -0700
"Pallipadi, Venkatesh" <venkatesh...@intel.com> wrote:

>
> Can you try using /dev/cpu_dma_latency from an user space application.
> Refer Documentation/pm_qos_interface.txt for usage. With this interface
> you can change menu governor behavior from userspace, by dynamically
> allowing it to use C3 or not. I am thinking that something like allowing
> C3 for few seconds and blocking it for few seconds may help.
>

I don't quite follow. Is the theory that you get a window of a few seconds where the system stops making a noise, even as it toggles C3?

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org

Pierre Ossman

unread,

Mar 13, 2008, 2:00:34 PM3/13/08

to

On Thu, 13 Mar 2008 17:34:37 +0100
Pierre Ossman <drzeu...@drzeus.cx> wrote:

>
> It does not. But my C3 desc looks like this:
>
> state3/desc:ACPI FFH INTEL MWAIT 0x50
>
> What's the meaning of the last number?
>

I've surfed Intel's web site for a bit, and found the MWAIT instruction in the arch docs. Unfortunately, it does not explain the value 0x50. According to the docs, that would be C6, which is not listed in the regular set of states for the Core 2. I did find this however:

http://www.trustedreviews.com/cpu-memory/review/2007/03/30/Intel-Processor-Roadmap-Penryn-Nehalem-and-the-Future/p2

It says that C6 is some ultra low power mode where more or less the entire chip is powered down.

Len, Venkatesh, do you have some insight? And is it harmless to move to more power hungry modes (i.e. C4 or C3).

Len Brown

unread,

Mar 14, 2008, 12:20:08 AM3/14/08

to

On Thursday 13 March 2008, Andi Kleen wrote:
> On Thu, Mar 13, 2008 at 09:10:48AM +0100, Pavel Machek wrote:
> > Hi!
> >
> > > > > USB keeps processor out of C3 in many cases.
> > > >
> > > > I figured that was the case. But I did not see any difference in powertop.
> > >
> > > Modern Intel mobile processors have a feature called "C2 popup"
> > > that allows the processor to retire DMA from C3 without
> > > breaking into C0. Instead the processor pops up to C2
> > > where the cache snoop can allow the DMA to retire --
> > > then it returns to C3, all transparent to software.
> >
> > Does that mean we should go to C3 on modern intels, even with
> > busmaster going on, so that cpu can keep going C2..C3..C2 as needed?

That decision has already been made for us.
BM_STS has been made a no-op on recent processors.
It reports bus activity only for a small sub-set of
south-bridge devices. Otherwise it tells us there
is none and that we should proceed into C3.

> C3 is still more expensive power wise to enter, so entering C3 just
> to let it immediately go back to C2 for bus mastering would be likely
> still a loss over staying at C2.

The newer the processor, the less exposed we area to this scenario.

cheers,
-Len

Pierre Ossman

unread,

Mar 14, 2008, 3:50:16 PM3/14/08

to

On Thu, 13 Mar 2008 17:34:37 +0100
Pierre Ossman <drzeu...@drzeus.cx> wrote:

> On Wed, 12 Mar 2008 15:11:17 -0400
> Len Brown <le...@kernel.org> wrote:
>
> >
> > You'll see "desc" change if ACPI pulls a _CST change on you.
> >
>
> It does not. But my C3 desc looks like this:
>
> state3/desc:ACPI FFH INTEL MWAIT 0x50
>

I must have done something wrong. I now see a switch between C6 and C3
when I play with the AC cord.

On that theme, I've tested fiddling with the real C-states. I've added
a new max_hwcstate that makes ACPI downgrade MWAIT hints. I also made
some odd discoveries:

C3: More or less completely silent (I haven't tested it in a really
quiet environment yet).
C4: Constant noise
C5: Constant noise
C6: Intermittent noise

When I say constant, I mean that the noise is not generated as a result
of switching between modes (to any extent I can see at least). The
average time spent in C3 (as reported by Powertop) is over 200 ms. So
that would give a frequency of around 5 Hz, not a consistent tone of
several kHz.

Here's said patch. Please comment as I hope this can be merged:

diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index 8ca3557..389ea8b 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -47,6 +47,9 @@ EXPORT_SYMBOL(acpi_processor_power_init_bm_check);

/* The code below handles cstate entry with monitor-mwait pair on Intel*/

+static unsigned int max_hwcstate __read_mostly = -1;
+module_param(max_hwcstate, uint, 0644);
+
struct cstate_entry {
struct {
unsigned int eax;
@@ -80,6 +83,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
unsigned int edx_part;
unsigned int cstate_type; /* C-state type and not ACPI C-state type */
unsigned int num_cstate_subtype;
+ unsigned int hint;

if (!cpu_cstate_entry || c->cpuid_level < CPUID_MWAIT_LEAF )
return -1;
@@ -100,16 +104,40 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
cpuid(CPUID_MWAIT_LEAF, &eax, &ebx, &ecx, &edx);

/* Check whether this particular cx_type (in CST) is supported or not */
- cstate_type = (cx->address >> MWAIT_SUBSTATE_SIZE) + 1;
- edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE);
- num_cstate_subtype = edx_part & MWAIT_SUBSTATE_MASK;
+ hint = cx->address;
+ for (;;) {
+ /* Compute main C-state */
+ cstate_type = (hint >> MWAIT_SUBSTATE_SIZE) + 1;
+
+ /* Determine number of sub-states for this C-state */
+ edx_part = edx >> (cstate_type * MWAIT_SUBSTATE_SIZE);
+ num_cstate_subtype = edx_part & MWAIT_SUBSTATE_MASK;
+
+ /* Check if it's within constraints, and supported */
+ if ((cstate_type > max_hwcstate) ||
+ (num_cstate_subtype <=
+ (hint & MWAIT_SUBSTATE_MASK))) {
+ /* Move down a C-state and drop sub-state */
+ cstate_type--;
+ hint = (cstate_type - 1) << MWAIT_SUBSTATE_SIZE;
+ /* Out of states, abort */
+ if (cstate_type == 0) {
+ retval = -1;
+ goto out;
+ }
+ } else {
+ break;
+ }
+ }

- retval = 0;
- if (num_cstate_subtype < (cx->address & MWAIT_SUBSTATE_MASK)) {
- retval = -1;
- goto out;
+ if (hint != cx->address) {
+ printk(KERN_DEBUG "ACPI: Downgrading hardware C%d to C%d\n",
+ (cx->address >> MWAIT_SUBSTATE_SIZE) + 1,
+ (hint >> MWAIT_SUBSTATE_SIZE) + 1);
}

+ retval = 0;
+
/* mwait ecx extensions INTERRUPT_BREAK should be supported for C2/C3 */
if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED) ||
!(ecx & CPUID5_ECX_INTERRUPT_BREAK)) {
@@ -119,15 +147,15 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
percpu_entry->states[cx->index].ecx = MWAIT_ECX_INTERRUPT_BREAK;

/* Use the hint in CST */
- percpu_entry->states[cx->index].eax = cx->address;
+ percpu_entry->states[cx->index].eax = hint;

if (!mwait_supported[cstate_type]) {
mwait_supported[cstate_type] = 1;
printk(KERN_DEBUG "Monitor-Mwait will be used to enter C-%d "
"state\n", cx->type);
}
- snprintf(cx->desc, ACPI_CX_DESC_LEN, "ACPI FFH INTEL MWAIT 0x%x",
- cx->address);
+ snprintf(cx->desc, ACPI_CX_DESC_LEN, "ACPI FFH INTEL MWAIT 0x%x (0x%x)",
+ hint, cx->address);

out:
set_cpus_allowed(current, saved_mask);

Pallipadi, Venkatesh

unread,

Mar 14, 2008, 5:20:12 PM3/14/08

to

>-----Original Message-----
>From: Pierre Ossman [mailto:drzeu...@drzeus.cx]
>Sent: Friday, March 14, 2008 12:41 PM
>To: Len Brown; Pallipadi, Venkatesh
>Cc: linu...@lists.linux-foundation.org; Pavel Machek; LKML;
>Adam Belay; Andi Kleen; Lee Revell
>Subject: Re: [linux-pm] [PATCH] cpuidle: avoid singing capacitors
>

>On Thu, 13 Mar 2008 17:34:37 +0100
>Pierre Ossman <drzeu...@drzeus.cx> wrote:
>
>> On Wed, 12 Mar 2008 15:11:17 -0400
>> Len Brown <le...@kernel.org> wrote:
>>
>> >
>> > You'll see "desc" change if ACPI pulls a _CST change on you.
>> >
>>
>> It does not. But my C3 desc looks like this:
>>
>> state3/desc:ACPI FFH INTEL MWAIT 0x50
>>
>
>I must have done something wrong. I now see a switch between C6 and C3
>when I play with the AC cord.
>

BIOSes do that on some platforms, exposing deeper C-states on battery.

Why go only one C-state down. Why not directly drop to max_hwcstate? We
don't need to loop through that way.

There are few other concerns which make me feel that the patch will be
not as simple as this.
1) BIOS may already be exporting the lower C-states as separate states.
In which case we just have to ignore this deep C-state and return. I
mean, on your system if BIOS exports C1, C3 and C6 hardware C-states and
you give max_hwcstate as C3, then we don't want to have C1, C3 and C3 in
our list.

2) On same lines the other information in ACPI like (low power of 100
and higher latency for C6 on your system) corresponds to hardware C6
state. If we change the hardware C-state here to C3, then continue to
use latency info from ACPI for hw C6, that may lead to inefficient state
selection in the governor. Also, ther are assumptions related DMA,
lapic, TSC etc in upper level ACPI based on "ACPI C-state" and changing
underlying hardware C-state to C1 for example will change some of those
behavior.

Thanks,
Venki

Pierre Ossman

unread,

Mar 14, 2008, 8:50:09 PM3/14/08

to

On Fri, 14 Mar 2008 14:15:46 -0700
"Pallipadi, Venkatesh" <venkatesh...@intel.com> wrote:

>
> Why go only one C-state down. Why not directly drop to max_hwcstate? We
> don't need to loop through that way.
>

Good point. But we can't just jump there and be done with it. That state might not be available (then again, we could require the user to put in a valid max_hwcstate, but that isn't as user friendly).

> There are few other concerns which make me feel that the patch will be
> not as simple as this.
> 1) BIOS may already be exporting the lower C-states as separate states.
> In which case we just have to ignore this deep C-state and return. I
> mean, on your system if BIOS exports C1, C3 and C6 hardware C-states and
> you give max_hwcstate as C3, then we don't want to have C1, C3 and C3 in
> our list.
>

Indeed. But the current system doesn't easily allows us to handle that. Any ideas?

> 2) On same lines the other information in ACPI like (low power of 100
> and higher latency for C6 on your system) corresponds to hardware C6
> state. If we change the hardware C-state here to C3, then continue to
> use latency info from ACPI for hw C6, that may lead to inefficient state
> selection in the governor. Also, ther are assumptions related DMA,
> lapic, TSC etc in upper level ACPI based on "ACPI C-state" and changing
> underlying hardware C-state to C1 for example will change some of those
> behavior.
>

It is still way better going to C3 less than possible than not at all (i.e. the previous workaround of processor.max_cstate). As far as I can tell from the documentation, every C-state includes all of the negative side effects of the ones preceding it. So it should just be a matter of sub-optimal selection by the governor. And I don't see how we can fix that without a big table for each processor and possibly chipset.

Rgds
--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org