[PATCH] i386: Selectable Frequency of the Timer Interrupt.

11 views
Skip to first unread message

christoph

unread,
May 16, 2005, 4:46:31 PM5/16/05
to linux-...@vger.kernel.org, sh...@scalex86.org, ak...@osdl.org
Make the timer frequency selectable. The timer interrupt may cause bus
and memory contention in large NUMA systems since the interrupt occurs
on each processor HZ times per second.

Signed-off-by: Christoph Lameter <chri...@scale86.org>
Signed-off-by: Shai Fultheim <sh...@scalex86.org>

Index: linux-2.6.11/arch/i386/Kconfig
===================================================================
--- linux-2.6.11.orig/arch/i386/Kconfig 2005-05-16 12:07:31.000000000 -0700
+++ linux-2.6.11/arch/i386/Kconfig 2005-05-16 12:39:48.000000000 -0700
@@ -939,6 +939,20 @@ config SECCOMP

If unsure, say Y. Only embedded should say N here.

+config HZ
+ int "Frequency of the Timer Interrupt (1000 or 100)"
+ range 100 1000
+ default 1000
+ help
+ Allows the configuration of the timer frequency. It is customary
+ to have the timer interrupt run at 1000 HZ but 100 HZ may be more
+ beneficial for servers and NUMA systems that do not need to have
+ a fast response for user interaction and that may experience bus
+ contention and cacheline bounces as a result of timer interrupts.
+ Note that the timer interrupt occurs on each processor in an SMP
+ environment leading to NR_CPUS * HZ number of timer interrupts
+ per second.
+
endmenu


Index: linux-2.6.11/include/asm-i386/param.h
===================================================================
--- linux-2.6.11.orig/include/asm-i386/param.h 2005-05-16 12:07:25.000000000 -0700
+++ linux-2.6.11/include/asm-i386/param.h 2005-05-16 12:09:04.000000000 -0700
@@ -2,7 +2,7 @@
#define _ASMi386_PARAM_H

#ifdef __KERNEL__
-# define HZ 1000 /* Internal kernel timer frequency */
+# define HZ CONFIG_HZ /* Internal kernel timer frequency */
# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
#endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Lee Revell

unread,
May 16, 2005, 4:53:17 PM5/16/05
to christoph, linux-...@vger.kernel.org, sh...@scalex86.org, ak...@osdl.org
On Mon, 2005-05-16 at 12:45 -0700, christoph wrote:
> Make the timer frequency selectable. The timer interrupt may cause bus
> and memory contention in large NUMA systems since the interrupt occurs
> on each processor HZ times per second.

Isn't there already a patch in the -ac kernel that allows HZ to be
selected at runtime?

Lee

Andrew Morton

unread,
May 16, 2005, 6:13:59 PM5/16/05
to christoph, linux-...@vger.kernel.org, sh...@scalex86.org, Linus Torvalds
christoph <chri...@scalex86.org> wrote:
>
> Make the timer frequency selectable. The timer interrupt may cause bus
> and memory contention in large NUMA systems since the interrupt occurs
> on each processor HZ times per second.
>
> Signed-off-by: Christoph Lameter <chri...@scale86.org>
> Signed-off-by: Shai Fultheim <sh...@scalex86.org>
>
> Index: linux-2.6.11/arch/i386/Kconfig
> ===================================================================
> --- linux-2.6.11.orig/arch/i386/Kconfig 2005-05-16 12:07:31.000000000 -0700
> +++ linux-2.6.11/arch/i386/Kconfig 2005-05-16 12:39:48.000000000 -0700
> @@ -939,6 +939,20 @@ config SECCOMP
>
> If unsure, say Y. Only embedded should say N here.
>
> +config HZ
> + int "Frequency of the Timer Interrupt (1000 or 100)"
> + range 100 1000

Linus spat this patch back a couple of years ago. Last time we discussed
it, a year ago, he said

On Fri, 21 May 2004, Andrew Morton wrote:
>
> Len, do you have any numbers on this? Do you think we need to address
> this? If so, is there any sane alternative to CONFIG_HZ?

100Hz is too little for a number of users, and yes, 1kHz is too high - I
selected it partly because it made it oh-so-much-more-obvious when
some pieces weren't converted.

1kHz is also good in that it makes it easy to convert both to USER_HZ and
to ms/ns. But maybe something like 250Hz would be better - still high
enough that things like multimedia (which really wants higher frequencies
in order to be able to sleep for fractional video-frames) should be happy,
low enough that we use less CPU.

(The issue being that the latency of entering ACPI low-power mode is of the
order of one millisecond on some machines, so HZ=1000 whacks the battery).

So yes, the time has come around again to work out what we're going to do
about this. I'd be a bit worried about allowing users to set HZ=724,
simply because nobody tests with that, and it might expose odd timing
relationships and unfortunate arithmetic rounding artifacts. So if we're
going to do this thing it might be better to just offer 100, 250 and 1000.

christoph

unread,
May 16, 2005, 9:55:53 PM5/16/05
to Lee Revell, linux-...@vger.kernel.org, sh...@scalex86.org, ak...@osdl.org
On Mon, 16 May 2005, Lee Revell wrote:

> On Mon, 2005-05-16 at 12:45 -0700, christoph wrote:
> > Make the timer frequency selectable. The timer interrupt may cause bus
> > and memory contention in large NUMA systems since the interrupt occurs
> > on each processor HZ times per second.
>
> Isn't there already a patch in the -ac kernel that allows HZ to be
> selected at runtime?

Runtime? That seems to be a bad idea. It would be better to rewrite the
timer subsystem to be able to work tickless.

Christoph Lameter

unread,
May 16, 2005, 10:40:49 PM5/16/05
to Andrew Morton, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de, Linus Torvalds
On Mon, 16 May 2005, Andrew Morton wrote:

> So yes, the time has come around again to work out what we're going to do
> about this. I'd be a bit worried about allowing users to set HZ=724,
> simply because nobody tests with that, and it might expose odd timing
> relationships and unfortunate arithmetic rounding artifacts. So if we're
> going to do this thing it might be better to just offer 100, 250 and 1000.

Ok. Here is the patch allowing 100, 250 and 1000 HZ for i386 and x86_64:

-----

Make the timer frequency selectable. The timer interrupt may cause bus
and memory contention in large NUMA systems since the interrupt occurs
on each processor HZ times per second.

Signed-off-by: Christoph Lameter <chri...@lameter.com>
Signed-off-by: Shai Fultheim <sh...@scalex86.org>

Index: linux-2.6.12-rc4/arch/i386/Kconfig
===================================================================
--- linux-2.6.12-rc4.orig/arch/i386/Kconfig 2005-05-17 02:19:55.000000000 +0000
+++ linux-2.6.12-rc4/arch/i386/Kconfig 2005-05-17 02:27:12.000000000 +0000
@@ -1133,6 +1133,20 @@
a work-around for a number of buggy BIOSes. Switch this option on if
your computer crashes instead of powering off properly.

+config HZ
+ int "Frequency of the Timer Interrupt (100, 250 or 1000 per second)"


+ range 100 1000
+ default 1000
+ help
+ Allows the configuration of the timer frequency. It is customary
+ to have the timer interrupt run at 1000 HZ but 100 HZ may be more
+ beneficial for servers and NUMA systems that do not need to have
+ a fast response for user interaction and that may experience bus
+ contention and cacheline bounces as a result of timer interrupts.
+ Note that the timer interrupt occurs on each processor in an SMP
+ environment leading to NR_CPUS * HZ number of timer interrupts
+ per second.
+
endmenu

source "arch/i386/kernel/cpu/cpufreq/Kconfig"
Index: linux-2.6.12-rc4/include/asm-i386/param.h
===================================================================
--- linux-2.6.12-rc4.orig/include/asm-i386/param.h 2005-05-17 02:15:57.000000000 +0000
+++ linux-2.6.12-rc4/include/asm-i386/param.h 2005-05-17 02:30:22.000000000 +0000
@@ -1,8 +1,19 @@
+#include <linux/config.h>
+
#ifndef _ASMi386_PARAM_H


#define _ASMi386_PARAM_H

#ifdef __KERNEL__
-# define HZ 1000 /* Internal kernel timer frequency */

+#if CONFIG_HZ == 1000
+#define HZ 1000 /* Internal kernel timer frequency */
+#elif CONFIG_HZ == 250
+#define CONFIG_HZ 250
+#elif CONFIG_HZ == 100
+#define HZ = 100
+#else
+#error "Invalid Timer Interrupt Frequency"
+#endif
+


# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
#endif

Index: linux-2.6.12-rc4/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.12-rc4.orig/arch/x86_64/Kconfig 2005-05-17 02:19:54.000000000 +0000
+++ linux-2.6.12-rc4/arch/x86_64/Kconfig 2005-05-17 02:26:38.000000000 +0000
@@ -410,6 +410,20 @@



If unsure, say Y. Only embedded should say N here.

+config HZ

+ int "Frequency of the Timer Interrupt (100, 250 or 1000 per second)"


+ range 100 1000
+ default 1000
+ help
+ Allows the configuration of the timer frequency. It is customary
+ to have the timer interrupt run at 1000 HZ but 100 HZ may be more
+ beneficial for servers and NUMA systems that do not need to have
+ a fast response for user interaction and that may experience bus
+ contention and cacheline bounces as a result of timer interrupts.
+ Note that the timer interrupt occurs on each processor in an SMP
+ environment leading to NR_CPUS * HZ number of timer interrupts
+ per second.
+
endmenu

#
Index: linux-2.6.12-rc4/include/asm-x86_64/param.h
===================================================================
--- linux-2.6.12-rc4.orig/include/asm-x86_64/param.h 2005-03-02 07:38:10.000000000 +0000
+++ linux-2.6.12-rc4/include/asm-x86_64/param.h 2005-05-17 02:28:04.000000000 +0000
@@ -1,8 +1,20 @@
+#include <linux/config.h>
+
#ifndef _ASMx86_64_PARAM_H
#define _ASMx86_64_PARAM_H



#ifdef __KERNEL__
-# define HZ 1000 /* Internal kernel timer frequency */
+

+#if CONFIG_HZ == 1000
+# define HZ 1000 /* Internal kernel timer frequency */
+#elif CONFIG_HZ == 250
+# define HZ 250
+#elif CONFIG_HZ == 100
+# define HZ 100
+#else
+#error "Invalid Timer Interrupt Frequency"
+#endif
+


# define USER_HZ 100 /* .. some user interfaces are in "ticks */

#define CLOCKS_PER_SEC (USER_HZ) /* like times() */
#endif

randy_dunlap

unread,
May 16, 2005, 10:48:46 PM5/16/05
to Christoph Lameter, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de, torv...@osdl.org

How about using choice / endchoice to that an improper HZ value
cannot be entered at all? (instead of using range M N)


---
~Randy

Christoph Lameter

unread,
May 16, 2005, 10:57:30 PM5/16/05
to randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de, torv...@osdl.org
On Mon, 16 May 2005, randy_dunlap wrote:

> | endmenu
>
> How about using choice / endchoice to that an improper HZ value
> cannot be entered at all? (instead of using range M N)

That would not allow to set the value of CONFIG_HZ to a numeric value.
I would have CONFIG_HZ_100 CONFIG_HZ_250 etc. Gets a bit complicated
to handle.

randy_dunlap

unread,
May 16, 2005, 11:03:08 PM5/16/05
to Christoph Lameter, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de, torv...@osdl.org
On Mon, 16 May 2005 19:55:42 -0700 (PDT) Christoph Lameter wrote:

| On Mon, 16 May 2005, randy_dunlap wrote:
|
| > | endmenu
| >
| > How about using choice / endchoice to that an improper HZ value
| > cannot be entered at all? (instead of using range M N)
|
| That would not allow to set the value of CONFIG_HZ to a numeric value.
| I would have CONFIG_HZ_100 CONFIG_HZ_250 etc. Gets a bit complicated
| to handle.

Ack, I see.

Thanks for explaining.

---
~Randy

Coywolf Qi Hunt

unread,
May 16, 2005, 11:03:54 PM5/16/05
to Christoph Lameter, Andrew Morton, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de, Linus Torvalds

You mean #define HZ 250 here.
--
Coywolf Qi Hunt
http://sosdg.org/~coywolf/

Linus Torvalds

unread,
May 16, 2005, 11:37:24 PM5/16/05
to Christoph Lameter, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de

On Mon, 16 May 2005, Christoph Lameter wrote:
>
> That would not allow to set the value of CONFIG_HZ to a numeric value.
> I would have CONFIG_HZ_100 CONFIG_HZ_250 etc. Gets a bit complicated
> to handle.

I don't think it gets any more complex to handle than the stuff you need
to do now (#ifdef's, and the #define HZ CONFIG_HZ games).

Also, I think you can do it in the Kconfig file, which at least makes it a
fairly localized thing:

choice
prompt "Timer frequency"
default HZ_250

config HZ_100
bool "100 Hz"

confic HZ_250
bool "250 Hz"

config HZ_1000
bool "1000 Hz"

endchoice

config HZ
int
default 100 if HZ_100
default 250 if HZ_250
default 1000 if HZ_1000

and now you can just do

#define HZ CONFIG_HZ

or something. You can even maje the Kconfig parts be a separate Kconfig.HZ
file, and have both the x86 and x86-64 Kconfig files just include the
common part (since it's a generic issue, not even PC-related: we might
want to allow things like 60Hz frequencies for CONFIG_EMBEDDED etc, and
these choices are really valid on any system that allows for the timer to
be reprogrammed)

The above is obviously totally untested, but it doesn't look any more
complex than having a fairly ugly (and much less user-friendly) check at
compile-time.

Linus

Andi Kleen

unread,
May 16, 2005, 11:42:21 PM5/16/05
to Linus Torvalds, Christoph Lameter, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de
> choice
> prompt "Timer frequency"
> default HZ_250
>
> config HZ_100
> bool "100 Hz"
>
> confic HZ_250
> bool "250 Hz"
>
> config HZ_1000
> bool "1000 Hz"

I would add a

config HZ_10 if EMBEDDED
bool "10 Hz"

that is useful for compute servers (although it will violate the TCP
specification). EMBEDDED would ensure only people who know what they're
doing set it.

-Andi

Christoph Lameter

unread,
May 17, 2005, 1:33:13 AM5/17/05
to Linus Torvalds, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de
On Mon, 16 May 2005, Linus Torvalds wrote:

> or something. You can even maje the Kconfig parts be a separate Kconfig.HZ
> file, and have both the x86 and x86-64 Kconfig files just include the
> common part (since it's a generic issue, not even PC-related: we might
> want to allow things like 60Hz frequencies for CONFIG_EMBEDDED etc, and
> these choices are really valid on any system that allows for the timer to
> be reprogrammed)

Ok. Here is the patch redone. The location for Kconfig.hz is in the
kernel directory since the other timer related stuff is there too:

---

Make the timer frequency selectable. The timer interrupt may cause bus
and memory contention in large NUMA systems since the interrupt occurs
on each processor HZ times per second.

Signed-off-by: Christoph Lameter <chri...@lameter.com>
Signed-off-by: Shai Fultheim <sh...@scalex86.org>

Index: linux-2.6.12-rc4/arch/i386/Kconfig
===================================================================
--- linux-2.6.12-rc4.orig/arch/i386/Kconfig 2005-05-17 02:19:55.000000000 +0000

+++ linux-2.6.12-rc4/arch/i386/Kconfig 2005-05-17 05:27:31.000000000 +0000
@@ -1133,6 +1133,8 @@


a work-around for a number of buggy BIOSes. Switch this option on if
your computer crashes instead of powering off properly.

+source kernel/Kconfig.hz


+
endmenu

source "arch/i386/kernel/cpu/cpufreq/Kconfig"
Index: linux-2.6.12-rc4/include/asm-i386/param.h
===================================================================

--- linux-2.6.12-rc4.orig/include/asm-i386/param.h 2005-05-17 05:08:56.000000000 +0000
+++ linux-2.6.12-rc4/include/asm-i386/param.h 2005-05-17 05:10:08.000000000 +0000
@@ -1,8 +1,10 @@


+#include <linux/config.h>
+
#ifndef _ASMi386_PARAM_H
#define _ASMi386_PARAM_H

#ifdef __KERNEL__
-# define HZ 1000 /* Internal kernel timer frequency */

+# define HZ CONFIG_HZ /* Internal kernel timer frequency */


# define USER_HZ 100 /* .. some user interfaces are in "ticks" */
# define CLOCKS_PER_SEC (USER_HZ) /* like times() */
#endif
Index: linux-2.6.12-rc4/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.12-rc4.orig/arch/x86_64/Kconfig 2005-05-17 02:19:54.000000000 +0000

+++ linux-2.6.12-rc4/arch/x86_64/Kconfig 2005-05-17 05:20:49.000000000 +0000
@@ -410,6 +410,8 @@



If unsure, say Y. Only embedded should say N here.

+source kernel/Kconfig.hz


+
endmenu

#
Index: linux-2.6.12-rc4/include/asm-x86_64/param.h
===================================================================

--- linux-2.6.12-rc4.orig/include/asm-x86_64/param.h 2005-05-17 05:08:52.000000000 +0000
+++ linux-2.6.12-rc4/include/asm-x86_64/param.h 2005-05-17 05:09:42.000000000 +0000
@@ -1,9 +1,11 @@
+#include <linux/config.h>
+
#ifndef _ASMx86_64_PARAM_H
#define _ASMx86_64_PARAM_H



#ifdef __KERNEL__
-# define HZ 1000 /* Internal kernel timer frequency */

-# define USER_HZ 100 /* .. some user interfaces are in "ticks */
+# define HZ CONFIG_HZ /* Internal kernel timer frequency */
+# define USER_HZ 100 /* .. some user interfaces are in "ticks */


#define CLOCKS_PER_SEC (USER_HZ) /* like times() */
#endif

Index: linux-2.6.12-rc4/kernel/Kconfig.hz
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6.12-rc4/kernel/Kconfig.hz 2005-05-17 05:24:01.000000000 +0000
@@ -0,0 +1,46 @@
+#
+# Timer Interrupt Frequency Configuration
+#
+
+choice
+ prompt "Timer frequency"
+ default HZ_250


+ help
+ Allows the configuration of the timer frequency. It is customary
+ to have the timer interrupt run at 1000 HZ but 100 HZ may be more
+ beneficial for servers and NUMA systems that do not need to have
+ a fast response for user interaction and that may experience bus
+ contention and cacheline bounces as a result of timer interrupts.
+ Note that the timer interrupt occurs on each processor in an SMP
+ environment leading to NR_CPUS * HZ number of timer interrupts
+ per second.
+

+
+ config HZ_100
+ bool "100 HZ"
+ help
+ 100 HZ is a typical choice for servers, SMP and NUMA systems
+ with lots of processors that may show reduced performance if
+ too many timer interrupts are occurring.
+
+ config HZ_250
+ bool "250 HZ"
+ help
+ 250 HZ is a good compromise choice allowing server performance
+ while also showing good interactive responsiveness even
+ on SMP and NUMA systems.
+
+ config HZ_1000
+ bool "1000 HZ"
+ help
+ 1000 HZ is the preferred choice for desktop systems and other
+ systems requiring fast interactive responses to events.
+
+endchoice
+
+config HZ
+ int
+ default 100 if HZ_100
+ default 250 if HZ_250
+ default 1000 if HZ_1000
+

Paulo Marques

unread,
May 17, 2005, 7:40:46 AM5/17/05
to Andi Kleen, Linus Torvalds, Christoph Lameter, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org
Andi Kleen wrote:
>>[...]

> I would add a
>
> config HZ_10 if EMBEDDED
> bool "10 Hz"
>
> that is useful for compute servers (although it will violate the TCP
> specification). EMBEDDED would ensure only people who know what they're
> doing set it.

I thought the lowest frequency the PIT timer would give was around 18 Hz.

Am I wrong, or are you thinking of other timing devices / different
platforms?

--
Paulo Marques - www.grupopie.com

An expert is a person who has made all the mistakes that can be
made in a very narrow field.
Niels Bohr (1885 - 1962)

Andi Kleen

unread,
May 17, 2005, 9:22:06 AM5/17/05
to Paulo Marques, Andi Kleen, Linus Torvalds, Christoph Lameter, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org
On Tue, May 17, 2005 at 11:51:03AM +0100, Paulo Marques wrote:
> Andi Kleen wrote:
> >>[...]
> >I would add a
> >
> > config HZ_10 if EMBEDDED
> > bool "10 Hz"
> >
> >that is useful for compute servers (although it will violate the TCP
> >specification). EMBEDDED would ensure only people who know what they're
> >doing set it.
>
> I thought the lowest frequency the PIT timer would give was around 18 Hz.
>
> Am I wrong, or are you thinking of other timing devices / different
> platforms?

I was thinking of HPET. You're right it would probably not work with
PIT.

Oh well, I guess it wasn't that great an idea anyways. I merely
suggested it because I know some people do it already.

-Andi

Joe Korty

unread,
May 17, 2005, 9:47:29 AM5/17/05
to Christoph Lameter, Linus Torvalds, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de
On Mon, May 16, 2005 at 10:31:25PM -0700, Christoph Lameter wrote:
> + help
> + 100 HZ is a typical choice for servers, SMP and NUMA systems
> + with lots of processors that may show reduced performance if
> + too many timer interrupts are occurring.

One of the options should mention the power savings benefit on laptops.
How about:

help

100 HZ, the lowest setting, is the best choice for any system
where the servicing of interrupts is expensive. This includes:
systems with so many processors that the mere execution of timer
interrupts on each and every processor degrades performance,
virtual systems, where Linux is not running on bare hardware
but is instead a guest operating system running on top of a
virtualization layer, and laptops, where each interrupt causes
a processor that is in low power mode to power up in order
to service the interrupt, and after the interrupt is complete,
might take up to one millisecond to power back down again.

Regards,
Joe
--
"Money can buy bandwidth, but latency is forever" -- John Mashey

Andi Kleen

unread,
May 17, 2005, 9:59:59 AM5/17/05
to Joe Korty, Christoph Lameter, Linus Torvalds, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de
> > + help
> > + 100 HZ is a typical choice for servers, SMP and NUMA systems
> > + with lots of processors that may show reduced performance if
> > + too many timer interrupts are occurring.
>
> One of the options should mention the power savings benefit on laptops.
> How about:

Actually it is not 100% clear. The ACPI idle code relies on
the timer right now to go from C1 to C2/C3. It basically
goes down in a staircase, first staying in C1, then when woken
up and still idle go down lower etc.

With HZ=100 the minimal latency (assuming no other interrupts) to go from C1
to C2 is 10ms, not 1ms, which might be even a power loss in some workloads.

-Andi

P.S.: The SUSE 2.4 kernels had for some time variable HZ, settable at boot.
It surprisingly didn't cause too much slowdown or code bloat and only
needed minor fixes over the tree. Might be worth considering at least
as a CONFIG

Christoph Lameter

unread,
May 17, 2005, 11:47:59 AM5/17/05
to Joe Korty, Linus Torvalds, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de
On Tue, 17 May 2005, Joe Korty wrote:

> One of the options should mention the power savings benefit on laptops.
> How about:

I am not an expert on that. Submit a patch.

Lee Revell

unread,
May 17, 2005, 7:30:24 PM5/17/05
to christoph, George Anzinger, linux-kernel, sh...@scalex86.org, ak...@osdl.org
On Mon, 2005-05-16 at 17:55 -0700, christoph wrote:
>
> Runtime? That seems to be a bad idea. It would be better to rewrite
> the timer subsystem to be able to work tickless.
>

I agree 100%, I think it's especially crazy to allow selecting 100, 250,
500, etc, whether at runtime or compile time. Might as well just go
tickless.

How do you expect application developers to handle not being able to
count on the resolution of nanosleep()? Currently they can at least
assume 10ms on 2.4, 1ms on 2.6. Seems to me that if you are no longer
guaranteed to be able to sleep 5ms on 2.6, you would just have to
busywait. Is it me, or does that way lie madness?

Lee

Nish Aravamudan

unread,
May 17, 2005, 7:59:17 PM5/17/05
to Lee Revell, christoph, George Anzinger, linux-kernel, sh...@scalex86.org, ak...@osdl.org
On 5/17/05, Lee Revell <rlre...@joe-job.com> wrote:
> On Mon, 2005-05-16 at 17:55 -0700, christoph wrote:
> >
> > Runtime? That seems to be a bad idea. It would be better to rewrite
> > the timer subsystem to be able to work tickless.
> >
>
> I agree 100%, I think it's especially crazy to allow selecting 100, 250,
> 500, etc, whether at runtime or compile time. Might as well just go
> tickless.
>
> How do you expect application developers to handle not being able to
> count on the resolution of nanosleep()? Currently they can at least
> assume 10ms on 2.4, 1ms on 2.6. Seems to me that if you are no longer
> guaranteed to be able to sleep 5ms on 2.6, you would just have to
> busywait. Is it me, or does that way lie madness?

From my meager understanding of sys_nanosleep() in 2.6 -- we'd round
up currently, If you request a microsecond of sleep, we'll sleep for a
jiffy + 1 (or 2, maybe). I am not sure we want a syscall that allows
busy-waiting, but I'm not certain. If you're interesting, my patch
(just posted again to LKML) tries to divorce HZ and soft-timers
somewhat.

-Nish

Valdis.K...@vt.edu

unread,
May 17, 2005, 8:08:40 PM5/17/05
to Lee Revell, christoph, George Anzinger, linux-kernel, sh...@scalex86.org, ak...@osdl.org
On Tue, 17 May 2005 19:25:41 EDT, Lee Revell said:

> How do you expect application developers to handle not being able to
> count on the resolution of nanosleep()? Currently they can at least
> assume 10ms on 2.4, 1ms on 2.6. Seems to me that if you are no longer
> guaranteed to be able to sleep 5ms on 2.6, you would just have to
> busywait. Is it me, or does that way lie madness?

If you're running tickless, wouldn't a 'sleep 5ms' cause a timer event to be
queued, and we wake up (approx) 5ms later?

Lee Revell

unread,
May 17, 2005, 8:15:15 PM5/17/05
to Valdis.K...@vt.edu, christoph, George Anzinger, linux-kernel, sh...@scalex86.org, ak...@osdl.org

Yes, exactly. This is why I think going tickless is a good solution,
and CONFIG_HZ is bad, because with HZ=100 "sleep 5ms" would cause us to
sleep for 10ms.

Lee

Pavel Machek

unread,
May 18, 2005, 2:53:23 PM5/18/05
to Linus Torvalds, Christoph Lameter, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de
Hi!

> config HZ
> int
> default 100 if HZ_100
> default 250 if HZ_250
> default 1000 if HZ_1000
>
> and now you can just do
>
> #define HZ CONFIG_HZ
>
> or something. You can even maje the Kconfig parts be a separate Kconfig.HZ
> file, and have both the x86 and x86-64 Kconfig files just include the
> common part (since it's a generic issue, not even PC-related: we might
> want to allow things like 60Hz frequencies for CONFIG_EMBEDDED etc, and
> these choices are really valid on any system that allows for the timer to
> be reprogrammed)
>
> The above is obviously totally untested, but it doesn't look any more
> complex than having a fairly ugly (and much less user-friendly) check at
> compile-time.

Please don't do this, CONFIG_NO_IDLE_HZ patches are better solution,
and they worked okay last time I tried them.
Pavel

Lee Revell

unread,
May 18, 2005, 3:06:36 PM5/18/05
to Pavel Machek, Linus Torvalds, Christoph Lameter, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de
On Wed, 2005-05-18 at 20:50 +0200, Pavel Machek wrote:
> Please don't do this, CONFIG_NO_IDLE_HZ patches are better solution,
> and they worked okay last time I tried them.

Last time the dynamic tick patches were posted, you reported they worked
fine. The next question is, when do they get merged?

Lee

Linus Torvalds

unread,
May 18, 2005, 3:26:10 PM5/18/05
to Pavel Machek, Christoph Lameter, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de

On Wed, 18 May 2005, Pavel Machek wrote:
>
> Please don't do this, CONFIG_NO_IDLE_HZ patches are better solution,
> and they worked okay last time I tried them.

. and they have nothing to do with this.

A number of people who want lower tick frequency are apparently _server_
people. Not because it makes any difference to idle time, but because it
can lessen the impact of the timer interrupt under load.

I don't know why, but I've actually gotten most of the complaints about
the 1kHz timer from ia64 people, who use a 1024Hz timer. Somebody from
Intel claimed a several percent reduction in performance between 1kHz and
100Hz under some load, apparently because of bad cache interaction.

At the same time, 100Hz really is too low for some desktop-like soft-RT
stuff, where you want to delay until the next frame (and humans notice
jitter at some fraction of a tenth of a second). With the 100Hz
granularity, and the uncertainty on where the jiffy tick ends up being,
you effectively have a ~50Hz clock you can depend on, which together with
worries about synchronizing with the video refresh rate etc seems to make
people unhappy.

So this thing has nothing to do with "idle".

And the truly-variable-HZ stuff just makes me nervous, but regardless of
that, you actually do want a "limit HZ to some value" configuration option
anyway.

Even with fully variable HZ, you need a limit just to say "this is the
highest precision we'll ever use", because otherwise you'll just be
wasting a lot of time on timers.

Linus

Tony Lindgren

unread,
May 18, 2005, 4:43:53 PM5/18/05
to Lee Revell, Pavel Machek, Linus Torvalds, Christoph Lameter, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de
* Lee Revell <rlre...@joe-job.com> [050518 12:06]:

> On Wed, 2005-05-18 at 20:50 +0200, Pavel Machek wrote:
> > Please don't do this, CONFIG_NO_IDLE_HZ patches are better solution,
> > and they worked okay last time I tried them.
>
> Last time the dynamic tick patches were posted, you reported they worked
> fine. The next question is, when do they get merged?

Uh, I've been meaning to do some clean-up on the x86 patch, but been
distracted every time I've tried... I'll try to do an updated patch
soon... But meanwhile, I believe the dyn-tick patch works reliably
on all machines if DYN_TICK_USE_APIC is not set in Kconfig.

Tony

Tony Lindgren

unread,
May 18, 2005, 4:50:27 PM5/18/05
to Linus Torvalds, Pavel Machek, Christoph Lameter, randy_dunlap, ak...@osdl.org, linux-...@vger.kernel.org, sh...@scalex86.org, a...@suse.de
* Linus Torvalds <torv...@osdl.org> [050518 12:28]:

>
>
> On Wed, 18 May 2005, Pavel Machek wrote:
> >
> > Please don't do this, CONFIG_NO_IDLE_HZ patches are better solution,
> > and they worked okay last time I tried them.
>
> .. and they have nothing to do with this.

>
> A number of people who want lower tick frequency are apparently _server_
> people. Not because it makes any difference to idle time, but because it
> can lessen the impact of the timer interrupt under load.
>
> I don't know why, but I've actually gotten most of the complaints about
> the 1kHz timer from ia64 people, who use a 1024Hz timer. Somebody from
> Intel claimed a several percent reduction in performance between 1kHz and
> 100Hz under some load, apparently because of bad cache interaction.
>
> At the same time, 100Hz really is too low for some desktop-like soft-RT
> stuff, where you want to delay until the next frame (and humans notice
> jitter at some fraction of a tenth of a second). With the 100Hz
> granularity, and the uncertainty on where the jiffy tick ends up being,
> you effectively have a ~50Hz clock you can depend on, which together with
> worries about synchronizing with the video refresh rate etc seems to make
> people unhappy.
>
> So this thing has nothing to do with "idle".

Yes, that's right. Setting HZ would just limit the max frequency
with dyn-tick patch when system is busy. On OMAP, we're using HZ=128
with dyn-tick.

> And the truly-variable-HZ stuff just makes me nervous, but regardless of
> that, you actually do want a "limit HZ to some value" configuration option
> anyway.

The dyn-tick patch skips ticks only during idle, and the system is
not doing anything at that point, so it should be safe. When the
system is under load, there is normal HZ tick and timer is not being
reprogrammed.

> Even with fully variable HZ, you need a limit just to say "this is the
> highest precision we'll ever use", because otherwise you'll just be
> wasting a lot of time on timers.

Yeah.

Tony

Reply all
Reply to author
Forward
0 new messages