How how latent should non-preemptive scheduling be?

Sitsofe Wheeler

unread,

Sep 17, 2008, 5:18:45 PM9/17/08

to linux-...@vger.kernel.org, Ingo Molnar

Hi,

I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be skipping
while playing sound through various desktop apps with a 2.6.27rc6
kernel. It is running off an SD card which really shows up slow writes
but the sound is seemingly skipping even when ext3 is not being used.
When look at latencytop I often see results similar to this:

> Cause Maximum Percentage
> Scheduler: waiting for cpu 247.2 msec 64.6 %
> futex_wait do_futex sys_futex sysenter_do_call 5.0 msec 24.8 %
> do_select core_sys_select sys_select sysenter_do_c 4.7 msec 0.7 %
> do_sys_poll sys_poll sysenter_do_call 4.5 msec 5.6 %
> do_sys_poll sys_ppoll sysenter_do_call 4.2 msec 1.7 %
> rt_mutex_timed_lock futex_lock_pi do_futex sys_fut 3.5 msec 0.5 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re 1.9 msec 1.3 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.4 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.5 %
> on acpi_ex_field_datum_io acpi_ex_extract_from_field acpi_ex_read_data_from_field acpi_ex_resolve_node_to_value acpi_ex_resolve_to_va
> lue
>
>
>
> Process rhythmbox (5694) Total: 358.4 msec
> Scheduler: waiting for cpu 221.8 msec 69.9 %
> futex_wait do_futex sys_futex sysenter_do_call 4.5 msec 26.3 %
> do_sys_poll sys_poll sysenter_do_call 3.8 msec 3.2 %
> do_select core_sys_select sys_select sysenter_do_c 1.0 msec 0.4 %
> rt_mutex_timed_lock futex_lock_pi do_futex sys_fut 0.2 msec 0.1 %

> Cause Maximum Percentage
> Scheduler: waiting for cpu 234.8 msec 31.1 %
> futex_wait do_futex sys_futex sysenter_do_call 5.0 msec 34.4 %
> do_select core_sys_select sys_select sysenter_do_c 5.0 msec 10.3 %
> do_sys_poll sys_ppoll sysenter_do_call 5.0 msec 6.2 %
> do_sys_poll sys_poll sysenter_do_call 5.0 msec 16.4 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re 1.9 msec 0.9 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.3 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.3 %
> blk_execute_rq scsi_execute scsi_execute_req scsi_ 1.0 msec 0.0 %
>
>
> Process firefox-bin (5756) Total: 482.2 msec
> Scheduler: waiting for cpu 234.8 msec 95.5 %
> do_sys_poll sys_poll sysenter_do_call 4.1 msec 3.5 %
> futex_wait do_futex sys_futex sysenter_do_call 3.9 msec 1.0 %
> do_select core_sys_select sys_select sysenter_do_c 0.2 msec 0.0 %

top seems to show that there is CPU time still available. The kernel is
compiled with voluntary preemption, dynticks and a HZ of 1000.

Is the scheduler waiting for CPU anything to do with the skips? If so
what sort of maximum should I be expecting? If you are listing to music
is it expected that you have preemption on? Would preemption even help?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Sitsofe Wheeler

unread,

Sep 17, 2008, 5:28:52 PM9/17/08

to linux-...@vger.kernel.org, Ingo Molnar, Arjan van de Ven

Sitsofe Wheeler wrote:
> I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be skipping
> while playing sound through various desktop apps with a 2.6.27rc6
> kernel. It is running off an SD card which really shows up slow writes
> but the sound is seemingly skipping even when ext3 is not being used.
> When look at latencytop I often see results similar to this:

One other small datapoint. When the skipping starts to occur regularly
it seems to be exactly 30 seconds apart. ALSA doesn't report any xruns
either...

Arjan van de Ven

unread,

Sep 17, 2008, 5:34:49 PM9/17/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, Ingo Molnar

On Wed, 17 Sep 2008 22:18:18 +0100
Sitsofe Wheeler <sit...@yahoo.com> wrote:

> Hi,
>
> I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be
> skipping while playing sound through various desktop apps with a
> 2.6.27rc6 kernel. It is running off an SD card which really shows up
> slow writes but the sound is seemingly skipping even when ext3 is not
> being used. When look at latencytop I often see results similar to
> this:
>
> > Cause Maximum
> > Percentage Scheduler: waiting for cpu 247.2
> > msec 64.6 % futex_wait do_futex sys_futex
> > sysenter_do_call 5.0 msec 24.8 % do_select
> > core_sys_select sys_select sysenter_do_c 4.7 msec 0.7 %
> > do_sys_poll sys_poll sysenter_do_call 4.5

this says you haven't done "make install" on the latencytop directory
so it's not translating things for you.. can you do that please?

Sitsofe Wheeler

unread,

Sep 17, 2008, 5:49:17 PM9/17/08

to Arjan van de Ven, linux-...@vger.kernel.org, Ingo Molnar

Arjan van de Ven wrote:
> this says you haven't done "make install" on the latencytop directory
> so it's not translating things for you.. can you do that please?

> Cause Maximum Percentage
> Scheduler: waiting for cpu 208.3 msec 59.4 %
> Userspace lock contention 4.9 msec 29.7 %
> Waiting for event (poll) 4.8 msec 7.1 %
> Waiting for event (select) 4.5 msec 0.6 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re 1.9 msec 2.0 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.7 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.6 %
> Executing raw SCSI command 0.7 msec 0.0 %
> ex_resolve_to_value
>
>
>
>
>
> Process firefox-bin (5756) Total: 317.2 msec
> Scheduler: waiting for cpu 208.3 msec 96.4 %
> Waiting for event (poll) 3.7 msec 3.0 %
> Userspace lock contention 1.0 msec 0.7 %
>
>
>
>
>
>
>
>
> pdflush gnome-vfs-daemo bash top bash pulseaudio gconf-helper bash rhythmbox firefox run-mozilla.sh firefox-bin

Better?

Arjan van de Ven

unread,

Sep 17, 2008, 5:54:26 PM9/17/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, Ingo Molnar, pet...@infradead.org

On Wed, 17 Sep 2008 22:48:55 +0100
Sitsofe Wheeler <sit...@yahoo.com> wrote:

> Arjan van de Ven wrote:
> > this says you haven't done "make install" on the latencytop
> > directory so it's not translating things for you.. can you do that
> > please?
>
> > Cause Maximum
> >
Percentage

Scheduler: waiting for cpu 208 msec 59.4 %

you're rather CPU bound, and your process was woken up but didn't run for over 200 milliseconds..
that sounds like a scheduler fairness issue!

Sitsofe Wheeler

unread,

Sep 17, 2008, 6:29:50 PM9/17/08

to Arjan van de Ven, linux-...@vger.kernel.org, Ingo Molnar, pet...@infradead.org

Arjan van de Ven wrote:
> On Wed, 17 Sep 2008 22:48:55 +0100
> Sitsofe Wheeler <sit...@yahoo.com> wrote:
>
>> Arjan van de Ven wrote:
>>> this says you haven't done "make install" on the latencytop
>>> directory so it's not translating things for you.. can you do that
>>> please?
>>> Cause Maximum
>>>
> Percentage
>
> Scheduler: waiting for cpu 208 msec 59.4 %
>
>
> you're rather CPU bound, and your process was woken up but didn't run for over 200 milliseconds..
> that sounds like a scheduler fairness issue!

I'm at a bit of a loss to explain this. I'm not using the group
scheduler and the load average is less than 0.4. CPU usage does seem to
spike a fair bit (I'm not sure why rhythmbox needs 50%+ for decoding
oggs from time to time) but there always seems to be 20% CPU free...

I don't know where to start debugging this one and I'm suspicious of the
way the problem would happen every 30 seconds too... (this laptop is
using ath5k for its wifi)

Peter Zijlstra

unread,

Sep 17, 2008, 10:42:52 PM9/17/08

to Arjan van de Ven, Sitsofe Wheeler, linux-...@vger.kernel.org, Ingo Molnar

On Wed, 2008-09-17 at 14:54 -0700, Arjan van de Ven wrote:
> On Wed, 17 Sep 2008 22:48:55 +0100
> Sitsofe Wheeler <sit...@yahoo.com> wrote:
>
> > Arjan van de Ven wrote:
> > > this says you haven't done "make install" on the latencytop
> > > directory so it's not translating things for you.. can you do that
> > > please?
> >
> > > Cause Maximum
> > >
> Percentage
>
> Scheduler: waiting for cpu 208 msec 59.4 %
>
>
> you're rather CPU bound, and your process was woken up but didn't run for over 200 milliseconds..
> that sounds like a scheduler fairness issue!

Really hard subject. Perfect fairness requires 0 latency - which with a
CPU only being able to run one thing at a time is impossible. So what
latency ends up being is a measure for the convergence towards fairness.

Anyway - 200ms isn't too weird depending on the circumstances. We start
out with a 20ms latency for UP, we then multiply with 1+log2(nr_cpus)
which in say a quad core machine ends up with 60ms. That ought to mean
that under light load the max latency should not exceed twice that
(basically a consequence of the Nyquist-Shannon sampling theorem IIRC).

Now, if you get get under some load (by default: nr_running > 5) the
expected latency starts to linearly grow with nr_running.

>From what I gather from the reply to this email the machine was not
doing much (and after having looked up the original email I see its a
eeeeeeeee atom - which is dual cpu iirc, so that yields 40ms default) -
so 200 is definately on the high side.

What you can do to investigate this, is use the sched_wakeup tracer from
ftrace, that should give a function trace of the highest wakeup latency
showing what the kernel is doing.

Sitsofe Wheeler

unread,

Sep 18, 2008, 3:26:40 AM9/18/08

to Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org, Ingo Molnar

Peter Zijlstra wrote:
> On Wed, 2008-09-17 at 14:54 -0700, Arjan van de Ven wrote:
>> On Wed, 17 Sep 2008 22:48:55 +0100
>>

>>>> Cause Maximum
>>>>
>> Percentage
>>
>> Scheduler: waiting for cpu 208 msec 59.4 %
>>
>>
>> you're rather CPU bound, and your process was woken up but didn't run for over 200 milliseconds..
>> that sounds like a scheduler fairness issue!
>
> Really hard subject. Perfect fairness requires 0 latency - which with a
> CPU only being able to run one thing at a time is impossible. So what
> latency ends up being is a measure for the convergence towards fairness.
>
> Anyway - 200ms isn't too weird depending on the circumstances. We start
> out with a 20ms latency for UP, we then multiply with 1+log2(nr_cpus)
> which in say a quad core machine ends up with 60ms. That ought to mean
> that under light load the max latency should not exceed twice that
> (basically a consequence of the Nyquist-Shannon sampling theorem IIRC).
>
> Now, if you get get under some load (by default: nr_running > 5) the
> expected latency starts to linearly grow with nr_running.
>
>>From what I gather from the reply to this email the machine was not
> doing much (and after having looked up the original email I see its a
> eeeeeeeee atom - which is dual cpu iirc, so that yields 40ms default) -
> so 200 is definately on the high side.

No, it's not an eeeeee atom. It's an eee celeron M (900Mhz) so it's
definitely a single CPU with no hyperthreading (and SMP is not enabled
in the kernel config either). It has less grunt that the atom and can't
do cpu scaling either (although it seems to have C states).

The load average is less than 0.5 but obviously I don't know if it is
periodically spiking over 5 and then smoothing out.

> What you can do to investigate this, is use the sched_wakeup tracer from
> ftrace, that should give a function trace of the highest wakeup latency
> showing what the kernel is doing.

Thanks for the hint - I was wondering where to look next.

Sitsofe Wheeler

unread,

Sep 18, 2008, 2:25:44 PM9/18/08

to Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org, Ingo Molnar

Peter Zijlstra wrote:
> What you can do to investigate this, is use the sched_wakeup tracer from
> ftrace, that should give a function trace of the highest wakeup latency
> showing what the kernel is doing.

I struggled to find documentation of ftrace because it's quite new. I
have come across
http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.0/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-Realtime_Specific_Tuning-Using_the_ftrace_Utility_for_Tracing_Latencies.html
and
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/ftrace.txt;h=d330fe3103da9c9a3cb8f888ac7255ce48e666d4;hb=45e9c0de2e86485f8b6633fd64ab19cfbff167f6
.

Thanks to those I started up the debugfs filesystem and went to the
trace directories but the on tracers in available_tracers are
ftrace sched_switch none

I can't see anything in the code that would disable wakeup... Any ideas
on what might be wrong? I'm using a 2.6.27rc6 kernel.

Additionally I think I found a trigger - unplugging the power cable from
the EeePC and having it run on battery seems to then set off this
periodic stall every 30 seconds... There's no CPU frequency scaling
enabled either (Celeron M's seemingly don't have P states and support
for cpufreq is configured out).

Ingo Molnar

unread,

Sep 19, 2008, 4:44:42 AM9/19/08

to Sitsofe Wheeler, Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt

* Sitsofe Wheeler <sit...@yahoo.com> wrote:

> Peter Zijlstra wrote:
>> What you can do to investigate this, is use the sched_wakeup tracer from
>> ftrace, that should give a function trace of the highest wakeup latency
>> showing what the kernel is doing.
>
> I struggled to find documentation of ftrace because it's quite new. I
> have come across
> http://www.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.0/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-Realtime_Specific_Tuning-Using_the_ftrace_Utility_for_Tracing_Latencies.html
> and
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/ftrace.txt;h=d330fe3103da9c9a3cb8f888ac7255ce48e666d4;hb=45e9c0de2e86485f8b6633fd64ab19cfbff167f6
> .
>
> Thanks to those I started up the debugfs filesystem and went to the
> trace directories but the on tracers in available_tracers are
> ftrace sched_switch none
>
> I can't see anything in the code that would disable wakeup... Any ideas
> on what might be wrong? I'm using a 2.6.27rc6 kernel.

here's two quick howtos:

http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt

you need to enable:

CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y

it's not particularly well named though. Why doesnt it say
LATENCY_TRACER or something?

> Additionally I think I found a trigger - unplugging the power cable
> from the EeePC and having it run on battery seems to then set off this
> periodic stall every 30 seconds... There's no CPU frequency scaling
> enabled either (Celeron M's seemingly don't have P states and support
> for cpufreq is configured out).

sounds like potential SMM triggered latencies.

Ingo

Sitsofe Wheeler

unread,

Sep 19, 2008, 7:55:31 AM9/19/08

to Ingo Molnar, Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt

Ingo Molnar wrote:
> here's two quick howtos:
>
> http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
> http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt

These two files appear to be identical. Is this intentional?

Anyway after following your instructions a putting together a small
script to dice the output, I collated the 10 switches which took the
longest:

# Top ten longest switches
# Rel TS Process Abs TS
0.122161 hald-3423 1867.821170 ***
0.039438 <idle>-0 1867.379054
0.036318 hald-3423 1867.669009
0.031362 <idle>-0 1868.002762
0.030000 hald-3423 1867.699009
0.028933 <idle>-0 1867.529238
0.028539 <idle>-0 1867.228861
0.028196 <idle>-0 1867.128731
0.027763 <idle>-0 1868.101449
0.027513 <idle>-0 1867.028606

# tracer: sched_switch from around longest switch
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<idle>-0 [00] 1867.608017: 0:140:R + 3:115:S
<idle>-0 [00] 1867.608038: 0:140:R + 3423:120:D
<idle>-0 [00] 1867.608045: 0:140:R ==> 3:115:R
ksoftirqd/0-3 [00] 1867.608048: 3:115:S ==> 3423:120:R
hald-3423 [00] 1867.629350: 3423:120:R + 6096:120:S
hald-3423 [00] 1867.632691: 3423:120:R + 3827:120:S
hald-3423 [00] 1867.669009: 3423:120:R + 3998:120:S
hald-3423 [00] 1867.699009: 3423:120:R + 6097:120:S
***hald-3423 [00] 1867.821170: 3423:120:R ==> 6096:120:R
rhythmbox-6096 [00] 1867.821219: 6096:120:S ==> 6097:120:R
rhythmbox-6097 [00] 1867.821262: 6097:120:R + 3826:120:S
rhythmbox-6097 [00] 1867.821289: 6097:120:S ==> 3826:120:R
pulseaudio-3826 [00] 1867.821332: 3826:120:R + 6097:120:S
pulseaudio-3826 [00] 1867.821374: 3826:120:S ==> 6097:120:R
rhythmbox-6097 [00] 1867.821380: 6097:120:S ==> 3998:120:R
rhythmbox-3998 [00] 1867.821709: 3998:120:S ==> 3827:120:R
pulseaudio-3827 [00] 1867.824041: 3827:120:R + 3826:120:S

> LatencyTOP version 0.4 (C) 2008 Intel Corporation
>
> Cause Maximum Percentage
> Scheduler: waiting for cpu 152.4 msec 13.8 %
> Userspace lock contention 5.0 msec 68.0 %
> Waiting for event (poll) 5.0 msec 14.3 %
> Waiting for event (select) 4.9 msec 3.5 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.1 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_re 1.9 msec 0.2 %
> msleep acpi_ec_wait acpi_ec_transaction acpi_ec_bu 1.9 msec 0.1 %
> Executing raw SCSI command 1.1 msec 0.0 %
> Waiting for TTY to finish sending 0.4 msec 0.0 %
>
>
> Process rhythmbox (3998) Total: 328.3 msec
> Scheduler: waiting for cpu 152.4 msec 80.0 %
> Userspace lock contention 4.1 msec 15.7 %
> Waiting for event (poll) 2.7 msec 4.1 %

> you need to enable:
>
> CONFIG_SCHED_TRACER=y
> CONFIG_CONTEXT_SWITCH_TRACER=y

I actually have both of these enabled but there's still no wakeup tracer
(as mentioned in the git kernel documentation http://tinyurl.com/4f9s4l
). The good news is that your instructions don't need the wakeup tracer.

> it's not particularly well named though. Why doesnt it say
> LATENCY_TRACER or something?

I agree it would be nicer if it had a better name.

>> Additionally I think I found a trigger - unplugging the power cable
>> from the EeePC and having it run on battery seems to then set off this
>> periodic stall every 30 seconds... There's no CPU frequency scaling
>> enabled either (Celeron M's seemingly don't have P states and support
>> for cpufreq is configured out).
>
> sounds like potential SMM triggered latencies.

I have just gone away and read about the SMM (
http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If
you're right there is pretty much nothing that can be done about the
problem : (

Peter Zijlstra

unread,

Sep 19, 2008, 10:20:46 AM9/19/08

to Sitsofe Wheeler, Ingo Molnar, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt

Its actually function tracer output I'm interested in.. that shows what
all its doing to make it take 120+ms.

I thought we had a wakeup latency tracer exacty like we have preempt and
irq off latency tracers, Steve, where'd that go?

> > sounds like potential SMM triggered latencies.
>
> I have just gone away and read about the SMM (
> http://blogs.msdn.com/carmencr/archive/2005/08/31/458609.aspx ). If
> you're right there is pretty much nothing that can be done about the
> problem : (

Yeah, SMM/SMI is nasty stuff :-(

Sitsofe Wheeler

unread,

Sep 20, 2008, 6:11:01 AM9/20/08

to Peter Zijlstra, Ingo Molnar, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt, le...@kernel.org, astari...@suse.de

Peter Zijlstra wrote:
> Its actually function tracer output I'm interested in.. that shows what
> all its doing to make it take 120+ms.

I switched the tracer to ftrace and waited for the problem to occur.
When stall did happen it was far worse than usual with the tracing on
(instead of looping sound of < 0.5 second it looped it for about 2-3
seconds). Looking atthe trace it was filled with hald events. Killing
off all of hald made the 30 second stalls go away.

A quick strace showed that what hal was doing every 30 seconds was
reading various battery stats from /sys. Doing a simple
cat /sys/class/power_supply/BAT0/manufacturer
is enough to provoke a stall. The stalls only occur if you are on
battery or are on AC and the battery is not reporting that it is 100%
charged (but in the latter case the stalls are less pronounced).

I can reliably hear the stalls at runlevel 1 by running
speaker-test -b75000
and
watch --interval=1 cat /proc/acpi/battery/BAT0/info
within separate terminals within screen.

A 5 second ftrace of the stall being provoked is provided on
<http://sucs.org/~sits/test/eeepc-ftrace.txt.gz> (it's 6Mbytes
uncompressed).

Putting the alsa buffer size up to 100000 allows you to still hear
stalls but far less frequently. Putting to 150000 will stop you from
hearing stalls. Even though xruns is set to 2 alsa not report any buffer
underruns to syslog.

Another way of seeing the stalls is to run glxgears and every second the
gears' spinning will jump (even on an unloaded system).

(I guess this works as the stalls are over 100ms)

The xandros provided 2.6.21.4 kernel does not exhibit this problem at
all but my hand compiled 2.6.24something and ubuntu 2.6.24 kernels do.

For some reason latencytop did not really finger ACPI as a cause of
stalls (although some acpi stuff does show up but never in the top
spot). Is this simply a part of the kernel that latencytop does not trace?

> I thought we had a wakeup latency tracer exacty like we have preempt and
> irq off latency tracers, Steve, where'd that go?

I rebuilt my kernel after a make clean and wakeup was still not there.
It might be a good idea to modify the kernel documentation currently
provided with 2.6.27 if it has gone away for now (or document the extra
switches needed to turn it on if that's why it didn't show up for me)...

--
Sitsofe | http://sucs.org/~sits/

Alexey Starikovskiy

unread,

Sep 20, 2008, 6:26:06 AM9/20/08

to Sitsofe Wheeler, Peter Zijlstra, Ingo Molnar, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt, le...@kernel.org

Hi Sitsofe,

eeePC is known to be affected by EC GPE storm bug, and there is a patch for
2.6.27 to workaround the problem. please look at
http://bugzilla.kernel.org/show_bug.cgi?id=11549
for latest patch and bugs 10724/9998 for problem discussion.

Thanks,
Alex.

Sitsofe Wheeler

unread,

Sep 20, 2008, 6:52:22 AM9/20/08

to Alexey Starikovskiy, Peter Zijlstra, Ingo Molnar, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt, le...@kernel.org

Hi Alexey,

Alexey Starikovskiy wrote:
> eeePC is known to be affected by EC GPE storm bug, and there is a patch
> for 2.6.27 to workaround the problem. please look at
> http://bugzilla.kernel.org/show_bug.cgi?id=11549
> for latest patch and bugs 10724/9998 for problem discussion.

Even with http://bugzilla.kernel.org/attachment.cgi?id=17845&action=view
(fast transaction) applied the stalls are still present (and seemingly
just as bad as before).

Sitsofe Wheeler

unread,

Sep 20, 2008, 6:55:56 AM9/20/08

to Alexey Starikovskiy, linux-...@vger.kernel.org, le...@kernel.org

Sitsofe Wheeler wrote:
> Hi Alexey,
>
> Alexey Starikovskiy wrote:
>> eeePC is known to be affected by EC GPE storm bug, and there is a
>> patch for 2.6.27 to workaround the problem. please look at
>> http://bugzilla.kernel.org/show_bug.cgi?id=11549
>> for latest patch and bugs 10724/9998 for problem discussion.
>
> Even with http://bugzilla.kernel.org/attachment.cgi?id=17845&action=view
> (fast transaction) applied the stalls are still present (and seemingly
> just as bad as before).

I should note that the patch DOES fix the slow volume hotkey repeat rate
that I was seeing.

Alexey Starikovskiy

unread,

Sep 20, 2008, 7:14:59 AM9/20/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, le...@kernel.org

Sitsofe Wheeler wrote:
> Sitsofe Wheeler wrote:
>> Hi Alexey,
>>
>> Alexey Starikovskiy wrote:
>>> eeePC is known to be affected by EC GPE storm bug, and there is a
>>> patch for 2.6.27 to workaround the problem. please look at
>>> http://bugzilla.kernel.org/show_bug.cgi?id=11549
>>> for latest patch and bugs 10724/9998 for problem discussion.
>>
>> Even with
>> http://bugzilla.kernel.org/attachment.cgi?id=17845&action=view (fast
>> transaction) applied the stalls are still present (and seemingly just
>> as bad as before).
>
> I should note that the patch DOES fix the slow volume hotkey repeat rate
> that I was seeing.

Could you cat /proc/interrupts ?

Sitsofe Wheeler

unread,

Sep 20, 2008, 7:23:55 AM9/20/08

to Alexey Starikovskiy, linux-...@vger.kernel.org, le...@kernel.org

Alexey Starikovskiy wrote:
> Could you cat /proc/interrupts ?

CPU0
0: 106942 IO-APIC-edge timer
1: 426 IO-APIC-edge i8042
8: 41 IO-APIC-edge rtc0
9: 27346 IO-APIC-fasteoi acpi
12: 6840 IO-APIC-edge i8042
14: 0 IO-APIC-edge ata_piix
15: 203 IO-APIC-edge ata_piix
16: 92357 IO-APIC-fasteoi uhci_hcd:usb5, HDA Intel,
i915@pci:0000:00:02.0
18: 38195 IO-APIC-fasteoi ath, uhci_hcd:usb4
19: 0 IO-APIC-fasteoi uhci_hcd:usb3
23: 25150 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
NMI: 0 Non-maskable interrupts
LOC: 142112 Local timer interrupts
TRM: 0 Thermal event interrupts
SPU: 0 Spurious interrupts
ERR: 0
MIS: 0

Sitsofe Wheeler

unread,

Sep 20, 2008, 7:42:09 AM9/20/08

to Alexey Starikovskiy, linux-...@vger.kernel.org, le...@kernel.org

Sitsofe Wheeler wrote:
> Alexey Starikovskiy wrote:
>> Could you cat /proc/interrupts ?
>
> CPU0
> 0: 106942 IO-APIC-edge timer
> 1: 426 IO-APIC-edge i8042
> 8: 41 IO-APIC-edge rtc0
> 9: 27346 IO-APIC-fasteoi acpi
> 12: 6840 IO-APIC-edge i8042
> 14: 0 IO-APIC-edge ata_piix
> 15: 203 IO-APIC-edge ata_piix
> 16: 92357 IO-APIC-fasteoi uhci_hcd:usb5, HDA Intel,
> i915@pci:0000:00:02.0
> 18: 38195 IO-APIC-fasteoi ath, uhci_hcd:usb4
> 19: 0 IO-APIC-fasteoi uhci_hcd:usb3
> 23: 25150 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
> NMI: 0 Non-maskable interrupts
> LOC: 142112 Local timer interrupts
> TRM: 0 Thermal event interrupts
> SPU: 0 Spurious interrupts
> ERR: 0
> MIS: 0

and here are the interrupts from the (seemingly working) EeePC's Xandros
2.6.21.4 install:

CPU0
0: 6486 XT-PIC-XT timer
1: 107 XT-PIC-XT i8042
2: 0 XT-PIC-XT cascade
3: 0 XT-PIC-XT uhci_hcd:usb2
5: 735 XT-PIC-XT uhci_hcd:usb1, ehci_hcd:usb5
9: 219 XT-PIC-XT acpi
10: 8171 XT-PIC-XT uhci_hcd:usb3, pciehp, wifi0
11: 107 XT-PIC-XT HDA Intel, i915@pci:0000:00:02.0,
uhci_hcd:usb4, pciehp
12: 236 XT-PIC-XT i8042
14: 0 XT-PIC-XT libata
15: 3475 XT-PIC-XT libata
NMI: 0
ERR: 0

and here is the kernel cmdline that the Xandros install uses:
quiet rw vga=785 irqpoll root=/dev/sda1

I've got a feeling I know what direction this is going in...

--
Sitsofe | http://sucs.org/~sits/

Alexey Starikovskiy

unread,

Sep 20, 2008, 8:34:50 AM9/20/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, le...@kernel.org

Right, could you please check if irqpoll is the key to success?

Thanks,
Alex.

Sitsofe Wheeler

unread,

Sep 20, 2008, 8:50:34 AM9/20/08

to Alexey Starikovskiy, linux-...@vger.kernel.org, le...@kernel.org

Alexey Starikovskiy wrote:

> Sitsofe Wheeler wrote:
>> and here is the kernel cmdline that the Xandros install uses:
>> quiet rw vga=785 irqpoll root=/dev/sda1
>>
>> I've got a feeling I know what direction this is going in...
>>
> Right, could you please check if irqpoll is the key to success?

So I tried just irqpoll (along with nolapic irqpool which changed the
output in /proc/interrupts) but it made no difference to the stalling.
I'm baffled.

--
Sitsofe | http://sucs.org/~sits/

Alexey Starikovskiy

unread,

Sep 20, 2008, 9:07:29 AM9/20/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, le...@kernel.org

Sitsofe Wheeler wrote:
> Alexey Starikovskiy wrote:
>> Could you cat /proc/interrupts ?
>
> CPU0
> 0: 106942 IO-APIC-edge timer
> 1: 426 IO-APIC-edge i8042
> 8: 41 IO-APIC-edge rtc0
> 9: 27346 IO-APIC-fasteoi acpi
> 12: 6840 IO-APIC-edge i8042
> 14: 0 IO-APIC-edge ata_piix
> 15: 203 IO-APIC-edge ata_piix
> 16: 92357 IO-APIC-fasteoi uhci_hcd:usb5, HDA Intel,
> i915@pci:0000:00:02.0
> 18: 38195 IO-APIC-fasteoi ath, uhci_hcd:usb4
> 19: 0 IO-APIC-fasteoi uhci_hcd:usb3
> 23: 25150 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
> NMI: 0 Non-maskable interrupts
> LOC: 142112 Local timer interrupts
> TRM: 0 Thermal event interrupts
> SPU: 0 Spurious interrupts
> ERR: 0
> MIS: 0

Could you please try following patch on top?

more_storm.patch

Sitsofe Wheeler

unread,

Sep 20, 2008, 9:32:21 AM9/20/08

to Alexey Starikovskiy, linux-...@vger.kernel.org, le...@kernel.org

Unfortunately it still stalls (hotkeys still seem to be working
correctly though).

--
Sitsofe | http://sucs.org/~sits/

Steven Rostedt

unread,

Sep 20, 2008, 10:00:00 AM9/20/08

to Sitsofe Wheeler, Peter Zijlstra, Ingo Molnar, Arjan van de Ven, linux-...@vger.kernel.org, le...@kernel.org, astari...@suse.de

On Sat, 20 Sep 2008, Sitsofe Wheeler wrote:
>
> > I thought we had a wakeup latency tracer exacty like we have preempt and
> > irq off latency tracers, Steve, where'd that go?
>
> I rebuilt my kernel after a make clean and wakeup was still not there. It
> might be a good idea to modify the kernel documentation currently
> provided with 2.6.27 if it has gone away for now (or document the extra
> switches needed to turn it on if that's why it didn't show up for me)...

The wake up tracing is CONFIG_SCHED_TRACER. If you do not see
"wakeup" as one of the available tracers in available_tracers, then check
your dmesg and see if you have CONFIG_FTRACE_STARTUP_TEST enabled. There
was a known bug that would cause the wakeup tracing to fail the test. When
a trace fails the startup test, it is disabled.

-- Steve

Sitsofe Wheeler

unread,

Sep 20, 2008, 10:11:58 AM9/20/08

to Steven Rostedt, Peter Zijlstra, Ingo Molnar, Arjan van de Ven, linux-...@vger.kernel.org, le...@kernel.org, astari...@suse.de

Steven Rostedt wrote:
> The wake up tracing is CONFIG_SCHED_TRACER. If you do not see
> "wakeup" as one of the available tracers in available_tracers, then check
> your dmesg and see if you have CONFIG_FTRACE_STARTUP_TEST enabled. There
> was a known bug that would cause the wakeup tracing to fail the test. When
> a trace fails the startup test, it is disabled.

This is exactly the case (dmesg reports the ftrace wakeup startup test
failed). That's what I get for not reading my dmesg more carefully...

--
Sitsofe | http://sucs.org/~sits/

Steven Rostedt

unread,

Sep 20, 2008, 10:37:44 AM9/20/08

to Sitsofe Wheeler, Peter Zijlstra, Ingo Molnar, Arjan van de Ven, LKML, le...@kernel.org, astari...@suse.de

On Sat, 20 Sep 2008, Sitsofe Wheeler wrote:

> Steven Rostedt wrote:
> > The wake up tracing is CONFIG_SCHED_TRACER. If you do not see "wakeup" as
> > one of the available tracers in available_tracers, then check
> > your dmesg and see if you have CONFIG_FTRACE_STARTUP_TEST enabled. There was
> > a known bug that would cause the wakeup tracing to fail the test. When a
> > trace fails the startup test, it is disabled.
>
> This is exactly the case (dmesg reports the ftrace wakeup startup test
> failed). That's what I get for not reading my dmesg more carefully...

OK, that is probably the known bug you are hitting. Simply disable the
CONFIG_FTRACE_STARTUP_TEST and you should have the wakeup tracer. The bug
is with the test, not the tracer, so it should not hurt you.

-- Steve

Sitsofe Wheeler

unread,

Sep 20, 2008, 1:16:46 PM9/20/08

to Steven Rostedt, Peter Zijlstra, Ingo Molnar, Arjan van de Ven, LKML, le...@kernel.org, astari...@suse.de

Steven Rostedt wrote:
> OK, that is probably the known bug you are hitting. Simply disable the
> CONFIG_FTRACE_STARTUP_TEST and you should have the wakeup tracer. The bug
> is with the test, not the tracer, so it should not hurt you.

Thanks - this made the wakeup tracer appear a you said. I have put two
wakeup traces up:

http://sucs.org/~sits/test/eeepc-debug/20080920/latency_trace.txt.gz
http://sucs.org/~sits/test/eeepc-debug/20080920/trace.txt.gz
(each file is around 6Mbytes uncompressed)

Here's a small extract of latency_trace.txt:

> # tracer: wakeup
> #
> wakeup latency trace v1.1.5 on 2.6.27-rc6skw-dirty
> --------------------------------------------------------------------
> latency: 3232905 us, #65620/6180619, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0)
> -----------------
> | task: speaker-test-5074 (uid:1000 nice:0 policy:1 rt_prio:25)
> -----------------
>
> # _------=> CPU#
> # / _-----=> irqs-off
> # | / _----=> need-resched
> # || / _---=> hardirq/softirq
> # ||| / _--=> preempt-depth
> # |||| /
> # ||||| delay
> # cmd pid ||||| time | caller
> # \ / ||||| \ | /
> cat-6743 0.N.. 3198412us : acpi_ut_update_ref_count (acpi_ut_update_object_reference)
> cat-6743 0.N.. 3198413us : acpi_ut_pop_generic_state (acpi_ut_update_object_reference)
> cat-6743 0.N.. 3198413us : acpi_ut_delete_generic_state (acpi_ut_update_object_reference)
> cat-6743 0.N.. 3198414us : acpi_os_release_object (acpi_ut_delete_generic_state)
> cat-6743 0.N.. 3198414us : kmem_cache_free (acpi_os_release_object)
[...]
> cat-6743 0.N.. 3232893us : acpi_ut_create_update_state (acpi_ut_create_update_state_and_push)
> cat-6743 0.N.. 3232893us : acpi_ut_create_generic_state (acpi_ut_create_update_state)
> cat-6743 0.N.. 3232894us : kmem_cache_alloc (acpi_ut_create_generic_state)
> cat-6743 0dN.. 3232895us : __slab_alloc (kmem_cache_alloc)
> cat-6743 0dN.. 3232895us : deactivate_slab (__slab_alloc)
> cat-6743 0.N.. 3232896us+: __alloc_pages_internal (__slab_alloc)
> cat-6743 0d... 3232904us+: marker_probe_cb (schedule)
> cat-6743 0d... 3232905us!: schedule (__cond_resched)
> cat-6743 0.N.. 3198412us : acpi_ut_push_generic_state (acpi_ut_create_update_state_and_push)

Is it intentional that the last event has a time earlier closer to that
of the first event?

--
Sitsofe | http://sucs.org/~sits/

Steven Rostedt

unread,

Sep 20, 2008, 5:53:40 PM9/20/08

to Sitsofe Wheeler, Peter Zijlstra, Ingo Molnar, Arjan van de Ven, LKML, le...@kernel.org, astari...@suse.de

On Sat, 20 Sep 2008, Sitsofe Wheeler wrote:

> Steven Rostedt wrote:
> > OK, that is probably the known bug you are hitting. Simply disable the
> > CONFIG_FTRACE_STARTUP_TEST and you should have the wakeup tracer. The bug is
> > with the test, not the tracer, so it should not hurt you.
>
> Thanks - this made the wakeup tracer appear a you said. I have put two wakeup
> traces up:
>
> http://sucs.org/~sits/test/eeepc-debug/20080920/latency_trace.txt.gz
> http://sucs.org/~sits/test/eeepc-debug/20080920/trace.txt.gz
> (each file is around 6Mbytes uncompressed)
>
> Here's a small extract of latency_trace.txt:
>
> > # tracer: wakeup
> > #
> > wakeup latency trace v1.1.5 on 2.6.27-rc6skw-dirty
> > --------------------------------------------------------------------
> > latency: 3232905 us, #65620/6180619, CPU#0 | (M:desktop VP:0, KP:0, SP:0

Peter, these times are crazy, mainly due to the cpu_clock. He probably
wants to use the sched_clock. Below is a patch to use it instead.

Sitsofe, I notice that the trace states "desktop". This means that you
are running with CONFIG_PREEMPT_VOLUNTARY. You want
CONFIG_PREEMPT.

[...]

>
> Is it intentional that the last event has a time earlier closer to that of the
> first event?
>

Change the config, and see what you get with this patch:

Note this is not compiled tested:

---
kernel/trace/trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linus.git/kernel/trace/trace.c
===================================================================
--- linus.git.orig/kernel/trace/trace.c 2008-09-20 17:49:00.000000000 -0400
+++ linus.git/kernel/trace/trace.c 2008-09-20 17:49:35.000000000 -0400
@@ -60,7 +60,7 @@ ns2usecs(cycle_t nsec)

cycle_t ftrace_now(int cpu)
{
- return cpu_clock(cpu);
+ return sched_clock();
}

/*

-- Steve

Sitsofe Wheeler

unread,

Sep 20, 2008, 7:19:03 PM9/20/08

to Steven Rostedt, Peter Zijlstra, Ingo Molnar, Arjan van de Ven, LKML, le...@kernel.org, astari...@suse.de

Steven Rostedt wrote:
> On Sat, 20 Sep 2008, Sitsofe Wheeler wrote:
>
>> Thanks - this made the wakeup tracer appear a you said. I have put two wakeup
>> traces up:
>>
>> http://sucs.org/~sits/test/eeepc-debug/20080920/latency_trace.txt.gz
>> http://sucs.org/~sits/test/eeepc-debug/20080920/trace.txt.gz
>> (each file is around 6Mbytes uncompressed)
>>
>> Here's a small extract of latency_trace.txt:
>>
>>> # tracer: wakeup
>>> #
>>> wakeup latency trace v1.1.5 on 2.6.27-rc6skw-dirty
>>> --------------------------------------------------------------------
>>> latency: 3232905 us, #65620/6180619, CPU#0 | (M:desktop VP:0, KP:0, SP:0
>
> Peter, these times are crazy, mainly due to the cpu_clock. He probably
> wants to use the sched_clock. Below is a patch to use it instead.
>
> Sitsofe, I notice that the trace states "desktop". This means that you
> are running with CONFIG_PREEMPT_VOLUNTARY. You want
> CONFIG_PREEMPT.

Dagnabit I keep confusing people. This was actually intentional because
I wanted to know whether the latency I was seeing should have been
present in a non-preempt (but voluntary) kernel. You can see the subject
at that start of this thread: http://tinyurl.com/4akxa5 (How how latent
should non-preemptive scheduling be?). I'm not running a sound studio
where I need the lowest possible latency at all costs. Further my
current understanding is that the desktop distros don't tend to ship
"regular desktop kernels" with preemption (I know Ubuntu 8.04 and Fedora
9 didn't).

Basically I have the following queries: Do you have to have preemption
on if you are listening to music (without noticeable skips) and playing
the odd game (without noticeable pauses) on a desktop? What's the
allowed highest latency going to be over a few minutes in such kernels?
Is it simply the case that if it's a non-preemptive kernel latency no
longer matters?

Just for the record I just tested a preempt kernel I had lying around
and the speakter-test -b75000 while looking at battery did not stall.
However latencytop still reported a "waiting for cpu" value of 75ms.

> [...]
>> Is it intentional that the last event has a time earlier closer to that of the
>> first event?
>>
>
> Change the config, and see what you get with this patch:

I'll see if I can test the patch tomorrow. Does the config change also
have to be made or the timestamps to be "narrower"?

--
Sitsofe | http://sucs.org/~sits/

Ingo Molnar

unread,

Sep 21, 2008, 3:10:30 AM9/21/08

to Sitsofe Wheeler, Steven Rostedt, Peter Zijlstra, Arjan van de Ven, LKML, le...@kernel.org, astari...@suse.de

* Sitsofe Wheeler <sit...@yahoo.com> wrote:

> Dagnabit I keep confusing people. This was actually intentional
> because I wanted to know whether the latency I was seeing should have
> been present in a non-preempt (but voluntary) kernel. You can see the
> subject at that start of this thread: http://tinyurl.com/4akxa5 (How
> how latent should non-preemptive scheduling be?). I'm not running a
> sound studio where I need the lowest possible latency at all costs.
> Further my current understanding is that the desktop distros don't
> tend to ship "regular desktop kernels" with preemption (I know Ubuntu
> 8.04 and Fedora 9 didn't).
>
> Basically I have the following queries: Do you have to have preemption
> on if you are listening to music (without noticeable skips) and
> playing the odd game (without noticeable pauses) on a desktop? What's
> the allowed highest latency going to be over a few minutes in such
> kernels? Is it simply the case that if it's a non-preemptive kernel
> latency no longer matters?

milliseconds of stalls is definitely excessive under a non-preempt
kernel - and you have up to 500 msecs of stalls, right?

The simplest way you can fix such latencies is to look at the function
trace, figure out which loop in the kernel takes so long to execute, and
add a cond_resched() call to it. [there are other situations where a
more complicated fix is needed, but this seems like a simpler scenario.]

Ingo

Matt Keenan

unread,

Sep 21, 2008, 5:00:23 PM9/21/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, Ingo Molnar

On Wed, 2008-09-17 at 22:18 +0100, Sitsofe Wheeler wrote:
> Hi,
>
> I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be skipping
> while playing sound through various desktop apps with a 2.6.27rc6
> kernel. It is running off an SD card which really shows up slow writes
> but the sound is seemingly skipping even when ext3 is not being used.

[snip snip]

If you use ALSA you could write a custom .asoundrc file and increase
ALSA's default buffer size, this gives the hardware a longer buffer to
play which may work around the problem. This used to do the trick for me
with the old scheduler with slow machines. Not a fix but if nothing else
works...

Matt

Sitsofe Wheeler

unread,

Sep 22, 2008, 2:25:13 AM9/22/08

to Steven Rostedt, Peter Zijlstra, Ingo Molnar, Arjan van de Ven, LKML, le...@kernel.org, astari...@suse.de

OK I've made the changes you suggested. Without preempt enabled the last
event will have a stamp closer to the first event and the times are very
high. With preempt enabled that beahviour has gone. Here are results
with preempt enabled:

latency: 19657 us, #3268/3268, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0):
http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/ath5k/latency_trace.txt

latency: 104 us, #182/182, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/unplug/latency_trace.txt

latency: 95 us, #134/134, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0):
http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/acpi/latency_trace.txt

latency: 91 us, #152/152, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
http://sucs.org/~sits/test/eeepc-debug/20080922/preempt/touchpad/latency_trace.txt

Are these the type of latencies to be expected with preemption on?

--
Sitsofe | http://sucs.org/~sits/

Sitsofe Wheeler

unread,

Sep 22, 2008, 2:50:34 AM9/22/08

to Matt Keenan, linux-...@vger.kernel.org, Ingo Molnar

Matt Keenan wrote:
> If you use ALSA you could write a custom .asoundrc file and increase
> ALSA's default buffer size, this gives the hardware a longer buffer to
> play which may work around the problem. This used to do the trick for me
> with the old scheduler with slow machines. Not a fix but if nothing else
> works...

In all fairness turning on preemption also helps (I don't hear any drops
or see any stalls once it's on) but I'm trying to work out if this is an
underlying problem that can be fixed (and then there is the case of the
default desktop distros).

--
Sitsofe | http://sucs.org/~sits/

Ingo Molnar

unread,

Sep 22, 2008, 7:58:26 AM9/22/08

to Sitsofe Wheeler, Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt

* Sitsofe Wheeler <sit...@yahoo.com> wrote:

well, since they went away after you enabled CONFIG_PREEMPT=y, they are
definitely in-kernel latencies, not any external SMM latencies.

I.e. they are inherently fixable. Could you enable:

CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE_MCOUNT_RECORD=y

that should make the traces a lot more verbose - every kernel function
executed in the latency path will be logged. That way we'll be able to
say which one takes that long.

note, you might have to increase /debug/tracing/trace_entries to get a
long enough trace to capture the relevant portion of the latency.

Ingo

Steven Rostedt

unread,

Sep 22, 2008, 8:07:33 AM9/22/08

to Ingo Molnar, Sitsofe Wheeler, Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org

Also note, to modify trace_entries, you must be in the none (nop?) tracer,
otherwise the size will not be effected.

If you find the trace is also too big, you can echo a list of functions
into:

/debug/tracing/set_ftrace_notrace

to not trace those functions. using '>' will remove any existing function
in that file, but using '>>' will append functions to the file.

For a list of functions that you can add, see:

/debug/tracing/available_filter_functions

-- Steve

Sitsofe Wheeler

unread,

Sep 23, 2008, 2:33:44 AM9/23/08

to Ingo Molnar, Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt

Ingo Molnar wrote:
> well, since they went away after you enabled CONFIG_PREEMPT=y, they are
> definitely in-kernel latencies, not any external SMM latencies.
>
> I.e. they are inherently fixable. Could you enable:
>
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
>
> that should make the traces a lot more verbose - every kernel function
> executed in the latency path will be logged. That way we'll be able to
> say which one takes that long.

I do not appear to have the CONFIG_FTRACE_MCOUNT_RECORD option in
2.6.27rc7. Is it an option that is only in -tip ?

--
Sitsofe | http://sucs.org/~sits/

Ingo Molnar

unread,

Sep 23, 2008, 7:53:46 AM9/23/08

to Sitsofe Wheeler, Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt

* Sitsofe Wheeler <sit...@yahoo.com> wrote:

> Ingo Molnar wrote:
>> well, since they went away after you enabled CONFIG_PREEMPT=y, they are
>> definitely in-kernel latencies, not any external SMM latencies.
>>
>> I.e. they are inherently fixable. Could you enable:
>>
>> CONFIG_DYNAMIC_FTRACE=y
>> CONFIG_FTRACE_MCOUNT_RECORD=y
>>
>> that should make the traces a lot more verbose - every kernel function
>> executed in the latency path will be logged. That way we'll be able to
>> say which one takes that long.
>
> I do not appear to have the CONFIG_FTRACE_MCOUNT_RECORD option in
> 2.6.27rc7. Is it an option that is only in -tip ?

yeah - it's a new ftrace feature queued up for v2.6.28.

Ingo

Sitsofe Wheeler

unread,

Sep 23, 2008, 12:31:01 PM9/23/08

to Ingo Molnar, Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt

Ingo Molnar wrote:
> * Sitsofe Wheeler <sit...@yahoo.com> wrote:
>
>> Ingo Molnar wrote:
>>> well, since they went away after you enabled CONFIG_PREEMPT=y, they are
>>> definitely in-kernel latencies, not any external SMM latencies.
>>>
>>> I.e. they are inherently fixable. Could you enable:
>>>
>>> CONFIG_DYNAMIC_FTRACE=y
>>> CONFIG_FTRACE_MCOUNT_RECORD=y
>>>
>>> that should make the traces a lot more verbose - every kernel function
>>> executed in the latency path will be logged. That way we'll be able to
>>> say which one takes that long.
>> I do not appear to have the CONFIG_FTRACE_MCOUNT_RECORD option in
>> 2.6.27rc7. Is it an option that is only in -tip ?
>
> yeah - it's a new ftrace feature queued up for v2.6.28.

I've been struggling to boot -tip/master - currently it blows up just
after printing SLUB information saying:

BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c0120078>] account_system_time+0x48/0x120
*pde = 00000000
Thread overran stack, or stack corrupted
Oops: 0002 [#1] PREEMPT

--
Sitsofe | http://sucs.org/~sits/

Sitsofe Wheeler

unread,

Sep 23, 2008, 3:40:25 PM9/23/08

to linux-...@vger.kernel.org, Ingo Molnar, Peter Zijlstra, Arjan van de Ven, linux-...@vger.kernel.org, Steven Rostedt

Sitsofe Wheeler wrote:
> I've been struggling to boot -tip/master - currently it blows up just
> after printing SLUB information saying:
>
> BUG: unable to handle kernel NULL pointer dereference at 00000004
> IP: [<c0120078>] account_system_time+0x48/0x120
> *pde = 00000000
> Thread overran stack, or stack corrupted
> Oops: 0002 [#1] PREEMPT

OK this BUG seems to have gone away in the past few hours.

Ingo when you were asking for the ftrace report I presume that would be
for a non preempt kernel (as a preemptive one only showed a very worst
latency of 19657 us in one exceptional case)?

Sitsofe Wheeler

unread,

Sep 23, 2008, 6:01:29 PM9/23/08

to linux-...@vger.kernel.org, Ingo Molnar, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

Sitsofe Wheeler wrote:
> Ingo when you were asking for the ftrace report I presume that would be
> for a non preempt kernel (as a preemptive one only showed a very worst
> latency of 19657 us in one exceptional case)?

With a non preempt kernel I found the logs would simply become too big
without filtering. On Peter's suggestion I used the following to remove
the most frequently called functions:

echo acpi_os_release_object > set_ftrace_notrace && echo kmem_cache_* >>
set_ftrace_notrace && echo acpi_ut_* >> set_ftrace_notrace

By doing counts across multiple runs I would say that the most
frequently called functions are the following (in most frequently called
order). The counts are definitely approximate but are reasonable
relative to each other.

475325 acpi_os_release_object (acpi_ut_delete_generic_state)
406895 kmem_cache_free (acpi_os_release_object)
402838 kmem_cache_alloc (acpi_ut_create_generic_state)
132968 acpi_ut_update_ref_count (acpi_ut_update_object_reference)
131041 acpi_ut_pop_generic_state (acpi_ut_update_object_reference)
131036 acpi_ut_delete_generic_state (acpi_ut_update_object_reference)
131025 acpi_ut_create_generic_state (acpi_ut_create_update_state)
131023 acpi_ut_create_update_state_and_push
(acpi_ut_update_object_reference)
131020 acpi_ut_create_update_state (acpi_ut_create_update_state_and_push)
131018 acpi_ut_push_generic_state (acpi_ut_create_update_state_and_push)
60147 acpi_ns_get_next_node (acpi_ns_delete_namespace_by_owner)
28974 acpi_ns_get_next_valid_node (acpi_ns_get_next_node)

Logs with the filtering on can be seen here (15Mbytes decompressed each):
http://sucs.org/~sits/test/eeepc-debug/20080923/latency_trace.gz
http://sucs.org/~sits/test/eeepc-debug/20080923/trace.txt.gz

It looks like lots of acpi state is created and deleted...

Ingo Molnar

unread,

Sep 27, 2008, 4:49:05 PM9/27/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

* Sitsofe Wheeler <sit...@yahoo.com> wrote:

> Logs with the filtering on can be seen here (15Mbytes decompressed each):
> http://sucs.org/~sits/test/eeepc-debug/20080923/latency_trace.gz
> http://sucs.org/~sits/test/eeepc-debug/20080923/trace.txt.gz
>
> It looks like lots of acpi state is created and deleted...

yeah. The latency starts here:

cat-5901 0dNh. 1us : default_wake_function (__wake_up_common)
cat-5901 0.Nh. 2us : kill_fasync (snd_pcm_period_elapsed)
[...]

and ends here:

[...]
cat-5901 0.N.. 270501us+: mutex_lock (acpi_ec_transaction)
cat-5901 0d... 270507us : __cond_resched (_cond_resched)

270 _milliseconds_ later. That's excessive.

The main overhead is starting here:

cat-5901 0.N.. 167us : acpi_ds_result_push (acpi_ds_exec_end_op)

lots of ACPI code executed ...

does it get better if you have CONFIG_PREEMPT_VOLUNTARY=y enabled? That
_should_ break up this section neatly. If it doesnt then please add a
might_sleep() check to kernel/kernel/semaphore.c's down_timeout()
function - that is called a number of times in this trace.

Ingo

Sitsofe Wheeler

unread,

Sep 28, 2008, 4:56:50 PM9/28/08

to Ingo Molnar, linux-...@vger.kernel.org, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

Ingo Molnar wrote:
> does it get better if you have CONFIG_PREEMPT_VOLUNTARY=y enabled? That

CONFIG_PREEMPT_VOLUNTARY=y has always been enabled in these results.

> _should_ break up this section neatly. If it doesnt then please add a
> might_sleep() check to kernel/kernel/semaphore.c's down_timeout()
> function - that is called a number of times in this trace.

I added might_sleep() to the start of down_timeout() but it neither
printed anything to dmesg nor changed the latency issue...

Strangely stalling only seems to turn up in linux-tip kernels with very
little debugging options set within them. I have a linux-tip that has
lots of extra debugging options set and this problem doesn't show up...

I've also asked about this on the ACPI mailing list and had a huge
amount of help from finding a good point for a cond_resched -
http://marc.info/?l=linux-acpi&m=122236450105747&w=2 . Certain things
(like games) still stutter when the battery is read but it's enough to
play back sound in rhythmbox without stuttering.

Any ideas why the issue would go away with a debug kernel though?

--
Sitsofe | http://sucs.org/~sits/

Ingo Molnar

unread,

Sep 29, 2008, 4:37:43 AM9/29/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

* Sitsofe Wheeler <sit...@yahoo.com> wrote:

> Ingo Molnar wrote:
>> does it get better if you have CONFIG_PREEMPT_VOLUNTARY=y enabled? That
>>
>
> CONFIG_PREEMPT_VOLUNTARY=y has always been enabled in these results.

okay - then this definitely needs fixed.

>> _should_ break up this section neatly. If it doesnt then please add a
>> might_sleep() check to kernel/kernel/semaphore.c's down_timeout()
>> function - that is called a number of times in this trace.
>
> I added might_sleep() to the start of down_timeout() but it neither
> printed anything to dmesg nor changed the latency issue...
>
> Strangely stalling only seems to turn up in linux-tip kernels with
> very little debugging options set within them. I have a linux-tip that
> has lots of extra debugging options set and this problem doesn't show
> up...
>
> I've also asked about this on the ACPI mailing list and had a huge
> amount of help from finding a good point for a cond_resched -
> http://marc.info/?l=linux-acpi&m=122236450105747&w=2 . Certain things
> (like games) still stutter when the battery is read but it's enough to
> play back sound in rhythmbox without stuttering.
>
> Any ideas why the issue would go away with a debug kernel though?

hm, that's weird indeed. You could try to trace the full battery readout
itself, via a script like this:

cat /debug/tracing/trace > /dev/null # drain trace
cat /proc/acpi/whatever
cat /debug/tracing/trace > trace.txt

this way you should be seeing a full trace of the whole thing. (with the
'ftrace' tracer plugin set in current_tracer)

then, assuming the trace.txt is complete, you can check where it
reschedules and why. Search for 'schedule' calls. Compare the debug
versus non-debug traces.

You can even turn on stack backtrace tracing feature, via:

echo stacktrace > /debug/tracing/iter_ctrl

this adds extra trace entries that show the call path of every
reschedule. [this attribute is default-disabled.]

Ingo

Sitsofe Wheeler

unread,

Sep 29, 2008, 7:11:27 PM9/29/08

to Ingo Molnar, linux-...@vger.kernel.org, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

Ingo Molnar wrote:
> * Sitsofe Wheeler <sit...@yahoo.com> wrote:
>
>> Any ideas why the issue would go away with a debug kernel though?
>
> hm, that's weird indeed. You could try to trace the full battery readout
> itself, via a script like this:
>
> cat /debug/tracing/trace > /dev/null # drain trace
> cat /proc/acpi/whatever
> cat /debug/tracing/trace > trace.txt
>

> You can even turn on stack backtrace tracing feature, via:
>
> echo stacktrace > /debug/tracing/iter_ctrl
>
> this adds extra trace entries that show the call path of every
> reschedule. [this attribute is default-disabled.]

(Something bad seems to be happening with my kernels as I have been
finding strange problems like network-manager detecting the wifi as a
wired interface unless I did a make clean on my kernel sources before
compiling). The stalling issue seems to keep coming and going and is
currently showing up in both debug and non debug kernels.

There are traces when looking at battery information on both AC (where
the problem is never there) and battery (where the problem always is
there) here:
http://sucs.org/~sits/test/eeepc-debug/20080929/trace-on-ac.txt.bz2
http://sucs.org/~sits/test/eeepc-debug/20080929/trace-on-battery.txt.bz2
(15Mbytes uncompressed)

I wasn't quite sure how to tell when a reschedule was done. I simply did
a grep schedule on the file and most of it seemed reasonable. One part
that caught my eye was the following:

speaker-test-3891 [000] 1392.751699: __switch_to <-schedule
<idle>-0 [000] 1392.752070: account_scheduler_latency
<-enqueue_task_fair
<idle>-0 [000] 1392.752091: __switch_to <-schedule
cat-7717 [000] 1392.752092: del_timer <-schedule_timeout
cat-7717 [000] 1392.772263: account_scheduler_latency
<-enqueue_task_fair
cat-7717 [000] 1392.773224: __tasklet_schedule <-kbd_event
cat-7717 [000] 1392.773225: schedule_console_callback
<-kbd_event
cat-7717 [000] 1392.773225: schedule_work
<-schedule_console_callback
cat-7717 [000] 1392.773226: queue_work <-schedule_work
cat-7717 [000] 1392.773232: account_scheduler_latency
<-enqueue_task_fair
cat-7717 [000] 1392.773240: account_scheduler_latency
<-enqueue_task_fair
cat-7717 [000] 1392.773251: schedule_delayed_work
<-put_queue
cat-7717 [000] 1392.773251: queue_delayed_work
<-schedule_delayed_work
cat-7717 [000] 1392.773258: schedule_console_callback
<-kbd_event
cat-7717 [000] 1392.773258: schedule_work
<-schedule_console_callback
cat-7717 [000] 1392.773259: queue_work <-schedule_work
cat-7717 [000] 1392.773267: schedule_console_callback
<-kbd_event
cat-7717 [000] 1392.773268: schedule_work
<-schedule_console_callback
cat-7717 [000] 1392.773268: queue_work <-schedule_work
cat-7717 [000] 1392.807931: account_scheduler_latency
<-enqueue_task_fair
cat-7717 [000] 1392.839464: __tasklet_schedule <-ath5k_intr
cat-7717 [000] 1392.839506: account_scheduler_latency
<-enqueue_task_fair
cat-7717 [000] 1392.941867: __tasklet_schedule <-ath5k_intr
cat-7717 [000] 1393.005963: __tasklet_schedule <-ath5k_intr
cat-7717 [000] 1393.033222: __switch_to <-schedule

Here a schedule seemingly doesn't happen for a few hundredths of a second...

--
Sitsofe | http://sucs.org/~sits/

Ingo Molnar

unread,

Sep 30, 2008, 7:23:11 AM9/30/08

to Sitsofe Wheeler, linux-...@vger.kernel.org, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

that's about 60 msecs. ath5k_intr IRQ handling overhead? IRQ handlers
are non-preemptible. (even under CONFIG_PREEMPT=y)

Ingo

Sitsofe Wheeler

unread,

Sep 30, 2008, 9:18:31 AM9/30/08

to Ingo Molnar, linux-...@vger.kernel.org, Peter Zijlstra, Arjan van de Ven, Steven Rostedt

Ingo Molnar wrote:
> * Sitsofe Wheeler <sit...@yahoo.com> wrote:
>
>> cat-7717 [000] 1392.941867: __tasklet_schedule <-ath5k_intr
>> cat-7717 [000] 1393.005963: __tasklet_schedule <-ath5k_intr
>> cat-7717 [000] 1393.033222: __switch_to <-schedule
>>
>> Here a schedule seemingly doesn't happen for a few hundredths of a second...
>
> that's about 60 msecs. ath5k_intr IRQ handling overhead? IRQ handlers
> are non-preemptible. (even under CONFIG_PREEMPT=y)

What about the piece leading up to that:

Here's a snippet from
http://sucs.org/~sits/test/eeepc-debug/20080929/trace-on-battery.txt.bz2 :

cat-7717 [000] 1392.752092: del_timer <-schedule_timeout

cat-7717 [000] 1392.752092: acpi_ec_check_status
<-acpi_ec_wait
cat-7717 [000] 1392.752093: constant_test_bit
<-acpi_ec_check_status
cat-7717 [000] 1392.752095: mutex_unlock
<-acpi_ec_transaction
cat-7717 [000] 1392.752096: acpi_ex_enter_interpreter
<-acpi_ev_address_space_dispatch
cat-7717 [000] 1392.752097: acpi_os_wait_semaphore
<-acpi_ut_acquire_mutex
cat-7717 [000] 1392.752097: down_timeout
<-acpi_os_wait_semaphore
cat-7717 [000] 1392.752098: acpi_ex_release_global_lock
<-acpi_ex_read_data_from_field
cat-7717 [000] 1392.752099: acpi_ex_release_mutex_object
<-acpi_ex_release_global_lock
cat-7717 [000] 1392.752100: acpi_ev_release_global_lock
<-acpi_ex_release_mutex_object
cat-7717 [000] 1392.752100: __acpi_release_global_lock
<-acpi_ev_release_global_lock
cat-7717 [000] 1392.752101: acpi_os_signal_semaphore
<-acpi_ev_release_global_lock
cat-7717 [000] 1392.752102: up <-acpi_os_signal_semaphore
..
cat-7717 [000] 1392.839447:
acpi_ex_truncate_for32bit_table <-acpi_ds_exec_end_op
cat-7717 [000] 1392.839447: acpi_ps_next_parse_state
<-acpi_ps_parse_loop
cat-7717 [000] 1392.839448: acpi_ps_complete_op
<-acpi_ps_parse_loop
cat-7717 [000] 1392.839448: acpi_ps_complete_this_op
<-acpi_ps_complete_op
cat-7717 [000] 1392.839449: acpi_ps_delete_parse_tree
<-acpi_ps_complete_this_op
cat-7717 [000] 1392.839449: acpi_ps_get_arg
<-acpi_ps_delete_parse_tree
cat-7717 [000] 1392.839450: acpi_ps_get_opcode_info
<-acpi_ps_get_arg
cat-7717 [000] 1392.839450: acpi_ps_get_arg
<-acpi_ps_delete_parse_tree
cat-7717 [000] 1392.839451: acpi_ps_get_opcode_info
<-acpi_ps_get_arg
cat-7717 [000] 1392.839452: do_IRQ <-common_interrupt
cat-7717 [000] 1392.839453: irq_enter <-do_IRQ
cat-7717 [000] 1392.839453: irq_to_desc <-do_IRQ
cat-7717 [000] 1392.839454: handle_fasteoi_irq <-do_IRQ
cat-7717 [000] 1392.839455: handle_IRQ_event
<-handle_fasteoi_irq
cat-7717 [000] 1392.839455: usb_hcd_irq <-handle_IRQ_event
cat-7717 [000] 1392.839456: uhci_irq <-usb_hcd_irq
cat-7717 [000] 1392.839458: ath5k_intr <-handle_IRQ_event
cat-7717 [000] 1392.839459: ath5k_hw_is_intr_pending
<-ath5k_intr
cat-7717 [000] 1392.839461: ath5k_hw_get_isr <-ath5k_intr

cat-7717 [000] 1392.839464: __tasklet_schedule <-ath5k_intr

Why didn't another process cut in from 1392.752092 to 1392.839464? In
the schedule grep normally runs for cat are half as long e.g.:

<idle>-0 [000] 1392.747102: __switch_to <-schedule
cat-7717 [000] 1392.747103: del_timer <-schedule_timeout
cat-7717 [000] 1392.747107:
schedule_timeout_uninterruptible <-msleep
cat-7717 [000] 1392.747107: schedule_timeout
<-schedule_timeout_uninterruptible
cat-7717 [000] 1392.747108: init_timer <-schedule_timeout
cat-7717 [000] 1392.747108: __mod_timer <-schedule_timeout
cat-7717 [000] 1392.747111: __switch_to <-schedule
<idle>-0 [000] 1392.749067: account_scheduler_latency
<-enqueue_task_fair

Further why (after taking a break to process the ath5k interrupt) did it
not schedule another process other than cat until 1393.033226:

cat-7717 [000] 1393.033215: set_normalized_timespec
<-ktime_get_ts
cat-7717 [000] 1393.033216: ktime_add_safe
<-hrtimer_start_range_ns
cat-7717 [000] 1393.033217: ktime_add_safe
<-hrtimer_start_range_ns
cat-7717 [000] 1393.033217: enqueue_hrtimer
<-hrtimer_start_range_ns

cat-7717 [000] 1393.033222: __switch_to <-schedule

hald-addon-inpu-3237 [000] 1393.033226: fget_light <-do_sys_poll

--
Sitsofe | http://sucs.org/~sits/

Sitsofe Wheeler

unread,

Oct 4, 2008, 6:50:54 AM10/4/08

to Ingo Molnar, linux-...@vger.kernel.org, Peter Zijlstra, Arjan van de Ven, Steven Rostedt, Christoph Lameter

Sitsofe Wheeler wrote:
> I have an EeePC 900 (Intel Celeron 900Mhz) and it seems to be
> skipping while playing sound through various desktop apps with a
> 2.6.27rc6 kernel. It is running off an SD card which really shows up

After weeks of head scratching as to why these latencies didn't occur
using the xandros 2.6.21.4 kernel (but keeping the same userspace) when
my own kernels would always show this problem I finally found the answer
after reading
http://tservice.net.ru/~s0mbre/blog/devel/other/2008_10_01 on kernel
planet - SLUB can cause regressions compared to SLAB. Switching from
SLUB to SLAB made the problem more or less disappear (which I guess
makes sense given the large number of kmem_* calls that are made when
reading the battery).

My understanding with the memory allocators is that SLOB is the smallest
and simplest. Its forté is that it uses very little memory which makes
it ideal for embedded systems and is the easiest allocator to
understand. The downside is that it might tend towards memory
fragmentation the longer a system runs and is apparently a little bit
slower than SLAB. The SLAB allocator is something that the linux kernel
has had for many years and was a good performer until the pressures to
perform with large SMP systems started to come in to play (at which
point lots of memory would be used up on fixed structures -
http://lwn.net/Articles/229984/ ). SLUB is the newest allocator, scales
up well and has finer grained diagnostics that SLAB.

Prior to today my understanding was that SLUB would be able to replace
SLAB in all scenarios and perform just as well or better. However now
I'm not so sure (SLAB appears to be less latent than SLUB). Other than
SLUB's newness are there cases where SLAB should be chosen instead of
SLUB (e.g. uniprocessor desktop systems with hundreds of megabytes of
RAM)? Will SLAB exist as an alternative to SLUB for certain setups?

--
Sitsofe | http://sucs.org/~sits