Example: when using
taskset -c 0 echo <new_frequency> > /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed
cpu 0 is traced, instead of cpu 1
Signed of by Robert Schoene <robert....@tu-dresden.de>
diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 1b1920f..0a47f10 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -174,6 +174,7 @@ static void do_drv_write(void *_cmd)
switch (cmd->type) {
case SYSTEM_INTEL_MSR_CAPABLE:
+ trace_power_frequency(POWER_PSTATE, cmd->val);
rdmsr(cmd->addr.msr.reg, lo, hi);
lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
wrmsr(cmd->addr.msr.reg, lo, hi);
@@ -363,7 +364,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
}
}
- trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
switch (data->cpu_feature) {
case SYSTEM_INTEL_MSR_CAPABLE:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
are you sure this is right?
it's moving something from outside a switch statement to inside only one prong of a switch statement...
I'm pretty sure, since I'm moving it from function acpi_cpufreq_target(...) to do_drv_write(...)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majo...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Robert Schoene
Technische Universitaet Dresden
Zentrum fuer Informationsdienste und Hochleistungsrechnen
01062 Dresden
Tel.: (0351) 463-42483, Fax: (0351) 463-37773
E-Mail: Robert....@tu-dresden.de
I expect Arjan is right.
You now only trace MSR based and not IO based frequency switching.
I don't know the tracing stuff, but it seems the cpu that executes
trace_power_frequency shows up in the statistics as the one on which the
frequency change happened which currently is wrong and you try to fix this?
What exactly is the reason you do not add
trace_power_frequency(..);
also in the
SYSTEM_IO_CAPABLE:
branch in do_drv_write()?
Thomas
Thomas
diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 1b1920f..4803883 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -174,11 +174,13 @@ static void do_drv_write(void *_cmd)
switch (cmd->type) {
case SYSTEM_INTEL_MSR_CAPABLE:
+ trace_power_frequency(POWER_PSTATE, cmd->val);
rdmsr(cmd->addr.msr.reg, lo, hi);
lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
wrmsr(cmd->addr.msr.reg, lo, hi);
break;
case SYSTEM_IO_CAPABLE:
+ trace_power_frequency(POWER_PSTATE, cmd->val);
acpi_os_write_port((acpi_io_address)cmd->addr.io.port,
cmd->val,
(u32)cmd->addr.io.bit_width);
@@ -363,7 +365,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
}
}
- trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
switch (data->cpu_feature) {
case SYSTEM_INTEL_MSR_CAPABLE:
> Am Montag, den 15.03.2010, 11:51 +0100 schrieb Thomas Renninger:
> > On Friday 12 March 2010 16:41:46 Robert Sch??ne wrote:
> > > Am Freitag, den 12.03.2010, 06:52 -0800 schrieb Arjan van de Ven:
Please send a changelogged version with everyone Cc:-ed once the dust settles
and the acks are in.
Thanks,
Ingo
But something else...:
What exactly is the power tracer good for and what is it
capable of which cpufreq_stats is not capable to do?
Beside the fact that it is an ugly macro you cannot grep for,
acpi-cpufreq really seem to be the only place it gets used in
the whole kernel:
grep trace_power_frequency * -rl
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
Robert: If you want to get proper cpufreq tracing/statistics,
compile with:
CONFIG_CPU_FREQ_STAT=y
and do:
modprobe cpufreq_stats
cat /sys/devices/system/cpu/cpu*/cpufreq/stats/*
Below patch fixes the problem.
This time submitted on the right mailing list,
it looks like the trace_power_frequency stuff never hit
the cpufreq list, even the maintainer wasn't CC'ed on
any trace_power_frequency submission.
For the trace people: To do it right, you have to hook
your trace function into cpufreq_stats. You also have
to pass the cpu on which the frequency change happened.
---
cpufreq: Remove broken trace_power_frequency
cpufreq_stats is used for frequency statistics and supports *all*
frequency switching drivers/HW.
The trace_power_frequency interface:
- only supports one cpufreq driver (acpi-cpufreq)
- has no additional capabilities compared to cpufreq_stats
- is broken and traces wrong CPUs on frequency switches
(cmp. with mail thread:
trace power_frequency events on the correct cpu
on the cpu...@vger.kernel.org list)
Signed-off-by: Thomas Renninger <tr...@suse.de>
diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 1b1920f..1808284 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -33,7 +33,6 @@
#include <linux/cpufreq.h>
#include <linux/compiler.h>
#include <linux/dmi.h>
-#include <trace/events/power.h>
#include <linux/acpi.h>
#include <linux/io.h>
@@ -363,8 +362,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
}
}
- trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
-
switch (data->cpu_feature) {
case SYSTEM_INTEL_MSR_CAPABLE:
cmd.type = SYSTEM_INTEL_MSR_CAPABLE;
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index c4efe9b..82b2b99 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -42,13 +42,6 @@ DEFINE_EVENT(power, power_start,
TP_ARGS(type, state)
);
-DEFINE_EVENT(power, power_frequency,
-
- TP_PROTO(unsigned int type, unsigned int state),
-
- TP_ARGS(type, state)
-);
-
TRACE_EVENT(power_end,
TP_PROTO(int dummy),
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 9f4f565..705d926 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,6 +13,3 @@
#define CREATE_TRACE_POINTS
#include <trace/events/power.h>
-
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
-
look at timechart for example.....
it's extremely useful to have this for us that do power tuning...
cpufreq_stats is nice but not nearly good enough since you only get averages,
not time behavior.
> cpufreq_stats is nice but not nearly good enough since you only get
> averages, not time behavior.
As said, try to hook it into cpufreq_stats.
Mark the cpufreq_stats sysfs interface deprecated, etc.
I like the idea of having one central trace utility, which probably
makes it easier for people to find such things.
But please do it properly and CC the cpufreq list in the future.
Still, as this is totally broken:
- by design -> only one of a dozen cpufreq drivers is supported
- by implementation -> wrong CPUs are tracked
please submit my patch and remove this again until a proper
implementation is provided.
Thanks,
Thomas
the one I care about is supported. The others... anyone can add support for their favorite driver.
> please submit my patch and remove this again until a proper
> implementation is provided.
how about not. it works right now, and is in active use right now.
feel free to add support to other drivers if you care about those....
> The others... anyone can add
> support for their favorite driver.
Or everyone can write his own debug facility for every driver...
As said, there already is a debug facility. If you need timings,
improve the existing interface, suggest how it could get connected
to the tracing facility or why you see problems doing so.
> > please submit my patch and remove this again until a proper
> > implementation is provided.
> how about not. it works right now,
It does not. Robert's fix (if he had taken the frequency) is correct.
That means you could get totally wrong values
when you put some load on the machine and the scheduler moves around
processes. You'd better compared your results with cpufreq_stats...
> and is in active use right now.
It's obviously not powertop, there I get some statistics on an AMD
machine as well. So where does it get used?
> feel free to add support to other drivers if you care about those....
It's easy to fix up acpi-cpufreq itself and implement this into
powernow-k8. Not sure about possible cpufreq drivers which can switch
frequency on cores without executing the code on this core. I expect
trace(_power_frequency) is not made for such?
I assume everybody who did work on cpufreq interfaces would have given
you a not-ack'ed or at least had asked the questions I did above.
People know you well and x86 maintainers just pushed it, I also doubt
you intentionally did not CC the cpufreq list.
But now things are broken and should get reverted.
Grmpfl, I can have a look at it, but not the next couple of days...
Thomas
why don't you provide the others then?
>> how about not. it works right now,
> It does not. Robert's fix (if he had taken the frequency) is correct.
> That means you could get totally wrong values
> when you put some load on the machine and the scheduler moves around
> processes.
the only case where the bug can hit is the userspace governor, yes.
it works correct for everything else. Yes it wants fixing. No it does not make
all data from it worthless.
>> and is in active use right now.
> It's obviously not powertop, there I get some statistics on an AMD
> machine as well. So where does it get used?
as I wrote in my previous mails... timechart.
Thomas
Otherwise the following might happen (4 cpu system):
cpu 3 wants to change freq of cpu 0:
...
switch to cpu 0 (some overhead)
write new frequency
switch back (some overhead)
switch to cpu 0 (some overhead)
write trace
switch back (some overhead)
...
It's easy to see, that writing of the frequency and writing the trace
should be called directly one after another. It increases the accuracy
of the trace and decreases the overhead.
Robert
Thomas
> It also seem to be (hopefully) a minor feature for timechart, so this should
> not hurt that much (yet).
It's actually a major feature for timechart, and one of the key things I and a bunch of others
inside Intel use timechart for.
--
Robert Schoene
Technische Universitaet Dresden
Zentrum fuer Informationsdienste und Hochleistungsrechnen
01062 Dresden
Tel.: (0351) 463-42483, Fax: (0351) 463-37773
E-Mail: Robert....@tu-dresden.de
--
post change would work... that gets frequency afaik..
no
hooking into the post frequency change callback that gets done..
which is guaranteed to be on the right cpu afaics.
But what assures that this is executed on the correct cpu, which seem
to be necessary with the current trace function?
Thomas
b) adding an item to the cpufreq_transition_notifier_list
c) adding it to cpufreq_stats.c/cpufreq_stat_notifier_trans
which would imply the usage of smp_call_function_single(...)
The next problem where current implementation is unfixable broken with
the tracer just passing the frequency is the fact that several CPUs
could get switched with one MSR write to a depending CPU (SW_ANY).
The same btw applies to C-states for which the tracer is used in
the same way (compare with 8.4.2.2 _CSD (C-State Dependency) of a
current ACPI spec).
No idea what the impact on userspace tools is, if I find some time
I can have a look at timechart how trace data gets read/used.
But I fear the Cstate tracing is used in some more tools already?
It would be great to get feedback/suggestions from people making use
of it already.
Below is still broken, but should make things at least a bit better:
---
X86 cpufreq: Fix powertracer in acpi-cpufreq and exec it on the correct cpu(s)
Several things are broken with the tracer currently.
This patch fixes:
- With the userspace governor the wrong cpu could get tracked if the target
function is executed on a CPU which does not get switched
- In SW_ALL CPU dependency case (CPUFREQ_SHARED_TYPE_ALL) only one CPU got
tracked. Now all CPUs that depend on each other are tracked.
What this patch does not fix:
- In SW_ANY case (CPUFREQ_SHARED_TYPE_ANY) it's enough to write to a MSR of
one of the depending CPUs. The power trace macro misses the ability
to pass the cpu. Thus only one of the depending CPUs gets tracked correctly.
To be able to fix this the power trace macro must get extended.
Signed-off-by: Thomas Renninger <tr...@suse.de>
CC: Robert Schöne <robert....@tu-dresden.de>
CC: x...@kernel.org
CC: cpufreq <cpu...@vger.kernel.org>
CC: Arjan van de Ven <ar...@linux.intel.com>
CC: sta...@kernel.org
---
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 1b1920f..259c49e 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -144,6 +144,7 @@ struct drv_cmd {
struct io_addr io;
} addr;
u32 val;
+ unsigned int frequency;
};
/* Called via smp_call_function_single(), on the target CPU */
@@ -177,11 +178,13 @@ static void do_drv_write(void *_cmd)
rdmsr(cmd->addr.msr.reg, lo, hi);
lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
wrmsr(cmd->addr.msr.reg, lo, hi);
+ trace_power_frequency(POWER_PSTATE, cmd->frequency);
break;
case SYSTEM_IO_CAPABLE:
acpi_os_write_port((acpi_io_address)cmd->addr.io.port,
cmd->val,
(u32)cmd->addr.io.bit_width);
+ trace_power_frequency(POWER_PSTATE, cmd->frequency);
break;
default:
break;
@@ -363,7 +366,7 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
}
}
- trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
+ cmd.frequency = data->freq_table[next_state].frequency;
switch (data->cpu_feature) {
case SYSTEM_INTEL_MSR_CAPABLE:
--
Robert Schoene
Technische Universitaet Dresden
Zentrum fuer Informationsdienste und Hochleistungsrechnen
01062 Dresden
Tel.: (0351) 463-42483, Fax: (0351) 463-37773
E-Mail: Robert....@tu-dresden.de
--