Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[tip:tracing/tasks] tracing: fix command line to pid reverse map

143 views
Skip to first unread message

Carsten Emde

unread,
Mar 18, 2009, 5:30:10 AM3/18/09
to
Commit-ID: a635cf0497342978d417cae19d4a4823932977ff
Gitweb: http://git.kernel.org/tip/a635cf0497342978d417cae19d4a4823932977ff
Author: Carsten Emde <Carste...@osadl.org>
AuthorDate: Wed, 18 Mar 2009 09:00:41 +0100
Commit: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 18 Mar 2009 10:10:18 +0100

tracing: fix command line to pid reverse map

Impact: fix command line to pid mapping

map_cmdline_to_pid[] is checked in trace_save_cmdline(), but never
updated. This results in stale pid to command line mappings and the
tracer output will associate the wrong comm string.

Signed-off-by: Carsten Emde <Carste...@osadl.org>
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Cc: Steven Rostedt <sros...@redhat.com>
Cc: Frederic Weisbecker <fwei...@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/trace/trace.c | 16 +++++++++++-----
1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 06c69a2..305c562 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -738,8 +738,7 @@ void trace_stop_cmdline_recording(void);

static void trace_save_cmdline(struct task_struct *tsk)
{
- unsigned map;
- unsigned idx;
+ unsigned pid, idx;

if (!tsk->pid || unlikely(tsk->pid > PID_MAX_DEFAULT))
return;
@@ -757,10 +756,17 @@ static void trace_save_cmdline(struct task_struct *tsk)
if (idx == NO_CMDLINE_MAP) {
idx = (cmdline_idx + 1) % SAVED_CMDLINES;

- map = map_cmdline_to_pid[idx];
- if (map != NO_CMDLINE_MAP)
- map_pid_to_cmdline[map] = NO_CMDLINE_MAP;
+ /*
+ * Check whether the cmdline buffer at idx has a pid
+ * mapped. We are going to overwrite that entry so we
+ * need to clear the map_pid_to_cmdline. Otherwise we
+ * would read the new comm for the old pid.
+ */
+ pid = map_cmdline_to_pid[idx];
+ if (pid != NO_CMDLINE_MAP)
+ map_pid_to_cmdline[pid] = NO_CMDLINE_MAP;

+ map_cmdline_to_pid[idx] = tsk->pid;
map_pid_to_cmdline[tsk->pid] = idx;

cmdline_idx = idx;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Thomas Gleixner

unread,
Mar 18, 2009, 5:30:15 AM3/18/09
to
Commit-ID: 18aecd362a1c991fbf5f7919ae051a77532ba2f8
Gitweb: http://git.kernel.org/tip/18aecd362a1c991fbf5f7919ae051a77532ba2f8
Author: Thomas Gleixner <tg...@linutronix.de>
AuthorDate: Wed, 18 Mar 2009 08:56:58 +0100
Commit: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 18 Mar 2009 10:10:16 +0100

tracing: stop command line recording when tracing is disabled

Impact: prevent overwrite of command line entries

When the tracer is stopped the command line recording continues to
record. The check for tracing_is_on() is not sufficient here as the
ringbuffer status is not affected by setting
debug/tracing/tracing_enabled to 0. On a non idle system this can
result in the loss of the command line information for the stopped
trace, which makes the trace harder to read and analyse.

Check tracer_enabled to allow further recording.

Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Cc: Steven Rostedt <sros...@redhat.com>
Cc: Frederic Weisbecker <fwei...@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/trace/trace.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 1ce6208..7b6043e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -797,7 +797,8 @@ void trace_find_cmdline(int pid, char comm[])

void tracing_record_cmdline(struct task_struct *tsk)
{
- if (atomic_read(&trace_record_cmdline_disabled) || !tracing_is_on())
+ if (atomic_read(&trace_record_cmdline_disabled) || !tracer_enabled ||
+ !tracing_is_on())
return;

trace_save_cmdline(tsk);

Thomas Gleixner

unread,
Mar 18, 2009, 5:30:16 AM3/18/09
to
Commit-ID: 2c7eea4c62ba090b7f4583c3d7337ea0019be900
Gitweb: http://git.kernel.org/tip/2c7eea4c62ba090b7f4583c3d7337ea0019be900
Author: Thomas Gleixner <tg...@linutronix.de>
AuthorDate: Wed, 18 Mar 2009 09:03:19 +0100
Commit: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 18 Mar 2009 10:10:17 +0100

tracing: replace the crude (unsigned) -1 hackery

Impact: cleanup

The command line recorder uses (unsigned) -1 to mark non mapped
entries in the pid to command line maps. The validity check is
completely unintuitive: idx >= SAVED_CMDLINES

There is no need for such casting games. Use a constant to mark
unmapped entries and check for that constant to make the code readable
and understandable.

Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Cc: Steven Rostedt <sros...@redhat.com>
Cc: Frederic Weisbecker <fwei...@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/trace/trace.c | 13 +++++++------
1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7b6043e..ca673c4 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -633,6 +633,7 @@ void tracing_reset_online_cpus(struct trace_array *tr)
}

#define SAVED_CMDLINES 128
+#define NO_CMDLINE_MAP UINT_MAX
static unsigned map_pid_to_cmdline[PID_MAX_DEFAULT+1];
static unsigned map_cmdline_to_pid[SAVED_CMDLINES];
static char saved_cmdlines[SAVED_CMDLINES][TASK_COMM_LEN];
@@ -644,8 +645,8 @@ static atomic_t trace_record_cmdline_disabled __read_mostly;

static void trace_init_cmdlines(void)
{
- memset(&map_pid_to_cmdline, -1, sizeof(map_pid_to_cmdline));
- memset(&map_cmdline_to_pid, -1, sizeof(map_cmdline_to_pid));
+ memset(&map_pid_to_cmdline, NO_CMDLINE_MAP, sizeof(map_pid_to_cmdline));
+ memset(&map_cmdline_to_pid, NO_CMDLINE_MAP, sizeof(map_cmdline_to_pid));
cmdline_idx = 0;
}

@@ -753,12 +754,12 @@ static void trace_save_cmdline(struct task_struct *tsk)
return;

idx = map_pid_to_cmdline[tsk->pid];
- if (idx >= SAVED_CMDLINES) {
+ if (idx == NO_CMDLINE_MAP) {


idx = (cmdline_idx + 1) % SAVED_CMDLINES;

map = map_cmdline_to_pid[idx];
- if (map <= PID_MAX_DEFAULT)
- map_pid_to_cmdline[map] = (unsigned)-1;
+ if (map != NO_CMDLINE_MAP)
+ map_pid_to_cmdline[map] = NO_CMDLINE_MAP;



map_pid_to_cmdline[tsk->pid] = idx;

@@ -786,7 +787,7 @@ void trace_find_cmdline(int pid, char comm[])

__raw_spin_lock(&trace_cmdline_lock);
map = map_pid_to_cmdline[pid];
- if (map >= SAVED_CMDLINES)
+ if (map == NO_CMDLINE_MAP)
goto out;

strcpy(comm, saved_cmdlines[map]);

Thomas Gleixner

unread,
Mar 18, 2009, 5:30:15 AM3/18/09
to
Commit-ID: 50d88758a3f9787cbdbdbc030560b815721eab4b
Gitweb: http://git.kernel.org/tip/50d88758a3f9787cbdbdbc030560b815721eab4b
Author: Thomas Gleixner <tg...@linutronix.de>
AuthorDate: Wed, 18 Mar 2009 08:58:44 +0100

Commit: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 18 Mar 2009 10:10:17 +0100

tracing: fix trace_find_cmdline()

Impact: prevent stale command line output

In case there is no valid command line mapping for a pid
trace_find_cmdline() returns without updating the comm buffer. The
trace dump keeps the previous entry which results in confusing trace
output:

<idle>-0 [000] 280.702056 ....
<idle>-23456 [000] 280.702080 ....

Update the comm buffer with "<...>" when no mapping is found.

Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Cc: Steven Rostedt <sros...@redhat.com>
Cc: Frederic Weisbecker <fwei...@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/trace/trace.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ca673c4..06c69a2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -787,12 +787,11 @@ void trace_find_cmdline(int pid, char comm[])



__raw_spin_lock(&trace_cmdline_lock);
map = map_pid_to_cmdline[pid];

- if (map == NO_CMDLINE_MAP)
- goto out;
-
- strcpy(comm, saved_cmdlines[map]);


+ if (map != NO_CMDLINE_MAP)

+ strcpy(comm, saved_cmdlines[map]);
+ else
+ strcpy(comm, "<...>");

- out:
__raw_spin_unlock(&trace_cmdline_lock);

Ingo Molnar

unread,
Mar 29, 2009, 6:30:15 PM3/29/09
to
Commit-ID: 56aea8468746e673a4bf50b6a13d97b2d1cbe1e8
Gitweb: http://git.kernel.org/tip/56aea8468746e673a4bf50b6a13d97b2d1cbe1e8
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Sun, 29 Mar 2009 23:47:48 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 30 Mar 2009 00:10:22 +0200

x86/mm: further cleanups of fault.c's include file section

Impact: cleanup

Eliminate more than 20 unnecessary #include lines in fault.c

Also fix include file dependency bug in asm/traps.h. (this was
masked before, by implicit inclusion)

Cc: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Ingo Molnar <mi...@elte.hu>
LKML-Reference: <new-submission>
Acked-by: H. Peter Anvin <h...@linux.intel.com>


---
arch/x86/include/asm/traps.h | 1 +
arch/x86/mm/fault.c | 41 ++++++++---------------------------------
2 files changed, 9 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 0d53425..37fb07a 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -2,6 +2,7 @@
#define _ASM_X86_TRAPS_H

#include <asm/debugreg.h>
+#include <asm/siginfo.h> /* TRAP_TRACE, ... */

#ifdef CONFIG_X86_32
#define dotraplinkage
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a03b727..f3c4d03 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -3,40 +3,15 @@
* Copyright (C) 2001, 2002 Andi Kleen, SuSE Labs.
* Copyright (C) 2008-2009, Red Hat Inc., Ingo Molnar
*/
-#include <linux/interrupt.h>
-#include <linux/mmiotrace.h>
-#include <linux/bootmem.h>
-#include <linux/compiler.h>
-#include <linux/highmem.h>
-#include <linux/kprobes.h>
-#include <linux/uaccess.h>
-#include <linux/vmalloc.h>
-#include <linux/vt_kern.h>
-#include <linux/signal.h>
-#include <linux/kernel.h>
-#include <linux/ptrace.h>
-#include <linux/string.h>
-#include <linux/module.h>
-#include <linux/kdebug.h>
-#include <linux/errno.h>
-#include <linux/magic.h>
-#include <linux/sched.h>
-#include <linux/types.h>
-#include <linux/init.h>
-#include <linux/mman.h>
-#include <linux/tty.h>
-#include <linux/smp.h>
-#include <linux/mm.h>
-
-#include <asm-generic/sections.h>
-
-#include <asm/tlbflush.h>
-#include <asm/pgalloc.h>
-#include <asm/segment.h>
-#include <asm/system.h>
-#include <asm/proto.h>
-#include <asm/traps.h>
-#include <asm/desc.h>
+#include <linux/magic.h> /* STACK_END_MAGIC */
+#include <linux/kdebug.h> /* oops_begin/end, ... */
+#include <linux/module.h> /* search_exception_table */
+#include <linux/bootmem.h> /* max_low_pfn */
+#include <linux/kprobes.h> /* __kprobes, ... */
+#include <linux/mmiotrace.h> /* kmmio_handler, ... */
+
+#include <asm/traps.h> /* dotraplinkage, ... */
+#include <asm/pgalloc.h> /* pgd_*(), ... */

/*
* Page fault error code bits:

Ingo Molnar

unread,
Mar 30, 2009, 8:10:13 AM3/30/09
to
Commit-ID: a2bcd4731f77cb77ae4b5e4a3d7f5471cf346c33
Gitweb: http://git.kernel.org/tip/a2bcd4731f77cb77ae4b5e4a3d7f5471cf346c33

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Sun, 29 Mar 2009 23:47:48 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 30 Mar 2009 14:02:02 +0200

x86/mm: further cleanups of fault.c's include file section

Impact: cleanup

Eliminate more than 20 unnecessary #include lines in fault.c

Also fix include file dependency bug in asm/traps.h. (this was
masked before, by implicit inclusion)

Signed-off-by: Ingo Molnar <mi...@elte.hu>
LKML-Reference: <tip-56aea8468746e673a4...@git.kernel.org>


Acked-by: H. Peter Anvin <h...@linux.intel.com>


---
arch/x86/include/asm/traps.h | 1 +
arch/x86/mm/fault.c | 42 +++++++++---------------------------------
2 files changed, 10 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index 0d53425..37fb07a 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -2,6 +2,7 @@
#define _ASM_X86_TRAPS_H

#include <asm/debugreg.h>
+#include <asm/siginfo.h> /* TRAP_TRACE, ... */

#ifdef CONFIG_X86_32
#define dotraplinkage
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c

index a03b727..24a36a6 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -3,40 +3,16 @@

+#include <linux/sched.h> /* test_thread_flag(), ... */

Mike Galbraith

unread,
Apr 1, 2009, 6:30:14 AM4/1/09
to
Commit-ID: f0e36dc28173b65df2216dfae7109645d97a1bd9
Gitweb: http://git.kernel.org/tip/f0e36dc28173b65df2216dfae7109645d97a1bd9
Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Fri, 27 Mar 2009 12:13:43 +0100
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 1 Apr 2009 12:08:51 +0200

perf_counter tools: kerneltop: add real-time data acquisition thread

Decouple kerneltop display from event acquisition by introducing
a separate data acquisition thread. This fixes annnoying kerneltop
display refresh jitter and missed events.

Also add a -r <prio> option, to switch the data acquisition thread
to real-time priority.

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/kerneltop.c | 57 ++++++++++++++++++++++----------
1 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/Documentation/perf_counter/kerneltop.c b/Documentation/perf_counter/kerneltop.c
index 430810d..33b4fcf 100644
--- a/Documentation/perf_counter/kerneltop.c
+++ b/Documentation/perf_counter/kerneltop.c
@@ -77,6 +77,8 @@
#include <errno.h>
#include <ctype.h>
#include <time.h>
+#include <sched.h>
+#include <pthread.h>

#include <sys/syscall.h>
#include <sys/ioctl.h>
@@ -181,6 +183,7 @@ static int tid = -1;
static int profile_cpu = -1;
static int nr_cpus = 0;
static int nmi = 1;
+static unsigned int realtime_prio = 0;
static int group = 0;
static unsigned int page_size;
static unsigned int mmap_pages = 16;
@@ -334,6 +337,7 @@ static void display_help(void)
" -l # show scale factor for RR events\n"
" -d delay --delay=<seconds> # sampling/display delay [default: 2]\n"
" -f CNT --filter=CNT # min-event-count filter [default: 100]\n\n"
+ " -r prio --realtime=<prio> # event acquisition runs with SCHED_FIFO policy\n"
" -s symbol --symbol=<symbol> # function to be showed annotated one-shot\n"
" -x path --vmlinux=<path> # the vmlinux binary, required for -s use\n"
" -z --zero # zero counts after display\n"
@@ -620,7 +624,6 @@ static int compare(const void *__sym1, const void *__sym2)
return sym_weight(sym1) < sym_weight(sym2);
}

-static time_t last_refresh;
static long events;
static long userspace_events;
static const char CONSOLE_CLEAR[] = " [H [2J";
@@ -634,6 +637,7 @@ static void print_sym_table(void)
float events_per_sec = events/delay_secs;
float kevents_per_sec = (events-userspace_events)/delay_secs;

+ events = userspace_events = 0;
memcpy(tmp, sym_table, sizeof(sym_table[0])*sym_table_count);
qsort(tmp, sym_table_count, sizeof(tmp[0]), compare);

@@ -714,8 +718,6 @@ static void print_sym_table(void)
if (sym_filter_entry)
show_details(sym_filter_entry);

- last_refresh = time(NULL);
-
{
struct pollfd stdin_poll = { .fd = 0, .events = POLLIN };

@@ -726,6 +728,16 @@ static void print_sym_table(void)
}
}

+static void *display_thread(void *arg)
+{
+ printf("KernelTop refresh period: %d seconds\n", delay_secs);
+
+ while (!sleep(delay_secs))
+ print_sym_table();
+
+ return NULL;
+}
+
static int read_symbol(FILE *in, struct sym_entry *s)
{
static int filter_match = 0;
@@ -1081,19 +1093,20 @@ static void process_options(int argc, char *argv[])
{"filter", required_argument, NULL, 'f'},
{"group", required_argument, NULL, 'g'},
{"help", no_argument, NULL, 'h'},
- {"scale", no_argument, NULL, 'l'},
{"nmi", required_argument, NULL, 'n'},
+ {"mmap_info", no_argument, NULL, 'M'},
+ {"mmap_pages", required_argument, NULL, 'm'},
+ {"munmap_info", no_argument, NULL, 'U'},
{"pid", required_argument, NULL, 'p'},
- {"vmlinux", required_argument, NULL, 'x'},
+ {"realtime", required_argument, NULL, 'r'},
+ {"scale", no_argument, NULL, 'l'},
{"symbol", required_argument, NULL, 's'},
{"stat", no_argument, NULL, 'S'},
+ {"vmlinux", required_argument, NULL, 'x'},
{"zero", no_argument, NULL, 'z'},
- {"mmap_pages", required_argument, NULL, 'm'},
- {"mmap_info", no_argument, NULL, 'M'},
- {"munmap_info", no_argument, NULL, 'U'},
{NULL, 0, NULL, 0 }
};
- int c = getopt_long(argc, argv, "+:ac:C:d:De:f:g:hln:m:p:s:Sx:zMU",
+ int c = getopt_long(argc, argv, "+:ac:C:d:De:f:g:hln:m:p:r:s:Sx:zMU",
long_options, &option_index);
if (c == -1)
break;
@@ -1127,6 +1140,7 @@ static void process_options(int argc, char *argv[])
profile_cpu = -1;
}
tid = atoi(optarg); break;
+ case 'r': realtime_prio = atoi(optarg); break;
case 's': sym_filter = strdup(optarg); break;
case 'S': run_perfstat = 1; break;
case 'x': vmlinux = strdup(optarg); break;
@@ -1289,6 +1303,7 @@ int main(int argc, char *argv[])
struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS];
struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
struct perf_counter_hw_event hw_event;
+ pthread_t thread;
int i, counter, group_fd, nr_poll = 0;
unsigned int cpu;
int ret;
@@ -1363,8 +1378,20 @@ int main(int argc, char *argv[])
}
}

- printf("KernelTop refresh period: %d seconds\n", delay_secs);
- last_refresh = time(NULL);
+ if (pthread_create(&thread, NULL, display_thread, NULL)) {
+ printf("Could not create display thread.\n");
+ exit(-1);
+ }
+
+ if (realtime_prio) {
+ struct sched_param param;
+
+ param.sched_priority = realtime_prio;
+ if (sched_setscheduler(0, SCHED_FIFO, &param)) {
+ printf("Could not set realtime priority.\n");
+ exit(-1);
+ }
+ }

while (1) {
int hits = events;
@@ -1374,14 +1401,8 @@ int main(int argc, char *argv[])
mmap_read(&mmap_array[i][counter]);
}

- if (time(NULL) >= last_refresh + delay_secs) {
- print_sym_table();
- events = userspace_events = 0;
- }
-
if (hits == events)
- ret = poll(event_array, nr_poll, 1000);
- hits = events;
+ ret = poll(event_array, nr_poll, 100);
}

return 0;

tip-bot for Ingo Molnar

unread,
May 4, 2009, 1:40:10 PM5/4/09
to
Commit-ID: 1dce8d99b85aba6eddb8b8260baea944922e6fe7
Gitweb: http://git.kernel.org/tip/1dce8d99b85aba6eddb8b8260baea944922e6fe7
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Mon, 4 May 2009 19:23:18 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 4 May 2009 19:30:42 +0200

perf_counter: convert perf_resource_mutex to a spinlock

Now percpu counters can be initialized very early. But the init
sequence uses mutex_lock(). Fortunately, perf_resource_mutex should
be a spinlock anyway, so convert it.

[ Impact: fix crash due to early init mutex use ]

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/perf_counter.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index fcdafa2..5f86a11 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -46,9 +46,9 @@ static atomic_t nr_comm_tracking __read_mostly;
int sysctl_perf_counter_priv __read_mostly; /* do we need to be privileged */

/*
- * Mutex for (sysadmin-configurable) counter reservations:
+ * Lock for (sysadmin-configurable) counter reservations:
*/
-static DEFINE_MUTEX(perf_resource_mutex);
+static DEFINE_SPINLOCK(perf_resource_lock);

/*
* Architecture provided APIs - weak aliases:
@@ -3207,9 +3207,9 @@ static void __cpuinit perf_counter_init_cpu(int cpu)
cpuctx = &per_cpu(perf_cpu_context, cpu);
__perf_counter_init_context(&cpuctx->ctx, NULL);

- mutex_lock(&perf_resource_mutex);
+ spin_lock(&perf_resource_lock);
cpuctx->max_pertask = perf_max_counters - perf_reserved_percpu;
- mutex_unlock(&perf_resource_mutex);
+ spin_unlock(&perf_resource_lock);

hw_perf_counter_setup(cpu);
}
@@ -3292,7 +3292,7 @@ perf_set_reserve_percpu(struct sysdev_class *class,
if (val > perf_max_counters)
return -EINVAL;

- mutex_lock(&perf_resource_mutex);
+ spin_lock(&perf_resource_lock);
perf_reserved_percpu = val;
for_each_online_cpu(cpu) {
cpuctx = &per_cpu(perf_cpu_context, cpu);
@@ -3302,7 +3302,7 @@ perf_set_reserve_percpu(struct sysdev_class *class,
cpuctx->max_pertask = mpt;
spin_unlock_irq(&cpuctx->ctx.lock);
}
- mutex_unlock(&perf_resource_mutex);
+ spin_unlock(&perf_resource_lock);

return count;
}
@@ -3324,9 +3324,9 @@ perf_set_overcommit(struct sysdev_class *class, const char *buf, size_t count)
if (val > 1)
return -EINVAL;

- mutex_lock(&perf_resource_mutex);
+ spin_lock(&perf_resource_lock);
perf_overcommit = val;
- mutex_unlock(&perf_resource_mutex);
+ spin_unlock(&perf_resource_lock);

return count;

tip-bot for Ingo Molnar

unread,
May 4, 2009, 1:40:09 PM5/4/09
to
Commit-ID: 0d905bca23aca5c86a10ee101bcd3b1abbd40b25
Gitweb: http://git.kernel.org/tip/0d905bca23aca5c86a10ee101bcd3b1abbd40b25
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Mon, 4 May 2009 19:13:30 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 4 May 2009 19:30:32 +0200

perf_counter: initialize the per-cpu context earlier

percpu scheduling for perfcounters wants to take the context lock,
but that lock first needs to be initialized. Currently it is an
early_initcall() - but that is too late, the task tick runs much
sooner than that.

Call it explicitly from the scheduler init sequence instead.

[ Impact: fix access-before-init crash ]

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
include/linux/perf_counter.h | 5 ++++-
kernel/perf_counter.c | 5 +----
kernel/sched.c | 5 ++++-
3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index f776851..a356fa6 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -573,6 +573,8 @@ extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs);

extern int sysctl_perf_counter_priv;

+extern void perf_counter_init(void);
+
#else
static inline void
perf_counter_task_sched_in(struct task_struct *task, int cpu) { }
@@ -600,9 +602,10 @@ perf_counter_mmap(unsigned long addr, unsigned long len,

static inline void
perf_counter_munmap(unsigned long addr, unsigned long len,
- unsigned long pgoff, struct file *file) { }
+ unsigned long pgoff, struct file *file) { }

static inline void perf_counter_comm(struct task_struct *tsk) { }
+static inline void perf_counter_init(void) { }
#endif

#endif /* __KERNEL__ */
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index b9679c3..fcdafa2 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -3265,15 +3265,12 @@ static struct notifier_block __cpuinitdata perf_cpu_nb = {
.notifier_call = perf_cpu_notify,
};

-static int __init perf_counter_init(void)
+void __init perf_counter_init(void)
{
perf_cpu_notify(&perf_cpu_nb, (unsigned long)CPU_UP_PREPARE,
(void *)(long)smp_processor_id());
register_cpu_notifier(&perf_cpu_nb);
-
- return 0;
}
-early_initcall(perf_counter_init);

static ssize_t perf_show_reserve_percpu(struct sysdev_class *class, char *buf)
{
diff --git a/kernel/sched.c b/kernel/sched.c
index 2f600e3..a728976 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -39,6 +39,7 @@
#include <linux/completion.h>
#include <linux/kernel_stat.h>
#include <linux/debug_locks.h>
+#include <linux/perf_counter.h>
#include <linux/security.h>
#include <linux/notifier.h>
#include <linux/profile.h>
@@ -8996,7 +8997,7 @@ void __init sched_init(void)
* 1024) and two child groups A0 and A1 (of weight 1024 each),
* then A0's share of the cpu resource is:
*
- * A0's bandwidth = 1024 / (10*1024 + 1024 + 1024) = 8.33%
+ * A0's bandwidth = 1024 / (10*1024 + 1024 + 1024) = 8.33%
*
* We achieve this by letting init_task_group's tasks sit
* directly in rq->cfs (i.e init_task_group->se[] = NULL).
@@ -9097,6 +9098,8 @@ void __init sched_init(void)
alloc_bootmem_cpumask_var(&cpu_isolated_map);
#endif /* SMP */

+ perf_counter_init();
+
scheduler_running = 1;

tip-bot for Ingo Molnar

unread,
May 4, 2009, 1:40:14 PM5/4/09
to
Commit-ID: b82914ce33146186d554b0f5c41e4e13693614ce
Gitweb: http://git.kernel.org/tip/b82914ce33146186d554b0f5c41e4e13693614ce
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Mon, 4 May 2009 18:54:32 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 4 May 2009 19:29:57 +0200

perf_counter: round-robin per-CPU counters too

This used to be unstable when we had the rq->lock dependencies,
but now that they are that of the past we can turn on percpu
counter RR too.

[ Impact: handle counter over-commit for per-CPU counters too ]

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/perf_counter.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 8660ae5..b9679c3 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1069,18 +1069,14 @@ void perf_counter_task_tick(struct task_struct *curr, int cpu)
{
struct perf_cpu_context *cpuctx = &per_cpu(perf_cpu_context, cpu);
struct perf_counter_context *ctx = &curr->perf_counter_ctx;
- const int rotate_percpu = 0;

- if (rotate_percpu)
- perf_counter_cpu_sched_out(cpuctx);
+ perf_counter_cpu_sched_out(cpuctx);
perf_counter_task_sched_out(curr, cpu);

- if (rotate_percpu)
- rotate_ctx(&cpuctx->ctx);
+ rotate_ctx(&cpuctx->ctx);
rotate_ctx(ctx);

- if (rotate_percpu)
- perf_counter_cpu_sched_in(cpuctx, cpu);
+ perf_counter_cpu_sched_in(cpuctx, cpu);
perf_counter_task_sched_in(curr, cpu);

Steven Rostedt

unread,
May 6, 2009, 10:30:21 AM5/6/09
to

On Wed, 6 May 2009, tip-bot for Jaswinder Singh Rajput wrote:

> Commit-ID: 48dd0fed90e2b1f1ba87401439b85942181c6df3
> Gitweb: http://git.kernel.org/tip/48dd0fed90e2b1f1ba87401439b85942181c6df3
> Author: Jaswinder Singh Rajput <jasw...@kernel.org>
> AuthorDate: Wed, 6 May 2009 15:45:45 +0530
> Committer: Ingo Molnar <mi...@elte.hu>
> CommitDate: Wed, 6 May 2009 14:19:16 +0200
>
> tracing: trace_output.c, fix false positive compiler warning
>
> This compiler warning:
>
> CC kernel/trace/trace_output.o
> kernel/trace/trace_output.c: In function ?register_ftrace_event?:
> kernel/trace/trace_output.c:544: warning: ?list? may be used uninitialized in this function
>
> Is wrong as 'list' is always initialized - but GCC (4.3.2) does not
> recognize this relationship properly.
>
> Work around the warning by initializing the variable to NULL.
>
> [ Impact: fix false positive compiler warning ]
>
> Signed-off-by: Jaswinder Singh Rajput <jaswind...@gmail.com>
> Acked-by: Steven Rostedt <ros...@goodmis.org>


> LKML-Reference: <new-submission>
> Signed-off-by: Ingo Molnar <mi...@elte.hu>
>
>
> ---

> kernel/trace/trace_output.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
> index 5fc51f0..8bd9a2c 100644
> --- a/kernel/trace/trace_output.c
> +++ b/kernel/trace/trace_output.c
> @@ -541,7 +541,7 @@ int register_ftrace_event(struct trace_event *event)
> INIT_LIST_HEAD(&event->list);
>
> if (!event->type) {
> - struct list_head *list;
> + struct list_head *list = NULL;

Actually this is the wrong place to initialize. The correct place is in
the function that is expected to.

>
> if (next_event_type > FTRACE_MAX_EVENT) {
>

Could you test this patch instead:

tracing: quiet gcc compile warning

Some versions of gcc can not catch the fact that the list variable in
register_ftrace_event is initialized. There's one place in the logic that
is a bit complex. The trace_search_list function that initializes the list
will not initialize it on error. But that's OK, because the caller checks
for error and will not use the list variable if there is one. Some
versions of gcc miss this.

[ Impact: quiet gcc from complaining about an unused variable ]

Signed-off-by: Steven Rostedt <ros...@goodmis.org>

diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 5fc51f0..e949cf6 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -506,8 +506,10 @@ static int trace_search_list(struct list_head **list)
}

/* Did we used up all 65 thousand events??? */
- if ((last + 1) > FTRACE_MAX_EVENT)
+ if ((last + 1) > FTRACE_MAX_EVENT) {
+ *list = NULL;
return 0;
+ }

*list = &e->list;
return last + 1;

Ingo Molnar

unread,
May 6, 2009, 10:40:17 AM5/6/09
to

does it really matter? It's far more robust to initialize at the
definition site, because there we can be sure there's no
side-effects. This one:

> /* Did we used up all 65 thousand events??? */
> - if ((last + 1) > FTRACE_MAX_EVENT)
> + if ((last + 1) > FTRACE_MAX_EVENT) {
> + *list = NULL;
> return 0;
> + }

Is correct too but needs a semantic check (and ongoing maintenance,
etc.).

Ingo

Steven Rostedt

unread,
May 6, 2009, 11:00:23 AM5/6/09
to

Actually, to answer this, we need to look at the entire code. Just looking
at the changes of a patch does not include the big picture.

The original code is:

static int trace_search_list(struct list_head **list)

{
struct trace_event *e;
int last = __TRACE_LAST_TYPE;

if (list_empty(&ftrace_event_list)) {
*list = &ftrace_event_list;
return last + 1;
}

/*
* We used up all possible max events,
* lets see if somebody freed one.
*/
list_for_each_entry(e, &ftrace_event_list, list) {
if (e->type != last + 1)
break;
last++;
}

/* Did we used up all 65 thousand events??? */

if ((last + 1) > FTRACE_MAX_EVENT)

return 0;

*list = &e->list;
return last + 1;
}

[...]
struct list_head *list;

if (next_event_type > FTRACE_MAX_EVENT) {

event->type = trace_search_list(&list);
if (!event->type)
goto out;

} else {

event->type = next_event_type++;
list = &ftrace_event_list;
}

if (WARN_ON(ftrace_find_event(event->type)))
goto out;

list_add_tail(&event->list, list);

The caller is:

struct list_head *list;

if () {
event->type = trace_seach_list(&list);
} else {
[...]
list = &ftrace_event_list;
}

This code shows that list is initialized by either trace_seach_list() or
set manually.

Thus, my change makes trace_search_list always initialize the list
variable. Thus if trace_search_list() is used someplace else, it will not
cause us this error again.

If gcc can not figure out that trace_search_list initializes the code
(from the original code), the

struct list_head *list = NULL;

will always be performed, because gcc thinks that's the only way to
guarantee that it will be initialized.

My solution, gcc can easily see that trace_search_list will always
initialize list, and will not set it needlessly to NULL.

-- Steve

tip-bot for Christoph Hellwig

unread,
May 6, 2009, 11:00:34 AM5/6/09
to
Commit-ID: 35cf723e99c0e26ddf51f037dffaa4ff2c2c9106
Gitweb: http://git.kernel.org/tip/35cf723e99c0e26ddf51f037dffaa4ff2c2c9106
Author: Christoph Hellwig <h...@lst.de>
AuthorDate: Wed, 6 May 2009 12:33:38 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 6 May 2009 16:48:56 +0200

tracing: small trave_events sample Makefile cleanup

Use -I$(src) to add the current directory the include path.

[ Impact: cleanup ]

Signed-off-by: Christoph Hellwig <h...@lst.de>


Acked-by: Steven Rostedt <ros...@goodmis.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
samples/trace_events/Makefile | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/samples/trace_events/Makefile b/samples/trace_events/Makefile
index 06c6dea..0d428dc 100644
--- a/samples/trace_events/Makefile
+++ b/samples/trace_events/Makefile
@@ -1,8 +1,6 @@
# builds the trace events example kernel modules;
# then to use one (as root): insmod <module_name.ko>

-PWD := $(shell pwd)
-
-CFLAGS_trace-events-sample.o := -I$(PWD)/samples/trace_events/
+CFLAGS_trace-events-sample.o := -I$(src)

obj-$(CONFIG_SAMPLE_TRACE_EVENTS) += trace-events-sample.o

tip-bot for Ingo Molnar

unread,
May 7, 2009, 5:30:20 AM5/7/09
to
Commit-ID: 643bec956544d376b7c2a80a3d5c3d0bf94da8d3
Gitweb: http://git.kernel.org/tip/643bec956544d376b7c2a80a3d5c3d0bf94da8d3
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Thu, 7 May 2009 09:12:50 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Thu, 7 May 2009 09:32:10 +0200

x86: clean up arch/x86/kernel/tsc_sync.c a bit

- remove unused define
- make the lock variable definition stand out some more
- convert KERN_* to pr_info() / pr_warning()

[ Impact: cleanup ]

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/tsc_sync.c | 14 ++++++--------
1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index bf36328..027b5b4 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -34,6 +34,7 @@ static __cpuinitdata atomic_t stop_count;
* of a critical section, to be able to prove TSC time-warps:
*/
static __cpuinitdata raw_spinlock_t sync_lock = __RAW_SPIN_LOCK_UNLOCKED;
+
static __cpuinitdata cycles_t last_tsc;
static __cpuinitdata cycles_t max_warp;
static __cpuinitdata int nr_warps;
@@ -113,13 +114,12 @@ void __cpuinit check_tsc_sync_source(int cpu)
return;

if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) {
- printk(KERN_INFO
- "Skipping synchronization checks as TSC is reliable.\n");
+ pr_info("Skipping synchronization checks as TSC is reliable.\n");
return;
}

- printk(KERN_INFO "checking TSC synchronization [CPU#%d -> CPU#%d]:",
- smp_processor_id(), cpu);
+ pr_info("checking TSC synchronization [CPU#%d -> CPU#%d]:",
+ smp_processor_id(), cpu);

/*
* Reset it - in case this is a second bootup:
@@ -143,8 +143,8 @@ void __cpuinit check_tsc_sync_source(int cpu)

if (nr_warps) {
printk("\n");
- printk(KERN_WARNING "Measured %Ld cycles TSC warp between CPUs,"
- " turning off TSC clock.\n", max_warp);
+ pr_warning("Measured %Ld cycles TSC warp between CPUs, "
+ "turning off TSC clock.\n", max_warp);
mark_tsc_unstable("check_tsc_sync_source failed");
} else {
printk(" passed.\n");
@@ -195,5 +195,3 @@ void __cpuinit check_tsc_sync_target(void)
while (atomic_read(&stop_count) != cpus)
cpu_relax();
}
-#undef NR_LOOPS
-

tip-bot for Yinghai Lu

unread,
May 11, 2009, 6:00:27 AM5/11/09
to
Commit-ID: 61fe91e1319556f32bebfd7ed2c68ef02e2c17f7
Gitweb: http://git.kernel.org/tip/61fe91e1319556f32bebfd7ed2c68ef02e2c17f7
Author: Yinghai Lu <yin...@kernel.org>
AuthorDate: Sat, 9 May 2009 23:47:42 -0700
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 11 May 2009 10:52:40 +0200

x86: apic: Check rev 3 fadt correctly for physical_apic bit

Impact: fix fadt version checking

FADT2_REVISION_ID has value 3 aka rev 3 FADT. So need to use >= instead
of >, as other places in the code do.

[ Impact: extend scope of APIC boot quirk ]

Signed-off-by: Yinghai Lu <yin...@kernel.org>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/apic/apic_flat_64.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c
index 306e5e8..744e6d8 100644
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -235,7 +235,7 @@ static int physflat_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
* regardless of how many processors are present (x86_64 ES7000
* is an example).
*/
- if (acpi_gbl_FADT.header.revision > FADT2_REVISION_ID &&
+ if (acpi_gbl_FADT.header.revision >= FADT2_REVISION_ID &&
(acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL)) {
printk(KERN_DEBUG "system APIC only can use physical flat");
return 1;

tip-bot for Yinghai Lu

unread,
May 11, 2009, 6:00:34 AM5/11/09
to
Commit-ID: 3e0c373749d7eb5b354ac0b043f2b2cdf84eefef
Gitweb: http://git.kernel.org/tip/3e0c373749d7eb5b354ac0b043f2b2cdf84eefef

Author: Yinghai Lu <yin...@kernel.org>
AuthorDate: Sat, 9 May 2009 23:47:42 -0700
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 11 May 2009 10:57:24 +0200

x86: clean up and fix setup_clear/force_cpu_cap handling

setup_force_cpu_cap() only have one user (Xen guest code),
but it should not reuse cleared_cpu_cpus, otherwise it
will have problems on SMP.

Need to have a separate cpu_cpus_set array too, for forced-on
flags, beyond the forced-off flags.

Also need to setup handling before all cpus caps are combined.

[ Impact: fix the forced-set CPU feature flag logic ]

Cc: H. Peter Anvin <h...@linux.intel.com>
Cc: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Signed-off-by: Yinghai Lu <yingh...@kernel.org>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/include/asm/cpufeature.h | 4 ++--
arch/x86/include/asm/processor.h | 3 ++-
arch/x86/kernel/cpu/common.c | 17 ++++++++++++-----
3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index ccc1061..13cc6a5 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -192,11 +192,11 @@ extern const char * const x86_power_flags[32];
#define clear_cpu_cap(c, bit) clear_bit(bit, (unsigned long *)((c)->x86_capability))
#define setup_clear_cpu_cap(bit) do { \
clear_cpu_cap(&boot_cpu_data, bit); \
- set_bit(bit, (unsigned long *)cleared_cpu_caps); \
+ set_bit(bit, (unsigned long *)cpu_caps_cleared); \
} while (0)
#define setup_force_cpu_cap(bit) do { \
set_cpu_cap(&boot_cpu_data, bit); \
- clear_bit(bit, (unsigned long *)cleared_cpu_caps); \
+ set_bit(bit, (unsigned long *)cpu_caps_set); \
} while (0)

#define cpu_has_fpu boot_cpu_has(X86_FEATURE_FPU)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c2cceae..fed93fe 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -135,7 +135,8 @@ extern struct cpuinfo_x86 boot_cpu_data;
extern struct cpuinfo_x86 new_cpu_data;

extern struct tss_struct doublefault_tss;
-extern __u32 cleared_cpu_caps[NCAPINTS];
+extern __u32 cpu_caps_cleared[NCAPINTS];
+extern __u32 cpu_caps_set[NCAPINTS];

#ifdef CONFIG_SMP
DECLARE_PER_CPU_SHARED_ALIGNED(struct cpuinfo_x86, cpu_info);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index c4f6678..e7fd5c4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -292,7 +292,8 @@ static const char *__cpuinit table_lookup_model(struct cpuinfo_x86 *c)
return NULL; /* Not found */
}

-__u32 cleared_cpu_caps[NCAPINTS] __cpuinitdata;
+__u32 cpu_caps_cleared[NCAPINTS] __cpuinitdata;
+__u32 cpu_caps_set[NCAPINTS] __cpuinitdata;

void load_percpu_segment(int cpu)
{
@@ -806,6 +807,16 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
#endif

init_hypervisor(c);
+
+ /*
+ * Clear/Set all flags overriden by options, need do it
+ * before following smp all cpus cap AND.
+ */
+ for (i = 0; i < NCAPINTS; i++) {
+ c->x86_capability[i] &= ~cpu_caps_cleared[i];
+ c->x86_capability[i] |= cpu_caps_set[i];
+ }
+
/*
* On SMP, boot_cpu_data holds the common feature set between
* all CPUs; so make sure that we indicate which features are
@@ -818,10 +829,6 @@ static void __cpuinit identify_cpu(struct cpuinfo_x86 *c)
boot_cpu_data.x86_capability[i] &= c->x86_capability[i];
}

- /* Clear all flags overriden by options */
- for (i = 0; i < NCAPINTS; i++)
- c->x86_capability[i] &= ~cleared_cpu_caps[i];
-
#ifdef CONFIG_X86_MCE
/* Init Machine Check Exception if available. */
mcheck_init(c);

tip-bot for Mike Galbraith

unread,
May 11, 2009, 6:20:14 AM5/11/09
to
Commit-ID: 8823392360dc4992f87bf4c623834d315f297493
Gitweb: http://git.kernel.org/tip/8823392360dc4992f87bf4c623834d315f297493
Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Sun, 10 May 2009 10:53:05 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 11 May 2009 12:04:30 +0200

perf_counter, x86: clean up throttling printk

s/PERFMON/perfcounters for perfcounter interrupt throttling warning.

'perfmon' is the CPU feature name that is Intel-only, while we do
throttling in a generic way.

[ Impact: cleanup ]

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Cc: Robert Richter <robert....@amd.com>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 2 +-


1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index a6878b0..da27419 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -814,7 +814,7 @@ void perf_counter_unthrottle(void)
cpuc = &__get_cpu_var(cpu_hw_counters);
if (cpuc->interrupts >= PERFMON_MAX_INTERRUPTS) {
if (printk_ratelimit())
- printk(KERN_WARNING "PERFMON: max interrupts exceeded!\n");
+ printk(KERN_WARNING "perfcounters: max interrupts exceeded!\n");
hw_perf_restore(cpuc->throttle_ctrl);
}
cpuc->interrupts = 0;

tip-bot for Ingo Molnar

unread,
May 12, 2009, 10:40:10 AM5/12/09
to
Commit-ID: bbcc73ee0c8684b11cce247ad98d6929a0bdfcaf
Gitweb: http://git.kernel.org/tip/bbcc73ee0c8684b11cce247ad98d6929a0bdfcaf
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 12 May 2009 16:29:13 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 12 May 2009 16:32:17 +0200

lockdep: increase MAX_LOCKDEP_ENTRIES

Now that lockdep coverage has increased it has become easier to
run out of entries:

[ 21.401387] BUG: MAX_LOCKDEP_ENTRIES too low!
[ 21.402007] turning off the locking correctness validator.
[ 21.402007] Pid: 1555, comm: S99local Not tainted 2.6.30-rc5-tip #2
[ 21.402007] Call Trace:
[ 21.402007] [<ffffffff81069789>] add_lock_to_list+0x53/0xba
[ 21.402007] [<ffffffff810eb615>] ? lookup_mnt+0x19/0x53
[ 21.402007] [<ffffffff8106be14>] check_prev_add+0x14b/0x1c7
[ 21.402007] [<ffffffff8106c304>] validate_chain+0x474/0x52a
[ 21.402007] [<ffffffff8106c6fc>] __lock_acquire+0x342/0x3c7
[ 21.402007] [<ffffffff8106c842>] lock_acquire+0xc1/0xe5
[ 21.402007] [<ffffffff810eb615>] ? lookup_mnt+0x19/0x53
[ 21.402007] [<ffffffff8153aedc>] _spin_lock+0x31/0x66

Double the size - as we've done in the past.

[ Impact: allow lockdep to cover more locks ]

Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/lockdep_internals.h | 2 +-


1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
index a2cc7e9..ce523ac 100644
--- a/kernel/lockdep_internals.h
+++ b/kernel/lockdep_internals.h
@@ -54,7 +54,7 @@ enum {
* table (if it's not there yet), and we check it for lock order
* conflicts and deadlocks.
*/
-#define MAX_LOCKDEP_ENTRIES 8192UL
+#define MAX_LOCKDEP_ENTRIES 16384UL

#define MAX_LOCKDEP_CHAINS_BITS 14
#define MAX_LOCKDEP_CHAINS (1UL << MAX_LOCKDEP_CHAINS_BITS)

tip-bot for Ingo Molnar

unread,
May 12, 2009, 2:30:22 PM5/12/09
to
Commit-ID: d80c19df5fcceb8c741e96f09f275c2da719efef
Gitweb: http://git.kernel.org/tip/d80c19df5fcceb8c741e96f09f275c2da719efef

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 12 May 2009 16:29:13 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 12 May 2009 19:59:52 +0200

lockdep: increase MAX_LOCKDEP_ENTRIES and MAX_LOCKDEP_CHAINS

Now that lockdep coverage has increased it has become easier to
run out of entries:

[ 21.401387] BUG: MAX_LOCKDEP_ENTRIES too low!
[ 21.402007] turning off the locking correctness validator.
[ 21.402007] Pid: 1555, comm: S99local Not tainted 2.6.30-rc5-tip #2
[ 21.402007] Call Trace:
[ 21.402007] [<ffffffff81069789>] add_lock_to_list+0x53/0xba
[ 21.402007] [<ffffffff810eb615>] ? lookup_mnt+0x19/0x53
[ 21.402007] [<ffffffff8106be14>] check_prev_add+0x14b/0x1c7
[ 21.402007] [<ffffffff8106c304>] validate_chain+0x474/0x52a
[ 21.402007] [<ffffffff8106c6fc>] __lock_acquire+0x342/0x3c7
[ 21.402007] [<ffffffff8106c842>] lock_acquire+0xc1/0xe5
[ 21.402007] [<ffffffff810eb615>] ? lookup_mnt+0x19/0x53
[ 21.402007] [<ffffffff8153aedc>] _spin_lock+0x31/0x66

Double the size - as we've done in the past.

[ Impact: allow lockdep to cover more locks ]

Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/lockdep_internals.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/lockdep_internals.h b/kernel/lockdep_internals.h
index a2cc7e9..699a2ac 100644
--- a/kernel/lockdep_internals.h
+++ b/kernel/lockdep_internals.h
@@ -54,9 +54,9 @@ enum {


* table (if it's not there yet), and we check it for lock order
* conflicts and deadlocks.
*/
-#define MAX_LOCKDEP_ENTRIES 8192UL
+#define MAX_LOCKDEP_ENTRIES 16384UL

-#define MAX_LOCKDEP_CHAINS_BITS 14
+#define MAX_LOCKDEP_CHAINS_BITS 15
#define MAX_LOCKDEP_CHAINS (1UL << MAX_LOCKDEP_CHAINS_BITS)

#define MAX_LOCKDEP_CHAIN_HLOCKS (MAX_LOCKDEP_CHAINS*5)

tip-bot for Peter Zijlstra

unread,
May 13, 2009, 2:30:21 AM5/13/09
to
Commit-ID: 5bb9efe33ea4001a17ab98186a40a134a3061d67
Gitweb: http://git.kernel.org/tip/5bb9efe33ea4001a17ab98186a40a134a3061d67
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Wed, 13 May 2009 08:12:51 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 13 May 2009 08:17:37 +0200

perf_counter: fix print debug irq disable

inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
bash/15802 [HC0[0]:SC0[0]:HE1:SE1] takes:
(sysrq_key_table_lock){?.....},

Don't unconditionally enable interrupts in the perf_counter_print_debug()
path.

[ Impact: fix potential deadlock pointed out by lockdep ]

LKML-Reference: <new-submission>
Reported-by: Ingo Molnar <mi...@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


---
arch/x86/kernel/cpu/perf_counter.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index da27419..f7772ff 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -621,12 +621,13 @@ void perf_counter_print_debug(void)
{
u64 ctrl, status, overflow, pmc_ctrl, pmc_count, prev_left, fixed;
struct cpu_hw_counters *cpuc;
+ unsigned long flags;
int cpu, idx;

if (!x86_pmu.num_counters)
return;

- local_irq_disable();
+ local_irq_save(flags);

cpu = smp_processor_id();
cpuc = &per_cpu(cpu_hw_counters, cpu);
@@ -664,7 +665,7 @@ void perf_counter_print_debug(void)
pr_info("CPU#%d: fixed-PMC%d count: %016llx\n",
cpu, idx, pmc_count);
}
- local_irq_enable();
+ local_irq_restore(flags);
}

static void x86_pmu_disable(struct perf_counter *counter)

tip-bot for Randy Dunlap

unread,
May 13, 2009, 10:00:16 AM5/13/09
to
Commit-ID: 44408ad7368906c84000e87a99c14a16dbb867fd
Gitweb: http://git.kernel.org/tip/44408ad7368906c84000e87a99c14a16dbb867fd
Author: Randy Dunlap <randy....@oracle.com>
AuthorDate: Tue, 12 May 2009 13:31:40 -0700
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 13 May 2009 15:43:55 +0200

xen: use header for EXPORT_SYMBOL_GPL

mmu.c needs to #include module.h to prevent these warnings:

arch/x86/xen/mmu.c:239: warning: data definition has no type or storage class
arch/x86/xen/mmu.c:239: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
arch/x86/xen/mmu.c:239: warning: parameter names (without types) in function declaration

[ Impact: cleanup ]

Signed-off-by: Randy Dunlap <randy....@oracle.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fi...@citrix.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/xen/mmu.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index e25a78e..fba55b1 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -42,6 +42,7 @@
#include <linux/highmem.h>
#include <linux/debugfs.h>
#include <linux/bug.h>
+#include <linux/module.h>

#include <asm/pgtable.h>
#include <asm/tlbflush.h>

tip-bot for Ingo Molnar

unread,
May 15, 2009, 4:50:14 AM5/15/09
to
Commit-ID: 9029a5e3801f1cc7cdaab80169d82427acf928d8
Gitweb: http://git.kernel.org/tip/9029a5e3801f1cc7cdaab80169d82427acf928d8
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Fri, 15 May 2009 08:26:20 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:47:06 +0200

perf_counter: x86: Protect against infinite loops in intel_pmu_handle_irq()

intel_pmu_handle_irq() can lock up in an infinite loop if the hardware
does not allow the acking of irqs. Alas, this happened in testing so
make this robust and emit a warning if it happens in the future.

Also, clean up the IRQ handlers a bit.

[ Impact: improve perfcounter irq/nmi handling robustness ]

Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 25 ++++++++++++++++++-------
1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 46a82d1..5a7f718 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -722,9 +722,13 @@ static void intel_pmu_save_and_restart(struct perf_counter *counter)
*/
static int intel_pmu_handle_irq(struct pt_regs *regs, int nmi)
{
- int bit, cpu = smp_processor_id();
+ struct cpu_hw_counters *cpuc;
+ struct cpu_hw_counters;
+ int bit, cpu, loops;
u64 ack, status;
- struct cpu_hw_counters *cpuc = &per_cpu(cpu_hw_counters, cpu);
+
+ cpu = smp_processor_id();
+ cpuc = &per_cpu(cpu_hw_counters, cpu);

perf_disable();
status = intel_pmu_get_status();
@@ -733,7 +737,13 @@ static int intel_pmu_handle_irq(struct pt_regs *regs, int nmi)
return 0;
}

+ loops = 0;
again:
+ if (++loops > 100) {
+ WARN_ONCE(1, "perfcounters: irq loop stuck!\n");
+ return 1;
+ }
+
inc_irq_stat(apic_perf_irqs);
ack = status;
for_each_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
@@ -765,13 +775,14 @@ again:

static int amd_pmu_handle_irq(struct pt_regs *regs, int nmi)
{
- int cpu = smp_processor_id();
- struct cpu_hw_counters *cpuc = &per_cpu(cpu_hw_counters, cpu);
- u64 val;
- int handled = 0;
+ int cpu, idx, throttle = 0, handled = 0;
+ struct cpu_hw_counters *cpuc;
struct perf_counter *counter;
struct hw_perf_counter *hwc;
- int idx, throttle = 0;
+ u64 val;
+
+ cpu = smp_processor_id();
+ cpuc = &per_cpu(cpu_hw_counters, cpu);

if (++cpuc->interrupts == PERFMON_MAX_INTERRUPTS) {
throttle = 1;

tip-bot for Peter Zijlstra

unread,
May 15, 2009, 4:50:10 AM5/15/09
to
Commit-ID: a026dfecc035f213c1cfa0bf6407ce3155f6a9df
Gitweb: http://git.kernel.org/tip/a026dfecc035f213c1cfa0bf6407ce3155f6a9df
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Wed, 13 May 2009 10:02:57 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:46:57 +0200

perf_counter: x86: Allow unpriviliged use of NMIs

Apply sysctl_perf_counter_priv to NMIs. Also, fail the counter
creation instead of silently down-grading to regular interrupts.

[ Impact: allow wider perf-counter usage ]

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 88ae8ce..c19e927 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -280,8 +280,11 @@ static int __hw_perf_counter_init(struct perf_counter *counter)
* If privileged enough, allow NMI events:
*/
hwc->nmi = 0;
- if (capable(CAP_SYS_ADMIN) && hw_event->nmi)
+ if (hw_event->nmi) {
+ if (sysctl_perf_counter_priv && !capable(CAP_SYS_ADMIN))
+ return -EACCES;
hwc->nmi = 1;
+ }

hwc->irq_period = hw_event->irq_period;
if ((s64)hwc->irq_period <= 0 || hwc->irq_period > x86_pmu.max_period)

tip-bot for Ingo Molnar

unread,
May 15, 2009, 4:50:13 AM5/15/09
to
Commit-ID: 1c80f4b598d9b075a2a0be694e28be93a6702bcc
Gitweb: http://git.kernel.org/tip/1c80f4b598d9b075a2a0be694e28be93a6702bcc
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Fri, 15 May 2009 08:25:22 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:47:05 +0200

perf_counter: x86: Disallow interval of 1

On certain CPUs i have observed a stuck PMU if interval was set to
1 and NMIs were used. The PMU had PMC0 set in MSR_CORE_PERF_GLOBAL_STATUS,
but it was not possible to ack it via MSR_CORE_PERF_GLOBAL_OVF_CTRL,
and the NMI loop got stuck infinitely.

[ Impact: fix rare hangs during high perfcounter load ]

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 1dcf670..46a82d1 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -473,6 +473,11 @@ x86_perf_counter_set_period(struct perf_counter *counter,
left += period;
atomic64_set(&hwc->period_left, left);
}
+ /*
+ * Quirk: certain CPUs dont like it if just 1 event is left:
+ */
+ if (unlikely(left < 2))
+ left = 2;

per_cpu(prev_left[idx], smp_processor_id()) = left;

tip-bot for Peter Zijlstra

unread,
May 15, 2009, 4:50:14 AM5/15/09
to
Commit-ID: 9e35ad388bea89f7d6f375af4c0ae98803688666
Gitweb: http://git.kernel.org/tip/9e35ad388bea89f7d6f375af4c0ae98803688666
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Wed, 13 May 2009 16:21:38 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:47:02 +0200

perf_counter: Rework the perf counter disable/enable

The current disable/enable mechanism is:

token = hw_perf_save_disable();
...
/* do bits */
...
hw_perf_restore(token);

This works well, provided that the use nests properly. Except we don't.

x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore
provide a reference counter disable/enable interface, where the first disable
disables the hardware, and the last enable enables the hardware again.

[ Impact: refactor, simplify the PMU disable/enable logic ]

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/powerpc/kernel/perf_counter.c | 24 ++++----
arch/x86/kernel/cpu/perf_counter.c | 113 +++++++++++++----------------------
drivers/acpi/processor_idle.c | 6 +-
include/linux/perf_counter.h | 10 ++-
kernel/perf_counter.c | 76 +++++++++++++++---------
5 files changed, 109 insertions(+), 120 deletions(-)

diff --git a/arch/powerpc/kernel/perf_counter.c b/arch/powerpc/kernel/perf_counter.c
index 15cdc8e..bb1b463 100644
--- a/arch/powerpc/kernel/perf_counter.c
+++ b/arch/powerpc/kernel/perf_counter.c
@@ -386,7 +386,7 @@ static void write_mmcr0(struct cpu_hw_counters *cpuhw, unsigned long mmcr0)
* Disable all counters to prevent PMU interrupts and to allow
* counters to be added or removed.
*/
-u64 hw_perf_save_disable(void)
+void hw_perf_disable(void)
{
struct cpu_hw_counters *cpuhw;
unsigned long ret;
@@ -428,7 +428,6 @@ u64 hw_perf_save_disable(void)
mb();
}
local_irq_restore(flags);
- return ret;
}

/*
@@ -436,7 +435,7 @@ u64 hw_perf_save_disable(void)
* If we were previously disabled and counters were added, then
* put the new config on the PMU.
*/
-void hw_perf_restore(u64 disable)
+void hw_perf_enable(void)
{
struct perf_counter *counter;
struct cpu_hw_counters *cpuhw;
@@ -448,9 +447,12 @@ void hw_perf_restore(u64 disable)
int n_lim;
int idx;

- if (disable)
- return;
local_irq_save(flags);
+ if (!cpuhw->disabled) {
+ local_irq_restore(flags);
+ return;
+ }
+
cpuhw = &__get_cpu_var(cpu_hw_counters);
cpuhw->disabled = 0;

@@ -649,19 +651,18 @@ int hw_perf_group_sched_in(struct perf_counter *group_leader,
/*
* Add a counter to the PMU.
* If all counters are not already frozen, then we disable and
- * re-enable the PMU in order to get hw_perf_restore to do the
+ * re-enable the PMU in order to get hw_perf_enable to do the
* actual work of reconfiguring the PMU.
*/
static int power_pmu_enable(struct perf_counter *counter)
{
struct cpu_hw_counters *cpuhw;
unsigned long flags;
- u64 pmudis;
int n0;
int ret = -EAGAIN;

local_irq_save(flags);
- pmudis = hw_perf_save_disable();
+ perf_disable();

/*
* Add the counter to the list (if there is room)
@@ -685,7 +686,7 @@ static int power_pmu_enable(struct perf_counter *counter)

ret = 0;
out:
- hw_perf_restore(pmudis);
+ perf_enable();
local_irq_restore(flags);
return ret;
}
@@ -697,11 +698,10 @@ static void power_pmu_disable(struct perf_counter *counter)
{
struct cpu_hw_counters *cpuhw;
long i;
- u64 pmudis;
unsigned long flags;

local_irq_save(flags);
- pmudis = hw_perf_save_disable();
+ perf_disable();

power_pmu_read(counter);

@@ -735,7 +735,7 @@ static void power_pmu_disable(struct perf_counter *counter)
cpuhw->mmcr[0] &= ~(MMCR0_PMXE | MMCR0_FCECE);
}

- hw_perf_restore(pmudis);
+ perf_enable();
local_irq_restore(flags);
}

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 7601c01..313638c 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -31,7 +31,6 @@ struct cpu_hw_counters {
unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
unsigned long active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
unsigned long interrupts;
- u64 throttle_ctrl;
int enabled;
};

@@ -42,8 +41,8 @@ struct x86_pmu {
const char *name;
int version;
int (*handle_irq)(struct pt_regs *, int);
- u64 (*save_disable_all)(void);
- void (*restore_all)(u64);
+ void (*disable_all)(void);
+ void (*enable_all)(void);
void (*enable)(struct hw_perf_counter *, int);
void (*disable)(struct hw_perf_counter *, int);
unsigned eventsel;
@@ -56,6 +55,7 @@ struct x86_pmu {
int counter_bits;
u64 counter_mask;
u64 max_period;
+ u64 intel_ctrl;
};

static struct x86_pmu x86_pmu __read_mostly;
@@ -311,22 +311,19 @@ static int __hw_perf_counter_init(struct perf_counter *counter)
return 0;
}

-static u64 intel_pmu_save_disable_all(void)
+static void intel_pmu_disable_all(void)
{
- u64 ctrl;
-
- rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, ctrl);
wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
-
- return ctrl;
}

-static u64 amd_pmu_save_disable_all(void)
+static void amd_pmu_disable_all(void)
{
struct cpu_hw_counters *cpuc = &__get_cpu_var(cpu_hw_counters);
- int enabled, idx;
+ int idx;
+
+ if (!cpuc->enabled)
+ return;

- enabled = cpuc->enabled;
cpuc->enabled = 0;
/*
* ensure we write the disable before we start disabling the
@@ -334,8 +331,6 @@ static u64 amd_pmu_save_disable_all(void)
* right thing.
*/
barrier();
- if (!enabled)
- goto out;

for (idx = 0; idx < x86_pmu.num_counters; idx++) {
u64 val;
@@ -348,37 +343,31 @@ static u64 amd_pmu_save_disable_all(void)
val &= ~ARCH_PERFMON_EVENTSEL0_ENABLE;
wrmsrl(MSR_K7_EVNTSEL0 + idx, val);
}
-
-out:
- return enabled;
}

-u64 hw_perf_save_disable(void)
+void hw_perf_disable(void)
{
if (!x86_pmu_initialized())
- return 0;
- return x86_pmu.save_disable_all();
+ return;
+ return x86_pmu.disable_all();
}
-/*
- * Exported because of ACPI idle
- */
-EXPORT_SYMBOL_GPL(hw_perf_save_disable);

-static void intel_pmu_restore_all(u64 ctrl)
+static void intel_pmu_enable_all(void)
{
- wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, ctrl);
+ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
}

-static void amd_pmu_restore_all(u64 ctrl)
+static void amd_pmu_enable_all(void)
{
struct cpu_hw_counters *cpuc = &__get_cpu_var(cpu_hw_counters);
int idx;

- cpuc->enabled = ctrl;
- barrier();
- if (!ctrl)
+ if (cpuc->enabled)
return;

+ cpuc->enabled = 1;
+ barrier();
+
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
u64 val;

@@ -392,16 +381,12 @@ static void amd_pmu_restore_all(u64 ctrl)
}
}

-void hw_perf_restore(u64 ctrl)
+void hw_perf_enable(void)
{
if (!x86_pmu_initialized())
return;
- x86_pmu.restore_all(ctrl);
+ x86_pmu.enable_all();
}
-/*
- * Exported because of ACPI idle
- */
-EXPORT_SYMBOL_GPL(hw_perf_restore);

static inline u64 intel_pmu_get_status(void)
{
@@ -735,15 +720,14 @@ static int intel_pmu_handle_irq(struct pt_regs *regs, int nmi)


int bit, cpu = smp_processor_id();

u64 ack, status;


struct cpu_hw_counters *cpuc = &per_cpu(cpu_hw_counters, cpu);

- int ret = 0;
-
- cpuc->throttle_ctrl = intel_pmu_save_disable_all();

+ perf_disable();
status = intel_pmu_get_status();
- if (!status)
- goto out;
+ if (!status) {
+ perf_enable();
+ return 0;
+ }

- ret = 1;
again:
inc_irq_stat(apic_perf_irqs);
ack = status;
@@ -767,19 +751,11 @@ again:
status = intel_pmu_get_status();
if (status)
goto again;
-out:
- /*
- * Restore - do not reenable when global enable is off or throttled:
- */
- if (cpuc->throttle_ctrl) {
- if (++cpuc->interrupts < PERFMON_MAX_INTERRUPTS) {
- intel_pmu_restore_all(cpuc->throttle_ctrl);
- } else {
- pr_info("CPU#%d: perfcounters: max interrupt rate exceeded! Throttle on.\n", smp_processor_id());
- }
- }

- return ret;
+ if (++cpuc->interrupts != PERFMON_MAX_INTERRUPTS)
+ perf_enable();
+
+ return 1;


}

static int amd_pmu_handle_irq(struct pt_regs *regs, int nmi)

@@ -792,13 +768,11 @@ static int amd_pmu_handle_irq(struct pt_regs *regs, int nmi)
struct hw_perf_counter *hwc;


int idx, throttle = 0;

- cpuc->throttle_ctrl = cpuc->enabled;
- cpuc->enabled = 0;
- barrier();
-
- if (cpuc->throttle_ctrl) {
- if (++cpuc->interrupts >= PERFMON_MAX_INTERRUPTS)
- throttle = 1;
+ if (++cpuc->interrupts == PERFMON_MAX_INTERRUPTS) {
+ throttle = 1;
+ __perf_disable();
+ cpuc->enabled = 0;
+ barrier();
}

for (idx = 0; idx < x86_pmu.num_counters; idx++) {
@@ -824,9 +798,6 @@ next:
amd_pmu_disable_counter(hwc, idx);
}

- if (cpuc->throttle_ctrl && !throttle)
- cpuc->enabled = 1;
-
return handled;
}

@@ -839,13 +810,11 @@ void perf_counter_unthrottle(void)



cpuc = &__get_cpu_var(cpu_hw_counters);
if (cpuc->interrupts >= PERFMON_MAX_INTERRUPTS) {

- pr_info("CPU#%d: perfcounters: throttle off.\n", smp_processor_id());
-
/*
* Clear them before re-enabling irqs/NMIs again:
*/
cpuc->interrupts = 0;
- hw_perf_restore(cpuc->throttle_ctrl);
+ perf_enable();
} else {
cpuc->interrupts = 0;
}
@@ -931,8 +900,8 @@ static __read_mostly struct notifier_block perf_counter_nmi_notifier = {
static struct x86_pmu intel_pmu = {
.name = "Intel",
.handle_irq = intel_pmu_handle_irq,
- .save_disable_all = intel_pmu_save_disable_all,
- .restore_all = intel_pmu_restore_all,
+ .disable_all = intel_pmu_disable_all,
+ .enable_all = intel_pmu_enable_all,
.enable = intel_pmu_enable_counter,
.disable = intel_pmu_disable_counter,
.eventsel = MSR_ARCH_PERFMON_EVENTSEL0,
@@ -951,8 +920,8 @@ static struct x86_pmu intel_pmu = {
static struct x86_pmu amd_pmu = {
.name = "AMD",
.handle_irq = amd_pmu_handle_irq,
- .save_disable_all = amd_pmu_save_disable_all,
- .restore_all = amd_pmu_restore_all,
+ .disable_all = amd_pmu_disable_all,
+ .enable_all = amd_pmu_enable_all,
.enable = amd_pmu_enable_counter,
.disable = amd_pmu_disable_counter,
.eventsel = MSR_K7_EVNTSEL0,
@@ -1003,6 +972,8 @@ static int intel_pmu_init(void)
x86_pmu.counter_bits = eax.split.bit_width;
x86_pmu.counter_mask = (1ULL << eax.split.bit_width) - 1;

+ rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl);
+
return 0;
}

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index d2830f3..9645758 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -763,11 +763,9 @@ static int acpi_idle_bm_check(void)
*/
static inline void acpi_idle_do_entry(struct acpi_processor_cx *cx)
{
- u64 perf_flags;
-
/* Don't trace irqs off for idle */
stop_critical_timings();
- perf_flags = hw_perf_save_disable();
+ perf_disable();
if (cx->entry_method == ACPI_CSTATE_FFH) {
/* Call into architectural FFH based C-state */
acpi_processor_ffh_cstate_enter(cx);
@@ -782,7 +780,7 @@ static inline void acpi_idle_do_entry(struct acpi_processor_cx *cx)
gets asserted in time to freeze execution properly. */
unused = inl(acpi_gbl_FADT.xpm_timer_block.address);
}
- hw_perf_restore(perf_flags);
+ perf_enable();
start_critical_timings();
}

diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 614f921..e543ecc 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -544,8 +544,10 @@ extern void perf_counter_exit_task(struct task_struct *child);
extern void perf_counter_do_pending(void);
extern void perf_counter_print_debug(void);
extern void perf_counter_unthrottle(void);
-extern u64 hw_perf_save_disable(void);
-extern void hw_perf_restore(u64 ctrl);
+extern void __perf_disable(void);
+extern bool __perf_enable(void);
+extern void perf_disable(void);
+extern void perf_enable(void);
extern int perf_counter_task_disable(void);
extern int perf_counter_task_enable(void);
extern int hw_perf_group_sched_in(struct perf_counter *group_leader,
@@ -600,8 +602,8 @@ static inline void perf_counter_exit_task(struct task_struct *child) { }
static inline void perf_counter_do_pending(void) { }
static inline void perf_counter_print_debug(void) { }
static inline void perf_counter_unthrottle(void) { }
-static inline void hw_perf_restore(u64 ctrl) { }
-static inline u64 hw_perf_save_disable(void) { return 0; }
+static inline void perf_disable(void) { }
+static inline void perf_enable(void) { }
static inline int perf_counter_task_disable(void) { return -EINVAL; }
static inline int perf_counter_task_enable(void) { return -EINVAL; }

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 985be0b..e814ff0 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -60,8 +60,9 @@ extern __weak const struct pmu *hw_perf_counter_init(struct perf_counter *counte
return NULL;
}

-u64 __weak hw_perf_save_disable(void) { return 0; }
-void __weak hw_perf_restore(u64 ctrl) { barrier(); }
+void __weak hw_perf_disable(void) { barrier(); }
+void __weak hw_perf_enable(void) { barrier(); }
+
void __weak hw_perf_counter_setup(int cpu) { barrier(); }
int __weak hw_perf_group_sched_in(struct perf_counter *group_leader,
struct perf_cpu_context *cpuctx,
@@ -72,6 +73,32 @@ int __weak hw_perf_group_sched_in(struct perf_counter *group_leader,

void __weak perf_counter_print_debug(void) { }

+static DEFINE_PER_CPU(int, disable_count);
+
+void __perf_disable(void)
+{
+ __get_cpu_var(disable_count)++;
+}
+
+bool __perf_enable(void)
+{
+ return !--__get_cpu_var(disable_count);
+}
+
+void perf_disable(void)
+{
+ __perf_disable();
+ hw_perf_disable();
+}
+EXPORT_SYMBOL_GPL(perf_disable); /* ACPI idle */
+
+void perf_enable(void)
+{
+ if (__perf_enable())
+ hw_perf_enable();
+}
+EXPORT_SYMBOL_GPL(perf_enable); /* ACPI idle */
+
static void
list_add_counter(struct perf_counter *counter, struct perf_counter_context *ctx)
{
@@ -170,7 +197,6 @@ static void __perf_counter_remove_from_context(void *info)
struct perf_counter *counter = info;
struct perf_counter_context *ctx = counter->ctx;
unsigned long flags;
- u64 perf_flags;

/*
* If this is a task context, we need to check whether it is
@@ -191,9 +217,9 @@ static void __perf_counter_remove_from_context(void *info)
* Protect the list operation against NMI by disabling the
* counters on a global level. NOP for non NMI based counters.
*/
- perf_flags = hw_perf_save_disable();
+ perf_disable();
list_del_counter(counter, ctx);
- hw_perf_restore(perf_flags);
+ perf_enable();

if (!ctx->task) {
/*
@@ -538,7 +564,6 @@ static void __perf_install_in_context(void *info)
struct perf_counter *leader = counter->group_leader;
int cpu = smp_processor_id();
unsigned long flags;
- u64 perf_flags;
int err;

/*
@@ -556,7 +581,7 @@ static void __perf_install_in_context(void *info)
* Protect the list operation against NMI by disabling the
* counters on a global level. NOP for non NMI based counters.
*/
- perf_flags = hw_perf_save_disable();
+ perf_disable();

add_counter_to_ctx(counter, ctx);

@@ -596,7 +621,7 @@ static void __perf_install_in_context(void *info)
cpuctx->max_pertask--;

unlock:
- hw_perf_restore(perf_flags);
+ perf_enable();

spin_unlock_irqrestore(&ctx->lock, flags);
}
@@ -663,7 +688,6 @@ static void __perf_counter_enable(void *info)
struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
struct perf_counter_context *ctx = counter->ctx;
struct perf_counter *leader = counter->group_leader;
- unsigned long pmuflags;
unsigned long flags;
int err;

@@ -693,14 +717,14 @@ static void __perf_counter_enable(void *info)
if (!group_can_go_on(counter, cpuctx, 1)) {
err = -EEXIST;
} else {
- pmuflags = hw_perf_save_disable();
+ perf_disable();
if (counter == leader)
err = group_sched_in(counter, cpuctx, ctx,
smp_processor_id());
else
err = counter_sched_in(counter, cpuctx, ctx,
smp_processor_id());
- hw_perf_restore(pmuflags);
+ perf_enable();
}

if (err) {
@@ -795,7 +819,6 @@ void __perf_counter_sched_out(struct perf_counter_context *ctx,
struct perf_cpu_context *cpuctx)
{
struct perf_counter *counter;
- u64 flags;

spin_lock(&ctx->lock);
ctx->is_active = 0;
@@ -803,12 +826,12 @@ void __perf_counter_sched_out(struct perf_counter_context *ctx,
goto out;
update_context_time(ctx);

- flags = hw_perf_save_disable();
+ perf_disable();
if (ctx->nr_active) {
list_for_each_entry(counter, &ctx->counter_list, list_entry)
group_sched_out(counter, cpuctx, ctx);
}
- hw_perf_restore(flags);
+ perf_enable();
out:
spin_unlock(&ctx->lock);
}
@@ -860,7 +883,6 @@ __perf_counter_sched_in(struct perf_counter_context *ctx,
struct perf_cpu_context *cpuctx, int cpu)
{
struct perf_counter *counter;
- u64 flags;
int can_add_hw = 1;

spin_lock(&ctx->lock);
@@ -870,7 +892,7 @@ __perf_counter_sched_in(struct perf_counter_context *ctx,

ctx->timestamp = perf_clock();

- flags = hw_perf_save_disable();
+ perf_disable();

/*
* First go through the list and put on any pinned groups
@@ -917,7 +939,7 @@ __perf_counter_sched_in(struct perf_counter_context *ctx,
can_add_hw = 0;
}
}
- hw_perf_restore(flags);
+ perf_enable();
out:
spin_unlock(&ctx->lock);
}
@@ -955,7 +977,6 @@ int perf_counter_task_disable(void)


struct perf_counter_context *ctx = &curr->perf_counter_ctx;

struct perf_counter *counter;
unsigned long flags;
- u64 perf_flags;

if (likely(!ctx->nr_counters))
return 0;
@@ -969,7 +990,7 @@ int perf_counter_task_disable(void)
/*
* Disable all the counters:
*/
- perf_flags = hw_perf_save_disable();
+ perf_disable();

list_for_each_entry(counter, &ctx->counter_list, list_entry) {
if (counter->state != PERF_COUNTER_STATE_ERROR) {
@@ -978,7 +999,7 @@ int perf_counter_task_disable(void)
}
}

- hw_perf_restore(perf_flags);
+ perf_enable();

spin_unlock_irqrestore(&ctx->lock, flags);

@@ -991,7 +1012,6 @@ int perf_counter_task_enable(void)


struct perf_counter_context *ctx = &curr->perf_counter_ctx;

struct perf_counter *counter;
unsigned long flags;
- u64 perf_flags;
int cpu;

if (likely(!ctx->nr_counters))
@@ -1007,7 +1027,7 @@ int perf_counter_task_enable(void)
/*
* Disable all the counters:
*/
- perf_flags = hw_perf_save_disable();
+ perf_disable();

list_for_each_entry(counter, &ctx->counter_list, list_entry) {
if (counter->state > PERF_COUNTER_STATE_OFF)
@@ -1017,7 +1037,7 @@ int perf_counter_task_enable(void)
ctx->time - counter->total_time_enabled;
counter->hw_event.disabled = 0;
}
- hw_perf_restore(perf_flags);
+ perf_enable();

spin_unlock(&ctx->lock);

@@ -1034,7 +1054,6 @@ int perf_counter_task_enable(void)
static void rotate_ctx(struct perf_counter_context *ctx)
{
struct perf_counter *counter;
- u64 perf_flags;

if (!ctx->nr_counters)
return;
@@ -1043,12 +1062,12 @@ static void rotate_ctx(struct perf_counter_context *ctx)
/*
* Rotate the first entry last (works just fine for group counters too):
*/
- perf_flags = hw_perf_save_disable();
+ perf_disable();
list_for_each_entry(counter, &ctx->counter_list, list_entry) {
list_move_tail(&counter->list_entry, &ctx->counter_list);
break;
}
- hw_perf_restore(perf_flags);
+ perf_enable();

spin_unlock(&ctx->lock);
}
@@ -3194,7 +3213,6 @@ __perf_counter_exit_task(struct task_struct *child,
} else {
struct perf_cpu_context *cpuctx;
unsigned long flags;
- u64 perf_flags;

/*
* Disable and unlink this counter.
@@ -3203,7 +3221,7 @@ __perf_counter_exit_task(struct task_struct *child,
* could still be processing it:
*/
local_irq_save(flags);
- perf_flags = hw_perf_save_disable();
+ perf_disable();

cpuctx = &__get_cpu_var(perf_cpu_context);

@@ -3214,7 +3232,7 @@ __perf_counter_exit_task(struct task_struct *child,

child_ctx->nr_counters--;

- hw_perf_restore(perf_flags);
+ perf_enable();
local_irq_restore(flags);

tip-bot for Ingo Molnar

unread,
May 15, 2009, 4:50:17 AM5/15/09
to
Commit-ID: 251e8e3c7235f5944805a64f24c79fc4696793f1
Gitweb: http://git.kernel.org/tip/251e8e3c7235f5944805a64f24c79fc4696793f1
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Thu, 14 May 2009 05:16:59 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:47:07 +0200

perf_counter: Remove ACPI quirk

We had a disable/enable around acpi_idle_do_entry() due to an erratum
in an early prototype CPU i had access to. That erratum has been fixed
in the BIOS so remove the quirk.

The quirk also kept us from profiling interrupts that hit the ACPI idle
instruction - so this is an improvement as well, beyond a cleanup and
a micro-optimization.

[ Impact: improve profiling scope, cleanup, micro-optimization ]

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
drivers/acpi/processor_idle.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 9645758..f7ca8c5 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -765,7 +765,6 @@ static inline void acpi_idle_do_entry(struct acpi_processor_cx *cx)
{


/* Don't trace irqs off for idle */
stop_critical_timings();

- perf_disable();


if (cx->entry_method == ACPI_CSTATE_FFH) {
/* Call into architectural FFH based C-state */
acpi_processor_ffh_cstate_enter(cx);

@@ -780,7 +779,6 @@ static inline void acpi_idle_do_entry(struct acpi_processor_cx *cx)


gets asserted in time to freeze execution properly. */
unused = inl(acpi_gbl_FADT.xpm_timer_block.address);
}

- perf_enable();
start_critical_timings();

tip-bot for Peter Zijlstra

unread,
May 15, 2009, 4:50:19 AM5/15/09
to
Commit-ID: 53020fe81eecd0b7be295868ce5850ef8f41074e
Gitweb: http://git.kernel.org/tip/53020fe81eecd0b7be295868ce5850ef8f41074e
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Wed, 13 May 2009 21:26:19 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:46:59 +0200

perf_counter: Fix perf_output_copy() WARN to account for overflow

The simple reservation test in perf_output_copy() failed to take
unsigned int overflow into account, fix this.

[ Impact: fix false positive warning with more than 4GB of profiling data ]

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/perf_counter.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index ff166c1..985be0b 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1927,7 +1927,11 @@ static void perf_output_copy(struct perf_output_handle *handle,

handle->offset = offset;

- WARN_ON_ONCE(handle->offset > handle->head);
+ /*
+ * Check we didn't copy past our reservation window, taking the
+ * possible unsigned int wrap into account.
+ */
+ WARN_ON_ONCE(((int)(handle->head - handle->offset)) < 0);
}

#define perf_output_put(handle, x) \

tip-bot for Peter Zijlstra

unread,
May 15, 2009, 4:50:20 AM5/15/09
to
Commit-ID: a4016a79fcbd139e7378944c0d86a39fdbc70ecc
Gitweb: http://git.kernel.org/tip/a4016a79fcbd139e7378944c0d86a39fdbc70ecc
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Thu, 14 May 2009 14:52:17 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:47:03 +0200

perf_counter: x86: Robustify interrupt handling

Two consecutive NMIs could daze and confuse the machine when the
first would handle the overflow of both counters.

[ Impact: fix false-positive syslog messages under multi-session profiling ]

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 16 +++++++++++++---
1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 313638c..1dcf670 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -783,6 +783,10 @@ static int amd_pmu_handle_irq(struct pt_regs *regs, int nmi)

counter = cpuc->counters[idx];
hwc = &counter->hw;
+
+ if (counter->hw_event.nmi != nmi)
+ goto next;
+
val = x86_perf_counter_update(counter, hwc, idx);
if (val & (1ULL << (x86_pmu.counter_bits - 1)))
goto next;
@@ -869,7 +873,6 @@ perf_counter_nmi_handler(struct notifier_block *self,
{
struct die_args *args = __args;
struct pt_regs *regs;
- int ret;

if (!atomic_read(&active_counters))
return NOTIFY_DONE;
@@ -886,9 +889,16 @@ perf_counter_nmi_handler(struct notifier_block *self,
regs = args->regs;

apic_write(APIC_LVTPC, APIC_DM_NMI);
- ret = x86_pmu.handle_irq(regs, 1);
+ /*
+ * Can't rely on the handled return value to say it was our NMI, two
+ * counters could trigger 'simultaneously' raising two back-to-back NMIs.
+ *
+ * If the first NMI handles both, the latter will be empty and daze
+ * the CPU.
+ */
+ x86_pmu.handle_irq(regs, 1);

- return ret ? NOTIFY_STOP : NOTIFY_OK;
+ return NOTIFY_STOP;


}

static __read_mostly struct notifier_block perf_counter_nmi_notifier = {

tip-bot for Ingo Molnar

unread,
May 15, 2009, 4:50:21 AM5/15/09
to
Commit-ID: f5a5a2f6e69e88647ae12da39f0ff3a510bcf0a6
Gitweb: http://git.kernel.org/tip/f5a5a2f6e69e88647ae12da39f0ff3a510bcf0a6
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Wed, 13 May 2009 12:54:01 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:46:56 +0200

perf_counter: x86: Fix throttling

If counters are disabled globally when a perfcounter IRQ/NMI hits,
and if we throttle in that case, we'll promote the '0' value to
the next lapic IRQ and disable all perfcounters at that point,
permanently ...

Fix it.

[ Impact: fix hung perfcounters under load ]

Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 20 +++++++++++++++-----
1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 3a92a2b..88ae8ce 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -765,8 +765,13 @@ out:
/*


* Restore - do not reenable when global enable is off or throttled:

*/


- if (++cpuc->interrupts < PERFMON_MAX_INTERRUPTS)
- intel_pmu_restore_all(cpuc->throttle_ctrl);

+ if (cpuc->throttle_ctrl) {
+ if (++cpuc->interrupts < PERFMON_MAX_INTERRUPTS) {
+ intel_pmu_restore_all(cpuc->throttle_ctrl);
+ } else {
+ pr_info("CPU#%d: perfcounters: max interrupt rate exceeded! Throttle on.\n", smp_processor_id());
+ }
+ }

return ret;
}
@@ -817,11 +822,16 @@ void perf_counter_unthrottle(void)



cpuc = &__get_cpu_var(cpu_hw_counters);
if (cpuc->interrupts >= PERFMON_MAX_INTERRUPTS) {

- if (printk_ratelimit())
- printk(KERN_WARNING "perfcounters: max interrupts exceeded!\n");
+ pr_info("CPU#%d: perfcounters: throttle off.\n", smp_processor_id());
+
+ /*
+ * Clear them before re-enabling irqs/NMIs again:
+ */
+ cpuc->interrupts = 0;
hw_perf_restore(cpuc->throttle_ctrl);
+ } else {
+ cpuc->interrupts = 0;
}
- cpuc->interrupts = 0;
}

void smp_perf_counter_interrupt(struct pt_regs *regs)

tip-bot for Peter Zijlstra

unread,
May 15, 2009, 4:50:22 AM5/15/09
to
Commit-ID: ec3232bdf8518bea8410f0027f870b24d3aa8753
Gitweb: http://git.kernel.org/tip/ec3232bdf8518bea8410f0027f870b24d3aa8753
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Wed, 13 May 2009 09:45:19 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:46:54 +0200

perf_counter: x86: More accurate counter update

Take the counter width into account instead of assuming 32 bits.

In particular Nehalem has 44 bit wide counters, and all
arithmetics should happen on a 44-bit signed integer basis.

[ Impact: fix rare event imprecision, warning message on Nehalem ]

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index f7772ff..3a92a2b 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -138,7 +138,9 @@ static u64
x86_perf_counter_update(struct perf_counter *counter,
struct hw_perf_counter *hwc, int idx)
{
- u64 prev_raw_count, new_raw_count, delta;
+ int shift = 64 - x86_pmu.counter_bits;
+ u64 prev_raw_count, new_raw_count;
+ s64 delta;

/*
* Careful: an NMI might modify the previous counter value.
@@ -161,9 +163,10 @@ again:
* (counter-)time and add that to the generic counter.
*
* Careful, not all hw sign-extends above the physical width
- * of the count, so we do that by clipping the delta to 32 bits:
+ * of the count.
*/
- delta = (u64)(u32)((s32)new_raw_count - (s32)prev_raw_count);
+ delta = (new_raw_count << shift) - (prev_raw_count << shift);
+ delta >>= shift;

atomic64_add(delta, &counter->count);
atomic64_sub(delta, &hwc->period_left);

tip-bot for Peter Zijlstra

unread,
May 15, 2009, 4:50:23 AM5/15/09
to
Commit-ID: 962bf7a66edca4d36a730a38ff8410a67f560e40
Gitweb: http://git.kernel.org/tip/962bf7a66edca4d36a730a38ff8410a67f560e40
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Wed, 13 May 2009 13:21:36 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 09:47:01 +0200

perf_counter: x86: Fix up the amd NMI/INT throttle

perf_counter_unthrottle() restores throttle_ctrl, buts its never set.
Also, we fail to disable all counters when throttling.

[ Impact: fix rare stuck perf-counters when they are throttled ]

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 38 ++++++++++++++++++++++++-----------
1 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index c19e927..7601c01 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -334,6 +334,8 @@ static u64 amd_pmu_save_disable_all(void)


* right thing.
*/
barrier();

+ if (!enabled)
+ goto out;



for (idx = 0; idx < x86_pmu.num_counters; idx++) {
u64 val;

@@ -347,6 +349,7 @@ static u64 amd_pmu_save_disable_all(void)
wrmsrl(MSR_K7_EVNTSEL0 + idx, val);
}

+out:
return enabled;
}

@@ -787,32 +790,43 @@ static int amd_pmu_handle_irq(struct pt_regs *regs, int nmi)
int handled = 0;


struct perf_counter *counter;
struct hw_perf_counter *hwc;

- int idx;
+ int idx, throttle = 0;
+
+ cpuc->throttle_ctrl = cpuc->enabled;


+ cpuc->enabled = 0;
+ barrier();

+
+ if (cpuc->throttle_ctrl) {
+ if (++cpuc->interrupts >= PERFMON_MAX_INTERRUPTS)


+ throttle = 1;
+ }

- ++cpuc->interrupts;


for (idx = 0; idx < x86_pmu.num_counters; idx++) {

+ int disable = 0;
+
if (!test_bit(idx, cpuc->active_mask))
continue;
+


counter = cpuc->counters[idx];
hwc = &counter->hw;

val = x86_perf_counter_update(counter, hwc, idx);
if (val & (1ULL << (x86_pmu.counter_bits - 1)))

- continue;
+ goto next;
+
/* counter overflow */
x86_perf_counter_set_period(counter, hwc, idx);
handled = 1;
inc_irq_stat(apic_perf_irqs);
- if (perf_counter_overflow(counter, nmi, regs, 0))
- amd_pmu_disable_counter(hwc, idx);
- else if (cpuc->interrupts >= PERFMON_MAX_INTERRUPTS)
- /*
- * do not reenable when throttled, but reload
- * the register
- */
+ disable = perf_counter_overflow(counter, nmi, regs, 0);
+
+next:
+ if (disable || throttle)
amd_pmu_disable_counter(hwc, idx);
- else if (counter->state == PERF_COUNTER_STATE_ACTIVE)
- amd_pmu_enable_counter(hwc, idx);
}
+
+ if (cpuc->throttle_ctrl && !throttle)


+ cpuc->enabled = 1;
+

return handled;

tip-bot for Ingo Molnar

unread,
May 15, 2009, 6:20:17 AM5/15/09
to
Commit-ID: 58d7e993b16b62d30f8ef27757614056fe4def11
Gitweb: http://git.kernel.org/tip/58d7e993b16b62d30f8ef27757614056fe4def11
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Fri, 15 May 2009 11:03:23 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 15 May 2009 12:09:54 +0200

perf stat: handle Ctrl-C

Before this change, if a long-running perf stat workload was Ctrl-C-ed,
the utility exited without displaying statistics.

After the change, the Ctrl-C gets propagated into the workload (and
causes its early exit there), but perf stat itself will still continue
to run and will display counter results.

This is useful to run open-ended workloads, let them run for
a while, then Ctrl-C them to get the stats.

[ Impact: extend perf stat with new functionality ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>

Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-stat.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/Documentation/perf_counter/builtin-stat.c b/Documentation/perf_counter/builtin-stat.c
index cf575c3..03518d7 100644
--- a/Documentation/perf_counter/builtin-stat.c
+++ b/Documentation/perf_counter/builtin-stat.c
@@ -538,8 +538,14 @@ static void process_options(int argc, char **argv)
}
}

+static void skip_signal(int signo)
+{
+}
+
int cmd_stat(int argc, char **argv, const char *prefix)
{
+ sigset_t blocked;
+
page_size = sysconf(_SC_PAGE_SIZE);

process_options(argc, argv);
@@ -548,5 +554,15 @@ int cmd_stat(int argc, char **argv, const char *prefix)
assert(nr_cpus <= MAX_NR_CPUS);
assert(nr_cpus >= 0);

+ /*
+ * We dont want to block the signals - that would cause
+ * child tasks to inherit that and Ctrl-C would not work.
+ * What we want is for Ctrl-C to work in the exec()-ed
+ * task, but being ignored by perf stat itself:
+ */
+ signal(SIGINT, skip_signal);
+ signal(SIGALRM, skip_signal);
+ signal(SIGABRT, skip_signal);
+
return do_perfstat(argc, argv);

Paul Mackerras

unread,
May 15, 2009, 7:10:16 AM5/15/09
to
tip-bot for Peter Zijlstra writes:

> x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore
> provide a reference counter disable/enable interface, where the first disable
> disables the hardware, and the last enable enables the hardware again.

It looks to me like what you've done for powerpc enables the hardware
again on the first enable, not the last one:

> @@ -436,7 +435,7 @@ u64 hw_perf_save_disable(void)
> * If we were previously disabled and counters were added, then
> * put the new config on the PMU.
> */
> -void hw_perf_restore(u64 disable)
> +void hw_perf_enable(void)
> {
> struct perf_counter *counter;
> struct cpu_hw_counters *cpuhw;
> @@ -448,9 +447,12 @@ void hw_perf_restore(u64 disable)
> int n_lim;
> int idx;
>
> - if (disable)
> - return;
> local_irq_save(flags);
> + if (!cpuhw->disabled) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> cpuhw = &__get_cpu_var(cpu_hw_counters);
> cpuhw->disabled = 0;

I do rely on nesting the disable/enable calls and only having the
hardware re-enabled on the last enable. I can't see anything here
that detects the last enable. Have I missed it somewhere?

Paul.

Peter Zijlstra

unread,
May 15, 2009, 7:30:12 AM5/15/09
to

+void perf_disable(void)


+{
+ __perf_disable();
+ hw_perf_disable();
+}

+void perf_enable(void)


+{
+ if (__perf_enable())
+ hw_perf_enable();
+}

--

tip-bot for Ingo Molnar

unread,
May 18, 2009, 3:50:14 AM5/18/09
to
Commit-ID: b68f1d2e7aa21029d73c7d453a8046e95d351740
Gitweb: http://git.kernel.org/tip/b68f1d2e7aa21029d73c7d453a8046e95d351740
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Sun, 17 May 2009 19:37:25 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 18 May 2009 09:37:09 +0200

perf_counter, x86: speed up the scheduling fast-path

We have to set up the LVT entry only at counter init time, not at
every switch-in time.

There's friction between NMI and non-NMI use here - we'll probably
remove the per counter configurability of it - but until then, dont
slow down things ...

[ Impact: micro-optimization ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Srivatsa Vaddagiri <va...@in.ibm.com>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: Marcelo Tosatti <mtos...@redhat.com>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 5bfd30a..c109819 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -285,6 +285,7 @@ static int __hw_perf_counter_init(struct perf_counter *counter)


return -EACCES;
hwc->nmi = 1;
}

+ perf_counters_lapic_init(hwc->nmi);

if (!hwc->irq_period)
hwc->irq_period = x86_pmu.max_period;
@@ -603,8 +604,6 @@ try_generic:
hwc->counter_base = x86_pmu.perfctr;
}

- perf_counters_lapic_init(hwc->nmi);
-
x86_pmu.disable(hwc, idx);

cpuc->counters[idx] = counter;
@@ -1054,7 +1053,7 @@ void __init init_hw_perf_counters(void)

pr_info("... counter mask: %016Lx\n", perf_counter_mask);

- perf_counters_lapic_init(0);
+ perf_counters_lapic_init(1);
register_die_notifier(&perf_counter_nmi_notifier);

tip-bot for Ingo Molnar

unread,
May 20, 2009, 2:20:07 PM5/20/09
to
Commit-ID: 34adc8062227f41b04ade0ff3fbd1dbe3002669e
Gitweb: http://git.kernel.org/tip/34adc8062227f41b04ade0ff3fbd1dbe3002669e
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Wed, 20 May 2009 20:13:28 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 20 May 2009 20:12:54 +0200

perf_counter: Fix context removal deadlock

Disable the PMU globally before removing a counter from a
context. This fixes the following lockup:

[22081.741922] ------------[ cut here ]------------
[22081.746668] WARNING: at arch/x86/kernel/cpu/perf_counter.c:803 intel_pmu_handle_irq+0x9b/0x24e()
[22081.755624] Hardware name: X8DTN
[22081.758903] perfcounters: irq loop stuck!
[22081.762985] Modules linked in:
[22081.766136] Pid: 11082, comm: perf Not tainted 2.6.30-rc6-tip #226
[22081.772432] Call Trace:
[22081.774940] <NMI> [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
[22081.781993] [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
[22081.788368] [<ffffffff8104505c>] ? warn_slowpath_common+0x77/0xa3
[22081.794649] [<ffffffff810450d3>] ? warn_slowpath_fmt+0x40/0x45
[22081.800696] [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
[22081.807080] [<ffffffff814d1a72>] ? perf_counter_nmi_handler+0x3f/0x4a
[22081.813751] [<ffffffff814d2d09>] ? notifier_call_chain+0x58/0x86
[22081.819951] [<ffffffff8105b250>] ? notify_die+0x2d/0x32
[22081.825392] [<ffffffff814d1414>] ? do_nmi+0x8e/0x242
[22081.830538] [<ffffffff814d0f0a>] ? nmi+0x1a/0x20
[22081.835342] [<ffffffff8117e102>] ? selinux_file_free_security+0x0/0x1a
[22081.842105] [<ffffffff81018793>] ? x86_pmu_disable_counter+0x15/0x41
[22081.848673] <<EOE>> [<ffffffff81018f3d>] ? x86_pmu_disable+0x86/0x103
[22081.855512] [<ffffffff8108fedd>] ? __perf_counter_remove_from_context+0x0/0xfe
[22081.862926] [<ffffffff8108fcbc>] ? counter_sched_out+0x30/0xce
[22081.868909] [<ffffffff8108ff36>] ? __perf_counter_remove_from_context+0x59/0xfe
[22081.876382] [<ffffffff8106808a>] ? smp_call_function_single+0x6c/0xe6
[22081.882955] [<ffffffff81091b96>] ? perf_release+0x86/0x14c
[22081.888600] [<ffffffff810c4c84>] ? __fput+0xe7/0x195
[22081.893718] [<ffffffff810c213e>] ? filp_close+0x5b/0x62
[22081.899107] [<ffffffff81046a70>] ? put_files_struct+0x64/0xc2
[22081.905031] [<ffffffff8104841a>] ? do_exit+0x1e2/0x6ef
[22081.910360] [<ffffffff814d0a60>] ? _spin_lock_irqsave+0x9/0xe
[22081.916292] [<ffffffff8104898e>] ? do_group_exit+0x67/0x93
[22081.921953] [<ffffffff810489cc>] ? sys_exit_group+0x12/0x16
[22081.927759] [<ffffffff8100baab>] ? system_call_fastpath+0x16/0x1b
[22081.934076] ---[ end trace 3a3936ce3e1b4505 ]---

And could potentially also fix the lockup reported by Marcelo Tosatti.

Also, print more debug info in case of a detected lockup.

[ Impact: fix lockup ]

Reported-by: Marcelo Tosatti <mtos...@redhat.com>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>

Cc: Thomas Gleixner <tg...@linutronix.de>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 1 +
kernel/perf_counter.c | 12 ++++++------
2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index c109819..6cc1660 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -740,6 +740,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs, int nmi)
again:
if (++loops > 100) {


WARN_ONCE(1, "perfcounters: irq loop stuck!\n");

+ perf_counter_print_debug();
return 1;
}

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 69d4de8..08584c1 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -208,18 +208,17 @@ static void __perf_counter_remove_from_context(void *info)
return;

spin_lock_irqsave(&ctx->lock, flags);
+ /*
+ * Protect the list operation against NMI by disabling the
+ * counters on a global level.
+ */
+ perf_disable();

counter_sched_out(counter, cpuctx, ctx);

counter->task = NULL;

- /*
- * Protect the list operation against NMI by disabling the
- * counters on a global level. NOP for non NMI based counters.
- */
- perf_disable();
list_del_counter(counter, ctx);
- perf_enable();

if (!ctx->task) {
/*
@@ -231,6 +230,7 @@ static void __perf_counter_remove_from_context(void *info)
perf_max_counters - perf_reserved_percpu);


}

+ perf_enable();
spin_unlock_irqrestore(&ctx->lock, flags);
}

tip-bot for Ingo Molnar

unread,
May 22, 2009, 12:30:20 PM5/22/09
to
Commit-ID: c6eb13847ba081552d2af644219bddeff7110caf
Gitweb: http://git.kernel.org/tip/c6eb13847ba081552d2af644219bddeff7110caf
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Fri, 22 May 2009 18:18:28 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 22 May 2009 18:18:28 +0200

perf_counter tools: increase limits

I tried to run with 300 active counters and the tools bailed out
because our limit was at 64. So increase the counter limit to 1024
and the CPU limit to 4096.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>

Cc: Marcelo Tosatti <mtos...@redhat.com>


Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: Thomas Gleixner <tg...@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/perf.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/perf_counter/perf.h b/Documentation/perf_counter/perf.h
index 6fa3656..81a7374 100644
--- a/Documentation/perf_counter/perf.h
+++ b/Documentation/perf_counter/perf.h
@@ -54,8 +54,8 @@ sys_perf_counter_open(struct perf_counter_hw_event *hw_event_uptr,
group_fd, flags);
}

-#define MAX_COUNTERS 64
-#define MAX_NR_CPUS 256
+#define MAX_COUNTERS 1024
+#define MAX_NR_CPUS 4096

#define EID(type, id) (((__u64)(type) << PERF_COUNTER_TYPE_SHIFT) | (id))

tip-bot for Mike Galbraith

unread,
May 24, 2009, 3:10:13 AM5/24/09
to
Commit-ID: c2990a2a582d73562d4dcf2502c39892a19a691d
Gitweb: http://git.kernel.org/tip/c2990a2a582d73562d4dcf2502c39892a19a691d
Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Sun, 24 May 2009 08:35:49 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Sun, 24 May 2009 08:57:08 +0200

perf top: fix segfault

c6eb13 increased stack usage such that perf-top now croaks on startup.

Take event_array and mmap_array off the stack to prevent segfault on boxen
with smallish ulimit -s setting.

Signed-off-by: Mike Galbraith <efa...@gmx.de>


Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>

Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: John Kacur <jka...@redhat.com>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-top.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/perf_counter/builtin-top.c b/Documentation/perf_counter/builtin-top.c
index a3216a6..74021ac 100644
--- a/Documentation/perf_counter/builtin-top.c
+++ b/Documentation/perf_counter/builtin-top.c
@@ -1035,10 +1035,11 @@ static void mmap_read(struct mmap_data *md)
md->prev = old;
}

+static struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS];
+static struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
+
int cmd_top(int argc, char **argv, const char *prefix)
{
- struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS];
- struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS];
struct perf_counter_hw_event hw_event;
pthread_t thread;
int i, counter, group_fd, nr_poll = 0;

tip-bot for Ingo Molnar

unread,
May 24, 2009, 11:50:06 PM5/24/09
to
Commit-ID: a3862d3f814ce7dfca9eed56ac23d29db3aee8d5
Gitweb: http://git.kernel.org/tip/a3862d3f814ce7dfca9eed56ac23d29db3aee8d5
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Sun, 24 May 2009 09:02:37 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Sun, 24 May 2009 09:02:37 +0200

perf_counter: Increase mmap limit

In a default 'perf top' run the tool will create a counter for
each online CPU. With enough CPUs this will eventually exhaust
the default limit.

So scale it up with the number of online CPUs.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>

Cc: Mike Galbraith <efa...@gmx.de>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/perf_counter.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index cb40625..6cdf824 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1704,6 +1704,12 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)

user_extra = nr_pages + 1;
user_lock_limit = sysctl_perf_counter_mlock >> (PAGE_SHIFT - 10);
+
+ /*
+ * Increase the limit linearly with more CPUs:
+ */
+ user_lock_limit *= num_online_cpus();
+
user_locked = atomic_long_read(&user->locked_vm) + user_extra;

extra = 0;

tip-bot for Ingo Molnar

unread,
May 25, 2009, 4:10:16 AM5/25/09
to
Commit-ID: 85a9f9200226ddffc2ea50dae6a8df04c033ecd4
Gitweb: http://git.kernel.org/tip/85a9f9200226ddffc2ea50dae6a8df04c033ecd4
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Mon, 25 May 2009 09:59:50 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 25 May 2009 09:59:50 +0200

perf_counter tools: increase limits, fix

NR_CPUS and NR_COUNTERS goes up quadratic ... 1024x4096 was far
too ambitious upper limit - go for 256x256 which is still plenty.

[ Impact: reduce perf tool memory consumption ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
Cc: Mike Galbraith <efa...@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/perf.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/perf_counter/perf.h b/Documentation/perf_counter/perf.h
index a517683..5a2520b 100644
--- a/Documentation/perf_counter/perf.h
+++ b/Documentation/perf_counter/perf.h
@@ -61,8 +61,8 @@ sys_perf_counter_open(struct perf_counter_hw_event *hw_event_uptr,
group_fd, flags);
}

-#define MAX_COUNTERS 1024
-#define MAX_NR_CPUS 4096
+#define MAX_COUNTERS 256
+#define MAX_NR_CPUS 256



#define EID(type, id) (((__u64)(type) << PERF_COUNTER_TYPE_SHIFT) | (id))

tip-bot for Ingo Molnar

unread,
May 25, 2009, 7:10:18 AM5/25/09
to
Commit-ID: e4cbb4e3ac8b09fdb11e39e5a5611bfab0a7cd1a
Gitweb: http://git.kernel.org/tip/e4cbb4e3ac8b09fdb11e39e5a5611bfab0a7cd1a
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 19 May 2009 15:50:30 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 25 May 2009 13:05:06 +0200

perf_counter: Move child perfcounter init to after scheduler init

Initialize a task's perfcounters (inherit from parent, etc.) after
the child task's scheduler fields have been initialized already.

[ Impact: cleanup ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
Cc: Mike Galbraith <efa...@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/fork.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index e72a09f..675e01e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -984,7 +984,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
goto fork_out;

rt_mutex_init_task(p);
- perf_counter_init_task(p);

#ifdef CONFIG_PROVE_LOCKING
DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled);
@@ -1096,6 +1095,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,

/* Perform scheduler related setup. Assign this task to a CPU. */
sched_fork(p, clone_flags);
+ perf_counter_init_task(p);

if ((retval = audit_alloc(p)))
goto bad_fork_cleanup_policy;

tip-bot for Mike Galbraith

unread,
May 25, 2009, 7:10:23 AM5/25/09
to
Commit-ID: d94b943054721c346b0881865d645f000cd19880
Gitweb: http://git.kernel.org/tip/d94b943054721c346b0881865d645f000cd19880
Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Mon, 25 May 2009 09:57:56 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 25 May 2009 13:01:17 +0200

perf top: Reduce display overhead

Iterate over the symbol table once per display interval, and
copy/sort/tally/decay only those symbols which are active.

Before:

top - 10:14:53 up 4:08, 17 users, load average: 1.17, 1.53, 1.49
Tasks: 273 total, 5 running, 268 sleeping, 0 stopped, 0 zombie
Cpu(s): 6.9%us, 38.2%sy, 0.0%ni, 19.9%id, 0.0%wa, 0.0%hi, 35.0%si, 0.0%st

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
28504 root 20 0 1044 260 164 S 58 0.0 0:04.19 2 netserver
28499 root 20 0 1040 412 316 R 51 0.0 0:04.15 0 netperf
28500 root 20 0 1040 408 316 R 50 0.0 0:04.14 1 netperf
28503 root 20 0 1044 260 164 S 50 0.0 0:04.01 1 netserver
28501 root 20 0 1044 260 164 S 49 0.0 0:03.99 0 netserver
28502 root 20 0 1040 412 316 S 43 0.0 0:03.96 2 netperf
28468 root 20 0 1892m 325m 972 S 16 10.8 0:10.50 3 perf
28467 root 20 0 1892m 325m 972 R 2 10.8 0:00.72 3 perf

After:

top - 10:16:30 up 4:10, 17 users, load average: 2.27, 1.88, 1.62
Tasks: 273 total, 6 running, 267 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.5%us, 39.7%sy, 0.0%ni, 24.6%id, 0.0%wa, 0.0%hi, 33.3%si, 0.0%st

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
28590 root 20 0 1040 412 316 S 54 0.0 0:07.85 2 netperf
28589 root 20 0 1044 260 164 R 54 0.0 0:07.84 0 netserver
28588 root 20 0 1040 412 316 R 50 0.0 0:07.89 1 netperf
28591 root 20 0 1044 256 164 S 50 0.0 0:07.82 1 netserver
28587 root 20 0 1040 408 316 R 47 0.0 0:07.61 0 netperf
28592 root 20 0 1044 260 164 R 47 0.0 0:07.85 2 netserver
28378 root 20 0 8732 1300 860 R 2 0.0 0:01.81 3 top
28577 root 20 0 1892m 165m 972 R 2 5.5 0:00.48 3 perf
28578 root 20 0 1892m 165m 972 S 2 5.5 0:00.04 3 perf

[ Impact: optimization ]

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-top.c | 56 +++++++++++++++--------------
1 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/Documentation/perf_counter/builtin-top.c b/Documentation/perf_counter/builtin-top.c
index 74021ac..4bed265 100644
--- a/Documentation/perf_counter/builtin-top.c
+++ b/Documentation/perf_counter/builtin-top.c
@@ -374,18 +374,26 @@ static struct sym_entry tmp[MAX_SYMS];

static void print_sym_table(void)
{
- int i, printed;
+ int i, j, active_count, printed;
int counter;
float events_per_sec = events/delay_secs;
float kevents_per_sec = (events-userspace_events)/delay_secs;
float sum_kevents = 0.0;

events = userspace_events = 0;
- memcpy(tmp, sym_table, sizeof(sym_table[0])*sym_table_count);
- qsort(tmp, sym_table_count, sizeof(tmp[0]), compare);

- for (i = 0; i < sym_table_count && tmp[i].count[0]; i++)
- sum_kevents += tmp[i].count[0];
+ /* Iterate over symbol table and copy/tally/decay active symbols. */
+ for (i = 0, active_count = 0; i < sym_table_count; i++) {
+ if (sym_table[i].count[0]) {
+ tmp[active_count++] = sym_table[i];
+ sum_kevents += sym_table[i].count[0];
+
+ for (j = 0; j < nr_counters; j++)
+ sym_table[i].count[j] = zero ? 0 : sym_table[i].count[j] * 7 / 8;
+ }
+ }
+
+ qsort(tmp, active_count + 1, sizeof(tmp[0]), compare);

write(1, CONSOLE_CLEAR, strlen(CONSOLE_CLEAR));

@@ -433,29 +441,23 @@ static void print_sym_table(void)
" ______ ______ _____ ________________ _______________\n\n"
);

- for (i = 0, printed = 0; i < sym_table_count; i++) {
+ for (i = 0, printed = 0; i < active_count; i++) {
float pcnt;
- int count;

- if (printed <= 18 && tmp[i].count[0] >= count_filter) {
- pcnt = 100.0 - (100.0*((sum_kevents-tmp[i].count[0])/sum_kevents));
-
- if (nr_counters == 1)
- printf("%19.2f - %4.1f%% - %016llx : %s\n",
- sym_weight(tmp + i),
- pcnt, tmp[i].addr, tmp[i].sym);
- else
- printf("%8.1f %10ld - %4.1f%% - %016llx : %s\n",
- sym_weight(tmp + i),
- tmp[i].count[0],
- pcnt, tmp[i].addr, tmp[i].sym);
- printed++;
- }
- /*
- * Add decay to the counts:
- */
- for (count = 0; count < nr_counters; count++)
- sym_table[i].count[count] = zero ? 0 : sym_table[i].count[count] * 7 / 8;
+ if (++printed > 18 || tmp[i].count[0] < count_filter)
+ break;
+
+ pcnt = 100.0 - (100.0*((sum_kevents-tmp[i].count[0])/sum_kevents));
+
+ if (nr_counters == 1)
+ printf("%19.2f - %4.1f%% - %016llx : %s\n",
+ sym_weight(tmp + i),
+ pcnt, tmp[i].addr, tmp[i].sym);
+ else
+ printf("%8.1f %10ld - %4.1f%% - %016llx : %s\n",
+ sym_weight(tmp + i),
+ tmp[i].count[0],
+ pcnt, tmp[i].addr, tmp[i].sym);
}

if (sym_filter_entry)

tip-bot for Ingo Molnar

unread,
May 25, 2009, 8:50:14 AM5/25/09
to
Commit-ID: d3f4b3855ba87caff8f35e738c7e7e3bad0a6ab1
Gitweb: http://git.kernel.org/tip/d3f4b3855ba87caff8f35e738c7e7e3bad0a6ab1
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Mon, 25 May 2009 14:40:01 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Mon, 25 May 2009 14:40:01 +0200

perf stat: flip around ':k' and ':u' flags

This output:

$ perf stat -e 0:1:k -e 0:1:u ./hello
Performance counter stats for './hello':
140131 instructions (events)
1906968 instructions (events)

Is quite confusing - as :k means "user instructions", :u means
"kernel instructions".

Flip them around - as the 'exclude' property is not intuitive in
the flag naming.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
Cc: Mike Galbraith <efa...@gmx.de>

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-stat.c | 4 ++--


1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/perf_counter/builtin-stat.c b/Documentation/perf_counter/builtin-stat.c
index 8ae01d5..88c70be 100644
--- a/Documentation/perf_counter/builtin-stat.c
+++ b/Documentation/perf_counter/builtin-stat.c
@@ -266,9 +266,9 @@ static __u64 match_event_symbols(char *str)

switch (sscanf(str, "%d:%llu:%2s", &type, &id, mask_str)) {
case 3:
- if (strchr(mask_str, 'u'))
- event_mask[nr_counters] |= EVENT_MASK_USER;
if (strchr(mask_str, 'k'))
+ event_mask[nr_counters] |= EVENT_MASK_USER;
+ if (strchr(mask_str, 'u'))
event_mask[nr_counters] |= EVENT_MASK_KERNEL;
case 2:
return EID(type, id);

tip-bot for Ingo Molnar

unread,
May 26, 2009, 4:00:09 AM5/26/09
to
Commit-ID: 79202ba9ff8cf570a75596f42e011167734d1c4b
Gitweb: http://git.kernel.org/tip/79202ba9ff8cf570a75596f42e011167734d1c4b
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 08:10:00 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 09:49:28 +0200

perf_counter, x86: Fix APIC NMI programming

My Nehalem box locks up in certain situations (with an
always-asserted NMI causing a lockup) if the PMU LVT
entry is programmed between NMI and IRQ mode with a
high frequency.

Standardize exlusively on NMIs instead.

[ Impact: fix lockup ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>

Cc: Marcelo Tosatti <mtos...@redhat.com>


Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 16 +++-------------
1 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index 189bf9d..ece3813 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -285,14 +285,10 @@ static int __hw_perf_counter_init(struct perf_counter *counter)
hwc->config |= ARCH_PERFMON_EVENTSEL_OS;

/*
- * If privileged enough, allow NMI events:
+ * Use NMI events all the time:
*/
- hwc->nmi = 0;
- if (hw_event->nmi) {
- if (sysctl_perf_counter_priv && !capable(CAP_SYS_ADMIN))
- return -EACCES;
- hwc->nmi = 1;
- }
+ hwc->nmi = 1;
+ hw_event->nmi = 1;



if (!hwc->irq_period)
hwc->irq_period = x86_pmu.max_period;

@@ -553,9 +549,6 @@ fixed_mode_idx(struct perf_counter *counter, struct hw_perf_counter *hwc)
if (!x86_pmu.num_counters_fixed)
return -1;

- if (unlikely(hwc->nmi))
- return -1;
-
event = hwc->config & ARCH_PERFMON_EVENT_MASK;

if (unlikely(event == x86_pmu.event_map(PERF_COUNT_INSTRUCTIONS)))
@@ -806,9 +799,6 @@ static int amd_pmu_handle_irq(struct pt_regs *regs, int nmi)


counter = cpuc->counters[idx];
hwc = &counter->hw;

- if (counter->hw_event.nmi != nmi)
- continue;
-


val = x86_perf_counter_update(counter, hwc, idx);
if (val & (1ULL << (x86_pmu.counter_bits - 1)))

continue;

tip-bot for Ingo Molnar

unread,
May 26, 2009, 4:00:13 AM5/26/09
to
Commit-ID: 329d876d6fd326109f191ae0fb2798b8834fb70b
Gitweb: http://git.kernel.org/tip/329d876d6fd326109f191ae0fb2798b8834fb70b

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 08:10:00 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 09:54:13 +0200

perf_counter: Initialize ->oncpu properly

This shouldnt matter normally (and i have not seen any
misbehavior), because active counters always have a
proper ->oncpu value - but nevertheless initialize the
field properly to -1.

[ Impact: cleanup ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/perf_counter.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 070f92d..367299f 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -3122,6 +3122,8 @@ perf_counter_alloc(struct perf_counter_hw_event *hw_event,
counter->group_leader = group_leader;
counter->pmu = NULL;
counter->ctx = ctx;
+ counter->oncpu = -1;
+
get_ctx(ctx);

counter->state = PERF_COUNTER_STATE_INACTIVE;

tip-bot for Ingo Molnar

unread,
May 26, 2009, 4:00:12 AM5/26/09
to
Commit-ID: aaba98018b8295dfa2119345d17f833d74448cd0
Gitweb: http://git.kernel.org/tip/aaba98018b8295dfa2119345d17f833d74448cd0

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 08:10:00 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 09:52:03 +0200

perf_counter, x86: Make NMI lockups more robust

We have a debug check that detects stuck NMIs and returns with
the PMU disabled in the global ctrl MSR - but i managed to trigger
a situation where this was not enough to deassert the NMI.

So clear/reset the full PMU and keep the disable count balanced when
exiting from here. This way the box produces a debug warning but
stays up and is more debuggable.

[ Impact: in case of PMU related bugs, recover more gracefully ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
arch/x86/kernel/cpu/perf_counter.c | 26 ++++++++++++++++++++++++++
1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_counter.c b/arch/x86/kernel/cpu/perf_counter.c
index ece3813..2eeaa99 100644
--- a/arch/x86/kernel/cpu/perf_counter.c
+++ b/arch/x86/kernel/cpu/perf_counter.c
@@ -724,6 +724,30 @@ static void intel_pmu_save_and_restart(struct perf_counter *counter)
intel_pmu_enable_counter(hwc, idx);
}

+static void intel_pmu_reset(void)
+{
+ unsigned long flags;
+ int idx;
+
+ if (!x86_pmu.num_counters)
+ return;
+
+ local_irq_save(flags);
+
+ printk("clearing PMU state on CPU#%d\n", smp_processor_id());
+
+ for (idx = 0; idx < x86_pmu.num_counters; idx++) {
+ checking_wrmsrl(x86_pmu.eventsel + idx, 0ull);
+ checking_wrmsrl(x86_pmu.perfctr + idx, 0ull);
+ }
+ for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
+ checking_wrmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, 0ull);
+ }
+
+ local_irq_restore(flags);
+}
+
+
/*
* This handler is triggered by the local APIC, so the APIC IRQ handling
* rules apply:
@@ -750,6 +774,8 @@ again:


if (++loops > 100) {
WARN_ONCE(1, "perfcounters: irq loop stuck!\n");

perf_counter_print_debug();
+ intel_pmu_reset();
+ perf_enable();
return 1;

tip-bot for Ingo Molnar

unread,
May 26, 2009, 6:40:12 AM5/26/09
to
Commit-ID: 4e97ddf09ee3ce715fc334399bae4cc0c0a13057
Gitweb: http://git.kernel.org/tip/4e97ddf09ee3ce715fc334399bae4cc0c0a13057
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 10:07:44 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 10:08:19 +0200

perf stat: Remove unused variable

[ Impact: cleanup ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-stat.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/Documentation/perf_counter/builtin-stat.c b/Documentation/perf_counter/builtin-stat.c
index 88c70be..c1053d8 100644
--- a/Documentation/perf_counter/builtin-stat.c
+++ b/Documentation/perf_counter/builtin-stat.c
@@ -541,8 +541,6 @@ static void skip_signal(int signo)

int cmd_stat(int argc, char **argv, const char *prefix)
{
- sigset_t blocked;
-


page_size = sysconf(_SC_PAGE_SIZE);

process_options(argc, argv);

tip-bot for Ingo Molnar

unread,
May 26, 2009, 6:40:11 AM5/26/09
to
Commit-ID: 0e9b20b8a1cab6c6ab4f98f917a2d98783103969
Gitweb: http://git.kernel.org/tip/0e9b20b8a1cab6c6ab4f98f917a2d98783103969
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 09:17:18 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 11:26:32 +0200

perf record: Convert to Git option parsing

Remove getopt usage and use Git's much more advanced and more compact
command option library.

Git's library (util/parse-options.[ch]) constructs help texts and
error messages automatically, and has a number of other convenience
features as well.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-record.c | 372 +++++++++++++--------------
Documentation/perf_counter/builtin-top.c | 3 +
2 files changed, 177 insertions(+), 198 deletions(-)

diff --git a/Documentation/perf_counter/builtin-record.c b/Documentation/perf_counter/builtin-record.c
index f225efa..f12a782 100644
--- a/Documentation/perf_counter/builtin-record.c
+++ b/Documentation/perf_counter/builtin-record.c
@@ -2,6 +2,8 @@

#include "perf.h"
#include "util/util.h"
+#include "util/parse-options.h"
+#include "util/exec_cmd.h"

#include <sys/types.h>
#include <sys/stat.h>
@@ -11,7 +13,6 @@
#include <stdlib.h>
#include <string.h>
#include <limits.h>
-#include <getopt.h>
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
@@ -33,8 +34,8 @@



-#define ALIGN(x,a) __ALIGN_MASK(x,(typeof(x))(a)-1)
-#define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask))
+#define ALIGN(x, a) __ALIGN_MASK(x, (typeof(x))(a)-1)
+#define __ALIGN_MASK(x, mask) (((x)+(mask))&~(mask))

static int nr_counters = 0;
static __u64 event_id[MAX_COUNTERS] = { };
@@ -45,7 +46,7 @@ static int nr_cpus = 0;
static unsigned int page_size;
static unsigned int mmap_pages = 16;
static int output;
-static char *output_name = "output.perf";
+static const char *output_name = "output.perf";
static int group = 0;
static unsigned int realtime_prio = 0;
static int system_wide = 0;
@@ -62,192 +63,6 @@ const unsigned int default_count[] = {
10000,
};

-struct event_symbol {
- __u64 event;
- char *symbol;
-};
-
-static struct event_symbol event_symbols[] = {
- {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cpu-cycles", },
- {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cycles", },
- {EID(PERF_TYPE_HARDWARE, PERF_COUNT_INSTRUCTIONS), "instructions", },
- {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_REFERENCES), "cache-references", },
- {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_MISSES), "cache-misses", },
- {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branch-instructions", },
- {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branches", },
- {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_MISSES), "branch-misses", },
- {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BUS_CYCLES), "bus-cycles", },
-
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_CLOCK), "cpu-clock", },
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK), "task-clock", },
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "page-faults", },
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "faults", },
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MIN), "minor-faults", },
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MAJ), "major-faults", },
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "context-switches", },
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "cs", },
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "cpu-migrations", },
- {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "migrations", },
-};
-
-/*
- * Each event can have multiple symbolic names.
- * Symbolic names are (almost) exactly matched.
- */
-static __u64 match_event_symbols(char *str)
-{
- __u64 config, id;
- int type;
- unsigned int i;
-
- if (sscanf(str, "r%llx", &config) == 1)
- return config | PERF_COUNTER_RAW_MASK;
-
- if (sscanf(str, "%d:%llu", &type, &id) == 2)
- return EID(type, id);
-
- for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
- if (!strncmp(str, event_symbols[i].symbol,
- strlen(event_symbols[i].symbol)))
- return event_symbols[i].event;
- }
-
- return ~0ULL;
-}
-
-static int parse_events(char *str)
-{
- __u64 config;
-
-again:
- if (nr_counters == MAX_COUNTERS)
- return -1;
-
- config = match_event_symbols(str);
- if (config == ~0ULL)
- return -1;
-
- event_id[nr_counters] = config;
- nr_counters++;
-
- str = strstr(str, ",");
- if (str) {
- str++;
- goto again;
- }
-
- return 0;
-}
-
-#define __PERF_COUNTER_FIELD(config, name) \
- ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
-
-#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
-#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
-#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
-#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
-
-static void display_events_help(void)
-{
- unsigned int i;
- __u64 e;
-
- printf(
- " -e EVENT --event=EVENT # symbolic-name abbreviations");
-
- for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
- int type, id;
-
- e = event_symbols[i].event;
- type = PERF_COUNTER_TYPE(e);
- id = PERF_COUNTER_ID(e);
-
- printf("\n %d:%d: %-20s",
- type, id, event_symbols[i].symbol);
- }
-
- printf("\n"
- " rNNN: raw PMU events (eventsel+umask)\n\n");
-}
-
-static void display_help(void)
-{
- printf(
- "Usage: perf-record [<options>] <cmd>\n"
- "perf-record Options (up to %d event types can be specified at once):\n\n",
- MAX_COUNTERS);
-
- display_events_help();
-
- printf(
- " -c CNT --count=CNT # event period to sample\n"
- " -m pages --mmap_pages=<pages> # number of mmap data pages\n"
- " -o file --output=<file> # output file\n"
- " -p pid --pid=<pid> # record events on existing pid\n"
- " -r prio --realtime=<prio> # use RT prio\n"
- " -s --system # system wide profiling\n"
- );
-
- exit(0);
-}
-
-static void process_options(int argc, char * const argv[])
-{
- int error = 0, counter;
-
- for (;;) {
- int option_index = 0;
- /** Options for getopt */
- static struct option long_options[] = {
- {"count", required_argument, NULL, 'c'},
- {"event", required_argument, NULL, 'e'},
- {"mmap_pages", required_argument, NULL, 'm'},
- {"output", required_argument, NULL, 'o'},
- {"pid", required_argument, NULL, 'p'},
- {"realtime", required_argument, NULL, 'r'},
- {"system", no_argument, NULL, 's'},
- {"inherit", no_argument, NULL, 'i'},
- {"nmi", no_argument, NULL, 'n'},
- {NULL, 0, NULL, 0 }
- };
- int c = getopt_long(argc, argv, "+:c:e:m:o:p:r:sin",
- long_options, &option_index);
- if (c == -1)
- break;
-
- switch (c) {
- case 'c': default_interval = atoi(optarg); break;
- case 'e': error = parse_events(optarg); break;
- case 'm': mmap_pages = atoi(optarg); break;
- case 'o': output_name = strdup(optarg); break;
- case 'p': target_pid = atoi(optarg); break;
- case 'r': realtime_prio = atoi(optarg); break;
- case 's': system_wide ^= 1; break;
- case 'i': inherit ^= 1; break;
- case 'n': nmi ^= 1; break;
- default: error = 1; break;
- }
- }
-
- if (argc - optind == 0 && target_pid == -1)
- error = 1;
-
- if (error)
- display_help();
-
- if (!nr_counters) {
- nr_counters = 1;
- event_id[0] = 0;
- }
-
- for (counter = 0; counter < nr_counters; counter++) {
- if (event_count[counter])
- continue;
-
- event_count[counter] = default_interval;
- }
-}
-
struct mmap_data {
int counter;
void *base;
@@ -538,16 +353,13 @@ static void open_counters(int cpu, pid_t pid)
nr_cpu++;
}

-int cmd_record(int argc, char * const argv[])
+static int __cmd_record(int argc, const char **argv)
{
int i, counter;
pid_t pid;
int ret;

page_size = sysconf(_SC_PAGE_SIZE);
-
- process_options(argc, argv);
-
nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);


assert(nr_cpus <= MAX_NR_CPUS);
assert(nr_cpus >= 0);

@@ -558,9 +370,6 @@ int cmd_record(int argc, char * const argv[])
exit(-1);
}

- argc -= optind;
- argv += optind;
-
if (!system_wide) {
open_counters(-1, target_pid != -1 ? target_pid : 0);
} else for (i = 0; i < nr_cpus; i++)
@@ -575,7 +384,7 @@ int cmd_record(int argc, char * const argv[])
perror("failed to fork");

if (!pid) {
- if (execvp(argv[0], argv)) {
+ if (execvp(argv[0], (char **)argv)) {
perror(argv[0]);
exit(-1);
}
@@ -610,3 +419,170 @@ int cmd_record(int argc, char * const argv[])

return 0;
}
+
+struct event_symbol {
+ __u64 event;
+ char *symbol;
+};
+
+static struct event_symbol event_symbols[] = {
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cpu-cycles", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CPU_CYCLES), "cycles", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_INSTRUCTIONS), "instructions", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_REFERENCES), "cache-references", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_MISSES), "cache-misses", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branch-instructions", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_INSTRUCTIONS), "branches", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BRANCH_MISSES), "branch-misses", },
+ {EID(PERF_TYPE_HARDWARE, PERF_COUNT_BUS_CYCLES), "bus-cycles", },
+
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_CLOCK), "cpu-clock", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK), "task-clock", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "page-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS), "faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MIN), "minor-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_PAGE_FAULTS_MAJ), "major-faults", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "context-switches", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES), "cs", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "cpu-migrations", },
+ {EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "migrations", },
+};
+
+/*
+ * Each event can have multiple symbolic names.
+ * Symbolic names are (almost) exactly matched.
+ */
+static __u64 match_event_symbols(const char *str)
+{
+ __u64 config, id;
+ int type;
+ unsigned int i;
+
+ if (sscanf(str, "r%llx", &config) == 1)
+ return config | PERF_COUNTER_RAW_MASK;
+
+ if (sscanf(str, "%d:%llu", &type, &id) == 2)
+ return EID(type, id);
+
+ for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
+ if (!strncmp(str, event_symbols[i].symbol,
+ strlen(event_symbols[i].symbol)))
+ return event_symbols[i].event;
+ }
+
+ return ~0ULL;
+}
+
+static int parse_events(const struct option *opt, const char *str, int unset)
+{
+ __u64 config;
+
+again:
+ if (nr_counters == MAX_COUNTERS)
+ return -1;
+
+ config = match_event_symbols(str);
+ if (config == ~0ULL)
+ return -1;
+
+ event_id[nr_counters] = config;
+ nr_counters++;
+
+ str = strstr(str, ",");
+ if (str) {
+ str++;
+ goto again;
+ }
+
+ return 0;
+}
+
+static char events_help[100000];
+
+#define __PERF_COUNTER_FIELD(config, name) \
+ ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
+
+#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
+#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
+#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
+#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
+
+
+
+static void create_events_help(void)
+{
+ unsigned int i;
+ char *str;
+ __u64 e;
+
+ str = events_help;
+
+ str += sprintf(str,
+ "event name: [");
+
+ for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
+ int type, id;
+
+ e = event_symbols[i].event;
+ type = PERF_COUNTER_TYPE(e);
+ id = PERF_COUNTER_ID(e);
+
+ if (i)
+ str += sprintf(str, "|");
+
+ str += sprintf(str, "%s",
+ event_symbols[i].symbol);
+ }
+
+ str += sprintf(str, "|rNNN]");
+}
+
+static const char * const record_usage[] = {
+ "perf record [<options>] <command>",
+ NULL
+};
+
+const struct option options[] = {
+ OPT_CALLBACK('e', "event", NULL, "event",
+ events_help, parse_events),
+ OPT_INTEGER('c', "count", &default_interval,
+ "event period to sample"),
+ OPT_INTEGER('m', "mmap-pages", &mmap_pages,
+ "number of mmap data pages"),
+ OPT_STRING('o', "output", &output_name, "file",
+ "output file name"),
+ OPT_BOOLEAN('i', "inherit", &inherit,
+ "child tasks inherit counters"),
+ OPT_INTEGER('p', "pid", &target_pid,
+ "record events on existing pid"),
+ OPT_INTEGER('r', "realtime", &realtime_prio,
+ "collect data with this RT SCHED_FIFO priority"),
+ OPT_BOOLEAN('a', "all-cpus", &system_wide,
+ "system-wide collection from all CPUs"),
+ OPT_END()
+};
+
+int cmd_record(int argc, const char **argv, const char *prefix)
+{
+ int counter;
+
+ create_events_help();
+
+ argc = parse_options(argc, argv, options, record_usage, 0);
+ if (!argc)
+ usage_with_options(record_usage, options);
+
+ if (!nr_counters) {
+ nr_counters = 1;
+ event_id[0] = 0;
+ }
+
+ for (counter = 0; counter < nr_counters; counter++) {
+ if (event_count[counter])
+ continue;
+
+ event_count[counter] = default_interval;
+ }
+
+ return __cmd_record(argc, argv);
+}
diff --git a/Documentation/perf_counter/builtin-top.c b/Documentation/perf_counter/builtin-top.c
index 4bed265..626b320 100644
--- a/Documentation/perf_counter/builtin-top.c
+++ b/Documentation/perf_counter/builtin-top.c
@@ -42,13 +42,16 @@
* Released under the GPL v2. (and only v2, not any later version)
*/

+
#include "perf.h"
#include "util/util.h"

#include <getopt.h>
#include <assert.h>
#include <fcntl.h>
+
#include <stdio.h>
+
#include <errno.h>
#include <time.h>
#include <sched.h>

tip-bot for Ingo Molnar

unread,
May 26, 2009, 6:40:14 AM5/26/09
to
Commit-ID: 69aa48ab82e17299efe2be6c21795945731a6c17
Gitweb: http://git.kernel.org/tip/69aa48ab82e17299efe2be6c21795945731a6c17
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 09:02:27 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 10:05:56 +0200

perf record: Straighten out argv types

[ Impact: cleanup ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: Thomas Gleixner <tg...@linutronix.de>


Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-record.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/perf_counter/builtin-record.c b/Documentation/perf_counter/builtin-record.c
index 1b19f18..f225efa 100644
--- a/Documentation/perf_counter/builtin-record.c
+++ b/Documentation/perf_counter/builtin-record.c
@@ -191,7 +191,7 @@ static void display_help(void)
exit(0);
}

-static void process_options(int argc, const char *argv[])
+static void process_options(int argc, char * const argv[])
{


int error = 0, counter;

@@ -538,7 +538,7 @@ static void open_counters(int cpu, pid_t pid)
nr_cpu++;
}

-int cmd_record(int argc, const char **argv)
+int cmd_record(int argc, char * const argv[])


{
int i, counter;
pid_t pid;

tip-bot for Ingo Molnar

unread,
May 26, 2009, 6:40:16 AM5/26/09
to
Commit-ID: 8ad8db3788fd9a449941fb2392ca85af4ee1cde1
Gitweb: http://git.kernel.org/tip/8ad8db3788fd9a449941fb2392ca85af4ee1cde1
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 11:10:09 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 11:26:34 +0200

perf_counter tools: Librarize event string parsing

Extract the event string parser from builtin-record.c, and
librarize it - to be reused in other commands.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/Makefile | 2 +
Documentation/perf_counter/builtin-record.c | 154 +-----------------------
Documentation/perf_counter/util/parse-events.c | 127 +++++++++++++++++++
Documentation/perf_counter/util/parse-events.h | 10 ++
4 files changed, 145 insertions(+), 148 deletions(-)

diff --git a/Documentation/perf_counter/Makefile b/Documentation/perf_counter/Makefile
index 481e4c2..45daa72 100644
--- a/Documentation/perf_counter/Makefile
+++ b/Documentation/perf_counter/Makefile
@@ -290,6 +290,7 @@ LIB_H += ../../include/linux/perf_counter.h
LIB_H += perf.h
LIB_H += util/levenshtein.h
LIB_H += util/parse-options.h
+LIB_H += util/parse-events.h
LIB_H += util/quote.h
LIB_H += util/util.h
LIB_H += util/help.h
@@ -304,6 +305,7 @@ LIB_OBJS += util/exec_cmd.o
LIB_OBJS += util/help.o
LIB_OBJS += util/levenshtein.o
LIB_OBJS += util/parse-options.o
+LIB_OBJS += util/parse-events.o
LIB_OBJS += util/path.o
LIB_OBJS += util/run-command.o
LIB_OBJS += util/quote.o
diff --git a/Documentation/perf_counter/builtin-record.c b/Documentation/perf_counter/builtin-record.c
index f12a782..6fa6ed6 100644
--- a/Documentation/perf_counter/builtin-record.c
+++ b/Documentation/perf_counter/builtin-record.c
@@ -3,44 +3,17 @@
#include "perf.h"
#include "util/util.h"
#include "util/parse-options.h"
+#include "util/parse-events.h"
#include "util/exec_cmd.h"

-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/time.h>
-#include <unistd.h>
-#include <stdint.h>
-#include <stdlib.h>
-#include <string.h>
-#include <limits.h>
-#include <assert.h>
-#include <fcntl.h>
-#include <stdio.h>
-#include <errno.h>
-#include <time.h>
#include <sched.h>
-#include <pthread.h>
-
-#include <sys/syscall.h>
-#include <sys/ioctl.h>
-#include <sys/poll.h>
-#include <sys/prctl.h>
-#include <sys/wait.h>
-#include <sys/uio.h>
-#include <sys/mman.h>
-
-#include <linux/unistd.h>
-#include <linux/types.h>
-
-

#define ALIGN(x, a) __ALIGN_MASK(x, (typeof(x))(a)-1)
#define __ALIGN_MASK(x, mask) (((x)+(mask))&~(mask))

-static int nr_counters = 0;
-static __u64 event_id[MAX_COUNTERS] = { };
static int default_interval = 100000;
static int event_count[MAX_COUNTERS];
+
static int fd[MAX_NR_CPUS][MAX_COUNTERS];


static int nr_cpus = 0;
static unsigned int page_size;

@@ -420,131 +393,16 @@ static int __cmd_record(int argc, const char **argv)
return 0;

-static __u64 match_event_symbols(const char *str)


-{
- __u64 config, id;
- int type;
- unsigned int i;
-
- if (sscanf(str, "r%llx", &config) == 1)
- return config | PERF_COUNTER_RAW_MASK;
-
- if (sscanf(str, "%d:%llu", &type, &id) == 2)
- return EID(type, id);
-
- for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
- if (!strncmp(str, event_symbols[i].symbol,
- strlen(event_symbols[i].symbol)))
- return event_symbols[i].event;
- }
-
- return ~0ULL;
-}
-

-static int parse_events(const struct option *opt, const char *str, int unset)


-{
- __u64 config;
-
-again:
- if (nr_counters == MAX_COUNTERS)
- return -1;
-
- config = match_event_symbols(str);
- if (config == ~0ULL)
- return -1;
-
- event_id[nr_counters] = config;
- nr_counters++;
-
- str = strstr(str, ",");
- if (str) {
- str++;
- goto again;
- }
-
- return 0;
-}
-

-static char events_help[100000];


-
-#define __PERF_COUNTER_FIELD(config, name) \
- ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
-
-#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
-#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
-#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
-#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
-
-

-
-static void create_events_help(void)


-{
- unsigned int i;

- char *str;
- __u64 e;
-
- str = events_help;
-
- str += sprintf(str,
- "event name: [");


-
- for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
- int type, id;
-
- e = event_symbols[i].event;
- type = PERF_COUNTER_TYPE(e);
- id = PERF_COUNTER_ID(e);
-

- if (i)
- str += sprintf(str, "|");
-
- str += sprintf(str, "%s",
- event_symbols[i].symbol);
- }
-
- str += sprintf(str, "|rNNN]");
-}
-


static const char * const record_usage[] = {

"perf record [<options>] <command>",

NULL
};

+static char events_help_msg[EVENTS_HELP_MAX];
+
const struct option options[] = {


OPT_CALLBACK('e', "event", NULL, "event",

- events_help, parse_events),
+ events_help_msg, parse_events),


OPT_INTEGER('c', "count", &default_interval,

"event period to sample"),

OPT_INTEGER('m', "mmap-pages", &mmap_pages,

@@ -566,7 +424,7 @@ int cmd_record(int argc, const char **argv, const char *prefix)
{
int counter;

- create_events_help();
+ create_events_help(events_help_msg);



argc = parse_options(argc, argv, options, record_usage, 0);

if (!argc)
diff --git a/Documentation/perf_counter/util/parse-events.c b/Documentation/perf_counter/util/parse-events.c
new file mode 100644
index 0000000..77d0917
--- /dev/null
+++ b/Documentation/perf_counter/util/parse-events.c
@@ -0,0 +1,127 @@
+
+#include "../perf.h"
+#include "util.h"
+#include "parse-options.h"
+#include "parse-events.h"
+#include "exec_cmd.h"
+
+int nr_counters;
+
+__u64 event_id[MAX_COUNTERS] = { };

+int parse_events(const struct option *opt, const char *str, int unset)


+{
+ __u64 config;
+
+again:
+ if (nr_counters == MAX_COUNTERS)
+ return -1;
+
+ config = match_event_symbols(str);
+ if (config == ~0ULL)
+ return -1;
+
+ event_id[nr_counters] = config;
+ nr_counters++;
+
+ str = strstr(str, ",");
+ if (str) {
+ str++;
+ goto again;
+ }
+
+ return 0;
+}
+

+#define __PERF_COUNTER_FIELD(config, name) \
+ ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
+
+#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
+#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
+#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
+#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
+

+/*
+ * Create the help text for the event symbols:
+ */
+void create_events_help(char *events_help_msg)


+{
+ unsigned int i;
+ char *str;
+ __u64 e;
+

+ str = events_help_msg;


+
+ str += sprintf(str,
+ "event name: [");
+
+ for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
+ int type, id;
+
+ e = event_symbols[i].event;
+ type = PERF_COUNTER_TYPE(e);
+ id = PERF_COUNTER_ID(e);
+
+ if (i)
+ str += sprintf(str, "|");
+
+ str += sprintf(str, "%s",
+ event_symbols[i].symbol);
+ }
+
+ str += sprintf(str, "|rNNN]");
+}
+

diff --git a/Documentation/perf_counter/util/parse-events.h b/Documentation/perf_counter/util/parse-events.h
new file mode 100644
index 0000000..6e2ebe5
--- /dev/null
+++ b/Documentation/perf_counter/util/parse-events.h
@@ -0,0 +1,10 @@
+
+extern int nr_counters;
+extern __u64 event_id[MAX_COUNTERS];
+
+extern int parse_events(const struct option *opt, const char *str, int unset);
+
+#define EVENTS_HELP_MAX (128*1024)
+
+extern void create_events_help(char *help_msg);
+

tip-bot for Ingo Molnar

unread,
May 26, 2009, 7:10:13 AM5/26/09
to
Commit-ID: 5242519b0296d128425368fc6ab17f541d5fa775
Gitweb: http://git.kernel.org/tip/5242519b0296d128425368fc6ab17f541d5fa775
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 09:17:18 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 11:59:34 +0200

perf stat: Convert to Git option parsing

Remove getopt usage and use Git's much more advanced and more compact
command option library.

Extend the event parser library with the extensions that were in
perf-stat before.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-record.c | 3 +-
Documentation/perf_counter/builtin-stat.c | 414 ++++--------------------
Documentation/perf_counter/util/parse-events.c | 82 ++++-
Documentation/perf_counter/util/parse-events.h | 10 +
4 files changed, 145 insertions(+), 364 deletions(-)

diff --git a/Documentation/perf_counter/builtin-record.c b/Documentation/perf_counter/builtin-record.c
index 6fa6ed6..ec2b787 100644
--- a/Documentation/perf_counter/builtin-record.c
+++ b/Documentation/perf_counter/builtin-record.c
@@ -4,7 +4,6 @@
#include "util/util.h"
#include "util/parse-options.h"
#include "util/parse-events.h"
-#include "util/exec_cmd.h"

#include <sched.h>

@@ -400,7 +399,7 @@ static const char * const record_usage[] = {

static char events_help_msg[EVENTS_HELP_MAX];

-const struct option options[] = {
+static const struct option options[] = {


OPT_CALLBACK('e', "event", NULL, "event",

events_help_msg, parse_events),
OPT_INTEGER('c', "count", &default_interval,

diff --git a/Documentation/perf_counter/builtin-stat.c b/Documentation/perf_counter/builtin-stat.c
index c1053d8..e7cb941 100644
--- a/Documentation/perf_counter/builtin-stat.c
+++ b/Documentation/perf_counter/builtin-stat.c
@@ -1,35 +1,5 @@
/*
- * kerneltop.c: show top kernel functions - performance counters showcase
-
- Build with:
-
- cc -O6 -Wall -c -o kerneltop.o kerneltop.c -lrt
-
- Sample output:
-
-------------------------------------------------------------------------------
- KernelTop: 2669 irqs/sec [NMI, cache-misses/cache-refs], (all, cpu: 2)
-------------------------------------------------------------------------------
-
- weight RIP kernel function
- ______ ________________ _______________
-
- 35.20 - ffffffff804ce74b : skb_copy_and_csum_dev
- 33.00 - ffffffff804cb740 : sock_alloc_send_skb
- 31.26 - ffffffff804ce808 : skb_push
- 22.43 - ffffffff80510004 : tcp_established_options
- 19.00 - ffffffff8027d250 : find_get_page
- 15.76 - ffffffff804e4fc9 : eth_type_trans
- 15.20 - ffffffff804d8baa : dst_release
- 14.86 - ffffffff804cf5d8 : skb_release_head_state
- 14.00 - ffffffff802217d5 : read_hpet
- 12.00 - ffffffff804ffb7f : __ip_local_out
- 11.97 - ffffffff804fc0c8 : ip_local_deliver_finish
- 8.54 - ffffffff805001a3 : ip_queue_xmit
- */
-
-/*
- * perfstat: /usr/bin/time -alike performance counter statistics utility
+ * perf stat: /usr/bin/time -alike performance counter statistics utility

It summarizes the counter events of all tasks (and child tasks),
covering all CPUs that the command (or workload) executes on.
@@ -38,59 +8,38 @@

Sample output:

- $ ./perfstat -e 1 -e 3 -e 5 ls -lR /usr/include/ >/dev/null
+ $ perf stat -e 1 -e 3 -e 5 ls -lR /usr/include/ >/dev/null

Performance counter stats for 'ls':

163516953 instructions
2295 cache-misses
2855182 branch-misses
+ *
+ * Copyright (C) 2008, Red Hat Inc, Ingo Molnar <mi...@redhat.com>
+ *
+ * Improvements and fixes by:
+ *
+ * Arjan van de Ven <ar...@linux.intel.com>
+ * Yanmin Zhang <yanmin...@intel.com>
+ * Wu Fengguang <fenggu...@intel.com>
+ * Mike Galbraith <efa...@gmx.de>
+ * Paul Mackerras <pau...@samba.org>
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
*/

- /*
- * Copyright (C) 2008, Red Hat Inc, Ingo Molnar <mi...@redhat.com>
- *
- * Improvements and fixes by:
- *
- * Arjan van de Ven <ar...@linux.intel.com>
- * Yanmin Zhang <yanmin...@intel.com>
- * Wu Fengguang <fenggu...@intel.com>
- * Mike Galbraith <efa...@gmx.de>
- * Paul Mackerras <pau...@samba.org>
- *
- * Released under the GPL v2. (and only v2, not any later version)
- */
-


#include "perf.h"
#include "util/util.h"
+#include "util/parse-options.h"

+#include "util/parse-events.h"

-#include <getopt.h>


-#include <assert.h>
-#include <fcntl.h>
-#include <stdio.h>
-#include <errno.h>
-#include <time.h>

-#include <sched.h>


-#include <pthread.h>
-
-#include <sys/syscall.h>
-#include <sys/ioctl.h>
-#include <sys/poll.h>

#include <sys/prctl.h>
-#include <sys/wait.h>
-#include <sys/uio.h>
-#include <sys/mman.h>
-
-#include <linux/unistd.h>
-#include <linux/types.h>
-

-#define EVENT_MASK_KERNEL 1
-#define EVENT_MASK_USER 2



static int system_wide = 0;

+static int inherit = 1;



-static int nr_counters = 0;
-static __u64 event_id[MAX_COUNTERS] = {

+static __u64 default_event_id[MAX_COUNTERS] = {
EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK),
EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES),
EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS),
@@ -101,20 +50,15 @@ static __u64 event_id[MAX_COUNTERS] = {
EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_REFERENCES),
EID(PERF_TYPE_HARDWARE, PERF_COUNT_CACHE_MISSES),
};
+


static int default_interval = 100000;
static int event_count[MAX_COUNTERS];

static int fd[MAX_NR_CPUS][MAX_COUNTERS];

-static int event_mask[MAX_COUNTERS];

-static int tid = -1;
-static int profile_cpu = -1;
+static int target_pid = -1;


static int nr_cpus = 0;

-static int nmi = 1;
-static int group = 0;
static unsigned int page_size;

-static int zero;
-
static int scale = 1;

static const unsigned int default_count[] = {
@@ -126,197 +70,6 @@ static const unsigned int default_count[] = {
10000,
};

-static char *hw_event_names[] = {
- "CPU cycles",
- "instructions",
- "cache references",
- "cache misses",
- "branches",
- "branch misses",
- "bus cycles",
-};
-
-static char *sw_event_names[] = {
- "cpu clock ticks",
- "task clock ticks",
- "pagefaults",
- "context switches",
- "CPU migrations",
- "minor faults",
- "major faults",
-};
-

-#define __PERF_COUNTER_FIELD(config, name) \
- ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
-
-#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
-#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
-#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
-#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
-

-static void display_events_help(void)


-{
- unsigned int i;

- __u64 e;
-
- printf(

- " -e EVENT --event=EVENT # symbolic-name abbreviations");


-
- for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
- int type, id;
-
- e = event_symbols[i].event;
- type = PERF_COUNTER_TYPE(e);
- id = PERF_COUNTER_ID(e);
-

- printf("\n %d:%d: %-20s",

- type, id, event_symbols[i].symbol);
- }
-


- printf("\n"
- " rNNN: raw PMU events (eventsel+umask)\n\n");
-}
-
-static void display_help(void)
-{
- printf(

- "Usage: perfstat [<events...>] <cmd...>\n\n"
- "PerfStat Options (up to %d event types can be specified):\n\n",


- MAX_COUNTERS);
-
- display_events_help();
-
- printf(

- " -l # scale counter values\n"
- " -a # system-wide collection\n");
- exit(0);
-}
-
-static char *event_name(int ctr)
-{
- __u64 config = event_id[ctr];
- int type = PERF_COUNTER_TYPE(config);
- int id = PERF_COUNTER_ID(config);
- static char buf[32];
-
- if (PERF_COUNTER_RAW(config)) {
- sprintf(buf, "raw 0x%llx", PERF_COUNTER_CONFIG(config));
- return buf;
- }
-
- switch (type) {
- case PERF_TYPE_HARDWARE:
- if (id < PERF_HW_EVENTS_MAX)
- return hw_event_names[id];
- return "unknown-hardware";
-
- case PERF_TYPE_SOFTWARE:
- if (id < PERF_SW_EVENTS_MAX)
- return sw_event_names[id];
- return "unknown-software";
-
- default:
- break;
- }
-
- return "unknown";


-}
-
-/*
- * Each event can have multiple symbolic names.
- * Symbolic names are (almost) exactly matched.
- */

-static __u64 match_event_symbols(char *str)


-{
- __u64 config, id;
- int type;
- unsigned int i;

- char mask_str[4];


-
- if (sscanf(str, "r%llx", &config) == 1)
- return config | PERF_COUNTER_RAW_MASK;
-

- switch (sscanf(str, "%d:%llu:%2s", &type, &id, mask_str)) {
- case 3:
- if (strchr(mask_str, 'k'))


- event_mask[nr_counters] |= EVENT_MASK_USER;
- if (strchr(mask_str, 'u'))

- event_mask[nr_counters] |= EVENT_MASK_KERNEL;
- case 2:


- return EID(type, id);
-

- default:
- break;
- }


-
- for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {
- if (!strncmp(str, event_symbols[i].symbol,
- strlen(event_symbols[i].symbol)))
- return event_symbols[i].event;
- }
-
- return ~0ULL;
-}
-

-static int parse_events(char *str)


-{
- __u64 config;
-
-again:
- if (nr_counters == MAX_COUNTERS)
- return -1;
-
- config = match_event_symbols(str);
- if (config == ~0ULL)
- return -1;
-
- event_id[nr_counters] = config;
- nr_counters++;
-
- str = strstr(str, ",");
- if (str) {
- str++;
- goto again;
- }
-
- return 0;
-}
-
-

-/*
- * perfstat
- */
-
-char fault_here[1000000];
-
static void create_perfstat_counter(int counter)
{
struct perf_counter_hw_event hw_event;
@@ -324,7 +77,7 @@ static void create_perfstat_counter(int counter)
memset(&hw_event, 0, sizeof(hw_event));
hw_event.config = event_id[counter];
hw_event.record_type = 0;
- hw_event.nmi = 0;
+ hw_event.nmi = 1;
hw_event.exclude_kernel = event_mask[counter] & EVENT_MASK_KERNEL;
hw_event.exclude_user = event_mask[counter] & EVENT_MASK_USER;

@@ -343,7 +96,7 @@ static void create_perfstat_counter(int counter)
}
}
} else {
- hw_event.inherit = 1;
+ hw_event.inherit = inherit;
hw_event.disabled = 1;

fd[0][counter] = sys_perf_counter_open(&hw_event, 0, -1, -1, 0);
@@ -355,7 +108,7 @@ static void create_perfstat_counter(int counter)
}
}

-int do_perfstat(int argc, char *argv[])
+int do_perfstat(int argc, const char **argv)
{
unsigned long long t0, t1;
int counter;
@@ -369,12 +122,6 @@ int do_perfstat(int argc, char *argv[])


for (counter = 0; counter < nr_counters; counter++)

create_perfstat_counter(counter);



- argc -= optind;
- argv += optind;
-

- if (!argc)
- display_help();
-
/*
* Enable counters and exec the command:
*/
@@ -384,7 +131,7 @@ int do_perfstat(int argc, char *argv[])
if ((pid = fork()) < 0)


perror("failed to fork");
if (!pid) {
- if (execvp(argv[0], argv)) {
+ if (execvp(argv[0], (char **)argv)) {
perror(argv[0]);
exit(-1);
}

@@ -458,70 +205,45 @@ int do_perfstat(int argc, char *argv[])
return 0;
}

-static void process_options(int argc, char **argv)
+static void skip_signal(int signo)


{
- int error = 0, counter;
-
- for (;;) {
- int option_index = 0;
- /** Options for getopt */
- static struct option long_options[] = {
- {"count", required_argument, NULL, 'c'},

- {"cpu", required_argument, NULL, 'C'},
- {"delay", required_argument, NULL, 'd'},
- {"dump_symtab", no_argument, NULL, 'D'},


- {"event", required_argument, NULL, 'e'},

- {"filter", required_argument, NULL, 'f'},
- {"group", required_argument, NULL, 'g'},
- {"help", no_argument, NULL, 'h'},
- {"nmi", required_argument, NULL, 'n'},
- {"munmap_info", no_argument, NULL, 'U'},


- {"pid", required_argument, NULL, 'p'},
- {"realtime", required_argument, NULL, 'r'},

- {"scale", no_argument, NULL, 'l'},
- {"symbol", required_argument, NULL, 's'},
- {"stat", no_argument, NULL, 'S'},
- {"vmlinux", required_argument, NULL, 'x'},
- {"zero", no_argument, NULL, 'z'},


- {NULL, 0, NULL, 0 }
- };

- int c = getopt_long(argc, argv, "+:ac:C:d:De:f:g:hln:m:p:r:s:Sx:zMU",


- long_options, &option_index);
- if (c == -1)
- break;
-
- switch (c) {

- case 'a': system_wide = 1; break;


- case 'c': default_interval = atoi(optarg); break;

- case 'C':
- /* CPU and PID are mutually exclusive */
- if (tid != -1) {
- printf("WARNING: CPU switch overriding PID\n");
- sleep(1);
- tid = -1;
- }
- profile_cpu = atoi(optarg); break;
-


- case 'e': error = parse_events(optarg); break;
-

- case 'g': group = atoi(optarg); break;
- case 'h': display_help(); break;
- case 'l': scale = 1; break;
- case 'n': nmi = atoi(optarg); break;
- case 'p':
- /* CPU and PID are mutually exclusive */
- if (profile_cpu != -1) {
- printf("WARNING: PID switch overriding CPU\n");
- sleep(1);
- profile_cpu = -1;
- }
- tid = atoi(optarg); break;
- case 'z': zero = 1; break;


- default: error = 1; break;
- }
- }

- if (error)
- display_help();
+}
+
+static const char * const stat_usage[] = {
+ "perf stat [<options>] <command>",
+ NULL
+};
+


+static char events_help_msg[EVENTS_HELP_MAX];
+

+static const struct option options[] = {
+ OPT_CALLBACK('e', "event", NULL, "event",
+ events_help_msg, parse_events),
+ OPT_INTEGER('c', "count", &default_interval,
+ "event period to sample"),


+ OPT_BOOLEAN('i', "inherit", &inherit,
+ "child tasks inherit counters"),
+ OPT_INTEGER('p', "pid", &target_pid,

+ "stat events on existing pid"),


+ OPT_BOOLEAN('a', "all-cpus", &system_wide,
+ "system-wide collection from all CPUs"),

+ OPT_BOOLEAN('l', "scale", &scale,
+ "scale/normalize counters"),
+ OPT_END()
+};
+
+int cmd_stat(int argc, const char **argv, const char *prefix)


+{
+ int counter;
+

+ page_size = sysconf(_SC_PAGE_SIZE);
+
+ create_events_help(events_help_msg);
+ memcpy(event_id, default_event_id, sizeof(default_event_id));
+
+ argc = parse_options(argc, argv, options, stat_usage, 0);
+ if (!argc)
+ usage_with_options(stat_usage, options);

if (!nr_counters) {
nr_counters = 8;
@@ -533,18 +255,6 @@ static void process_options(int argc, char **argv)



event_count[counter] = default_interval;
}
-}
-

-static void skip_signal(int signo)
-{
-}
-
-int cmd_stat(int argc, char **argv, const char *prefix)
-{
- page_size = sysconf(_SC_PAGE_SIZE);


-
- process_options(argc, argv);
-
nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
assert(nr_cpus <= MAX_NR_CPUS);
assert(nr_cpus >= 0);

diff --git a/Documentation/perf_counter/util/parse-events.c b/Documentation/perf_counter/util/parse-events.c
index 77d0917..88c903e 100644
--- a/Documentation/perf_counter/util/parse-events.c
+++ b/Documentation/perf_counter/util/parse-events.c
@@ -8,6 +8,7 @@
int nr_counters;

__u64 event_id[MAX_COUNTERS] = { };
+int event_mask[MAX_COUNTERS];

struct event_symbol {
__u64 event;
@@ -37,6 +38,64 @@ static struct event_symbol event_symbols[] = {


{EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS), "migrations", },

};



+#define __PERF_COUNTER_FIELD(config, name) \
+ ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
+
+#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
+#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
+#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
+#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
+

+static char *hw_event_names[] = {
+ "CPU cycles",
+ "instructions",
+ "cache references",
+ "cache misses",
+ "branches",
+ "branch misses",
+ "bus cycles",
+};
+
+static char *sw_event_names[] = {
+ "cpu clock ticks",
+ "task clock ticks",
+ "pagefaults",
+ "context switches",
+ "CPU migrations",
+ "minor faults",
+ "major faults",
+};
+
+char *event_name(int ctr)
+{
+ __u64 config = event_id[ctr];
+ int type = PERF_COUNTER_TYPE(config);
+ int id = PERF_COUNTER_ID(config);
+ static char buf[32];
+
+ if (PERF_COUNTER_RAW(config)) {
+ sprintf(buf, "raw 0x%llx", PERF_COUNTER_CONFIG(config));
+ return buf;
+ }
+
+ switch (type) {
+ case PERF_TYPE_HARDWARE:
+ if (id < PERF_HW_EVENTS_MAX)
+ return hw_event_names[id];
+ return "unknown-hardware";
+
+ case PERF_TYPE_SOFTWARE:
+ if (id < PERF_SW_EVENTS_MAX)
+ return sw_event_names[id];
+ return "unknown-software";
+
+ default:
+ break;
+ }
+
+ return "unknown";
+}
+
/*


* Each event can have multiple symbolic names.

* Symbolic names are (almost) exactly matched.

@@ -46,12 +105,23 @@ static __u64 match_event_symbols(const char *str)
__u64 config, id;
int type;
unsigned int i;
+ char mask_str[4];



if (sscanf(str, "r%llx", &config) == 1)

return config | PERF_COUNTER_RAW_MASK;



- if (sscanf(str, "%d:%llu", &type, &id) == 2)
- return EID(type, id);

+ switch (sscanf(str, "%d:%llu:%2s", &type, &id, mask_str)) {
+ case 3:
+ if (strchr(mask_str, 'k'))


+ event_mask[nr_counters] |= EVENT_MASK_USER;
+ if (strchr(mask_str, 'u'))

+ event_mask[nr_counters] |= EVENT_MASK_KERNEL;
+ case 2:


+ return EID(type, id);
+

+ default:
+ break;


+ }

for (i = 0; i < ARRAY_SIZE(event_symbols); i++) {

if (!strncmp(str, event_symbols[i].symbol,

@@ -86,14 +156,6 @@ again:
return 0;


}

-#define __PERF_COUNTER_FIELD(config, name) \
- ((config & PERF_COUNTER_##name##_MASK) >> PERF_COUNTER_##name##_SHIFT)
-
-#define PERF_COUNTER_RAW(config) __PERF_COUNTER_FIELD(config, RAW)
-#define PERF_COUNTER_CONFIG(config) __PERF_COUNTER_FIELD(config, CONFIG)
-#define PERF_COUNTER_TYPE(config) __PERF_COUNTER_FIELD(config, TYPE)
-#define PERF_COUNTER_ID(config) __PERF_COUNTER_FIELD(config, EVENT)
-

/*


* Create the help text for the event symbols:

*/
diff --git a/Documentation/perf_counter/util/parse-events.h b/Documentation/perf_counter/util/parse-events.h
index 6e2ebe5..0da306b 100644
--- a/Documentation/perf_counter/util/parse-events.h
+++ b/Documentation/perf_counter/util/parse-events.h
@@ -1,6 +1,16 @@

+/*
+ * Parse symbolic events/counts passed in as options:
+ */
+
extern int nr_counters;
extern __u64 event_id[MAX_COUNTERS];
+extern int event_mask[MAX_COUNTERS];
+
+#define EVENT_MASK_KERNEL 1
+#define EVENT_MASK_USER 2
+
+extern char *event_name(int ctr);

extern int parse_events(const struct option *opt, const char *str, int unset);

tip-bot for Ingo Molnar

unread,
May 26, 2009, 7:40:12 AM5/26/09
to
Commit-ID: b456bae0ff4f3cf91639dd32b2bfc49b1c30b4b0
Gitweb: http://git.kernel.org/tip/b456bae0ff4f3cf91639dd32b2bfc49b1c30b4b0

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 09:17:18 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 12:33:04 +0200

perf top: Convert to Git option parsing

Remove getopt usage and use Git's much more advanced and more compact
command option library.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-top.c | 559 ++++++------------------------
1 files changed, 105 insertions(+), 454 deletions(-)

diff --git a/Documentation/perf_counter/builtin-top.c b/Documentation/perf_counter/builtin-top.c
index 626b320..87b925c 100644
--- a/Documentation/perf_counter/builtin-top.c
+++ b/Documentation/perf_counter/builtin-top.c
@@ -45,8 +45,10 @@

#include "perf.h"
#include "util/util.h"
+#include "util/util.h"


+#include "util/parse-options.h"
+#include "util/parse-events.h"

-#include <getopt.h>

#include <assert.h>
#include <fcntl.h>

@@ -70,8 +72,7 @@



static int system_wide = 0;

-static int nr_counters = 0;
-static __u64 event_id[MAX_COUNTERS] = {
+static __u64 default_event_id[MAX_COUNTERS] = {
EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK),
EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CONTEXT_SWITCHES),
EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_MIGRATIONS),

@@ -88,7 +89,7 @@ static int fd[MAX_NR_CPUS][MAX_COUNTERS];

static __u64 count_filter = 100;



-static int tid = -1;

+static int target_pid = -1;

static int profile_cpu = -1;


static int nr_cpus = 0;

static int nmi = 1;
@@ -100,8 +101,6 @@ static int use_mmap = 0;
static int use_munmap = 0;
static int freq = 0;

-static char *vmlinux;
-
static char *sym_filter;
static unsigned long filter_start;
static unsigned long filter_end;
@@ -110,18 +109,6 @@ static int delay_secs = 2;
static int zero;
static int dump_symtab;

-static int scale;
-
-struct source_line {
- uint64_t EIP;
- unsigned long count;
- char *line;
- struct source_line *next;
-};
-
-static struct source_line *lines;
-static struct source_line **lines_tail;
-


static const unsigned int default_count[] = {

1000000,
1000000,
@@ -131,194 +118,6 @@ static const unsigned int default_count[] = {

- "Usage: kerneltop [<options>]\n"
- " Or: kerneltop -S [<options>] COMMAND [ARGS]\n\n"
- "KernelTop Options (up to %d event types can be specified at once):\n\n",


- MAX_COUNTERS);
-
- display_events_help();
-
- printf(

- " -c CNT --count=CNT # event period to sample\n\n"
- " -C CPU --cpu=CPU # CPU (-1 for all) [default: -1]\n"
- " -p PID --pid=PID # PID of sampled task (-1 for all) [default: -1]\n\n"
- " -l # show scale factor for RR events\n"
- " -d delay --delay=<seconds> # sampling/display delay [default: 2]\n"
- " -f CNT --filter=CNT # min-event-count filter [default: 100]\n\n"
- " -r prio --realtime=<prio> # event acquisition runs with SCHED_FIFO policy\n"
- " -s symbol --symbol=<symbol> # function to be showed annotated one-shot\n"
- " -x path --vmlinux=<path> # the vmlinux binary, required for -s use\n"
- " -z --zero # zero counts after display\n"
- " -D --dump_symtab # dump symbol table to stderr on startup\n"


- " -m pages --mmap_pages=<pages> # number of mmap data pages\n"

- " -M --mmap_info # print mmap info stream\n"
- " -U --munmap_info # print munmap info stream\n"
- );
-

- if (sscanf(str, "r%llx", &config) == 1)
- return config | PERF_COUNTER_RAW_MASK;
-

- if (sscanf(str, "%d:%llu", &type, &id) == 2)
- return EID(type, id);

/*
* Symbols
*/
@@ -331,7 +130,6 @@ struct sym_entry {
char *sym;
unsigned long count[MAX_COUNTERS];
int skip;
- struct source_line *source;
};

#define MAX_SYMS 100000
@@ -342,8 +140,6 @@ struct sym_entry *sym_filter_entry;

static struct sym_entry sym_table[MAX_SYMS];

-static void show_details(struct sym_entry *sym);
-
/*
* Ordering weight: count-1 * count-2 * ... / count-n
*/
@@ -419,15 +215,15 @@ static void print_sym_table(void)

printf( "], ");



- if (tid != -1)

- printf(" (tid: %d", tid);
+ if (target_pid != -1)
+ printf(" (target_pid: %d", target_pid);
else
printf(" (all");

if (profile_cpu != -1)
printf(", cpu: %d)\n", profile_cpu);
else {


- if (tid != -1)

+ if (target_pid != -1)
printf(")\n");
else
printf(", %d CPUs)\n", nr_cpus);
@@ -463,9 +259,6 @@ static void print_sym_table(void)


pcnt, tmp[i].addr, tmp[i].sym);
}

- if (sym_filter_entry)
- show_details(sym_filter_entry);
-
{
struct pollfd stdin_poll = { .fd = 0, .events = POLLIN };

@@ -628,134 +421,8 @@ static void parse_symbols(void)
}
}

-/*
- * Source lines
- */
-
-static void parse_vmlinux(char *filename)
-{
- FILE *file;
- char command[PATH_MAX*2];
- if (!filename)
- return;
-
- sprintf(command, "objdump --start-address=0x%016lx --stop-address=0x%016lx -dS %s", filter_start, filter_end, filename);
-
- file = popen(command, "r");
- if (!file)
- return;
-
- lines_tail = &lines;
- while (!feof(file)) {
- struct source_line *src;
- size_t dummy = 0;
- char *c;
-
- src = malloc(sizeof(struct source_line));
- assert(src != NULL);
- memset(src, 0, sizeof(struct source_line));
-
- if (getline(&src->line, &dummy, file) < 0)
- break;
- if (!src->line)
- break;
-
- c = strchr(src->line, '\n');
- if (c)
- *c = 0;
-
- src->next = NULL;
- *lines_tail = src;
- lines_tail = &src->next;
-
- if (strlen(src->line)>8 && src->line[8] == ':')
- src->EIP = strtoull(src->line, NULL, 16);
- if (strlen(src->line)>8 && src->line[16] == ':')
- src->EIP = strtoull(src->line, NULL, 16);
- }
- pclose(file);
-}
-
-static void record_precise_ip(uint64_t ip)
-{
- struct source_line *line;
-
- for (line = lines; line; line = line->next) {
- if (line->EIP == ip)
- line->count++;
- if (line->EIP > ip)


- break;
- }
-}
-

-static void lookup_sym_in_vmlinux(struct sym_entry *sym)
-{
- struct source_line *line;
- char pattern[PATH_MAX];
- sprintf(pattern, "<%s>:", sym->sym);
-
- for (line = lines; line; line = line->next) {
- if (strstr(line->line, pattern)) {
- sym->source = line;


- break;
- }
- }
-}

-
-static void show_lines(struct source_line *line_queue, int line_queue_count)
-{
- int i;
- struct source_line *line;
-
- line = line_queue;
- for (i = 0; i < line_queue_count; i++) {
- printf("%8li\t%s\n", line->count, line->line);
- line = line->next;
- }
-}
-
#define TRACE_COUNT 3

-static void show_details(struct sym_entry *sym)
-{
- struct source_line *line;
- struct source_line *line_queue = NULL;
- int displayed = 0;
- int line_queue_count = 0;
-
- if (!sym->source)
- lookup_sym_in_vmlinux(sym);
- if (!sym->source)
- return;
-
- printf("Showing details for %s\n", sym->sym);
-
- line = sym->source;
- while (line) {
- if (displayed && strstr(line->line, ">:"))
- break;
-
- if (!line_queue_count)
- line_queue = line;
- line_queue_count ++;
-
- if (line->count >= count_filter) {
- show_lines(line_queue, line_queue_count);
- line_queue_count = 0;
- line_queue = NULL;
- } else if (line_queue_count > TRACE_COUNT) {
- line_queue = line_queue->next;
- line_queue_count --;
- }
-
- line->count = 0;
- displayed++;
- if (displayed > 300)
- break;
- line = line->next;
- }
-}
-
/*
* Binary search in the histogram table and record the hit:
*/
@@ -764,8 +431,6 @@ static void record_ip(uint64_t ip, int counter)
int left_idx, middle_idx, right_idx, idx;
unsigned long left, middle, right;

- record_precise_ip(ip);
-
left_idx = 0;
right_idx = sym_table_count-1;
assert(ip <= max_ip && ip >= min_ip);
@@ -822,97 +487,6 @@ static void process_event(uint64_t ip, int counter)
record_ip(ip, counter);


}

-static void process_options(int argc, char **argv)

-{


- int error = 0, counter;
-
- for (;;) {
- int option_index = 0;
- /** Options for getopt */
- static struct option long_options[] = {
- {"count", required_argument, NULL, 'c'},
- {"cpu", required_argument, NULL, 'C'},
- {"delay", required_argument, NULL, 'd'},
- {"dump_symtab", no_argument, NULL, 'D'},
- {"event", required_argument, NULL, 'e'},
- {"filter", required_argument, NULL, 'f'},
- {"group", required_argument, NULL, 'g'},
- {"help", no_argument, NULL, 'h'},
- {"nmi", required_argument, NULL, 'n'},

- {"mmap_info", no_argument, NULL, 'M'},
- {"mmap_pages", required_argument, NULL, 'm'},


- {"munmap_info", no_argument, NULL, 'U'},
- {"pid", required_argument, NULL, 'p'},
- {"realtime", required_argument, NULL, 'r'},
- {"scale", no_argument, NULL, 'l'},
- {"symbol", required_argument, NULL, 's'},
- {"stat", no_argument, NULL, 'S'},
- {"vmlinux", required_argument, NULL, 'x'},
- {"zero", no_argument, NULL, 'z'},

- {"freq", required_argument, NULL, 'F'},


- {NULL, 0, NULL, 0 }
- };

- int c = getopt_long(argc, argv, "+:ac:C:d:De:f:g:hln:m:p:r:s:Sx:zMUF:",


- long_options, &option_index);
- if (c == -1)
- break;
-
- switch (c) {
- case 'a': system_wide = 1; break;
- case 'c': default_interval = atoi(optarg); break;
- case 'C':
- /* CPU and PID are mutually exclusive */
- if (tid != -1) {
- printf("WARNING: CPU switch overriding PID\n");
- sleep(1);
- tid = -1;
- }
- profile_cpu = atoi(optarg); break;

- case 'd': delay_secs = atoi(optarg); break;
- case 'D': dump_symtab = 1; break;
-


- case 'e': error = parse_events(optarg); break;
-

- case 'f': count_filter = atoi(optarg); break;


- case 'g': group = atoi(optarg); break;
- case 'h': display_help(); break;
- case 'l': scale = 1; break;
- case 'n': nmi = atoi(optarg); break;
- case 'p':
- /* CPU and PID are mutually exclusive */
- if (profile_cpu != -1) {
- printf("WARNING: PID switch overriding CPU\n");
- sleep(1);
- profile_cpu = -1;
- }
- tid = atoi(optarg); break;

- case 'r': realtime_prio = atoi(optarg); break;
- case 's': sym_filter = strdup(optarg); break;
- case 'x': vmlinux = strdup(optarg); break;


- case 'z': zero = 1; break;

- case 'm': mmap_pages = atoi(optarg); break;
- case 'M': use_mmap = 1; break;
- case 'U': use_munmap = 1; break;
- case 'F': freq = 1; default_interval = atoi(optarg); break;


- default: error = 1; break;
- }
- }
- if (error)
- display_help();

-
- if (!nr_counters) {
- nr_counters = 1;
- event_id[0] = 0;
- }
-
- for (counter = 0; counter < nr_counters; counter++) {
- if (event_count[counter])
- continue;
-

- event_count[counter] = default_interval;
- }
-}


-
struct mmap_data {
int counter;
void *base;

@@ -973,11 +547,11 @@ static void mmap_read(struct mmap_data *md)
struct ip_event {
struct perf_event_header header;
__u64 ip;
- __u32 pid, tid;
+ __u32 pid, target_pid;
};
struct mmap_event {
struct perf_event_header header;
- __u32 pid, tid;
+ __u32 pid, target_pid;
__u64 start;
__u64 len;
__u64 pgoff;
@@ -1043,7 +617,7 @@ static void mmap_read(struct mmap_data *md)


static struct pollfd event_array[MAX_NR_CPUS * MAX_COUNTERS];

static struct mmap_data mmap_array[MAX_NR_CPUS][MAX_COUNTERS];

-int cmd_top(int argc, char **argv, const char *prefix)
+static int __cmd_top(void)
{


struct perf_counter_hw_event hw_event;
pthread_t thread;

@@ -1051,27 +625,12 @@ int cmd_top(int argc, char **argv, const char *prefix)
unsigned int cpu;
int ret;



- page_size = sysconf(_SC_PAGE_SIZE);
-
- process_options(argc, argv);
-

- nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
- assert(nr_cpus <= MAX_NR_CPUS);
- assert(nr_cpus >= 0);
-
- if (tid != -1 || profile_cpu != -1)
- nr_cpus = 1;
-
- parse_symbols();
- if (vmlinux && sym_filter_entry)
- parse_vmlinux(vmlinux);
-
for (i = 0; i < nr_cpus; i++) {
group_fd = -1;
for (counter = 0; counter < nr_counters; counter++) {

cpu = profile_cpu;
- if (tid == -1 && profile_cpu == -1)
+ if (target_pid == -1 && profile_cpu == -1)
cpu = i;

memset(&hw_event, 0, sizeof(hw_event));
@@ -1083,7 +642,7 @@ int cmd_top(int argc, char **argv, const char *prefix)
hw_event.munmap = use_munmap;
hw_event.freq = freq;

- fd[i][counter] = sys_perf_counter_open(&hw_event, tid, cpu, group_fd, 0);
+ fd[i][counter] = sys_perf_counter_open(&hw_event, target_pid, cpu, group_fd, 0);
if (fd[i][counter] < 0) {
int err = errno;
printf("kerneltop error: syscall returned with %d (%s)\n",
@@ -1147,3 +706,95 @@ int cmd_top(int argc, char **argv, const char *prefix)

return 0;
}
+
+static const char * const top_usage[] = {
+ "perf top [<options>]",


+ NULL
+};
+
+static char events_help_msg[EVENTS_HELP_MAX];
+
+static const struct option options[] = {
+ OPT_CALLBACK('e', "event", NULL, "event",
+ events_help_msg, parse_events),
+ OPT_INTEGER('c', "count", &default_interval,
+ "event period to sample"),

+ OPT_INTEGER('p', "pid", &target_pid,

+ "profile events on existing pid"),


+ OPT_BOOLEAN('a', "all-cpus", &system_wide,
+ "system-wide collection from all CPUs"),

+ OPT_INTEGER('C', "CPU", &profile_cpu,
+ "CPU to profile on"),


+ OPT_INTEGER('m', "mmap-pages", &mmap_pages,
+ "number of mmap data pages"),

+ OPT_INTEGER('r', "realtime", &realtime_prio,
+ "collect data with this RT SCHED_FIFO priority"),

+ OPT_INTEGER('d', "delay", &realtime_prio,
+ "number of seconds to delay between refreshes"),
+ OPT_BOOLEAN('D', "dump-symtab", &dump_symtab,
+ "dump the symbol table used for profiling"),
+ OPT_INTEGER('f', "--count-filter", &count_filter,
+ "only display functions with more events than this"),
+ OPT_BOOLEAN('g', "group", &group,
+ "put the counters into a counter group"),
+ OPT_STRING('s', "sym-filter", &sym_filter, "pattern",
+ "only display symbols matchig this pattern"),
+ OPT_BOOLEAN('z', "zero", &group,
+ "zero history across updates"),
+ OPT_BOOLEAN('M', "use-mmap", &use_mmap,
+ "track mmap events"),
+ OPT_BOOLEAN('U', "use-munmap", &use_munmap,
+ "track munmap events"),
+ OPT_INTEGER('F', "--freq", &freq,
+ "profile at this frequency"),
+ OPT_END()
+};
+
+int cmd_top(int argc, const char **argv, const char *prefix)


+{
+ int counter;
+
+ page_size = sysconf(_SC_PAGE_SIZE);
+
+ create_events_help(events_help_msg);
+ memcpy(event_id, default_event_id, sizeof(default_event_id));
+

+ argc = parse_options(argc, argv, options, top_usage, 0);
+ if (argc)
+ usage_with_options(top_usage, options);
+
+ if (freq) {
+ default_interval = freq;
+ freq = 1;
+ }
+
+ /* CPU and PID are mutually exclusive */
+ if (target_pid != -1 && profile_cpu != -1) {
+ printf("WARNING: PID switch overriding CPU\n");
+ sleep(1);
+ profile_cpu = -1;
+ }


+
+ if (!nr_counters) {
+ nr_counters = 1;
+ event_id[0] = 0;
+ }
+
+ for (counter = 0; counter < nr_counters; counter++) {
+ if (event_count[counter])
+ continue;
+
+ event_count[counter] = default_interval;
+ }
+

+ nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
+ assert(nr_cpus <= MAX_NR_CPUS);
+ assert(nr_cpus >= 0);
+
+ if (target_pid != -1 || profile_cpu != -1)
+ nr_cpus = 1;
+
+ parse_symbols();
+
+ return __cmd_top();
+}

tip-bot for Ingo Molnar

unread,
May 26, 2009, 8:20:08 AM5/26/09
to
Commit-ID: 0bec253c813fbb067db4dfd9f5b6cec1bd2ef026
Gitweb: http://git.kernel.org/tip/0bec253c813fbb067db4dfd9f5b6cec1bd2ef026

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 09:17:18 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 13:11:57 +0200

perf report: Add help/manpage

Add a (minimal) manpage for perf report.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
.../perf_counter/Documentation/perf-report.txt | 32 ++++++++++++++++++++
1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/Documentation/perf_counter/Documentation/perf-report.txt b/Documentation/perf_counter/Documentation/perf-report.txt
new file mode 100644
index 0000000..64696a2
--- /dev/null
+++ b/Documentation/perf_counter/Documentation/perf-report.txt
@@ -0,0 +1,32 @@
+perf-report(1)
+==========
+
+NAME
+----
+perf-report - Read output.perf (created by perf record) and display the profile
+
+SYNOPSIS
+--------
+[verse]
+'perf report' [-i <file> | --input=file]
+
+DESCRIPTION
+-----------
+This command displays the performance counter profile information recorded
+via perf report.
+
+OPTIONS
+-------
+-i::
+--input=::
+ Input file name. (default: output.perf)
+
+Configuration
+-------------
+
+EXAMPLES
+--------
+
+SEE ALSO
+--------
+linkperf:perf-stat[1]

tip-bot for Arnaldo Carvalho de Melo

unread,
May 26, 2009, 8:20:11 AM5/26/09
to
Commit-ID: ce7e43653b08db094326f378958bc293a68e8e5b
Gitweb: http://git.kernel.org/tip/ce7e43653b08db094326f378958bc293a68e8e5b
Author: Arnaldo Carvalho de Melo <ac...@redhat.com>
AuthorDate: Tue, 19 May 2009 09:30:23 -0300
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 13:52:55 +0200

perf_counter: Use rb_tree for symhists and threads in report

Signed-off-by: Arnaldo Carvalho de Melo <ac...@redhat.com>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>

Cc: Thomas Gleixner <tg...@linutronix.de>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 178 +++++++++++---------------
1 files changed, 75 insertions(+), 103 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index f63057f..e857201 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -479,23 +479,25 @@ static size_t map__fprintf(struct map *self, FILE *fp)
}

struct symhist {
- struct list_head node;
+ struct rb_node rb_node;
struct dso *dso;
struct symbol *sym;
+ uint64_t ip;
uint32_t count;
char level;
};

-static struct symhist *symhist__new(struct symbol *sym, struct dso *dso,
- char level)
+static struct symhist *symhist__new(struct symbol *sym, uint64_t ip,
+ struct dso *dso, char level)
{
struct symhist *self = malloc(sizeof(*self));

if (self != NULL) {
self->sym = sym;
+ self->ip = ip;
self->dso = dso;
self->level = level;
- self->count = 0;
+ self->count = 1;
}

return self;
@@ -506,12 +508,6 @@ static void symhist__delete(struct symhist *self)
free(self);
}

-static bool symhist__equal(struct symhist *self, struct symbol *sym,
- struct dso *dso, char level)
-{
- return self->level == level && self->sym == sym && self->dso == dso;
-}
-
static void symhist__inc(struct symhist *self)
{
++self->count;
@@ -519,7 +515,7 @@ static void symhist__inc(struct symhist *self)

static size_t symhist__fprintf(struct symhist *self, FILE *fp)
{
- size_t ret = fprintf(fp, "[%c] ", self->level);
+ size_t ret = fprintf(fp, "%#llx [%c] ", (unsigned long long)self->ip, self->level);

if (self->level != '.')
ret += fprintf(fp, "%s", self->sym->name);
@@ -531,9 +527,9 @@ static size_t symhist__fprintf(struct symhist *self, FILE *fp)
}

struct thread {
- struct list_head node;
+ struct rb_node rb_node;
struct list_head maps;
- struct list_head symhists;
+ struct rb_root symhists;
pid_t pid;
char *comm;
};
@@ -546,47 +542,43 @@ static struct thread *thread__new(pid_t pid)
self->pid = pid;
self->comm = NULL;
INIT_LIST_HEAD(&self->maps);
- INIT_LIST_HEAD(&self->symhists);
+ self->symhists = RB_ROOT;
}

return self;
}

-static void thread__insert_symhist(struct thread *self,
- struct symhist *symhist)
-{
- list_add_tail(&symhist->node, &self->symhists);
-}
-
-static struct symhist *thread__symhists_find(struct thread *self,
- struct symbol *sym,
- struct dso *dso, char level)
+static int thread__symbol_incnew(struct thread *self, struct symbol *sym,
+ uint64_t ip, struct dso *dso, char level)
{
- struct symhist *pos;
+ struct rb_node **p = &self->symhists.rb_node;
+ struct rb_node *parent = NULL;
+ struct symhist *sh;

- list_for_each_entry(pos, &self->symhists, node)
- if (symhist__equal(pos, sym, dso, level))
- return pos;
+ while (*p != NULL) {
+ parent = *p;
+ sh = rb_entry(parent, struct symhist, rb_node);

- return NULL;
-}
+ if (sh->sym == sym || ip == sh->ip) {
+ symhist__inc(sh);
+ return 0;
+ }

-static int thread__symbol_incnew(struct thread *self, struct symbol *sym,
- struct dso *dso, char level)
-{
- struct symhist *symhist = thread__symhists_find(self, sym, dso, level);
+ /* Handle unresolved symbols too */
+ const uint64_t start = !sh->sym ? sh->ip : sh->sym->start;

- if (symhist == NULL) {
- symhist = symhist__new(sym, dso, level);
- if (symhist == NULL)
- goto out_error;
- thread__insert_symhist(self, symhist);
+ if (ip < start)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
}

- symhist__inc(symhist);
+ sh = symhist__new(sym, ip, dso, level);
+ if (sh == NULL)
+ return -ENOMEM;
+ rb_link_node(&sh->rb_node, parent, p);
+ rb_insert_color(&sh->rb_node, &self->symhists);
return 0;
-out_error:
- return -ENOMEM;
}

static int thread__set_comm(struct thread *self, const char *comm)
@@ -608,43 +600,44 @@ static size_t thread__maps_fprintf(struct thread *self, FILE *fp)

static size_t thread__fprintf(struct thread *self, FILE *fp)
{
- struct symhist *pos;
int ret = fprintf(fp, "thread: %d %s\n", self->pid, self->comm);
+ struct rb_node *nd;

- list_for_each_entry(pos, &self->symhists, node)
+ for (nd = rb_first(&self->symhists); nd; nd = rb_next(nd)) {
+ struct symhist *pos = rb_entry(nd, struct symhist, rb_node);
ret += symhist__fprintf(pos, fp);
+ }

return ret;
}

-static LIST_HEAD(threads);
+static struct rb_root threads = RB_ROOT;

-static void threads__add(struct thread *thread)
-{
- list_add_tail(&thread->node, &threads);
-}
-
-static struct thread *threads__find(pid_t pid)
+static struct thread *threads__findnew(pid_t pid)
{
- struct thread *pos;
+ struct rb_node **p = &threads.rb_node;
+ struct rb_node *parent = NULL;
+ struct thread *th;

- list_for_each_entry(pos, &threads, node)
- if (pos->pid == pid)
- return pos;
- return NULL;
-}
+ while (*p != NULL) {
+ parent = *p;
+ th = rb_entry(parent, struct thread, rb_node);

-static struct thread *threads__findnew(pid_t pid)
-{
- struct thread *thread = threads__find(pid);
+ if (th->pid == pid)
+ return th;

- if (thread == NULL) {
- thread = thread__new(pid);
- if (thread != NULL)
- threads__add(thread);
+ if (pid < th->pid)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
}

- return thread;
+ th = thread__new(pid);
+ if (th != NULL) {
+ rb_link_node(&th->rb_node, parent, p);
+ rb_insert_color(&th->rb_node, &threads);
+ }
+ return th;
}

static void thread__insert_map(struct thread *self, struct map *map)
@@ -668,44 +661,13 @@ static struct map *thread__find_map(struct thread *self, uint64_t ip)

static void threads__fprintf(FILE *fp)
{
- struct thread *pos;
-
- list_for_each_entry(pos, &threads, node)
+ struct rb_node *nd;
+ for (nd = rb_first(&threads); nd; nd = rb_next(nd)) {
+ struct thread *pos = rb_entry(nd, struct thread, rb_node);
thread__fprintf(pos, fp);
+ }
}

-#if 0
-static std::string resolve_user_symbol(int pid, uint64_t ip)
-{
- std::string sym = "<unknown>";
-
- maps_t &m = maps[pid];
- maps_t::const_iterator mi = m.upper_bound(map(ip));
- if (mi == m.end())
- return sym;
-
- ip -= mi->start + mi->pgoff;
-
- symbols_t &s = dsos[mi->dso].syms;
- symbols_t::const_iterator si = s.upper_bound(symbol(ip));
-
- sym = mi->dso + ": <unknown>";
-
- if (si == s.begin())
- return sym;
- si--;
-
- if (si->start <= ip && ip < si->end)
- sym = mi->dso + ": " + si->name;
-#if 0
- else if (si->start <= ip)
- sym = mi->dso + ": ?" + si->name;
-#endif
-
- return sym;
-}
-#endif
-
static void display_help(void)
{
printf(
@@ -824,8 +786,11 @@ more:
struct dso *dso = NULL;
struct thread *thread = threads__findnew(event->ip.pid);

- if (thread == NULL)
+ if (thread == NULL) {
+ fprintf(stderr, "problem processing %d event, bailing out\n",
+ event->header.type);
goto done;
+ }

if (event->header.misc & PERF_EVENT_MISC_KERNEL) {
show = SHOW_KERNEL;
@@ -845,8 +810,11 @@ more:
if (show & show_mask) {
struct symbol *sym = dso__find_symbol(dso, event->ip.ip);

- if (thread__symbol_incnew(thread, sym, dso, level))
+ if (thread__symbol_incnew(thread, sym, event->ip.ip,
+ dso, level)) {
+ fprintf(stderr, "problem incrementing symbol count, bailing out\n");
goto done;
+ }
}
total++;
} else switch (event->header.type) {
@@ -854,8 +822,10 @@ more:
struct thread *thread = threads__findnew(event->mmap.pid);
struct map *map = map__new(&event->mmap);

- if (thread == NULL || map == NULL )
+ if (thread == NULL || map == NULL) {
+ fprintf(stderr, "problem processing PERF_EVENT_MMAP, bailing out\n");
goto done;
+ }
thread__insert_map(thread, map);
break;
}
@@ -863,8 +833,10 @@ more:
struct thread *thread = threads__findnew(event->comm.pid);

if (thread == NULL ||
- thread__set_comm(thread, event->comm.comm))
+ thread__set_comm(thread, event->comm.comm)) {
+ fprintf(stderr, "problem processing PERF_EVENT_COMM, bailing out\n");
goto done;
+ }
break;

tip-bot for Arnaldo Carvalho de Melo

unread,
May 26, 2009, 8:20:13 AM5/26/09
to
Commit-ID: 040e6034124c504d536736ce08e4643e640cd7c2
Gitweb: http://git.kernel.org/tip/040e6034124c504d536736ce08e4643e640cd7c2

Author: Arnaldo Carvalho de Melo <ac...@redhat.com>
AuthorDate: Mon, 18 May 2009 16:25:31 -0300

Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 13:52:55 +0200

perf_counter: Add our private copy of list.h

Signed-off-by: Arnaldo Carvalho de Melo <ac...@redhat.com>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
.../perf_counter/util}/list.h | 238 ++++++--------------
1 files changed, 73 insertions(+), 165 deletions(-)

diff --git a/include/linux/list.h b/Documentation/perf_counter/util/list.h
similarity index 74%
copy from include/linux/list.h
copy to Documentation/perf_counter/util/list.h
index 969f6e9..e2548e8 100644
--- a/include/linux/list.h
+++ b/Documentation/perf_counter/util/list.h
@@ -1,10 +1,33 @@
#ifndef _LINUX_LIST_H
#define _LINUX_LIST_H
+/*
+ Copyright (C) Cast of dozens, comes from the Linux kernel
+
+ This program is free software; you can redistribute it and/or modify it
+ under the terms of version 2 of the GNU General Public License as
+ published by the Free Software Foundation.
+*/
+
+#include <stddef.h>
+
+/*
+ * These are non-NULL pointers that will result in page faults
+ * under normal circumstances, used to verify that nobody uses
+ * non-initialized list entries.
+ */
+#define LIST_POISON1 ((void *)0x00100100)
+#define LIST_POISON2 ((void *)0x00200200)

-#include <linux/stddef.h>
-#include <linux/poison.h>
-#include <linux/prefetch.h>
-#include <asm/system.h>
+/**
+ * container_of - cast a member of a structure out to the containing structure
+ * @ptr: the pointer to the member.
+ * @type: the type of the container struct this is embedded in.
+ * @member: the name of the member within the struct.
+ *
+ */
+#define container_of(ptr, type, member) ({ \
+ const typeof( ((type *)0)->member ) *__mptr = (ptr); \
+ (type *)( (char *)__mptr - offsetof(type,member) );})

/*
* Simple doubly linked list implementation.
@@ -37,7 +60,6 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
* This is only for internal list manipulation where we know
* the prev/next entries already!
*/
-#ifndef CONFIG_DEBUG_LIST
static inline void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
@@ -47,11 +69,6 @@ static inline void __list_add(struct list_head *new,
new->prev = prev;
prev->next = new;
}
-#else
-extern void __list_add(struct list_head *new,
- struct list_head *prev,
- struct list_head *next);
-#endif

/**
* list_add - add a new entry
@@ -66,7 +83,6 @@ static inline void list_add(struct list_head *new, struct list_head *head)
__list_add(new, head, head->next);
}

-
/**
* list_add_tail - add a new entry
* @new: new entry to be added
@@ -96,26 +112,35 @@ static inline void __list_del(struct list_head * prev, struct list_head * next)
/**
* list_del - deletes entry from list.
* @entry: the element to delete from the list.
- * Note: list_empty() on entry does not return true after this, the entry is
+ * Note: list_empty on entry does not return true after this, the entry is
* in an undefined state.
*/
-#ifndef CONFIG_DEBUG_LIST
static inline void list_del(struct list_head *entry)
{
__list_del(entry->prev, entry->next);
entry->next = LIST_POISON1;
entry->prev = LIST_POISON2;
}
-#else
-extern void list_del(struct list_head *entry);
-#endif
+
+/**
+ * list_del_range - deletes range of entries from list.
+ * @beging: first element in the range to delete from the list.
+ * @beging: first element in the range to delete from the list.
+ * Note: list_empty on the range of entries does not return true after this,
+ * the entries is in an undefined state.
+ */
+static inline void list_del_range(struct list_head *begin,
+ struct list_head *end)
+{
+ begin->prev->next = end->next;
+ end->next->prev = begin->prev;
+}

/**
* list_replace - replace old entry by new one
* @old : the element to be replaced
* @new : the new element to insert
- *
- * If @old was empty, it will be overwritten.
+ * Note: if 'old' was empty, it will be overwritten.
*/
static inline void list_replace(struct list_head *old,
struct list_head *new)
@@ -150,8 +175,8 @@ static inline void list_del_init(struct list_head *entry)
*/
static inline void list_move(struct list_head *list, struct list_head *head)
{
- __list_del(list->prev, list->next);
- list_add(list, head);
+ __list_del(list->prev, list->next);
+ list_add(list, head);
}

/**
@@ -162,8 +187,8 @@ static inline void list_move(struct list_head *list, struct list_head *head)
static inline void list_move_tail(struct list_head *list,
struct list_head *head)
{
- __list_del(list->prev, list->next);
- list_add_tail(list, head);
+ __list_del(list->prev, list->next);
+ list_add_tail(list, head);
}

/**
@@ -205,91 +230,29 @@ static inline int list_empty_careful(const struct list_head *head)
return (next == head) && (next == head->prev);
}

-/**
- * list_is_singular - tests whether a list has just one entry.
- * @head: the list to test.
- */
-static inline int list_is_singular(const struct list_head *head)
-{
- return !list_empty(head) && (head->next == head->prev);
-}
-
-static inline void __list_cut_position(struct list_head *list,
- struct list_head *head, struct list_head *entry)
-{
- struct list_head *new_first = entry->next;
- list->next = head->next;
- list->next->prev = list;
- list->prev = entry;
- entry->next = list;
- head->next = new_first;
- new_first->prev = head;
-}
-
-/**
- * list_cut_position - cut a list into two
- * @list: a new list to add all removed entries
- * @head: a list with entries
- * @entry: an entry within head, could be the head itself
- * and if so we won't cut the list
- *
- * This helper moves the initial part of @head, up to and
- * including @entry, from @head to @list. You should
- * pass on @entry an element you know is on @head. @list
- * should be an empty list or a list you do not care about
- * losing its data.
- *
- */
-static inline void list_cut_position(struct list_head *list,
- struct list_head *head, struct list_head *entry)
-{
- if (list_empty(head))
- return;
- if (list_is_singular(head) &&
- (head->next != entry && head != entry))
- return;
- if (entry == head)
- INIT_LIST_HEAD(list);
- else
- __list_cut_position(list, head, entry);
-}
-
-static inline void __list_splice(const struct list_head *list,
- struct list_head *prev,
- struct list_head *next)
+static inline void __list_splice(struct list_head *list,
+ struct list_head *head)
{
struct list_head *first = list->next;
struct list_head *last = list->prev;
+ struct list_head *at = head->next;

- first->prev = prev;
- prev->next = first;
-
- last->next = next;
- next->prev = last;
-}
+ first->prev = head;
+ head->next = first;

-/**
- * list_splice - join two lists, this is designed for stacks
- * @list: the new list to add.
- * @head: the place to add it in the first list.
- */
-static inline void list_splice(const struct list_head *list,
- struct list_head *head)
-{
- if (!list_empty(list))
- __list_splice(list, head, head->next);
+ last->next = at;
+ at->prev = last;
}

/**
- * list_splice_tail - join two lists, each list being a queue
+ * list_splice - join two lists
* @list: the new list to add.
* @head: the place to add it in the first list.
*/
-static inline void list_splice_tail(struct list_head *list,
- struct list_head *head)
+static inline void list_splice(struct list_head *list, struct list_head *head)
{
if (!list_empty(list))
- __list_splice(list, head->prev, head);
+ __list_splice(list, head);
}

/**
@@ -303,24 +266,7 @@ static inline void list_splice_init(struct list_head *list,
struct list_head *head)
{
if (!list_empty(list)) {
- __list_splice(list, head, head->next);
- INIT_LIST_HEAD(list);
- }
-}
-
-/**
- * list_splice_tail_init - join two lists and reinitialise the emptied list
- * @list: the new list to add.
- * @head: the place to add it in the first list.
- *
- * Each of the lists is a queue.
- * The list at @list is reinitialised
- */
-static inline void list_splice_tail_init(struct list_head *list,
- struct list_head *head)
-{
- if (!list_empty(list)) {
- __list_splice(list, head->prev, head);
+ __list_splice(list, head);
INIT_LIST_HEAD(list);
}
}
@@ -336,9 +282,9 @@ static inline void list_splice_tail_init(struct list_head *list,

/**
* list_first_entry - get the first element from a list
- * @ptr: the list head to take the element from.
- * @type: the type of the struct this is embedded in.
- * @member: the name of the list_struct within the struct.
+ * @ptr: the list head to take the element from.
+ * @type: the type of the struct this is embedded in.
+ * @member: the name of the list_struct within the struct.
*
* Note, that list is expected to be not empty.
*/
@@ -351,7 +297,7 @@ static inline void list_splice_tail_init(struct list_head *list,
* @head: the head for your list.
*/
#define list_for_each(pos, head) \
- for (pos = (head)->next; prefetch(pos->next), pos != (head); \
+ for (pos = (head)->next; pos != (head); \
pos = pos->next)

/**
@@ -373,7 +319,7 @@ static inline void list_splice_tail_init(struct list_head *list,
* @head: the head for your list.
*/
#define list_for_each_prev(pos, head) \
- for (pos = (head)->prev; prefetch(pos->prev), pos != (head); \
+ for (pos = (head)->prev; pos != (head); \
pos = pos->prev)

/**
@@ -387,17 +333,6 @@ static inline void list_splice_tail_init(struct list_head *list,
pos = n, n = pos->next)

/**
- * list_for_each_prev_safe - iterate over a list backwards safe against removal of list entry
- * @pos: the &struct list_head to use as a loop cursor.
- * @n: another &struct list_head to use as temporary storage
- * @head: the head for your list.
- */
-#define list_for_each_prev_safe(pos, n, head) \
- for (pos = (head)->prev, n = pos->prev; \
- prefetch(pos->prev), pos != (head); \
- pos = n, n = pos->prev)
-
-/**
* list_for_each_entry - iterate over list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
@@ -405,7 +340,7 @@ static inline void list_splice_tail_init(struct list_head *list,
*/
#define list_for_each_entry(pos, head, member) \
for (pos = list_entry((head)->next, typeof(*pos), member); \
- prefetch(pos->member.next), &pos->member != (head); \
+ &pos->member != (head); \
pos = list_entry(pos->member.next, typeof(*pos), member))

/**
@@ -416,16 +351,16 @@ static inline void list_splice_tail_init(struct list_head *list,
*/
#define list_for_each_entry_reverse(pos, head, member) \
for (pos = list_entry((head)->prev, typeof(*pos), member); \
- prefetch(pos->member.prev), &pos->member != (head); \
+ &pos->member != (head); \
pos = list_entry(pos->member.prev, typeof(*pos), member))

/**
- * list_prepare_entry - prepare a pos entry for use in list_for_each_entry_continue()
+ * list_prepare_entry - prepare a pos entry for use in list_for_each_entry_continue
* @pos: the type * to use as a start point
* @head: the head of the list
* @member: the name of the list_struct within the struct.
*
- * Prepares a pos entry for use as a start point in list_for_each_entry_continue().
+ * Prepares a pos entry for use as a start point in list_for_each_entry_continue.
*/
#define list_prepare_entry(pos, head, member) \
((pos) ? : list_entry(head, typeof(*pos), member))
@@ -441,24 +376,10 @@ static inline void list_splice_tail_init(struct list_head *list,
*/
#define list_for_each_entry_continue(pos, head, member) \
for (pos = list_entry(pos->member.next, typeof(*pos), member); \
- prefetch(pos->member.next), &pos->member != (head); \
+ &pos->member != (head); \
pos = list_entry(pos->member.next, typeof(*pos), member))

/**
- * list_for_each_entry_continue_reverse - iterate backwards from the given point
- * @pos: the type * to use as a loop cursor.
- * @head: the head for your list.
- * @member: the name of the list_struct within the struct.
- *
- * Start to iterate over list of given type backwards, continuing after
- * the current position.
- */
-#define list_for_each_entry_continue_reverse(pos, head, member) \
- for (pos = list_entry(pos->member.prev, typeof(*pos), member); \
- prefetch(pos->member.prev), &pos->member != (head); \
- pos = list_entry(pos->member.prev, typeof(*pos), member))
-
-/**
* list_for_each_entry_from - iterate over list of given type from the current point
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.
@@ -467,7 +388,7 @@ static inline void list_splice_tail_init(struct list_head *list,
* Iterate over list of given type, continuing from current position.
*/
#define list_for_each_entry_from(pos, head, member) \
- for (; prefetch(pos->member.next), &pos->member != (head); \
+ for (; &pos->member != (head); \
pos = list_entry(pos->member.next, typeof(*pos), member))

/**
@@ -619,23 +540,10 @@ static inline void hlist_add_after(struct hlist_node *n,
next->next->pprev = &next->next;
}

-/*
- * Move a list from one list head to another. Fixup the pprev
- * reference of the first entry if it exists.
- */
-static inline void hlist_move_list(struct hlist_head *old,
- struct hlist_head *new)
-{
- new->first = old->first;
- if (new->first)
- new->first->pprev = &new->first;
- old->first = NULL;
-}
-
#define hlist_entry(ptr, type, member) container_of(ptr,type,member)

#define hlist_for_each(pos, head) \
- for (pos = (head)->first; pos && ({ prefetch(pos->next); 1; }); \
+ for (pos = (head)->first; pos; \
pos = pos->next)

#define hlist_for_each_safe(pos, n, head) \
@@ -651,7 +559,7 @@ static inline void hlist_move_list(struct hlist_head *old,
*/
#define hlist_for_each_entry(tpos, pos, head, member) \
for (pos = (head)->first; \
- pos && ({ prefetch(pos->next); 1;}) && \
+ pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
pos = pos->next)

@@ -663,7 +571,7 @@ static inline void hlist_move_list(struct hlist_head *old,
*/
#define hlist_for_each_entry_continue(tpos, pos, member) \
for (pos = (pos)->next; \
- pos && ({ prefetch(pos->next); 1;}) && \
+ pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
pos = pos->next)

@@ -674,7 +582,7 @@ static inline void hlist_move_list(struct hlist_head *old,
* @member: the name of the hlist_node within the struct.
*/
#define hlist_for_each_entry_from(tpos, pos, member) \
- for (; pos && ({ prefetch(pos->next); 1;}) && \
+ for (; pos && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
pos = pos->next)

tip-bot for Arnaldo Carvalho de Melo

unread,
May 26, 2009, 8:20:15 AM5/26/09
to
Commit-ID: 62eb93905b3b43cea407cfbc061cc7b40ae1c6e9
Gitweb: http://git.kernel.org/tip/62eb93905b3b43cea407cfbc061cc7b40ae1c6e9

Author: Arnaldo Carvalho de Melo <ac...@redhat.com>
AuthorDate: Mon, 18 May 2009 14:28:47 -0300
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 13:52:53 +0200

perf_counter: Implement dso__load using libelf

Signed-off-by: Arnaldo Carvalho de Melo <ac...@redhat.com>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/Makefile | 2 +-
Documentation/perf_counter/builtin-report.c | 122 ++++++++++++++++++++++++++-
2 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/Documentation/perf_counter/Makefile b/Documentation/perf_counter/Makefile
index 49c601e..6bffa86 100644
--- a/Documentation/perf_counter/Makefile
+++ b/Documentation/perf_counter/Makefile
@@ -160,7 +160,7 @@ uname_V := $(shell sh -c 'uname -v 2>/dev/null || echo not')
# CFLAGS and LDFLAGS are for the users to override from the command line.

CFLAGS = -g -O2 -Wall
-LDFLAGS = -lpthread -lrt
+LDFLAGS = -lpthread -lrt -lelf
ALL_CFLAGS = $(CFLAGS)
ALL_LDFLAGS = $(LDFLAGS)
STRIP ?= strip
diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 864f68f..ad2f327 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -8,6 +8,9 @@


#include <stdlib.h>
#include <string.h>
#include <limits.h>

+#include <gelf.h>
+#include <elf.h>
+#include <libelf.h>
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
@@ -195,10 +198,123 @@ static struct symbol *dso__find_symbol(struct dso *self, uint64_t ip)
return NULL;
}

+/**
+ * elf_symtab__for_each_symbol - iterate thru all the symbols
+ *
+ * @self: struct elf_symtab instance to iterate
+ * @index: uint32_t index
+ * @sym: GElf_Sym iterator
+ */
+#define elf_symtab__for_each_symbol(syms, nr_syms, index, sym) \
+ for (index = 0, gelf_getsym(syms, index, &sym);\
+ index < nr_syms; \
+ index++, gelf_getsym(syms, index, &sym))
+
+static inline uint8_t elf_sym__type(const GElf_Sym *sym)
+{
+ return GELF_ST_TYPE(sym->st_info);
+}
+
+static inline bool elf_sym__is_function(const GElf_Sym *sym)
+{
+ return elf_sym__type(sym) == STT_FUNC &&
+ sym->st_name != 0 &&
+ sym->st_shndx != SHN_UNDEF;
+}
+
+static inline const char *elf_sym__name(const GElf_Sym *sym,
+ const Elf_Data *symstrs)
+{
+ return symstrs->d_buf + sym->st_name;
+}
+
+static Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
+ GElf_Shdr *shp, const char *name,
+ size_t *index)
+{
+ Elf_Scn *sec = NULL;
+ size_t cnt = 1;
+
+ while ((sec = elf_nextscn(elf, sec)) != NULL) {
+ char *str;
+
+ gelf_getshdr(sec, shp);
+ str = elf_strptr(elf, ep->e_shstrndx, shp->sh_name);
+ if (!strcmp(name, str)) {
+ if (index)
+ *index = cnt;
+ break;
+ }
+ ++cnt;
+ }
+
+ return sec;
+}
+
static int dso__load(struct dso *self)
{
- /* FIXME */
- return 0;
+ int fd = open(self->name, O_RDONLY), err = -1;
+
+ if (fd == -1)
+ return -1;
+
+ Elf *elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
+ if (elf == NULL) {
+ fprintf(stderr, "%s: cannot read %s ELF file.\n",
+ __func__, self->name);
+ goto out_close;
+ }
+
+ GElf_Ehdr ehdr;
+ if (gelf_getehdr(elf, &ehdr) == NULL) {
+ fprintf(stderr, "%s: cannot get elf header.\n", __func__);
+ goto out_elf_end;
+ }
+
+ GElf_Shdr shdr;
+ Elf_Scn *sec = elf_section_by_name(elf, &ehdr, &shdr, ".symtab", NULL);
+ if (sec == NULL)
+ sec = elf_section_by_name(elf, &ehdr, &shdr, ".dynsym", NULL);
+
+ if (sec == NULL)
+ goto out_elf_end;
+
+ if (gelf_getshdr(sec, &shdr) == NULL)
+ goto out_elf_end;
+
+ Elf_Data *syms = elf_getdata(sec, NULL);
+ if (syms == NULL)
+ goto out_elf_end;
+
+ sec = elf_getscn(elf, shdr.sh_link);
+ if (sec == NULL)
+ goto out_elf_end;
+
+ Elf_Data *symstrs = elf_getdata(sec, NULL);
+ if (symstrs == NULL)
+ goto out_elf_end;
+
+ const uint32_t nr_syms = shdr.sh_size / shdr.sh_entsize;
+
+ GElf_Sym sym;
+ uint32_t index;
+ elf_symtab__for_each_symbol(syms, nr_syms, index, sym) {
+ if (!elf_sym__is_function(&sym))
+ continue;
+ struct symbol *f = symbol__new(sym.st_value, sym.st_size,
+ elf_sym__name(&sym, symstrs));
+ if (f == NULL)
+ goto out_elf_end;
+
+ dso__insert_symbol(self, f);
+ }
+
+ err = 0;
+out_elf_end:
+ elf_end(elf);
+out_close:
+ close(fd);
+ return err;
}

static size_t dso__fprintf(struct dso *self, FILE *fp)
@@ -614,6 +730,8 @@ int cmd_report(int argc, char **argv)
int ret, rc = EXIT_FAILURE;
unsigned long total = 0;

+ elf_version(EV_CURRENT);
+
page_size = getpagesize();

process_options(argc, argv);

tip-bot for Ingo Molnar

unread,
May 26, 2009, 8:20:16 AM5/26/09
to
Commit-ID: 53cb8bc2a3d976efd1a800c3de4640a7220afbb3
Gitweb: http://git.kernel.org/tip/53cb8bc2a3d976efd1a800c3de4640a7220afbb3

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 09:17:18 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 13:05:27 +0200

perf record: Convert to Git option parsing

Remove getopt usage and use Git's much more advanced and more compact
command option library.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>

Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 126 ++++++++-------------------
1 files changed, 38 insertions(+), 88 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 21386a8..9e59d60 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -1,52 +1,29 @@
-#define _GNU_SOURCE


-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/time.h>
-#include <unistd.h>
-#include <stdint.h>

-#include <stdbool.h>


-#include <stdlib.h>
-#include <string.h>
-#include <limits.h>

+#include "util/util.h"
+
+#include <libelf.h>
#include <gelf.h>
#include <elf.h>
-#include <libelf.h>


-#include <fcntl.h>
-#include <stdio.h>
-#include <errno.h>

-#include <ctype.h>
-#include <time.h>
-#include <getopt.h>
-#include <assert.h>
-#include <search.h>
-


-#include <sys/ioctl.h>
-#include <sys/poll.h>
-#include <sys/prctl.h>
-#include <sys/wait.h>

-#include <sys/mman.h>
-#include <sys/types.h>
-#include <sys/stat.h>


-
-#include <linux/unistd.h>
-#include <linux/types.h>
-

-#include "../../include/linux/perf_counter.h"
+
#include "util/list.h"
#include "util/rbtree.h"

+#include "perf.h"
+
+#include "util/parse-options.h"
+#include "util/parse-events.h"
+
#define SHOW_KERNEL 1
#define SHOW_USER 2
#define SHOW_HV 4

-static char const *input_name = "output.perf";
+static char const *input_name = "output.perf";
static int input;
static int show_mask = SHOW_KERNEL | SHOW_USER | SHOW_HV;

static unsigned long page_size;
static unsigned long mmap_window = 32;

-static const char *perf_event_names[] = {
+const char *perf_event_names[] = {
[PERF_EVENT_MMAP] = " PERF_EVENT_MMAP",
[PERF_EVENT_MUNMAP] = " PERF_EVENT_MUNMAP",
[PERF_EVENT_COMM] = " PERF_EVENT_COMM",
@@ -86,7 +63,7 @@ struct section {
char name[0];
};

-static struct section *section__new(uint64_t start, uint64_t size,
+struct section *section__new(uint64_t start, uint64_t size,
uint64_t offset, char *name)
{
struct section *self = malloc(sizeof(*self) + strlen(name) + 1);
@@ -241,7 +218,7 @@ static inline uint8_t elf_sym__type(const GElf_Sym *sym)
return GELF_ST_TYPE(sym->st_info);
}

-static inline bool elf_sym__is_function(const GElf_Sym *sym)
+static inline int elf_sym__is_function(const GElf_Sym *sym)
{


return elf_sym__type(sym) == STT_FUNC &&

sym->st_name != 0 &&

@@ -393,7 +370,7 @@ out_delete_dso:
return NULL;
}

-static void dsos__fprintf(FILE *fp)
+void dsos__fprintf(FILE *fp)
{
struct dso *pos;

@@ -503,7 +480,7 @@ static struct symhist *symhist__new(struct symbol *sym, uint64_t ip,
return self;
}

-static void symhist__delete(struct symhist *self)
+void symhist__delete(struct symhist *self)
{
free(self);
}
@@ -587,7 +564,7 @@ static int thread__set_comm(struct thread *self, const char *comm)
return self->comm ? 0 : -ENOMEM;
}

-static size_t thread__maps_fprintf(struct thread *self, FILE *fp)
+size_t thread__maps_fprintf(struct thread *self, FILE *fp)
{
struct map *pos;
size_t ret = 0;
@@ -668,49 +645,7 @@ static void threads__fprintf(FILE *fp)


}
}

-static void display_help(void)
-{
- printf(

- "Usage: perf-report [<options>]\n"
- " -i file --input=<file> # input file\n"


- );
-
- exit(0);
-}
-

-static void process_options(int argc, char *argv[])
-{
- int error = 0;


-
- for (;;) {
- int option_index = 0;
- /** Options for getopt */
- static struct option long_options[] = {

- {"input", required_argument, NULL, 'i'},
- {"no-user", no_argument, NULL, 'u'},
- {"no-kernel", no_argument, NULL, 'k'},
- {"no-hv", no_argument, NULL, 'h'},


- {NULL, 0, NULL, 0 }
- };

- int c = getopt_long(argc, argv, "+:i:kuh",


- long_options, &option_index);
- if (c == -1)
- break;
-
- switch (c) {

- case 'i': input_name = strdup(optarg); break;
- case 'k': show_mask &= ~SHOW_KERNEL; break;
- case 'u': show_mask &= ~SHOW_USER; break;
- case 'h': show_mask &= ~SHOW_HV; break;


- default: error = 1; break;
- }
- }
-
- if (error)
- display_help();
-}
-

-int cmd_report(int argc, char **argv)
+static int __cmd_report(void)
{
unsigned long offset = 0;
unsigned long head = 0;
@@ -720,12 +655,6 @@ int cmd_report(int argc, char **argv)


int ret, rc = EXIT_FAILURE;
unsigned long total = 0;

- elf_version(EV_CURRENT);
-
- page_size = getpagesize();


-
- process_options(argc, argv);
-

input = open(input_name, O_RDONLY);
if (input < 0) {
perror("failed to open file");
@@ -867,3 +796,24 @@ done:
return rc;
}

+static const char * const report_usage[] = {
+ "perf report [<options>] <command>",
+ NULL
+};
+


+static const struct option options[] = {

+ OPT_STRING('i', "input", &input_name, "file",
+ "input file name"),
+ OPT_END()
+};
+
+int cmd_report(int argc, const char **argv, const char *prefix)
+{
+ elf_version(EV_CURRENT);
+
+ page_size = getpagesize();
+
+ parse_options(argc, argv, options, report_usage, 0);
+
+ return __cmd_report();
+}

tip-bot for Arnaldo Carvalho de Melo

unread,
May 26, 2009, 8:20:17 AM5/26/09
to
Commit-ID: 8fa66bdcc81dd4fc2c91228074d0a4698120c5a2
Gitweb: http://git.kernel.org/tip/8fa66bdcc81dd4fc2c91228074d0a4698120c5a2

Author: Arnaldo Carvalho de Melo <ac...@redhat.com>
AuthorDate: Mon, 18 May 2009 12:45:42 -0300
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 13:52:53 +0200

perf_counter: First part of 'perf report' conversion to C + elfutils

Integrate perf-report into 'perf', as builtin-report.c.

Signed-off-by: Arnaldo Carvalho de Melo <ac...@redhat.com>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>

Cc: Thomas Gleixner <tg...@linutronix.de>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/Makefile | 6 +-
Documentation/perf_counter/builtin-report.c | 751 +++++++++++++++++++++++++++
Documentation/perf_counter/builtin.h | 1 +
Documentation/perf_counter/command-list.txt | 1 +
Documentation/perf_counter/perf.c | 1 +
5 files changed, 755 insertions(+), 5 deletions(-)

diff --git a/Documentation/perf_counter/Makefile b/Documentation/perf_counter/Makefile
index 45daa72..49c601e 100644
--- a/Documentation/perf_counter/Makefile
+++ b/Documentation/perf_counter/Makefile
@@ -228,7 +228,6 @@ COMPAT_CFLAGS =
COMPAT_OBJS =
LIB_H =
LIB_OBJS =
-PROGRAMS = perf-report
SCRIPT_PERL =
SCRIPT_SH =
TEST_PROGRAMS =
@@ -315,6 +314,7 @@ LIB_OBJS += util/wrapper.o

BUILTIN_OBJS += builtin-help.o
BUILTIN_OBJS += builtin-record.o
+BUILTIN_OBJS += builtin-report.o
BUILTIN_OBJS += builtin-stat.o
BUILTIN_OBJS += builtin-top.o

@@ -811,10 +811,6 @@ clean:
$(RM) $(htmldocs).tar.gz $(manpages).tar.gz
$(RM) PERF-VERSION-FILE PERF-CFLAGS PERF-BUILD-OPTIONS

-# temporary hack:
-perf-report: perf-report.cc ../../include/linux/perf_counter.h Makefile
- g++ -g -O2 -Wall -lrt -o $@ $<
-
.PHONY: all install clean strip
.PHONY: shell_compatibility_test please_set_SHELL_PATH_to_a_more_modern_shell
.PHONY: .FORCE-PERF-VERSION-FILE TAGS tags cscope .FORCE-PERF-CFLAGS
diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
new file mode 100644
index 0000000..864f68f
--- /dev/null
+++ b/Documentation/perf_counter/builtin-report.c
@@ -0,0 +1,751 @@
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <unistd.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <errno.h>
+#include <ctype.h>
+#include <time.h>
+#include <getopt.h>
+#include <assert.h>
+#include <search.h>
+
+#include <sys/ioctl.h>
+#include <sys/poll.h>
+#include <sys/prctl.h>
+#include <sys/wait.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/unistd.h>
+#include <linux/types.h>
+
+#include "../../include/linux/perf_counter.h"
+#include "list.h"
+
+#define SHOW_KERNEL 1
+#define SHOW_USER 2
+#define SHOW_HV 4
+


+static char const *input_name = "output.perf";

+static int input;
+static int show_mask = SHOW_KERNEL | SHOW_USER | SHOW_HV;
+
+static unsigned long page_size;
+static unsigned long mmap_window = 32;
+
+static const char *perf_event_names[] = {
+ [PERF_EVENT_MMAP] = " PERF_EVENT_MMAP",
+ [PERF_EVENT_MUNMAP] = " PERF_EVENT_MUNMAP",
+ [PERF_EVENT_COMM] = " PERF_EVENT_COMM",
+};
+
+struct ip_event {
+ struct perf_event_header header;
+ __u64 ip;
+ __u32 pid, tid;
+};
+struct mmap_event {
+ struct perf_event_header header;
+ __u32 pid, tid;
+ __u64 start;
+ __u64 len;
+ __u64 pgoff;
+ char filename[PATH_MAX];
+};
+struct comm_event {
+ struct perf_event_header header;
+ __u32 pid,tid;
+ char comm[16];
+};
+
+typedef union event_union {
+ struct perf_event_header header;
+ struct ip_event ip;
+ struct mmap_event mmap;
+ struct comm_event comm;
+} event_t;
+
+struct section {
+ struct list_head node;
+ uint64_t start;
+ uint64_t end;
+ uint64_t offset;
+ char name[0];
+};
+
+static struct section *section__new(uint64_t start, uint64_t size,
+ uint64_t offset, char *name)
+{
+ struct section *self = malloc(sizeof(*self) + strlen(name) + 1);
+
+ if (self != NULL) {
+ self->start = start;
+ self->end = start + size;
+ self->offset = offset;
+ strcpy(self->name, name);
+ }
+
+ return self;
+}
+
+static void section__delete(struct section *self)
+{
+ free(self);
+}
+
+struct symbol {
+ struct list_head node;
+ uint64_t start;
+ uint64_t end;
+ char name[0];
+};
+
+static struct symbol *symbol__new(uint64_t start, uint64_t len, const char *name)
+{
+ struct symbol *self = malloc(sizeof(*self) + strlen(name) + 1);
+
+ if (self != NULL) {
+ self->start = start;
+ self->end = start + len;
+ strcpy(self->name, name);
+ }
+
+ return self;
+}
+
+static void symbol__delete(struct symbol *self)
+{
+ free(self);
+}
+
+static size_t symbol__fprintf(struct symbol *self, FILE *fp)
+{
+ return fprintf(fp, " %lx-%lx %s\n",
+ self->start, self->end, self->name);
+}
+
+struct dso {
+ struct list_head node;
+ struct list_head sections;
+ struct list_head syms;
+ char name[0];
+};
+
+static struct dso *dso__new(const char *name)
+{
+ struct dso *self = malloc(sizeof(*self) + strlen(name) + 1);
+
+ if (self != NULL) {
+ strcpy(self->name, name);
+ INIT_LIST_HEAD(&self->sections);
+ INIT_LIST_HEAD(&self->syms);
+ }
+
+ return self;
+}
+
+static void dso__delete_sections(struct dso *self)
+{
+ struct section *pos, *n;
+
+ list_for_each_entry_safe(pos, n, &self->sections, node)
+ section__delete(pos);
+}
+
+static void dso__delete_symbols(struct dso *self)
+{
+ struct symbol *pos, *n;
+
+ list_for_each_entry_safe(pos, n, &self->syms, node)
+ symbol__delete(pos);
+}
+
+static void dso__delete(struct dso *self)
+{
+ dso__delete_sections(self);
+ dso__delete_symbols(self);
+ free(self);
+}
+
+static void dso__insert_symbol(struct dso *self, struct symbol *sym)
+{
+ list_add_tail(&sym->node, &self->syms);
+}
+
+static struct symbol *dso__find_symbol(struct dso *self, uint64_t ip)
+{
+ if (self == NULL)
+ return NULL;
+
+ struct symbol *pos;
+
+ list_for_each_entry(pos, &self->syms, node)
+ if (ip >= pos->start && ip <= pos->end)
+ return pos;
+
+ return NULL;
+}
+
+static int dso__load(struct dso *self)
+{
+ /* FIXME */


+ return 0;
+}
+

+static size_t dso__fprintf(struct dso *self, FILE *fp)
+{
+ struct symbol *pos;
+ size_t ret = fprintf(fp, "dso: %s\n", self->name);
+
+ list_for_each_entry(pos, &self->syms, node)
+ ret += symbol__fprintf(pos, fp);
+
+ return ret;
+}
+
+static LIST_HEAD(dsos);
+static struct dso *kernel_dso;
+
+static void dsos__add(struct dso *dso)
+{
+ list_add_tail(&dso->node, &dsos);
+}
+
+static struct dso *dsos__find(const char *name)
+{
+ struct dso *pos;
+
+ list_for_each_entry(pos, &dsos, node)
+ if (strcmp(pos->name, name) == 0)
+ return pos;
+ return NULL;
+}
+
+static struct dso *dsos__findnew(const char *name)
+{
+ struct dso *dso = dsos__find(name);
+
+ if (dso == NULL) {
+ dso = dso__new(name);
+ if (dso != NULL && dso__load(dso) < 0)
+ goto out_delete_dso;
+
+ dsos__add(dso);
+ }
+
+ return dso;
+
+out_delete_dso:
+ dso__delete(dso);
+ return NULL;
+}
+
+static void dsos__fprintf(FILE *fp)
+{
+ struct dso *pos;
+
+ list_for_each_entry(pos, &dsos, node)
+ dso__fprintf(pos, fp);
+}
+
+static int load_kallsyms(void)
+{
+ kernel_dso = dso__new("[kernel]");
+ if (kernel_dso == NULL)
+ return -1;
+
+ FILE *file = fopen("/proc/kallsyms", "r");
+
+ if (file == NULL)
+ goto out_delete_dso;
+
+ char *line = NULL;
+ size_t n;
+
+ while (!feof(file)) {
+ unsigned long long start;
+ char c, symbf[4096];
+
+ if (getline(&line, &n, file) < 0)
+ break;
+
+ if (!line)
+ goto out_delete_dso;
+
+ if (sscanf(line, "%llx %c %s", &start, &c, symbf) == 3) {
+ struct symbol *sym = symbol__new(start, 0x1000000, symbf);
+
+ if (sym == NULL)
+ goto out_delete_dso;
+
+ dso__insert_symbol(kernel_dso, sym);
+ }
+ }
+
+ dsos__add(kernel_dso);
+ free(line);
+ fclose(file);
+ return 0;
+
+out_delete_dso:
+ dso__delete(kernel_dso);


+ return -1;
+}
+

+struct map {
+ struct list_head node;
+ uint64_t start;
+ uint64_t end;
+ uint64_t pgoff;
+ struct dso *dso;
+};
+
+static struct map *map__new(struct mmap_event *event)
+{
+ struct map *self = malloc(sizeof(*self));
+
+ if (self != NULL) {
+ self->start = event->start;
+ self->end = event->start + event->len;
+ self->pgoff = event->pgoff;
+
+ self->dso = dsos__findnew(event->filename);
+ if (self->dso == NULL)
+ goto out_delete;
+ }
+ return self;
+out_delete:
+ free(self);
+ return NULL;
+}
+
+static size_t map__fprintf(struct map *self, FILE *fp)
+{
+ return fprintf(fp, " %lx-%lx %lx %s\n",
+ self->start, self->end, self->pgoff, self->dso->name);
+}
+
+struct symhist {
+ struct list_head node;
+ struct dso *dso;
+ struct symbol *sym;
+ uint32_t count;
+ char level;
+};
+
+static struct symhist *symhist__new(struct symbol *sym, struct dso *dso,
+ char level)
+{
+ struct symhist *self = malloc(sizeof(*self));
+
+ if (self != NULL) {
+ self->sym = sym;
+ self->dso = dso;
+ self->level = level;
+ self->count = 0;
+ }
+
+ return self;
+}
+
+static void symhist__delete(struct symhist *self)
+{
+ free(self);
+}
+
+static bool symhist__equal(struct symhist *self, struct symbol *sym,


+ struct dso *dso, char level)

+{
+ return self->level == level && self->sym == sym && self->dso == dso;
+}
+
+static void symhist__inc(struct symhist *self)
+{
+ ++self->count;
+}
+
+static size_t symhist__fprintf(struct symhist *self, FILE *fp)
+{
+ size_t ret = fprintf(fp, "[%c] ", self->level);
+
+ if (self->level != '.')
+ ret += fprintf(fp, "%s", self->sym->name);
+ else
+ ret += fprintf(fp, "%s: %s",
+ self->dso ? self->dso->name : "<unknown",
+ self->sym ? self->sym->name : "<unknown>");
+ return ret + fprintf(fp, ": %u\n", self->count);
+}
+
+struct thread {
+ struct list_head node;
+ struct list_head maps;
+ struct list_head symhists;
+ pid_t pid;
+ char *comm;
+};
+
+static struct thread *thread__new(pid_t pid)
+{
+ struct thread *self = malloc(sizeof(*self));
+
+ if (self != NULL) {
+ self->pid = pid;
+ self->comm = NULL;
+ INIT_LIST_HEAD(&self->maps);
+ INIT_LIST_HEAD(&self->symhists);
+ }
+
+ return self;
+}
+
+static void thread__insert_symhist(struct thread *self,
+ struct symhist *symhist)
+{
+ list_add_tail(&symhist->node, &self->symhists);
+}
+
+static struct symhist *thread__symhists_find(struct thread *self,
+ struct symbol *sym,


+ struct dso *dso, char level)

+{
+ struct symhist *pos;
+
+ list_for_each_entry(pos, &self->symhists, node)
+ if (symhist__equal(pos, sym, dso, level))
+ return pos;
+
+ return NULL;
+}
+


+static int thread__symbol_incnew(struct thread *self, struct symbol *sym,

+ struct dso *dso, char level)

+{
+ struct symhist *symhist = thread__symhists_find(self, sym, dso, level);
+
+ if (symhist == NULL) {
+ symhist = symhist__new(sym, dso, level);
+ if (symhist == NULL)
+ goto out_error;
+ thread__insert_symhist(self, symhist);
+ }
+
+ symhist__inc(symhist);
+ return 0;
+out_error:
+ return -ENOMEM;
+}
+
+static int thread__set_comm(struct thread *self, const char *comm)
+{
+ self->comm = strdup(comm);
+ return self->comm ? 0 : -ENOMEM;
+}
+
+static size_t thread__maps_fprintf(struct thread *self, FILE *fp)
+{
+ struct map *pos;
+ size_t ret = 0;
+
+ list_for_each_entry(pos, &self->maps, node)
+ ret += map__fprintf(pos, fp);
+
+ return ret;
+}
+
+static size_t thread__fprintf(struct thread *self, FILE *fp)
+{
+ struct symhist *pos;
+ int ret = fprintf(fp, "thread: %d %s\n", self->pid, self->comm);
+
+ list_for_each_entry(pos, &self->symhists, node)
+ ret += symhist__fprintf(pos, fp);
+
+ return ret;
+}
+
+static LIST_HEAD(threads);
+
+static void threads__add(struct thread *thread)
+{
+ list_add_tail(&thread->node, &threads);
+}
+
+static struct thread *threads__find(pid_t pid)
+{
+ struct thread *pos;
+
+ list_for_each_entry(pos, &threads, node)
+ if (pos->pid == pid)
+ return pos;
+ return NULL;
+}
+


+static struct thread *threads__findnew(pid_t pid)

+{
+ struct thread *thread = threads__find(pid);
+


+ if (thread == NULL) {

+ thread = thread__new(pid);
+ if (thread != NULL)
+ threads__add(thread);
+ }
+
+ return thread;
+}
+
+static void thread__insert_map(struct thread *self, struct map *map)
+{
+ list_add_tail(&map->node, &self->maps);
+}
+
+static struct map *thread__find_map(struct thread *self, uint64_t ip)
+{
+ if (self == NULL)
+ return NULL;
+
+ struct map *pos;
+
+ list_for_each_entry(pos, &self->maps, node)
+ if (ip >= pos->start && ip <= pos->end)
+ return pos;
+
+ return NULL;
+}
+
+static void threads__fprintf(FILE *fp)
+{
+ struct thread *pos;
+
+ list_for_each_entry(pos, &threads, node)
+ thread__fprintf(pos, fp);
+}
+
+#if 0
+static std::string resolve_user_symbol(int pid, uint64_t ip)
+{
+ std::string sym = "<unknown>";
+
+ maps_t &m = maps[pid];
+ maps_t::const_iterator mi = m.upper_bound(map(ip));
+ if (mi == m.end())
+ return sym;
+
+ ip -= mi->start + mi->pgoff;
+
+ symbols_t &s = dsos[mi->dso].syms;
+ symbols_t::const_iterator si = s.upper_bound(symbol(ip));
+
+ sym = mi->dso + ": <unknown>";
+
+ if (si == s.begin())
+ return sym;
+ si--;
+
+ if (si->start <= ip && ip < si->end)
+ sym = mi->dso + ": " + si->name;
+#if 0
+ else if (si->start <= ip)
+ sym = mi->dso + ": ?" + si->name;
+#endif
+
+ return sym;
+}
+#endif
+
+static void display_help(void)
+{
+ printf(
+ "Usage: perf-report [<options>]\n"
+ " -i file --input=<file> # input file\n"
+ );
+
+ exit(0);
+}
+
+static void process_options(int argc, char *argv[])
+{
+ int error = 0;
+
+ for (;;) {
+ int option_index = 0;
+ /** Options for getopt */
+ static struct option long_options[] = {
+ {"input", required_argument, NULL, 'i'},
+ {"no-user", no_argument, NULL, 'u'},
+ {"no-kernel", no_argument, NULL, 'k'},
+ {"no-hv", no_argument, NULL, 'h'},
+ {NULL, 0, NULL, 0 }
+ };
+ int c = getopt_long(argc, argv, "+:i:kuh",
+ long_options, &option_index);
+ if (c == -1)
+ break;
+
+ switch (c) {
+ case 'i': input_name = strdup(optarg); break;
+ case 'k': show_mask &= ~SHOW_KERNEL; break;
+ case 'u': show_mask &= ~SHOW_USER; break;
+ case 'h': show_mask &= ~SHOW_HV; break;
+ default: error = 1; break;
+ }
+ }
+
+ if (error)
+ display_help();
+}
+
+int cmd_report(int argc, char **argv)
+{
+ unsigned long offset = 0;
+ unsigned long head = 0;
+ struct stat stat;
+ char *buf;
+ event_t *event;
+ int ret, rc = EXIT_FAILURE;
+ unsigned long total = 0;


+
+ page_size = getpagesize();
+

+ process_options(argc, argv);
+
+ input = open(input_name, O_RDONLY);
+ if (input < 0) {
+ perror("failed to open file");
+ exit(-1);
+ }
+
+ ret = fstat(input, &stat);
+ if (ret < 0) {
+ perror("failed to stat file");
+ exit(-1);
+ }
+
+ if (!stat.st_size) {
+ fprintf(stderr, "zero-sized file, nothing to do!\n");
+ exit(0);
+ }
+
+ if (load_kallsyms() < 0) {
+ perror("failed to open kallsyms");
+ return EXIT_FAILURE;
+ }
+
+remap:
+ buf = (char *)mmap(NULL, page_size * mmap_window, PROT_READ,
+ MAP_SHARED, input, offset);
+ if (buf == MAP_FAILED) {
+ perror("failed to mmap file");
+ exit(-1);
+ }
+
+more:
+ event = (event_t *)(buf + head);
+
+ if (head + event->header.size >= page_size * mmap_window) {
+ unsigned long shift = page_size * (head / page_size);
+ int ret;
+
+ ret = munmap(buf, page_size * mmap_window);
+ assert(ret == 0);
+
+ offset += shift;
+ head -= shift;
+ goto remap;
+ }
+
+
+ if (!event->header.size) {
+ fprintf(stderr, "zero-sized event at file offset %ld\n", offset + head);
+ fprintf(stderr, "skipping %ld bytes of events.\n", stat.st_size - offset - head);
+ goto done;
+ }
+
+ head += event->header.size;
+
+ if (event->header.misc & PERF_EVENT_MISC_OVERFLOW) {
+ char level;
+ int show = 0;
+ struct dso *dso = NULL;
+ struct thread *thread = threads__findnew(event->ip.pid);
+


+ if (thread == NULL)

+ goto done;
+


+ if (event->header.misc & PERF_EVENT_MISC_KERNEL) {

+ show = SHOW_KERNEL;
+ level = 'k';
+ dso = kernel_dso;
+ } else if (event->header.misc & PERF_EVENT_MISC_USER) {
+ show = SHOW_USER;
+ level = '.';
+ struct map *map = thread__find_map(thread, event->ip.ip);
+ if (map != NULL)
+ dso = map->dso;
+ } else {
+ show = SHOW_HV;
+ level = 'H';
+ }
+
+ if (show & show_mask) {
+ struct symbol *sym = dso__find_symbol(dso, event->ip.ip);
+
+ if (thread__symbol_incnew(thread, sym, dso, level))
+ goto done;
+ }
+ total++;
+ } else switch (event->header.type) {
+ case PERF_EVENT_MMAP: {
+ struct thread *thread = threads__findnew(event->mmap.pid);
+ struct map *map = map__new(&event->mmap);
+


+ if (thread == NULL || map == NULL )

+ goto done;
+ thread__insert_map(thread, map);
+ break;
+ }
+ case PERF_EVENT_COMM: {
+ struct thread *thread = threads__findnew(event->comm.pid);
+


+ if (thread == NULL ||

+ thread__set_comm(thread, event->comm.comm))
+ goto done;


+ break;
+ }
+ }
+

+ if (offset + head < stat.st_size)
+ goto more;
+
+ rc = EXIT_SUCCESS;
+done:
+ close(input);
+ //dsos__fprintf(stdout);
+ threads__fprintf(stdout);
+#if 0
+ std::map<std::string, int>::iterator hi = hist.begin();
+
+ while (hi != hist.end()) {
+ rev_hist.insert(std::pair<int, std::string>(hi->second, hi->first));
+ hist.erase(hi++);
+ }
+
+ std::multimap<int, std::string>::const_iterator ri = rev_hist.begin();
+
+ while (ri != rev_hist.end()) {
+ printf(" %5.2f %s\n", (100.0 * ri->first)/total, ri->second.c_str());
+ ri++;
+ }
+#endif
+ return rc;
+}
+
diff --git a/Documentation/perf_counter/builtin.h b/Documentation/perf_counter/builtin.h
index d32318a..5bfea57 100644
--- a/Documentation/perf_counter/builtin.h
+++ b/Documentation/perf_counter/builtin.h
@@ -16,6 +16,7 @@ extern int check_pager_config(const char *cmd);

extern int cmd_help(int argc, const char **argv, const char *prefix);
extern int cmd_record(int argc, const char **argv, const char *prefix);
+extern int cmd_report(int argc, const char **argv, const char *prefix);
extern int cmd_stat(int argc, const char **argv, const char *prefix);
extern int cmd_top(int argc, const char **argv, const char *prefix);
extern int cmd_version(int argc, const char **argv, const char *prefix);
diff --git a/Documentation/perf_counter/command-list.txt b/Documentation/perf_counter/command-list.txt
index d15210a..4390292 100644
--- a/Documentation/perf_counter/command-list.txt
+++ b/Documentation/perf_counter/command-list.txt
@@ -1,6 +1,7 @@
# List of known perf commands.
# command name category [deprecated] [common]
perf-record mainporcelain common
+perf-report mainporcelain common
perf-stat mainporcelain common
perf-top mainporcelain common

diff --git a/Documentation/perf_counter/perf.c b/Documentation/perf_counter/perf.c
index 1d6d7aa..e8a8584 100644
--- a/Documentation/perf_counter/perf.c
+++ b/Documentation/perf_counter/perf.c
@@ -250,6 +250,7 @@ static void handle_internal_command(int argc, const char **argv)
static struct cmd_struct commands[] = {
{ "help", cmd_help, 0 },
{ "record", cmd_record, 0 },
+ { "report", cmd_report, 0 },
{ "stat", cmd_stat, 0 },
{ "top", cmd_top, 0 },
{ "version", cmd_version, 0 },

tip-bot for Arnaldo Carvalho de Melo

unread,
May 26, 2009, 8:20:18 AM5/26/09
to
Commit-ID: 35a50c8a20eea22c141e05c5667ac21c48b8b65d
Gitweb: http://git.kernel.org/tip/35a50c8a20eea22c141e05c5667ac21c48b8b65d

Author: Arnaldo Carvalho de Melo <ac...@redhat.com>
AuthorDate: Mon, 18 May 2009 16:24:49 -0300
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 13:52:55 +0200

perf_counter: Use rb_trees in perf report

Signed-off-by: Arnaldo Carvalho de Melo <ac...@redhat.com>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/Makefile | 3 +
Documentation/perf_counter/builtin-report.c | 60 +++--
Documentation/perf_counter/util/rbtree.c | 383 +++++++++++++++++++++++++++
Documentation/perf_counter/util/rbtree.h | 171 ++++++++++++
4 files changed, 601 insertions(+), 16 deletions(-)

diff --git a/Documentation/perf_counter/Makefile b/Documentation/perf_counter/Makefile
index 6bffa86..412dea1 100644
--- a/Documentation/perf_counter/Makefile
+++ b/Documentation/perf_counter/Makefile
@@ -287,6 +287,8 @@ LIB_FILE=libperf.a



LIB_H += ../../include/linux/perf_counter.h
LIB_H += perf.h

+LIB_H += util/list.h
+LIB_H += util/rbtree.h


LIB_H += util/levenshtein.h
LIB_H += util/parse-options.h

LIB_H += util/parse-events.h
@@ -306,6 +308,7 @@ LIB_OBJS += util/levenshtein.o
LIB_OBJS += util/parse-options.o


LIB_OBJS += util/parse-events.o
LIB_OBJS += util/path.o

+LIB_OBJS += util/rbtree.o


LIB_OBJS += util/run-command.o
LIB_OBJS += util/quote.o

LIB_OBJS += util/strbuf.o
diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index ad2f327..f63057f 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -32,7 +32,8 @@
#include <linux/types.h>

#include "../../include/linux/perf_counter.h"
-#include "list.h"
+#include "util/list.h"
+#include "util/rbtree.h"



#define SHOW_KERNEL 1
#define SHOW_USER 2

@@ -106,10 +107,10 @@ static void section__delete(struct section *self)
}

struct symbol {
- struct list_head node;
- uint64_t start;
- uint64_t end;
- char name[0];
+ struct rb_node rb_node;


+ uint64_t start;
+ uint64_t end;
+ char name[0];

};



static struct symbol *symbol__new(uint64_t start, uint64_t len, const char *name)

@@ -139,7 +140,7 @@ static size_t symbol__fprintf(struct symbol *self, FILE *fp)
struct dso {
struct list_head node;
struct list_head sections;
- struct list_head syms;
+ struct rb_root syms;
char name[0];
};

@@ -150,7 +151,7 @@ static struct dso *dso__new(const char *name)
if (self != NULL) {
strcpy(self->name, name);
INIT_LIST_HEAD(&self->sections);
- INIT_LIST_HEAD(&self->syms);
+ self->syms = RB_ROOT;
}

return self;
@@ -166,10 +167,14 @@ static void dso__delete_sections(struct dso *self)

static void dso__delete_symbols(struct dso *self)
{
- struct symbol *pos, *n;
+ struct symbol *pos;
+ struct rb_node *next = rb_first(&self->syms);

- list_for_each_entry_safe(pos, n, &self->syms, node)
+ while (next) {
+ pos = rb_entry(next, struct symbol, rb_node);
+ next = rb_next(&pos->rb_node);
symbol__delete(pos);
+ }
}

static void dso__delete(struct dso *self)
@@ -181,7 +186,21 @@ static void dso__delete(struct dso *self)



static void dso__insert_symbol(struct dso *self, struct symbol *sym)

{
- list_add_tail(&sym->node, &self->syms);
+ struct rb_node **p = &self->syms.rb_node;


+ struct rb_node *parent = NULL;

+ const uint64_t ip = sym->start;
+ struct symbol *s;
+


+ while (*p != NULL) {
+ parent = *p;

+ s = rb_entry(parent, struct symbol, rb_node);
+ if (ip < s->start)


+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;

+ }
+ rb_link_node(&sym->rb_node, parent, p);
+ rb_insert_color(&sym->rb_node, &self->syms);


}

static struct symbol *dso__find_symbol(struct dso *self, uint64_t ip)

@@ -189,11 +208,18 @@ static struct symbol *dso__find_symbol(struct dso *self, uint64_t ip)
if (self == NULL)
return NULL;

- struct symbol *pos;
+ struct rb_node *n = self->syms.rb_node;

- list_for_each_entry(pos, &self->syms, node)
- if (ip >= pos->start && ip <= pos->end)
- return pos;
+ while (n) {
+ struct symbol *s = rb_entry(n, struct symbol, rb_node);
+
+ if (ip < s->start)
+ n = n->rb_left;
+ else if (ip > s->end)
+ n = n->rb_right;
+ else
+ return s;
+ }

return NULL;
}
@@ -319,11 +345,13 @@ out_close:



static size_t dso__fprintf(struct dso *self, FILE *fp)

{
- struct symbol *pos;


size_t ret = fprintf(fp, "dso: %s\n", self->name);

- list_for_each_entry(pos, &self->syms, node)
+ struct rb_node *nd;
+ for (nd = rb_first(&self->syms); nd; nd = rb_next(nd)) {
+ struct symbol *pos = rb_entry(nd, struct symbol, rb_node);


ret += symbol__fprintf(pos, fp);
+ }

return ret;
}
diff --git a/Documentation/perf_counter/util/rbtree.c b/Documentation/perf_counter/util/rbtree.c
new file mode 100644
index 0000000..b15ba9c
--- /dev/null
+++ b/Documentation/perf_counter/util/rbtree.c
@@ -0,0 +1,383 @@
+/*
+ Red Black Trees
+ (C) 1999 Andrea Arcangeli <and...@suse.de>
+ (C) 2002 David Woodhouse <dw...@infradead.org>


+
+ This program is free software; you can redistribute it and/or modify

+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+ linux/lib/rbtree.c
+*/
+
+#include "rbtree.h"
+
+static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
+{
+ struct rb_node *right = node->rb_right;
+ struct rb_node *parent = rb_parent(node);
+
+ if ((node->rb_right = right->rb_left))
+ rb_set_parent(right->rb_left, node);
+ right->rb_left = node;
+
+ rb_set_parent(right, parent);
+
+ if (parent)
+ {
+ if (node == parent->rb_left)
+ parent->rb_left = right;
+ else
+ parent->rb_right = right;
+ }
+ else
+ root->rb_node = right;
+ rb_set_parent(node, right);
+}
+
+static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
+{
+ struct rb_node *left = node->rb_left;
+ struct rb_node *parent = rb_parent(node);
+
+ if ((node->rb_left = left->rb_right))
+ rb_set_parent(left->rb_right, node);
+ left->rb_right = node;
+
+ rb_set_parent(left, parent);
+
+ if (parent)
+ {
+ if (node == parent->rb_right)
+ parent->rb_right = left;
+ else
+ parent->rb_left = left;
+ }
+ else
+ root->rb_node = left;
+ rb_set_parent(node, left);
+}
+
+void rb_insert_color(struct rb_node *node, struct rb_root *root)
+{
+ struct rb_node *parent, *gparent;
+
+ while ((parent = rb_parent(node)) && rb_is_red(parent))
+ {
+ gparent = rb_parent(parent);
+
+ if (parent == gparent->rb_left)
+ {
+ {
+ register struct rb_node *uncle = gparent->rb_right;
+ if (uncle && rb_is_red(uncle))
+ {
+ rb_set_black(uncle);
+ rb_set_black(parent);
+ rb_set_red(gparent);
+ node = gparent;
+ continue;
+ }
+ }
+
+ if (parent->rb_right == node)
+ {
+ register struct rb_node *tmp;
+ __rb_rotate_left(parent, root);
+ tmp = parent;
+ parent = node;
+ node = tmp;
+ }
+
+ rb_set_black(parent);
+ rb_set_red(gparent);
+ __rb_rotate_right(gparent, root);
+ } else {
+ {
+ register struct rb_node *uncle = gparent->rb_left;
+ if (uncle && rb_is_red(uncle))
+ {
+ rb_set_black(uncle);
+ rb_set_black(parent);
+ rb_set_red(gparent);
+ node = gparent;
+ continue;
+ }
+ }
+
+ if (parent->rb_left == node)
+ {
+ register struct rb_node *tmp;
+ __rb_rotate_right(parent, root);
+ tmp = parent;
+ parent = node;
+ node = tmp;
+ }
+
+ rb_set_black(parent);
+ rb_set_red(gparent);
+ __rb_rotate_left(gparent, root);
+ }
+ }
+
+ rb_set_black(root->rb_node);
+}
+
+static void __rb_erase_color(struct rb_node *node, struct rb_node *parent,
+ struct rb_root *root)
+{
+ struct rb_node *other;
+
+ while ((!node || rb_is_black(node)) && node != root->rb_node)
+ {
+ if (parent->rb_left == node)
+ {
+ other = parent->rb_right;
+ if (rb_is_red(other))
+ {
+ rb_set_black(other);
+ rb_set_red(parent);
+ __rb_rotate_left(parent, root);
+ other = parent->rb_right;
+ }
+ if ((!other->rb_left || rb_is_black(other->rb_left)) &&
+ (!other->rb_right || rb_is_black(other->rb_right)))
+ {
+ rb_set_red(other);
+ node = parent;
+ parent = rb_parent(node);
+ }
+ else
+ {
+ if (!other->rb_right || rb_is_black(other->rb_right))
+ {
+ rb_set_black(other->rb_left);
+ rb_set_red(other);
+ __rb_rotate_right(other, root);
+ other = parent->rb_right;
+ }
+ rb_set_color(other, rb_color(parent));
+ rb_set_black(parent);
+ rb_set_black(other->rb_right);
+ __rb_rotate_left(parent, root);
+ node = root->rb_node;
+ break;
+ }
+ }
+ else
+ {
+ other = parent->rb_left;
+ if (rb_is_red(other))
+ {
+ rb_set_black(other);
+ rb_set_red(parent);
+ __rb_rotate_right(parent, root);
+ other = parent->rb_left;
+ }
+ if ((!other->rb_left || rb_is_black(other->rb_left)) &&
+ (!other->rb_right || rb_is_black(other->rb_right)))
+ {
+ rb_set_red(other);
+ node = parent;
+ parent = rb_parent(node);
+ }
+ else
+ {
+ if (!other->rb_left || rb_is_black(other->rb_left))
+ {
+ rb_set_black(other->rb_right);
+ rb_set_red(other);
+ __rb_rotate_left(other, root);
+ other = parent->rb_left;
+ }
+ rb_set_color(other, rb_color(parent));
+ rb_set_black(parent);
+ rb_set_black(other->rb_left);
+ __rb_rotate_right(parent, root);
+ node = root->rb_node;


+ break;
+ }
+ }
+ }

+ if (node)
+ rb_set_black(node);
+}
+
+void rb_erase(struct rb_node *node, struct rb_root *root)
+{
+ struct rb_node *child, *parent;
+ int color;
+
+ if (!node->rb_left)
+ child = node->rb_right;
+ else if (!node->rb_right)
+ child = node->rb_left;
+ else
+ {
+ struct rb_node *old = node, *left;
+
+ node = node->rb_right;
+ while ((left = node->rb_left) != NULL)
+ node = left;
+ child = node->rb_right;
+ parent = rb_parent(node);
+ color = rb_color(node);
+
+ if (child)
+ rb_set_parent(child, parent);
+ if (parent == old) {
+ parent->rb_right = child;
+ parent = node;
+ } else
+ parent->rb_left = child;
+
+ node->rb_parent_color = old->rb_parent_color;
+ node->rb_right = old->rb_right;
+ node->rb_left = old->rb_left;
+
+ if (rb_parent(old))
+ {
+ if (rb_parent(old)->rb_left == old)
+ rb_parent(old)->rb_left = node;
+ else
+ rb_parent(old)->rb_right = node;
+ } else
+ root->rb_node = node;
+
+ rb_set_parent(old->rb_left, node);
+ if (old->rb_right)
+ rb_set_parent(old->rb_right, node);
+ goto color;
+ }
+
+ parent = rb_parent(node);
+ color = rb_color(node);
+
+ if (child)
+ rb_set_parent(child, parent);
+ if (parent)
+ {
+ if (parent->rb_left == node)
+ parent->rb_left = child;
+ else
+ parent->rb_right = child;
+ }
+ else
+ root->rb_node = child;
+
+ color:
+ if (color == RB_BLACK)
+ __rb_erase_color(child, parent, root);
+}
+
+/*
+ * This function returns the first node (in sort order) of the tree.
+ */
+struct rb_node *rb_first(const struct rb_root *root)
+{
+ struct rb_node *n;
+
+ n = root->rb_node;
+ if (!n)
+ return NULL;
+ while (n->rb_left)
+ n = n->rb_left;
+ return n;
+}
+
+struct rb_node *rb_last(const struct rb_root *root)
+{
+ struct rb_node *n;
+
+ n = root->rb_node;
+ if (!n)
+ return NULL;
+ while (n->rb_right)
+ n = n->rb_right;
+ return n;
+}
+
+struct rb_node *rb_next(const struct rb_node *node)
+{
+ struct rb_node *parent;
+
+ if (rb_parent(node) == node)
+ return NULL;
+
+ /* If we have a right-hand child, go down and then left as far
+ as we can. */
+ if (node->rb_right) {
+ node = node->rb_right;
+ while (node->rb_left)
+ node=node->rb_left;
+ return (struct rb_node *)node;
+ }
+
+ /* No right-hand children. Everything down and left is
+ smaller than us, so any 'next' node must be in the general
+ direction of our parent. Go up the tree; any time the
+ ancestor is a right-hand child of its parent, keep going
+ up. First time it's a left-hand child of its parent, said
+ parent is our 'next' node. */
+ while ((parent = rb_parent(node)) && node == parent->rb_right)
+ node = parent;
+
+ return parent;
+}
+
+struct rb_node *rb_prev(const struct rb_node *node)
+{
+ struct rb_node *parent;
+
+ if (rb_parent(node) == node)
+ return NULL;
+
+ /* If we have a left-hand child, go down and then right as far
+ as we can. */
+ if (node->rb_left) {
+ node = node->rb_left;
+ while (node->rb_right)
+ node=node->rb_right;
+ return (struct rb_node *)node;
+ }
+
+ /* No left-hand children. Go up till we find an ancestor which
+ is a right-hand child of its parent */
+ while ((parent = rb_parent(node)) && node == parent->rb_left)
+ node = parent;
+
+ return parent;
+}
+
+void rb_replace_node(struct rb_node *victim, struct rb_node *new,
+ struct rb_root *root)
+{
+ struct rb_node *parent = rb_parent(victim);
+
+ /* Set the surrounding nodes to point to the replacement */
+ if (parent) {
+ if (victim == parent->rb_left)
+ parent->rb_left = new;
+ else
+ parent->rb_right = new;
+ } else {
+ root->rb_node = new;
+ }
+ if (victim->rb_left)
+ rb_set_parent(victim->rb_left, new);
+ if (victim->rb_right)
+ rb_set_parent(victim->rb_right, new);
+
+ /* Copy the pointers/colour from the victim to the replacement */
+ *new = *victim;
+}
diff --git a/Documentation/perf_counter/util/rbtree.h b/Documentation/perf_counter/util/rbtree.h
new file mode 100644
index 0000000..6bdc488
--- /dev/null
+++ b/Documentation/perf_counter/util/rbtree.h
@@ -0,0 +1,171 @@
+/*
+ Red Black Trees
+ (C) 1999 Andrea Arcangeli <and...@suse.de>


+
+ This program is free software; you can redistribute it and/or modify

+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+
+ linux/include/linux/rbtree.h
+
+ To use rbtrees you'll have to implement your own insert and search cores.
+ This will avoid us to use callbacks and to drop drammatically performances.
+ I know it's not the cleaner way, but in C (not in C++) to get
+ performances and genericity...
+
+ Some example of insert and search follows here. The search is a plain
+ normal search over an ordered tree. The insert instead must be implemented
+ int two steps: as first thing the code must insert the element in
+ order as a red leaf in the tree, then the support library function
+ rb_insert_color() must be called. Such function will do the
+ not trivial work to rebalance the rbtree if necessary.
+
+-----------------------------------------------------------------------
+static inline struct page * rb_search_page_cache(struct inode * inode,
+ unsigned long offset)
+{
+ struct rb_node * n = inode->i_rb_page_cache.rb_node;
+ struct page * page;
+
+ while (n)
+ {
+ page = rb_entry(n, struct page, rb_page_cache);
+
+ if (offset < page->offset)
+ n = n->rb_left;
+ else if (offset > page->offset)
+ n = n->rb_right;
+ else
+ return page;


+ }
+ return NULL;
+}
+

+static inline struct page * __rb_insert_page_cache(struct inode * inode,
+ unsigned long offset,
+ struct rb_node * node)
+{
+ struct rb_node ** p = &inode->i_rb_page_cache.rb_node;
+ struct rb_node * parent = NULL;
+ struct page * page;
+
+ while (*p)
+ {
+ parent = *p;
+ page = rb_entry(parent, struct page, rb_page_cache);
+
+ if (offset < page->offset)


+ p = &(*p)->rb_left;

+ else if (offset > page->offset)


+ p = &(*p)->rb_right;

+ else
+ return page;
+ }
+
+ rb_link_node(node, parent, p);


+
+ return NULL;
+}
+

+static inline struct page * rb_insert_page_cache(struct inode * inode,
+ unsigned long offset,
+ struct rb_node * node)
+{
+ struct page * ret;
+ if ((ret = __rb_insert_page_cache(inode, offset, node)))
+ goto out;
+ rb_insert_color(node, &inode->i_rb_page_cache);
+ out:
+ return ret;
+}
+-----------------------------------------------------------------------
+*/
+
+#ifndef _LINUX_RBTREE_H
+#define _LINUX_RBTREE_H
+
+#include <stddef.h>
+


+/**
+ * container_of - cast a member of a structure out to the containing structure
+ * @ptr: the pointer to the member.
+ * @type: the type of the container struct this is embedded in.
+ * @member: the name of the member within the struct.
+ *
+ */
+#define container_of(ptr, type, member) ({ \
+ const typeof( ((type *)0)->member ) *__mptr = (ptr); \
+ (type *)( (char *)__mptr - offsetof(type,member) );})

+
+struct rb_node
+{
+ unsigned long rb_parent_color;
+#define RB_RED 0
+#define RB_BLACK 1
+ struct rb_node *rb_right;
+ struct rb_node *rb_left;
+} __attribute__((aligned(sizeof(long))));
+ /* The alignment might seem pointless, but allegedly CRIS needs it */
+
+struct rb_root
+{
+ struct rb_node *rb_node;
+};
+
+
+#define rb_parent(r) ((struct rb_node *)((r)->rb_parent_color & ~3))
+#define rb_color(r) ((r)->rb_parent_color & 1)
+#define rb_is_red(r) (!rb_color(r))
+#define rb_is_black(r) rb_color(r)
+#define rb_set_red(r) do { (r)->rb_parent_color &= ~1; } while (0)
+#define rb_set_black(r) do { (r)->rb_parent_color |= 1; } while (0)
+
+static inline void rb_set_parent(struct rb_node *rb, struct rb_node *p)
+{
+ rb->rb_parent_color = (rb->rb_parent_color & 3) | (unsigned long)p;
+}
+static inline void rb_set_color(struct rb_node *rb, int color)
+{
+ rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
+}
+
+#define RB_ROOT (struct rb_root) { NULL, }
+#define rb_entry(ptr, type, member) container_of(ptr, type, member)
+
+#define RB_EMPTY_ROOT(root) ((root)->rb_node == NULL)
+#define RB_EMPTY_NODE(node) (rb_parent(node) == node)
+#define RB_CLEAR_NODE(node) (rb_set_parent(node, node))
+
+extern void rb_insert_color(struct rb_node *, struct rb_root *);
+extern void rb_erase(struct rb_node *, struct rb_root *);
+
+/* Find logical next and previous nodes in a tree */
+extern struct rb_node *rb_next(const struct rb_node *);
+extern struct rb_node *rb_prev(const struct rb_node *);
+extern struct rb_node *rb_first(const struct rb_root *);
+extern struct rb_node *rb_last(const struct rb_root *);
+
+/* Fast replacement of a single node without remove/rebalance/add/rebalance */
+extern void rb_replace_node(struct rb_node *victim, struct rb_node *new,
+ struct rb_root *root);
+
+static inline void rb_link_node(struct rb_node * node, struct rb_node * parent,
+ struct rb_node ** rb_link)
+{
+ node->rb_parent_color = (unsigned long )parent;
+ node->rb_left = node->rb_right = NULL;
+
+ *rb_link = node;
+}
+
+#endif /* _LINUX_RBTREE_H */

tip-bot for Ingo Molnar

unread,
May 26, 2009, 8:20:20 AM5/26/09
to
Commit-ID: fd4242bb35b70557eee8d0c79f82dacc3f3b89e0
Gitweb: http://git.kernel.org/tip/fd4242bb35b70557eee8d0c79f82dacc3f3b89e0
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Wed, 20 May 2009 12:45:34 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 13:52:53 +0200

perf_counter tools: remove the standalone perf-report utility

With a built-in 'perf report' command now available, remove the
standalone implementation for good.

Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/perf-report.cc | 515 -----------------------------
1 files changed, 0 insertions(+), 515 deletions(-)

diff --git a/Documentation/perf_counter/perf-report.cc b/Documentation/perf_counter/perf-report.cc
deleted file mode 100644
index 8855107..0000000
--- a/Documentation/perf_counter/perf-report.cc
+++ /dev/null
@@ -1,515 +0,0 @@


-#define _GNU_SOURCE
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/time.h>
-#include <unistd.h>
-#include <stdint.h>

-#include <stdlib.h>
-#include <string.h>
-#include <limits.h>

-#include <fcntl.h>
-#include <stdio.h>
-#include <errno.h>
-#include <ctype.h>
-#include <time.h>
-#include <getopt.h>
-#include <assert.h>

-
-#include <sys/ioctl.h>
-#include <sys/poll.h>
-#include <sys/prctl.h>
-#include <sys/wait.h>
-#include <sys/mman.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-
-#include <linux/unistd.h>
-#include <linux/types.h>
-
-#include "../../include/linux/perf_counter.h"

-
-#include <set>
-#include <map>
-#include <string>
-
-
-#define SHOW_KERNEL 1
-#define SHOW_USER 2
-#define SHOW_HV 4
-
-static char const *input_name = "output.perf";
-static int input;
-static int show_mask = SHOW_KERNEL | SHOW_USER | SHOW_HV;
-
-static unsigned long page_size;
-static unsigned long mmap_window = 32;
-
-struct ip_event {
- struct perf_event_header header;
- __u64 ip;
- __u32 pid, tid;
-};
-struct mmap_event {
- struct perf_event_header header;
- __u32 pid, tid;
- __u64 start;
- __u64 len;
- __u64 pgoff;
- char filename[PATH_MAX];
-};
-struct comm_event {
- struct perf_event_header header;
- __u32 pid,tid;
- char comm[16];
-};
-
-typedef union event_union {
- struct perf_event_header header;
- struct ip_event ip;
- struct mmap_event mmap;
- struct comm_event comm;
-} event_t;
-
-struct section {


- uint64_t start;
- uint64_t end;
-

- uint64_t offset;
-
- std::string name;
-
- section() { };
-
- section(uint64_t stab) : end(stab) { };
-
- section(uint64_t start, uint64_t size, uint64_t offset, std::string name) :
- start(start), end(start + size), offset(offset), name(name)
- { };
-
- bool operator < (const struct section &s) const {
- return end < s.end;
- };
-};
-
-typedef std::set<struct section> sections_t;
-
-struct symbol {


- uint64_t start;
- uint64_t end;
-

- std::string name;
-
- symbol() { };
-
- symbol(uint64_t ip) : start(ip) { }
-
- symbol(uint64_t start, uint64_t len, std::string name) :
- start(start), end(start + len), name(name)
- { };
-
- bool operator < (const struct symbol &s) const {
- return start < s.start;
- };
-};
-
-typedef std::set<struct symbol> symbols_t;
-
-struct dso {
- sections_t sections;
- symbols_t syms;
-};
-
-static std::map<std::string, struct dso> dsos;
-
-static void load_dso_sections(std::string dso_name)
-{
- struct dso &dso = dsos[dso_name];
-
- std::string cmd = "readelf -DSW " + dso_name;
-
- FILE *file = popen(cmd.c_str(), "r");
- if (!file) {
- perror("failed to open pipe");
- exit(-1);
- }
-
- char *line = NULL;
- size_t n = 0;
-
- while (!feof(file)) {
- uint64_t addr, off, size;
- char name[32];
-
- if (getline(&line, &n, file) < 0)
- break;
- if (!line)
- break;
-
- if (sscanf(line, " [%*2d] %16s %*14s %Lx %Lx %Lx",
- name, &addr, &off, &size) == 4) {
-
- dso.sections.insert(section(addr, size, addr - off, name));
- }
-#if 0
- /*
- * for reading readelf symbols (-s), however these don't seem
- * to include nearly everything, so use nm for that.
- */
- if (sscanf(line, " %*4d %*3d: %Lx %5Lu %*7s %*6s %*7s %3d %s",
- &start, &size, &section, sym) == 4) {
-
- start -= dso.section_offsets[section];
-
- dso.syms.insert(symbol(start, size, std::string(sym)));
- }
-#endif


- }
- pclose(file);
-}
-

-static void load_dso_symbols(std::string dso_name, std::string args)
-{
- struct dso &dso = dsos[dso_name];
-
- std::string cmd = "nm -nSC " + args + " " + dso_name;
-
- FILE *file = popen(cmd.c_str(), "r");
- if (!file) {
- perror("failed to open pipe");
- exit(-1);
- }
-
- char *line = NULL;
- size_t n = 0;
-
- while (!feof(file)) {
- uint64_t start, size;
- char c;
- char sym[1024];
-
- if (getline(&line, &n, file) < 0)
- break;
- if (!line)
- break;
-
-
- if (sscanf(line, "%Lx %Lx %c %s", &start, &size, &c, sym) == 4) {
- sections_t::const_iterator si =
- dso.sections.upper_bound(section(start));
- if (si == dso.sections.end()) {
- printf("symbol in unknown section: %s\n", sym);
- continue;
- }
-
- start -= si->offset;
-
- dso.syms.insert(symbol(start, size, sym));
- }


- }
- pclose(file);
-}
-

-static void load_dso(std::string dso_name)
-{
- load_dso_sections(dso_name);
- load_dso_symbols(dso_name, "-D"); /* dynamic symbols */
- load_dso_symbols(dso_name, ""); /* regular ones */
-}
-
-void load_kallsyms(void)
-{
- struct dso &dso = dsos["[kernel]"];
-
- FILE *file = fopen("/proc/kallsyms", "r");
- if (!file) {
- perror("failed to open kallsyms");
- exit(-1);
- }
-
- char *line;
- size_t n;
-
- while (!feof(file)) {
- uint64_t start;
- char c;
- char sym[1024000];
-
- if (getline(&line, &n, file) < 0)
- break;
- if (!line)
- break;
-
- if (sscanf(line, "%Lx %c %s", &start, &c, sym) == 3)
- dso.syms.insert(symbol(start, 0x1000000, std::string(sym)));
- }
- fclose(file);
-}
-
-struct map {


- uint64_t start;
- uint64_t end;

- uint64_t pgoff;
-
- std::string dso;
-
- map() { };
-
- map(uint64_t ip) : end(ip) { }
-
- map(mmap_event *mmap) {
- start = mmap->start;
- end = mmap->start + mmap->len;
- pgoff = mmap->pgoff;
-
- dso = std::string(mmap->filename);
-
- if (dsos.find(dso) == dsos.end())
- load_dso(dso);
- };
-
- bool operator < (const struct map &m) const {
- return end < m.end;
- };
-};
-
-typedef std::set<struct map> maps_t;
-
-static std::map<int, maps_t> maps;
-
-static std::map<int, std::string> comms;
-
-static std::map<std::string, int> hist;
-static std::multimap<int, std::string> rev_hist;
-
-static std::string resolve_comm(int pid)
-{
- std::string comm;
-
- std::map<int, std::string>::const_iterator ci = comms.find(pid);
- if (ci != comms.end()) {
- comm = ci->second;
- } else {
- char pid_str[30];
-
- sprintf(pid_str, ":%d", pid);
- comm = pid_str;
- }
-
- return comm;
-}
-

-static std::string resolve_kernel_symbol(uint64_t ip)
-{
- std::string sym = "<unknown>";
-
- symbols_t &s = dsos["[kernel]"].syms;
- symbols_t::const_iterator si = s.upper_bound(symbol(ip));


-
- if (si == s.begin())
- return sym;
- si--;
-

- if (si->start <= ip && ip < si->end)
- sym = si->name;


-
- return sym;
-}
-

-int main(int argc, char *argv[])
-{
- unsigned long offset = 0;
- unsigned long head = 0;
- struct stat stat;
- char *buf;
- event_t *event;
- int ret;
- unsigned long total = 0;


-
- page_size = getpagesize();
-
- process_options(argc, argv);
-
- input = open(input_name, O_RDONLY);

- if (input < 0) {
- perror("failed to open file");
- exit(-1);
- }
-
- ret = fstat(input, &stat);
- if (ret < 0) {
- perror("failed to stat file");
- exit(-1);
- }
-
- if (!stat.st_size) {
- fprintf(stderr, "zero-sized file, nothing to do!\n");
- exit(0);
- }
-
- load_kallsyms();
-
-remap:
- buf = (char *)mmap(NULL, page_size * mmap_window, PROT_READ,
- MAP_SHARED, input, offset);
- if (buf == MAP_FAILED) {
- perror("failed to mmap file");
- exit(-1);
- }
-
-more:
- event = (event_t *)(buf + head);
-
- if (head + event->header.size >= page_size * mmap_window) {
- unsigned long shift = page_size * (head / page_size);
- int ret;
-
- ret = munmap(buf, page_size * mmap_window);
- assert(ret == 0);
-
- offset += shift;
- head -= shift;
- goto remap;
- }
-
-
- if (!event->header.size) {
- fprintf(stderr, "zero-sized event at file offset %ld\n", offset + head);
- fprintf(stderr, "skipping %ld bytes of events.\n", stat.st_size - offset - head);
- goto done;
- }
-
- head += event->header.size;
-
- if (event->header.misc & PERF_EVENT_MISC_OVERFLOW) {
- std::string comm, sym, level;
- int show = 0;
- char output[1024];
-
- if (event->header.misc & PERF_EVENT_MISC_KERNEL) {
- show |= SHOW_KERNEL;
- level = " [k] ";
- sym = resolve_kernel_symbol(event->ip.ip);
- } else if (event->header.misc & PERF_EVENT_MISC_USER) {
- show |= SHOW_USER;
- level = " [.] ";
- sym = resolve_user_symbol(event->ip.pid, event->ip.ip);
- } else {
- show |= SHOW_HV;
- level = " [H] ";
- }
-
- if (show & show_mask) {
- comm = resolve_comm(event->ip.pid);
- snprintf(output, sizeof(output), "%16s %s %s",
- comm.c_str(), level.c_str(), sym.c_str());
- hist[output]++;
- }
-
- total++;
-
- } else switch (event->header.type) {
- case PERF_EVENT_MMAP:
- maps[event->mmap.pid].insert(map(&event->mmap));
- break;
-
- case PERF_EVENT_COMM:
- comms[event->comm.pid] = std::string(event->comm.comm);
- break;
- }
-
- if (offset + head < stat.st_size)
- goto more;
-
-done:
-
- close(input);
-
- std::map<std::string, int>::iterator hi = hist.begin();
-
- while (hi != hist.end()) {
- rev_hist.insert(std::pair<int, std::string>(hi->second, hi->first));
- hist.erase(hi++);
- }
-
- std::multimap<int, std::string>::const_iterator ri = rev_hist.begin();
-
- while (ri != rev_hist.end()) {
- printf(" %5.2f %s\n", (100.0 * ri->first)/total, ri->second.c_str());
- ri++;


- }
-
- return 0;
-}
-

tip-bot for Mike Galbraith

unread,
May 26, 2009, 9:30:20 AM5/26/09
to
Commit-ID: db20c0031288ff524d82b1f240f35f85d4a052eb
Gitweb: http://git.kernel.org/tip/db20c0031288ff524d82b1f240f35f85d4a052eb
Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Tue, 26 May 2009 15:25:34 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 15:25:34 +0200

perf top: fix typo in -d option

Clean up copy/paste options parsing conversion error.

[ Impact: reactivate -d option ]

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>

Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-top.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/perf_counter/builtin-top.c b/Documentation/perf_counter/builtin-top.c
index cacaa3c..6b1c66f 100644
--- a/Documentation/perf_counter/builtin-top.c
+++ b/Documentation/perf_counter/builtin-top.c
@@ -727,7 +727,7 @@ static const struct option options[] = {


"number of mmap data pages"),

OPT_INTEGER('r', "realtime", &realtime_prio,

"collect data with this RT SCHED_FIFO priority"),

- OPT_INTEGER('d', "delay", &realtime_prio,
+ OPT_INTEGER('d', "delay", &delay_secs,


"number of seconds to delay between refreshes"),

OPT_BOOLEAN('D', "dump-symtab", &dump_symtab,

"dump the symbol table used for profiling"),

tip-bot for Mike Galbraith

unread,
May 26, 2009, 9:30:18 AM5/26/09
to
Commit-ID: f91183fe3780d44849110a1653dfe8af7bc67aa4
Gitweb: http://git.kernel.org/tip/f91183fe3780d44849110a1653dfe8af7bc67aa4

Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Tue, 26 May 2009 15:25:34 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 15:25:34 +0200

perf top: Remove leftover NMI/IRQ bits

79202b removed IRQ/NMI mode selection, so remove it from
perf top as well.

[ Impact: cleanup ]

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-top.c | 10 ++++------
1 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/Documentation/perf_counter/builtin-top.c b/Documentation/perf_counter/builtin-top.c
index 87b925c..cacaa3c 100644
--- a/Documentation/perf_counter/builtin-top.c
+++ b/Documentation/perf_counter/builtin-top.c
@@ -8,7 +8,7 @@
Sample output:



------------------------------------------------------------------------------
- KernelTop: 2669 irqs/sec [NMI, cache-misses/cache-refs], (all, cpu: 2)

+ KernelTop: 2669 irqs/sec [cache-misses/cache-refs], (all, cpu: 2)
------------------------------------------------------------------------------

weight RIP kernel function
@@ -92,7 +92,6 @@ static __u64 count_filter = 100;


static int target_pid = -1;
static int profile_cpu = -1;
static int nr_cpus = 0;

-static int nmi = 1;

static unsigned int realtime_prio = 0;

static int group = 0;
static unsigned int page_size;

@@ -198,10 +197,9 @@ static void print_sym_table(void)

printf(
"------------------------------------------------------------------------------\n");
- printf( " KernelTop:%8.0f irqs/sec kernel:%4.1f%% [%s, ",
+ printf( " KernelTop:%8.0f irqs/sec kernel:%4.1f%% [",
events_per_sec,
- 100.0 - (100.0*((events_per_sec-kevents_per_sec)/events_per_sec)),
- nmi ? "NMI" : "IRQ");
+ 100.0 - (100.0*((events_per_sec-kevents_per_sec)/events_per_sec)));

if (nr_counters == 1)
printf("%d ", event_count[0]);
@@ -637,7 +635,7 @@ static int __cmd_top(void)
hw_event.config = event_id[counter];
hw_event.irq_period = event_count[counter];
hw_event.record_type = PERF_RECORD_IP | PERF_RECORD_TID;
- hw_event.nmi = nmi;
+ hw_event.nmi = 1;
hw_event.mmap = use_mmap;


hw_event.munmap = use_munmap;
hw_event.freq = freq;

tip-bot for Arnaldo Carvalho de Melo

unread,
May 26, 2009, 10:30:10 AM5/26/09
to
Commit-ID: 59d81029b6804c3d5895d07cad77d7dfddc6b5b2
Gitweb: http://git.kernel.org/tip/59d81029b6804c3d5895d07cad77d7dfddc6b5b2
Author: Arnaldo Carvalho de Melo <ac...@redhat.com>
AuthorDate: Tue, 26 May 2009 11:14:27 -0300
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 16:19:05 +0200

perf report: Fix kernel symbol resolution

kallsyms have just the symbol start, so we need to read two lines
to get the len.

[ Impact: fix incorrect kernel symbol display in perf report ]

Signed-off-by: Arnaldo Carvalho de Melo <ac...@redhat.com>


Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>

Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: Thomas Gleixner <tg...@linutronix.de>


Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 30 ++++++++++++++++++++------
1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 697f960..b19b893 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -360,9 +360,17 @@ static int load_kallsyms(void)
char *line = NULL;
size_t n;

+ if (getline(&line, &n, file) < 0 || !line)
+ goto out_delete_dso;
+
+ unsigned long long previous_start;
+ char c, previous_symbf[4096];
+ if (sscanf(line, "%llx %c %s", &previous_start, &c, previous_symbf) != 3)
+ goto out_delete_line;
+
while (!feof(file)) {
unsigned long long start;
- char c, symbf[4096];
+ char symbf[4096];



if (getline(&line, &n, file) < 0)

break;
@@ -371,12 +379,18 @@ static int load_kallsyms(void)
goto out_delete_dso;

if (sscanf(line, "%llx %c %s", &start, &c, symbf) == 3) {
- struct symbol *sym = symbol__new(start, 0x1000000, symbf);
+ if (start > previous_start) {
+ struct symbol *sym = symbol__new(previous_start,
+ start - previous_start,
+ previous_symbf);

- if (sym == NULL)
- goto out_delete_dso;


+ if (sym == NULL)
+ goto out_delete_dso;

- dso__insert_symbol(kernel_dso, sym);
+ dso__insert_symbol(kernel_dso, sym);
+ previous_start = start;
+ strcpy(previous_symbf, symbf);
+ }
}
}

@@ -385,6 +399,8 @@ static int load_kallsyms(void)
fclose(file);
return 0;

+out_delete_line:
+ free(line);
out_delete_dso:
dso__delete(kernel_dso);
return -1;

tip-bot for Peter Zijlstra

unread,
May 26, 2009, 10:30:12 AM5/26/09
to
Commit-ID: f17e04afaff84b5cfd317da29ac4d764908ff833
Gitweb: http://git.kernel.org/tip/f17e04afaff84b5cfd317da29ac4d764908ff833
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Tue, 26 May 2009 15:30:22 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 16:18:43 +0200

perf report: Fix ELF symbol parsing

[ Impact: fix DSO symbol output in perf report ]

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/Makefile | 2 +-
Documentation/perf_counter/builtin-report.c | 72 ++++++++-------------------
2 files changed, 22 insertions(+), 52 deletions(-)

diff --git a/Documentation/perf_counter/Makefile b/Documentation/perf_counter/Makefile
index 412dea1..10c13a6 100644
--- a/Documentation/perf_counter/Makefile
+++ b/Documentation/perf_counter/Makefile
@@ -159,7 +159,7 @@ uname_V := $(shell sh -c 'uname -v 2>/dev/null || echo not')



# CFLAGS and LDFLAGS are for the users to override from the command line.

-CFLAGS = -g -O2 -Wall
+CFLAGS = -ggdb3 -Wall


LDFLAGS = -lpthread -lrt -lelf
ALL_CFLAGS = $(CFLAGS)
ALL_LDFLAGS = $(LDFLAGS)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 9e59d60..697f960 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -55,34 +55,6 @@ typedef union event_union {
struct comm_event comm;
} event_t;

-struct section {
- struct list_head node;


- uint64_t start;
- uint64_t end;

- uint64_t offset;
- char name[0];
-};
-
-struct section *section__new(uint64_t start, uint64_t size,
- uint64_t offset, char *name)
-{
- struct section *self = malloc(sizeof(*self) + strlen(name) + 1);
-
- if (self != NULL) {
- self->start = start;
- self->end = start + size;
- self->offset = offset;
- strcpy(self->name, name);
- }
-
- return self;
-}
-
-static void section__delete(struct section *self)
-{
- free(self);
-}
-
struct symbol {
struct rb_node rb_node;
uint64_t start;
@@ -116,7 +88,6 @@ static size_t symbol__fprintf(struct symbol *self, FILE *fp)



struct dso {
struct list_head node;

- struct list_head sections;


struct rb_root syms;
char name[0];
};

@@ -127,21 +98,12 @@ static struct dso *dso__new(const char *name)



if (self != NULL) {
strcpy(self->name, name);

- INIT_LIST_HEAD(&self->sections);


self->syms = RB_ROOT;
}

return self;
}

-static void dso__delete_sections(struct dso *self)
-{
- struct section *pos, *n;
-
- list_for_each_entry_safe(pos, n, &self->sections, node)
- section__delete(pos);
-}
-


static void dso__delete_symbols(struct dso *self)
{

struct symbol *pos;
@@ -156,7 +118,6 @@ static void dso__delete_symbols(struct dso *self)



static void dso__delete(struct dso *self)

{
- dso__delete_sections(self);
dso__delete_symbols(self);
free(self);
}
@@ -282,9 +243,6 @@ static int dso__load(struct dso *self)
if (sec == NULL)
goto out_elf_end;

- if (gelf_getshdr(sec, &shdr) == NULL)
- goto out_elf_end;
-


Elf_Data *syms = elf_getdata(sec, NULL);

if (syms == NULL)
goto out_elf_end;
@@ -302,11 +260,21 @@ static int dso__load(struct dso *self)
GElf_Sym sym;
uint32_t index;
elf_symtab__for_each_symbol(syms, nr_syms, index, sym) {
+ struct symbol *f;
+
if (!elf_sym__is_function(&sym))
continue;
- struct symbol *f = symbol__new(sym.st_value, sym.st_size,
- elf_sym__name(&sym, symstrs));
- if (f == NULL)
+
+ sec = elf_getscn(elf, sym.st_shndx);
+ if (!sec)
+ goto out_elf_end;
+
+ gelf_getshdr(sec, &shdr);
+ sym.st_value -= shdr.sh_addr - shdr.sh_offset;
+
+ f = symbol__new(sym.st_value, sym.st_size,
+ elf_sym__name(&sym, symstrs));
+ if (!f)
goto out_elf_end;

dso__insert_symbol(self, f);
@@ -498,7 +466,7 @@ static size_t symhist__fprintf(struct symhist *self, FILE *fp)
ret += fprintf(fp, "%s", self->sym ? self->sym->name: "<unknown>");
else


ret += fprintf(fp, "%s: %s",

- self->dso ? self->dso->name : "<unknown",


+ self->dso ? self->dso->name : "<unknown>",

self->sym ? self->sym->name : "<unknown>");

return ret + fprintf(fp, ": %u\n", self->count);
}

@@ -714,6 +682,7 @@ more:
int show = 0;
struct dso *dso = NULL;


struct thread *thread = threads__findnew(event->ip.pid);

+ uint64_t ip = event->ip.ip;

if (thread == NULL) {


fprintf(stderr, "problem processing %d event, bailing out\n",

@@ -728,19 +697,20 @@ more:


} else if (event->header.misc & PERF_EVENT_MISC_USER) {

show = SHOW_USER;
level = '.';
- struct map *map = thread__find_map(thread, event->ip.ip);
- if (map != NULL)
+ struct map *map = thread__find_map(thread, ip);
+ if (map != NULL) {
dso = map->dso;
+ ip -= map->start + map->pgoff;
+ }
} else {
show = SHOW_HV;
level = 'H';
}

if (show & show_mask) {
- struct symbol *sym = dso__find_symbol(dso, event->ip.ip);
+ struct symbol *sym = dso__find_symbol(dso, ip);

- if (thread__symbol_incnew(thread, sym, event->ip.ip,
- dso, level)) {
+ if (thread__symbol_incnew(thread, sym, ip, dso, level)) {


fprintf(stderr, "problem incrementing symbol count, bailing out\n");
goto done;
}

Arnaldo Carvalho de Melo

unread,
May 26, 2009, 11:30:24 AM5/26/09
to
Please fix the previous fix with this:

commit 64d254523be740b224533e0e4982cda7f25c0348


Author: Arnaldo Carvalho de Melo <ac...@redhat.com>

Date: Tue May 26 12:08:10 2009 -0300

perf: Don't assume /proc/kallsyms is ordered

Since we _are_ ordering it by the symbol start, just traverse the
freshly built rbtree setting the prev->end members to curr->start - 1.



Signed-off-by: Arnaldo Carvalho de Melo <ac...@redhat.com>

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index b19b893..5a385e8 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -360,17 +360,9 @@ static int load_kallsyms(void)


char *line = NULL;
size_t n;

- if (getline(&line, &n, file) < 0 || !line)
- goto out_delete_dso;
-
- unsigned long long previous_start;
- char c, previous_symbf[4096];
- if (sscanf(line, "%llx %c %s", &previous_start, &c, previous_symbf) != 3)
- goto out_delete_line;
-


while (!feof(file)) {
unsigned long long start;

- char symbf[4096];
+ char c, symbf[4096];



if (getline(&line, &n, file) < 0)
break;

@@ -379,21 +371,35 @@ static int load_kallsyms(void)


goto out_delete_dso;

if (sscanf(line, "%llx %c %s", &start, &c, symbf) == 3) {

- if (start > previous_start) {
- struct symbol *sym = symbol__new(previous_start,
- start - previous_start,
- previous_symbf);
+ /*
+ * Well fix up the end later, when we have all sorted.
+ */
+ struct symbol *sym = symbol__new(start, 0xdead, symbf);



- if (sym == NULL)
- goto out_delete_dso;
+ if (sym == NULL)
+ goto out_delete_dso;

- dso__insert_symbol(kernel_dso, sym);

- previous_start = start;
- strcpy(previous_symbf, symbf);
- }
+ dso__insert_symbol(kernel_dso, sym);
}
}

+ /*
+ * Now that we have all sorted out, just set the ->end of all
+ * symbols
+ */
+ struct rb_node *nd, *prevnd = rb_first(&kernel_dso->syms);
+
+ if (prevnd == NULL)
+ goto out_delete_line;
+
+ for (nd = rb_next(prevnd); nd; nd = rb_next(nd)) {
+ struct symbol *prev = rb_entry(prevnd, struct symbol, rb_node),
+ *curr = rb_entry(nd, struct symbol, rb_node);
+
+ prev->end = curr->start - 1;
+ prevnd = nd;
+ }
+
dsos__add(kernel_dso);
free(line);
fclose(file);

tip-bot for Arnaldo Carvalho de Melo

unread,
May 26, 2009, 11:50:13 AM5/26/09
to
Commit-ID: abd54f68629fa73ed4fa040d433196211a9bbed2
Gitweb: http://git.kernel.org/tip/abd54f68629fa73ed4fa040d433196211a9bbed2

Author: Arnaldo Carvalho de Melo <ac...@redhat.com>
AuthorDate: Tue, 26 May 2009 12:21:34 -0300
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 17:36:13 +0200

perf: Don't assume /proc/kallsyms is ordered

perf: Don't assume /proc/kallsyms is ordered

Since we _are_ ordering it by the symbol start, just traverse the
freshly built rbtree setting the prev->end members to curr->start - 1.

Signed-off-by: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>

Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <2009052615...@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 48 +++++++++++++++------------
1 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index b19b893..e178190 100644

Peter Zijlstra

unread,
May 26, 2009, 2:00:13 PM5/26/09
to
On Tue, 2009-05-26 at 12:21 -0300, Arnaldo Carvalho de Melo wrote:
> Please fix the previous fix with this:

You don't need a second walk through the RB-tree like that, simply
change the lookup function:

The below finds the first entry that has ->start > ip, we then walk
backwards until we find an entry that has start <= ip < end and end > ip
(should never be more than 1).

This way you can deal with holes (like userspace has), and deal with
entries without size (like kallsyms) by setting size to a random large
value.

---

Index: linux-2.6/Documentation/perf_counter/builtin-report.c
===================================================================
--- linux-2.6.orig/Documentation/perf_counter/builtin-report.c
+++ linux-2.6/Documentation/perf_counter/builtin-report.c
@@ -147,16 +147,25 @@ static struct symbol *dso__find_symbol(s
return NULL;



struct rb_node *n = self->syms.rb_node;

+ struct rb_node *last = NULL;
+ struct symbol *s;

while (n) {
- struct symbol *s = rb_entry(n, struct symbol, rb_node);
+ last = n;
+ s = rb_entry(n, struct symbol, rb_node);

if (ip < s->start)
n = n->rb_left;
- else if (ip > s->end)
- n = n->rb_right;
else
+ n = n->rb_right;
+ }
+
+ while (last) {
+ s = rb_entry(last, struct symbol, rb_node);
+ if (s->start <= ip && ip < s->end)
return s;
+
+ last = rb_prev(last);
}

return NULL;

tip-bot for Ingo Molnar

unread,
May 26, 2009, 2:00:14 PM5/26/09
to
Commit-ID: 97b07b699b11d4bd1218a841e5dfed16bd53de06
Gitweb: http://git.kernel.org/tip/97b07b699b11d4bd1218a841e5dfed16bd53de06
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 18:48:58 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 18:48:58 +0200

perf report: add --dump-raw-trace option

To help the inspection of various data files, implement an ASCII dump
method that just dumps the records as they are read in - then we exit.

[ Impact: new feature ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 39 ++++++++++++++++++++++++++-
1 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index e178190..8ea8aaa 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -20,6 +20,8 @@ static char const *input_name = "output.perf";
static int input;


static int show_mask = SHOW_KERNEL | SHOW_USER | SHOW_HV;

+static int dump_trace = 0;
+
static unsigned long page_size;


static unsigned long mmap_window = 32;

@@ -643,7 +645,7 @@ static int __cmd_report(void)
char *buf;
event_t *event;


int ret, rc = EXIT_FAILURE;

- unsigned long total = 0;

+ unsigned long total = 0, total_mmap = 0, total_comm = 0;



input = open(input_name, O_RDONLY);
if (input < 0) {

@@ -706,6 +708,13 @@ more:


struct thread *thread = threads__findnew(event->ip.pid);

uint64_t ip = event->ip.ip;

+ if (dump_trace) {
+ fprintf(stderr, "PERF_EVENT (IP, %d): %d: %p\n",
+ event->header.misc,
+ event->ip.pid,
+ (void *)event->ip.ip);
+ }
+


if (thread == NULL) {
fprintf(stderr, "problem processing %d event, bailing out\n",

event->header.type);
@@ -743,23 +752,40 @@ more:
struct thread *thread = threads__findnew(event->mmap.pid);


struct map *map = map__new(&event->mmap);

+ if (dump_trace) {
+ fprintf(stderr, "PERF_EVENT_MMAP: [%p(%p) @ %p]: %s\n",
+ (void *)event->mmap.start,
+ (void *)event->mmap.len,
+ (void *)event->mmap.pgoff,
+ event->mmap.filename);
+ }
if (thread == NULL || map == NULL) {
fprintf(stderr, "problem processing PERF_EVENT_MMAP, bailing out\n");
goto done;
}
thread__insert_map(thread, map);
+ total_mmap++;
break;
}
case PERF_EVENT_COMM: {
struct thread *thread = threads__findnew(event->comm.pid);

+ if (dump_trace) {
+ fprintf(stderr, "PERF_EVENT_COMM: %s:%d\n",
+ event->comm.comm, event->comm.pid);


+ }
if (thread == NULL ||

thread__set_comm(thread, event->comm.comm)) {
fprintf(stderr, "problem processing PERF_EVENT_COMM, bailing out\n");
goto done;
}
+ total_comm++;
break;
}
+ default: {
+ fprintf(stderr, "skipping unknown header type: %d\n",
+ event->header.type);
+ }


}

if (offset + head < stat.st_size)

@@ -768,6 +794,15 @@ more:
rc = EXIT_SUCCESS;
done:
close(input);
+
+ if (dump_trace) {
+ fprintf(stderr, " IP events: %10ld\n", total);
+ fprintf(stderr, " mmap events: %10ld\n", total_mmap);
+ fprintf(stderr, " comm events: %10ld\n", total_comm);
+


+ return 0;
+ }
+

//dsos__fprintf(stdout);
threads__fprintf(stdout);
#if 0
@@ -796,6 +831,8 @@ static const char * const report_usage[] = {


static const struct option options[] = {

OPT_STRING('i', "input", &input_name, "file",

"input file name"),
+ OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
+ "dump raw trace in ASCII"),
OPT_END()

tip-bot for Ingo Molnar

unread,
May 26, 2009, 2:20:10 PM5/26/09
to
Commit-ID: 3e70611460fe74ad32534fa9791774f6bbdd4159
Gitweb: http://git.kernel.org/tip/3e70611460fe74ad32534fa9791774f6bbdd4159
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 18:53:17 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 18:53:17 +0200

perf report: add counter for unknown events

Add a counter for unknown event records.

[ Impact: improve debugging ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 10 ++++++----
1 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 8ea8aaa..4b5ccc5 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -645,7 +645,7 @@ static int __cmd_report(void)


char *buf;
event_t *event;
int ret, rc = EXIT_FAILURE;

- unsigned long total = 0, total_mmap = 0, total_comm = 0;
+ unsigned long total = 0, total_mmap = 0, total_comm = 0, total_unknown;



input = open(input_name, O_RDONLY);
if (input < 0) {

@@ -785,6 +785,7 @@ more:
default: {


fprintf(stderr, "skipping unknown header type: %d\n",

event->header.type);
+ total_unknown++;
}
}

@@ -796,9 +797,10 @@ done:
close(input);

if (dump_trace) {
- fprintf(stderr, " IP events: %10ld\n", total);
- fprintf(stderr, " mmap events: %10ld\n", total_mmap);
- fprintf(stderr, " comm events: %10ld\n", total_comm);


+ fprintf(stderr, " IP events: %10ld\n", total);
+ fprintf(stderr, " mmap events: %10ld\n", total_mmap);
+ fprintf(stderr, " comm events: %10ld\n", total_comm);

+ fprintf(stderr, " unknown events: %10ld\n", total_unknown);

return 0;

tip-bot for Ingo Molnar

unread,
May 26, 2009, 2:20:12 PM5/26/09
to
Commit-ID: f95f33e6d0e6ee55f8ed62f70e4febe23e4830dc
Gitweb: http://git.kernel.org/tip/f95f33e6d0e6ee55f8ed62f70e4febe23e4830dc
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 19:03:36 +0200
Committer: Ingo Molnar <mi...@elte.hu>

CommitDate: Tue, 26 May 2009 19:03:36 +0200

perf report: add more debugging

Add the offset of the file we are analyzing.

In case of problems it's easier to see where the parser lost track.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 12 ++++++++----
1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 4b5ccc5..1b16f81 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -709,7 +709,8 @@ more:


uint64_t ip = event->ip.ip;

if (dump_trace) {
- fprintf(stderr, "PERF_EVENT (IP, %d): %d: %p\n",
+ fprintf(stderr, "%p: PERF_EVENT (IP, %d): %d: %p\n",
+ (void *)(offset + head),
event->header.misc,
event->ip.pid,
(void *)event->ip.ip);
@@ -753,7 +754,8 @@ more:


struct map *map = map__new(&event->mmap);

if (dump_trace) {
- fprintf(stderr, "PERF_EVENT_MMAP: [%p(%p) @ %p]: %s\n",
+ fprintf(stderr, "%p: PERF_EVENT_MMAP: [%p(%p) @ %p]: %s\n",
+ (void *)(offset + head),
(void *)event->mmap.start,
(void *)event->mmap.len,
(void *)event->mmap.pgoff,
@@ -771,7 +773,8 @@ more:
struct thread *thread = threads__findnew(event->comm.pid);

if (dump_trace) {
- fprintf(stderr, "PERF_EVENT_COMM: %s:%d\n",
+ fprintf(stderr, "%p: PERF_EVENT_COMM: %s:%d\n",
+ (void *)(offset + head),
event->comm.comm, event->comm.pid);
}
if (thread == NULL ||
@@ -783,7 +786,8 @@ more:
break;
}
default: {
- fprintf(stderr, "skipping unknown header type: %d\n",
+ fprintf(stderr, "%p: skipping unknown header type: %d\n",
+ (void *)(offset + head),
event->header.type);
total_unknown++;

tip-bot for Ingo Molnar

unread,
May 26, 2009, 2:30:11 PM5/26/09
to
Commit-ID: 78bfcc8556838bad563b781c49e5f20ec5831611
Gitweb: http://git.kernel.org/tip/78bfcc8556838bad563b781c49e5f20ec5831611

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Tue, 26 May 2009 19:03:36 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 19:15:59 +0200

perf report: add more debugging

Add the offset of the file we are analyzing, and the size of the record.

tip-bot for Peter Zijlstra

unread,
May 26, 2009, 3:30:13 PM5/26/09
to
Commit-ID: 6142f9ec108a4ddbf0d5904c3daa5fdcaa618792
Gitweb: http://git.kernel.org/tip/6142f9ec108a4ddbf0d5904c3daa5fdcaa618792
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Tue, 26 May 2009 20:51:47 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Tue, 26 May 2009 20:17:46 +0200

perf report: More robust error handling

Don't let funny events confuse us, stick to what we know and
try to find sensible data again.

If we find an unknown event, check we're still u64 aligned, and
increment by one u64. This ensures we're bound to happen upon a
valid event soon.

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 27 ++++++++++++++++++++-------
1 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 2d4e4cc..a58be7f 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -645,6 +645,7 @@ static int __cmd_report(void)


char *buf;
event_t *event;
int ret, rc = EXIT_FAILURE;

+ uint32_t size;
unsigned long total = 0, total_mmap = 0, total_comm = 0, total_unknown = 0;

input = open(input_name, O_RDONLY);
@@ -680,6 +681,10 @@ remap:
more:


event = (event_t *)(buf + head);

+ size = event->header.size;
+ if (!size)
+ size = 8;
+


if (head + event->header.size >= page_size * mmap_window) {

unsigned long shift = page_size * (head / page_size);

int ret;
@@ -692,12 +697,9 @@ more:
goto remap;


}

-
- if (!event->header.size) {
- fprintf(stderr, "zero-sized event at file offset %ld\n", offset + head);
- fprintf(stderr, "skipping %ld bytes of events.\n", stat.st_size - offset - head);
- goto done;
- }

+ size = event->header.size;
+ if (!size)
+ goto broken_event;



if (event->header.misc & PERF_EVENT_MISC_OVERFLOW) {

char level;
@@ -787,15 +789,26 @@ more:
break;
}
default: {
+broken_event:
fprintf(stderr, "%p [%p]: skipping unknown header type: %d\n",
(void *)(offset + head),
(void *)(long)(event->header.size),
event->header.type);
total_unknown++;
+
+ /*
+ * assume we lost track of the stream, check alignment, and
+ * increment a single u64 in the hope to catch on again 'soon'.
+ */
+
+ if (unlikely(head & 7))
+ head &= ~7ULL;
+
+ size = 8;


}
}

- head += event->header.size;

+ head += size;



if (offset + head < stat.st_size)

goto more;

tip-bot for Ingo Molnar

unread,
May 27, 2009, 3:50:16 AM5/27/09
to
Commit-ID: 23ac9cbed82b00ca3520bb81dbe9ea3b7a936a1b
Gitweb: http://git.kernel.org/tip/23ac9cbed82b00ca3520bb81dbe9ea3b7a936a1b
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Wed, 27 May 2009 09:33:18 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 27 May 2009 09:33:18 +0200

perf_counter tools: Rename output.perf to perf.data

output.perf is only output to perf-record - it's input to
perf-report. So change it to a more direction-neutral name.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
.../perf_counter/Documentation/perf-record.txt | 4 ++--
.../perf_counter/Documentation/perf-report.txt | 4 ++--
Documentation/perf_counter/builtin-record.c | 2 +-
Documentation/perf_counter/builtin-report.c | 2 +-
4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/perf_counter/Documentation/perf-record.txt b/Documentation/perf_counter/Documentation/perf-record.txt
index d07700e..353db1b 100644
--- a/Documentation/perf_counter/Documentation/perf-record.txt
+++ b/Documentation/perf_counter/Documentation/perf-record.txt
@@ -3,7 +3,7 @@ perf-record(1)

NAME
----
-perf-record - Run a command and record its profile into output.perf
+perf-record - Run a command and record its profile into perf.data

SYNOPSIS
--------
@@ -13,7 +13,7 @@ SYNOPSIS
DESCRIPTION
-----------
This command runs a command and gathers a performance counter profile
-from it, into output.perf - without displaying anything.
+from it, into perf.data - without displaying anything.

This file can then be inspected later on, using 'perf report'.

diff --git a/Documentation/perf_counter/Documentation/perf-report.txt b/Documentation/perf_counter/Documentation/perf-report.txt
index 64696a2..49efe16 100644
--- a/Documentation/perf_counter/Documentation/perf-report.txt
+++ b/Documentation/perf_counter/Documentation/perf-report.txt
@@ -3,7 +3,7 @@ perf-report(1)

NAME
----
-perf-report - Read output.perf (created by perf record) and display the profile
+perf-report - Read perf.data (created by perf record) and display the profile

SYNOPSIS
--------
@@ -19,7 +19,7 @@ OPTIONS
-------
-i::
--input=::
- Input file name. (default: output.perf)
+ Input file name. (default: perf.data)

Configuration
-------------
diff --git a/Documentation/perf_counter/builtin-record.c b/Documentation/perf_counter/builtin-record.c
index 68abfdf..431077a 100644
--- a/Documentation/perf_counter/builtin-record.c
+++ b/Documentation/perf_counter/builtin-record.c
@@ -19,7 +19,7 @@ static int nr_cpus = 0;


static unsigned int page_size;
static unsigned int mmap_pages = 16;
static int output;

-static const char *output_name = "output.perf";
+static const char *output_name = "perf.data";


static int group = 0;

static unsigned int realtime_prio = 0;
static int system_wide = 0;
diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 7f1255d..e2712cd 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -18,7 +18,7 @@


#define SHOW_USER 2
#define SHOW_HV 4

-static char const *input_name = "output.perf";
+static char const *input_name = "perf.data";


static int input;
static int show_mask = SHOW_KERNEL | SHOW_USER | SHOW_HV;

tip-bot for Ingo Molnar

unread,
May 27, 2009, 5:10:08 AM5/27/09
to
Commit-ID: a930d2c0d0a685ab955472b08baad041cc5edb4a
Gitweb: http://git.kernel.org/tip/a930d2c0d0a685ab955472b08baad041cc5edb4a
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Wed, 27 May 2009 09:50:13 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 27 May 2009 09:59:00 +0200

perf_counter tools: Add built-in pager support

Add Git's pager.c (and sigchain) code. A command only
has to call setup_pager() to get paged interactive
output.

Non-interactive (redirected, command-piped, etc.) uses
are not affected.

Update perf-report to make use of this.

[ Impact: new feature ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/Makefile | 4 +
Documentation/perf_counter/builtin-report.c | 3 +
Documentation/perf_counter/util/environment.c | 8 ++
Documentation/perf_counter/util/pager.c | 99 +++++++++++++++++++++++++
Documentation/perf_counter/util/sigchain.c | 52 +++++++++++++
Documentation/perf_counter/util/sigchain.h | 11 +++
6 files changed, 177 insertions(+), 0 deletions(-)

diff --git a/Documentation/perf_counter/Makefile b/Documentation/perf_counter/Makefile
index efb0589..51b13f9 100644
--- a/Documentation/perf_counter/Makefile
+++ b/Documentation/perf_counter/Makefile
@@ -297,11 +297,13 @@ LIB_H += util/util.h
LIB_H += util/help.h
LIB_H += util/strbuf.h
LIB_H += util/run-command.h
+LIB_H += util/sigchain.h

LIB_OBJS += util/abspath.o
LIB_OBJS += util/alias.o
LIB_OBJS += util/config.o
LIB_OBJS += util/ctype.o
+LIB_OBJS += util/environment.o


LIB_OBJS += util/exec_cmd.o
LIB_OBJS += util/help.o
LIB_OBJS += util/levenshtein.o

@@ -314,6 +316,8 @@ LIB_OBJS += util/quote.o
LIB_OBJS += util/strbuf.o
LIB_OBJS += util/usage.o
LIB_OBJS += util/wrapper.o
+LIB_OBJS += util/sigchain.o
+LIB_OBJS += util/pager.o



BUILTIN_OBJS += builtin-help.o
BUILTIN_OBJS += builtin-record.o

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index e2712cd..9aef7c5 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -7,6 +7,7 @@
#include <ctype.h>

#include "util/list.h"
+#include "util/cache.h"
#include "util/rbtree.h"

#include "perf.h"
@@ -992,5 +993,7 @@ int cmd_report(int argc, const char **argv, const char *prefix)



parse_options(argc, argv, options, report_usage, 0);

+ setup_pager();
+
return __cmd_report();
}
diff --git a/Documentation/perf_counter/util/environment.c b/Documentation/perf_counter/util/environment.c
new file mode 100644
index 0000000..9b1c819
--- /dev/null
+++ b/Documentation/perf_counter/util/environment.c
@@ -0,0 +1,8 @@
+/*
+ * We put all the perf config variables in this same object
+ * file, so that programs can link against the config parser
+ * without having to link against all the rest of perf.
+ */
+#include "cache.h"
+
+const char *pager_program;
diff --git a/Documentation/perf_counter/util/pager.c b/Documentation/perf_counter/util/pager.c
new file mode 100644
index 0000000..a28bcca
--- /dev/null
+++ b/Documentation/perf_counter/util/pager.c
@@ -0,0 +1,99 @@
+#include "cache.h"
+#include "run-command.h"
+#include "sigchain.h"
+
+/*
+ * This is split up from the rest of git so that we can do
+ * something different on Windows.
+ */
+
+static int spawned_pager;
+
+#ifndef __MINGW32__
+static void pager_preexec(void)
+{
+ /*
+ * Work around bug in "less" by not starting it until we
+ * have real input
+ */
+ fd_set in;
+
+ FD_ZERO(&in);
+ FD_SET(0, &in);
+ select(1, &in, NULL, &in, NULL);
+
+ setenv("LESS", "FRSX", 0);
+}
+#endif
+
+static const char *pager_argv[] = { "sh", "-c", NULL, NULL };
+static struct child_process pager_process;
+
+static void wait_for_pager(void)
+{
+ fflush(stdout);
+ fflush(stderr);
+ /* signal EOF to pager */
+ close(1);
+ close(2);
+ finish_command(&pager_process);
+}
+
+static void wait_for_pager_signal(int signo)
+{
+ wait_for_pager();
+ sigchain_pop(signo);
+ raise(signo);
+}
+
+void setup_pager(void)
+{
+ const char *pager = getenv("PERF_PAGER");
+
+ if (!isatty(1))
+ return;
+ if (!pager) {
+ if (!pager_program)
+ perf_config(perf_default_config, NULL);
+ pager = pager_program;
+ }
+ if (!pager)
+ pager = getenv("PAGER");
+ if (!pager)
+ pager = "less";
+ else if (!*pager || !strcmp(pager, "cat"))
+ return;
+
+ spawned_pager = 1; /* means we are emitting to terminal */
+
+ /* spawn the pager */
+ pager_argv[2] = pager;
+ pager_process.argv = pager_argv;
+ pager_process.in = -1;
+#ifndef __MINGW32__
+ pager_process.preexec_cb = pager_preexec;
+#endif
+ if (start_command(&pager_process))
+ return;
+
+ /* original process continues, but writes to the pipe */
+ dup2(pager_process.in, 1);
+ if (isatty(2))
+ dup2(pager_process.in, 2);
+ close(pager_process.in);
+
+ /* this makes sure that the parent terminates after the pager */
+ sigchain_push_common(wait_for_pager_signal);
+ atexit(wait_for_pager);
+}
+
+int pager_in_use(void)
+{
+ const char *env;
+
+ if (spawned_pager)
+ return 1;
+
+ env = getenv("PERF_PAGER_IN_USE");
+ return env ? perf_config_bool("PERF_PAGER_IN_USE", env) : 0;
+}
diff --git a/Documentation/perf_counter/util/sigchain.c b/Documentation/perf_counter/util/sigchain.c
new file mode 100644
index 0000000..1118b99
--- /dev/null
+++ b/Documentation/perf_counter/util/sigchain.c
@@ -0,0 +1,52 @@
+#include "sigchain.h"
+#include "cache.h"
+
+#define SIGCHAIN_MAX_SIGNALS 32
+
+struct sigchain_signal {
+ sigchain_fun *old;
+ int n;
+ int alloc;
+};
+static struct sigchain_signal signals[SIGCHAIN_MAX_SIGNALS];
+
+static void check_signum(int sig)
+{
+ if (sig < 1 || sig >= SIGCHAIN_MAX_SIGNALS)
+ die("BUG: signal out of range: %d", sig);
+}
+
+int sigchain_push(int sig, sigchain_fun f)
+{
+ struct sigchain_signal *s = signals + sig;
+ check_signum(sig);
+
+ ALLOC_GROW(s->old, s->n + 1, s->alloc);
+ s->old[s->n] = signal(sig, f);
+ if (s->old[s->n] == SIG_ERR)
+ return -1;
+ s->n++;


+ return 0;
+}
+

+int sigchain_pop(int sig)
+{
+ struct sigchain_signal *s = signals + sig;
+ check_signum(sig);
+ if (s->n < 1)
+ return 0;
+
+ if (signal(sig, s->old[s->n - 1]) == SIG_ERR)
+ return -1;
+ s->n--;


+ return 0;
+}
+

+void sigchain_push_common(sigchain_fun f)
+{
+ sigchain_push(SIGINT, f);
+ sigchain_push(SIGHUP, f);
+ sigchain_push(SIGTERM, f);
+ sigchain_push(SIGQUIT, f);
+ sigchain_push(SIGPIPE, f);
+}
diff --git a/Documentation/perf_counter/util/sigchain.h b/Documentation/perf_counter/util/sigchain.h
new file mode 100644
index 0000000..618083b
--- /dev/null
+++ b/Documentation/perf_counter/util/sigchain.h
@@ -0,0 +1,11 @@
+#ifndef SIGCHAIN_H
+#define SIGCHAIN_H
+
+typedef void (*sigchain_fun)(int);
+
+int sigchain_push(int sig, sigchain_fun f);
+int sigchain_pop(int sig);
+
+void sigchain_push_common(sigchain_fun f);
+
+#endif /* SIGCHAIN_H */

tip-bot for Mike Galbraith

unread,
May 27, 2009, 6:40:09 AM5/27/09
to
Commit-ID: ef65b2a0b3a2f82850144df6e6a7796f6d66da6b
Gitweb: http://git.kernel.org/tip/ef65b2a0b3a2f82850144df6e6a7796f6d66da6b
Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Wed, 27 May 2009 10:10:51 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 27 May 2009 12:31:03 +0200

perf record: Fix the profiling of existing pid or whole box

Perf record bails if no command argument is provided, so you can't use
naked -a or -p to profile a running task or the whole box.

Allow foreground profiling of an existing pid or the entire system.

[ Impact: fix command option handling bug ]

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-record.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/perf_counter/builtin-record.c b/Documentation/perf_counter/builtin-record.c
index 431077a..4a06866 100644
--- a/Documentation/perf_counter/builtin-record.c
+++ b/Documentation/perf_counter/builtin-record.c
@@ -354,7 +354,7 @@ static int __cmd_record(int argc, const char **argv)
signal(SIGCHLD, sig_handler);
signal(SIGINT, sig_handler);

- if (target_pid == -1) {
+ if (target_pid == -1 && argc) {
pid = fork();
if (pid < 0)


perror("failed to fork");

@@ -430,7 +430,7 @@ int cmd_record(int argc, const char **argv, const char *prefix)
create_events_help(events_help_msg);

argc = parse_options(argc, argv, options, record_usage, 0);
- if (!argc)
+ if (!argc && target_pid == -1 && !system_wide)
usage_with_options(record_usage, options);

if (!nr_counters) {

tip-bot for Ingo Molnar

unread,
May 27, 2009, 7:30:22 AM5/27/09
to
Commit-ID: d716fba49c7445ec87c3f045c59624fac03ee3f2
Gitweb: http://git.kernel.org/tip/d716fba49c7445ec87c3f045c59624fac03ee3f2
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Wed, 27 May 2009 13:19:59 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 27 May 2009 13:19:59 +0200

perf report: Remove <ctype.h> include

Pekka reported build failure in builtin-report.c:

CC builtin-report.o
In file included from builtin-report.c:7:
/usr/include/ctype.h:102: error: expected expression before token

And observed:

| Removing #include <ctype.h> from builtin-report.c makes the problem
| go away. I am running Ubuntu 9.04 that has gcc 4.3.3 and libc 2.9.

Reported-by: Pekka J Enberg <pen...@cs.helsinki.fi>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>


Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: Thomas Gleixner <tg...@linutronix.de>


Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 9aef7c5..6265bed 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -4,7 +4,6 @@


#include <libelf.h>
#include <gelf.h>
#include <elf.h>

-#include <ctype.h>

#include "util/list.h"
#include "util/cache.h"

tip-bot for Peter Zijlstra

unread,
May 27, 2009, 9:10:13 AM5/27/09
to
Commit-ID: b7a16eac5e679fb5f531b9eeff7db7952303e77d
Gitweb: http://git.kernel.org/tip/b7a16eac5e679fb5f531b9eeff7db7952303e77d
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Wed, 27 May 2009 13:35:35 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 27 May 2009 14:54:29 +0200

perf_counter: tools: /usr/lib/debug%s.debug support

Some distros seem to store debuginfo in weird places.

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 94 +++++++++++++++++++++-----
1 files changed, 76 insertions(+), 18 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 6265bed..a9ff49a 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -190,7 +190,8 @@ static inline int elf_sym__is_function(const GElf_Sym *sym)


{
return elf_sym__type(sym) == STT_FUNC &&
sym->st_name != 0 &&

- sym->st_shndx != SHN_UNDEF;
+ sym->st_shndx != SHN_UNDEF &&
+ sym->st_size != 0;


}

static inline const char *elf_sym__name(const GElf_Sym *sym,

@@ -222,11 +223,11 @@ static Elf_Scn *elf_section_by_name(Elf *elf, GElf_Ehdr *ep,
return sec;
}

-static int dso__load(struct dso *self)
+static int dso__load_sym(struct dso *self, int fd, char *name)
{
Elf_Data *symstrs;
uint32_t nr_syms;
- int fd, err = -1;
+ int err = -1;
uint32_t index;
GElf_Ehdr ehdr;
GElf_Shdr shdr;
@@ -234,16 +235,12 @@ static int dso__load(struct dso *self)
GElf_Sym sym;
Elf_Scn *sec;
Elf *elf;
-
-
- fd = open(self->name, O_RDONLY);
- if (fd == -1)
- return -1;
+ int nr = 0;

elf = elf_begin(fd, ELF_C_READ_MMAP, NULL);
if (elf == NULL) {


fprintf(stderr, "%s: cannot read %s ELF file.\n",

- __func__, self->name);
+ __func__, name);
goto out_close;
}

@@ -292,16 +289,63 @@ static int dso__load(struct dso *self)
goto out_elf_end;

dso__insert_symbol(self, f);
+
+ nr++;
}

- err = 0;
+ err = nr;
out_elf_end:
elf_end(elf);
out_close:
- close(fd);
return err;


}

+static int dso__load(struct dso *self)
+{

+ int size = strlen(self->name) + sizeof("/usr/lib/debug%s.debug");
+ char *name = malloc(size);
+ int variant = 0;
+ int ret = -1;
+ int fd;
+
+ if (!name)
+ return -1;
+
+more:
+ do {
+ switch (variant) {
+ case 0: /* Fedora */
+ snprintf(name, size, "/usr/lib/debug%s.debug", self->name);
+ break;
+ case 1: /* Ubuntu */
+ snprintf(name, size, "/usr/lib/debug%s", self->name);
+ break;
+ case 2: /* Sane people */
+ snprintf(name, size, "%s", self->name);
+ break;
+
+ default:
+ goto out;
+ }
+ variant++;
+
+ fd = open(name, O_RDONLY);
+ } while (fd < 0);
+
+ ret = dso__load_sym(self, fd, name);
+ close(fd);
+
+ /*
+ * Some people seem to have debuginfo files _WITHOUT_ debug info!?!?
+ */
+ if (!ret)
+ goto more;
+
+out:
+ free(name);


+ return ret;
+}
+

static size_t dso__fprintf(struct dso *self, FILE *fp)
{

size_t ret = fprintf(fp, "dso: %s\n", self->name);

@@ -336,11 +380,23 @@ static struct dso *dsos__find(const char *name)


static struct dso *dsos__findnew(const char *name)

{


struct dso *dso = dsos__find(name);

+ int nr;

if (dso == NULL) {
dso = dso__new(name);
- if (dso != NULL && dso__load(dso) < 0)
+ if (!dso)
+ goto out_delete_dso;
+
+ nr = dso__load(dso);
+ if (nr < 0) {
+ fprintf(stderr, "Failed to open: %s\n", name);
goto out_delete_dso;
+ }
+ if (!nr) {
+ fprintf(stderr,
+ "Failed to find debug symbols for: %s, maybe install a debug package?\n",
+ name);
+ }

dsos__add(dso);
}
@@ -547,9 +603,9 @@ symhist__fprintf(struct symhist *self, uint64_t total_samples, FILE *fp)
size_t ret;

if (total_samples)
- ret = fprintf(fp, "%5.2f", (self->count * 100.0) / total_samples);
+ ret = fprintf(fp, "%5.2f%% ", (self->count * 100.0) / total_samples);
else
- ret = fprintf(fp, "%12d", self->count);
+ ret = fprintf(fp, "%12d ", self->count);

ret += fprintf(fp, "%14s [%c] ",
thread__name(self->thread, bf, sizeof(bf)),
@@ -922,10 +978,12 @@ more:
}
default: {
broken_event:
- fprintf(stderr, "%p [%p]: skipping unknown header type: %d\n",
- (void *)(offset + head),
- (void *)(long)(event->header.size),
- event->header.type);
+ if (dump_trace)
+ fprintf(stderr, "%p [%p]: skipping unknown header type: %d\n",


+ (void *)(offset + head),

+ (void *)(long)(event->header.size),
+ event->header.type);
+
total_unknown++;

/*

tip-bot for Ingo Molnar

unread,
May 27, 2009, 5:30:10 PM5/27/09
to
Commit-ID: 55717314c4e3a5180a54228a2f97e50f3496de4c
Gitweb: http://git.kernel.org/tip/55717314c4e3a5180a54228a2f97e50f3496de4c
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Wed, 27 May 2009 22:13:17 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Wed, 27 May 2009 22:19:58 +0200

pref_counter: tools: report: Robustify in case of weird events

This error condition:

aldebaran:~/linux/linux/Documentation/perf_counter> perf report
dso__load_sym: cannot get elf header.
failed to open: /etc/ld.so.cache


problem processing PERF_EVENT_MMAP, bailing out

caused the profile to be very short - as the error was at the beginning
of the file and we bailed out completely.

Be more permissive and consider the event broken instead.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 17 ++++++++---------
1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 6df95c2..5993c12 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -1117,9 +1117,9 @@ more:
}

if (thread == NULL) {
- fprintf(stderr, "problem processing %d event, bailing out\n",
+ fprintf(stderr, "problem processing %d event, skipping it.\n",
event->header.type);
- goto done;
+ goto broken_event;
}

if (event->header.misc & PERF_EVENT_MISC_KERNEL) {
@@ -1149,8 +1149,8 @@ more:

if (hist_entry__add(thread, map, dso, sym, ip, level)) {
fprintf(stderr,
- "problem incrementing symbol count, bailing out\n");
- goto done;
+ "problem incrementing symbol count, skipping event\n");
+ goto broken_event;
}
}
total++;
@@ -1169,8 +1169,8 @@ more:
event->mmap.filename);


}
if (thread == NULL || map == NULL) {

- fprintf(stderr, "problem processing PERF_EVENT_MMAP, bailing out\n");
- goto done;
+ fprintf(stderr, "problem processing PERF_EVENT_MMAP, skipping event.\n");
+ goto broken_event;
}
thread__insert_map(thread, map);
total_mmap++;
@@ -1187,8 +1187,8 @@ more:


}
if (thread == NULL ||
thread__set_comm(thread, event->comm.comm)) {

- fprintf(stderr, "problem processing PERF_EVENT_COMM, bailing out\n");
- goto done;
+ fprintf(stderr, "problem processing PERF_EVENT_COMM, skipping event.\n");
+ goto broken_event;
}
total_comm++;
break;
@@ -1221,7 +1221,6 @@ broken_event:
goto more;

rc = EXIT_SUCCESS;
-done:
close(input);

if (dump_trace) {

tip-bot for Ingo Molnar

unread,
May 28, 2009, 5:50:08 AM5/28/09
to
Commit-ID: d3e78ee3d015dac1794433abb6403b6fc8e70e10
Gitweb: http://git.kernel.org/tip/d3e78ee3d015dac1794433abb6403b6fc8e70e10
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Thu, 28 May 2009 11:41:50 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Thu, 28 May 2009 11:42:16 +0200

perf_counter: Fix perf_counter_init_task() on !CONFIG_PERF_COUNTERS

Pointed out by compiler warnings:

tip/include/linux/perf_counter.h:644: warning: no return statement in function returning non-void

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
include/linux/perf_counter.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 2b16ed3..a65ddc5 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -641,7 +641,7 @@ perf_counter_task_sched_out(struct task_struct *task,
struct task_struct *next, int cpu) { }
static inline void
perf_counter_task_tick(struct task_struct *task, int cpu) { }
-static inline int perf_counter_init_task(struct task_struct *child) { }
+static inline int perf_counter_init_task(struct task_struct *child) { return 0; }
static inline void perf_counter_exit_task(struct task_struct *child) { }
static inline void perf_counter_do_pending(void) { }
static inline void perf_counter_print_debug(void) { }

tip-bot for Ingo Molnar

unread,
May 28, 2009, 6:10:18 AM5/28/09
to
Commit-ID: 63299f057fbce47da895e8865cba7e9c3eb01a20
Gitweb: http://git.kernel.org/tip/63299f057fbce47da895e8865cba7e9c3eb01a20
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Thu, 28 May 2009 10:52:00 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Thu, 28 May 2009 10:53:40 +0200

perf_counter tools: report: Add help text for --sort

Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 506cde4..9fdf822 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -1240,7 +1240,8 @@ static const struct option options[] = {


OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,

"dump raw trace in ASCII"),

OPT_STRING('k', "vmlinux", &vmlinux, "file", "vmlinux pathname"),
- OPT_STRING('s', "sort", &sort_order, "foo", "bar"),
+ OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
+ "sort by key(s): pid, comm, dso, symbol. Default: pid,symbol"),
OPT_END()
};

tip-bot for Peter Zijlstra

unread,
May 28, 2009, 6:10:18 AM5/28/09
to
Commit-ID: ca8cdeef9ca2ff89ee8a21d6f6ff3dfb60286041
Gitweb: http://git.kernel.org/tip/ca8cdeef9ca2ff89ee8a21d6f6ff3dfb60286041
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Thu, 28 May 2009 11:08:33 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Thu, 28 May 2009 11:47:02 +0200

perf_counter tools: report: Implement header output for --sort variants

Implement this style of header:

#
# Overhead Command File: Symbol
# ........ ....... ............
#

for the various --sort variants as well.

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-report.c | 70 +++++++++++----------------
1 files changed, 28 insertions(+), 42 deletions(-)

diff --git a/Documentation/perf_counter/builtin-report.c b/Documentation/perf_counter/builtin-report.c
index 5993c12..506cde4 100644
--- a/Documentation/perf_counter/builtin-report.c
+++ b/Documentation/perf_counter/builtin-report.c
@@ -596,8 +596,6 @@ out_delete:

struct thread;

-static const char *thread__name(struct thread *self, char *bf, size_t size);
-
struct thread {
struct rb_node rb_node;
struct list_head maps;
@@ -605,15 +603,6 @@ struct thread {
char *comm;
};

-static const char *thread__name(struct thread *self, char *bf, size_t size)
-{
- if (self->comm)
- return self->comm;
-
- snprintf(bf, sizeof(bf), ":%u", self->pid);
- return bf;
-}
-
static struct thread *thread__new(pid_t pid)
{


struct thread *self = malloc(sizeof(*self));

@@ -707,8 +696,9 @@ struct hist_entry {
struct sort_entry {
struct list_head list;

+ char *header;
+
int64_t (*cmp)(struct hist_entry *, struct hist_entry *);
- size_t (*print_header)(FILE *fp);
size_t (*print)(FILE *fp, struct hist_entry *);
};

@@ -721,13 +711,11 @@ sort__thread_cmp(struct hist_entry *left, struct hist_entry *right)
static size_t
sort__thread_print(FILE *fp, struct hist_entry *self)
{
- char bf[32];
-
- return fprintf(fp, " %16s",
- thread__name(self->thread, bf, sizeof(bf)));
+ return fprintf(fp, " %16s:%5d", self->thread->comm ?: "", self->thread->pid);
}

static struct sort_entry sort_thread = {
+ .header = " Command: Pid ",
.cmp = sort__thread_cmp,
.print = sort__thread_print,
};
@@ -757,6 +745,7 @@ sort__comm_print(FILE *fp, struct hist_entry *self)
}

static struct sort_entry sort_comm = {
+ .header = " Command",
.cmp = sort__comm_cmp,
.print = sort__comm_print,
};
@@ -786,6 +775,7 @@ sort__dso_print(FILE *fp, struct hist_entry *self)
}

static struct sort_entry sort_dso = {
+ .header = " Shared Object",
.cmp = sort__dso_cmp,
.print = sort__dso_print,
};
@@ -804,43 +794,25 @@ sort__sym_cmp(struct hist_entry *left, struct hist_entry *right)
return (int64_t)(ip_r - ip_l);
}

-static size_t sort__sym_print_header(FILE *fp)
-{
- size_t ret = 0;
-
- ret += fprintf(fp, "#\n");
- ret += fprintf(fp, "# Overhead Command File: Symbol\n");
- ret += fprintf(fp, "# ........ ....... ............\n");
- ret += fprintf(fp, "#\n");
-
- return ret;
-}
-
static size_t
sort__sym_print(FILE *fp, struct hist_entry *self)
{
size_t ret = 0;

- ret += fprintf(fp, " [%c] ", self->level);
-
if (verbose)
ret += fprintf(fp, " %#018llx", (unsigned long long)self->ip);

- if (self->level != '.')
- ret += fprintf(fp, " kernel: %s",
- self->sym ? self->sym->name : "<unknown>");
- else
- ret += fprintf(fp, " %s: %s",


- self->dso ? self->dso->name : "<unknown>",

- self->sym ? self->sym->name : "<unknown>");
+ ret += fprintf(fp, " %s: %s",


+ self->dso ? self->dso->name : "<unknown>",

+ self->sym ? self->sym->name : "<unknown>");

return ret;
}

static struct sort_entry sort_sym = {
- .cmp = sort__sym_cmp,
- .print_header = sort__sym_print_header,
- .print = sort__sym_print,
+ .header = "Shared Object: Symbol",
+ .cmp = sort__sym_cmp,
+ .print = sort__sym_print,
};

struct sort_dimension {
@@ -1021,10 +993,24 @@ static size_t output__fprintf(FILE *fp, uint64_t total_samples)
struct rb_node *nd;
size_t ret = 0;

+ fprintf(fp, "#\n");
+
+ fprintf(fp, "# Overhead");
+ list_for_each_entry(se, &hist_entry__sort_list, list)
+ fprintf(fp, " %s", se->header);
+ fprintf(fp, "\n");
+
+ fprintf(fp, "# ........");
list_for_each_entry(se, &hist_entry__sort_list, list) {
- if (se->print_header)
- ret += se->print_header(fp);
+ int i;
+
+ fprintf(fp, " ");
+ for (i = 0; i < strlen(se->header); i++)
+ fprintf(fp, ".");
}
+ fprintf(fp, "\n");
+
+ fprintf(fp, "#\n");

for (nd = rb_first(&output_hists); nd; nd = rb_next(nd)) {
pos = rb_entry(nd, struct hist_entry, rb_node);

Paul Mackerras

unread,
May 28, 2009, 7:20:11 AM5/28/09
to
tip-bot for Ingo Molnar writes:

> perf stat: handle Ctrl-C
>
> Before this change, if a long-running perf stat workload was Ctrl-C-ed,
> the utility exited without displaying statistics.
>
> After the change, the Ctrl-C gets propagated into the workload (and
> causes its early exit there), but perf stat itself will still continue
> to run and will display counter results.
>
> This is useful to run open-ended workloads, let them run for
> a while, then Ctrl-C them to get the stats.

Unfortunately it means that if you do e.g.

$ while true; do perf stat something; done

it's impossible to kill the loop with ctrl-C. To fix this we need to
make perf stat kill itself with the signal after printing the results,
so bash sees the died-due-to-signal exit status and stops the loop.

Paul.

Peter Zijlstra

unread,
May 28, 2009, 8:20:17 AM5/28/09
to
On Thu, 2009-05-28 at 21:09 +1000, Paul Mackerras wrote:
> tip-bot for Ingo Molnar writes:
>
> > perf stat: handle Ctrl-C
> >
> > Before this change, if a long-running perf stat workload was Ctrl-C-ed,
> > the utility exited without displaying statistics.
> >
> > After the change, the Ctrl-C gets propagated into the workload (and
> > causes its early exit there), but perf stat itself will still continue
> > to run and will display counter results.
> >
> > This is useful to run open-ended workloads, let them run for
> > a while, then Ctrl-C them to get the stats.
>
> Unfortunately it means that if you do e.g.
>
> $ while true; do perf stat something; done
>
> it's impossible to kill the loop with ctrl-C. To fix this we need to
> make perf stat kill itself with the signal after printing the results,
> so bash sees the died-due-to-signal exit status and stops the loop.

Yep, just ran into the same..

^Z kill $! worked though, but that's not ideal.

tip-bot for Mike Galbraith

unread,
May 28, 2009, 6:10:09 PM5/28/09
to
Commit-ID: 9e09675366695405412b709e91709c1ce2925c90
Gitweb: http://git.kernel.org/tip/9e09675366695405412b709e91709c1ce2925c90
Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Thu, 28 May 2009 16:25:34 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 00:02:33 +0200

perf_counter tools: Document '--' option parsing terminator

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
.../perf_counter/Documentation/perf-record.txt | 1 +
.../perf_counter/Documentation/perf-stat.txt | 1 +
Documentation/perf_counter/builtin-record.c | 3 ++-
Documentation/perf_counter/builtin-stat.c | 1 +
4 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/Documentation/perf_counter/Documentation/perf-record.txt b/Documentation/perf_counter/Documentation/perf-record.txt
index 353db1b..a93d2ec 100644
--- a/Documentation/perf_counter/Documentation/perf-record.txt
+++ b/Documentation/perf_counter/Documentation/perf-record.txt
@@ -9,6 +9,7 @@ SYNOPSIS
--------
[verse]
'perf record' [-e <EVENT> | --event=EVENT] [-l] [-a] <command>
+'perf record' [-e <EVENT> | --event=EVENT] [-l] [-a] -- <command> [<options>]

DESCRIPTION
-----------
diff --git a/Documentation/perf_counter/Documentation/perf-stat.txt b/Documentation/perf_counter/Documentation/perf-stat.txt
index 7fcab27..828c59f 100644
--- a/Documentation/perf_counter/Documentation/perf-stat.txt
+++ b/Documentation/perf_counter/Documentation/perf-stat.txt
@@ -9,6 +9,7 @@ SYNOPSIS
--------
[verse]
'perf stat' [-e <EVENT> | --event=EVENT] [-l] [-a] <command>
+'perf stat' [-e <EVENT> | --event=EVENT] [-l] [-a] -- <command> [<options>]

DESCRIPTION
-----------
diff --git a/Documentation/perf_counter/builtin-record.c b/Documentation/perf_counter/builtin-record.c
index 4a06866..23d1224 100644
--- a/Documentation/perf_counter/builtin-record.c
+++ b/Documentation/perf_counter/builtin-record.c
@@ -397,7 +397,8 @@ static int __cmd_record(int argc, const char **argv)
}

static const char * const record_usage[] = {
- "perf record [<options>] <command>",
+ "perf record [<options>] [<command>]",
+ "perf record [<options>] -- <command> [<options>]",
NULL
};

diff --git a/Documentation/perf_counter/builtin-stat.c b/Documentation/perf_counter/builtin-stat.c
index ce661e2..ac14086 100644
--- a/Documentation/perf_counter/builtin-stat.c
+++ b/Documentation/perf_counter/builtin-stat.c
@@ -212,6 +212,7 @@ static void skip_signal(int signo)

static const char * const stat_usage[] = {
"perf stat [<options>] <command>",
+ "perf stat [<options>] -- <command> [<options>]",
NULL
};

tip-bot for Mike Galbraith

unread,
May 29, 2009, 3:10:15 AM5/29/09
to
Commit-ID: a3ec8d70f1a55acccc4874fe9b4dadbbb9454a0f
Gitweb: http://git.kernel.org/tip/a3ec8d70f1a55acccc4874fe9b4dadbbb9454a0f
Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Fri, 29 May 2009 06:46:46 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 09:03:56 +0200

perf_counter tools: Fix top symbol table dump typo

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>

Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-top.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/perf_counter/builtin-top.c b/Documentation/perf_counter/builtin-top.c
index 52ba9f4..0d100f5 100644
--- a/Documentation/perf_counter/builtin-top.c
+++ b/Documentation/perf_counter/builtin-top.c
@@ -371,7 +371,7 @@ static int parse_symbols(void)
max_ip = sym->start;

if (dump_symtab)
- dso__fprintf(kernel_dso, stdout);
+ dso__fprintf(kernel_dso, stderr);

return 0;

tip-bot for Mike Galbraith

unread,
May 29, 2009, 3:10:16 AM5/29/09
to
Commit-ID: da417a7537cbf4beb28a08a49adf915f2358040c
Gitweb: http://git.kernel.org/tip/da417a7537cbf4beb28a08a49adf915f2358040c
Author: Mike Galbraith <efa...@gmx.de>
AuthorDate: Fri, 29 May 2009 08:23:16 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 09:03:57 +0200

perf_counter tools: Fix top symbol table max_ip typo

Signed-off-by: Mike Galbraith <efa...@gmx.de>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-top.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/perf_counter/builtin-top.c b/Documentation/perf_counter/builtin-top.c
index 0d100f5..ebe8bec 100644
--- a/Documentation/perf_counter/builtin-top.c
+++ b/Documentation/perf_counter/builtin-top.c
@@ -368,7 +368,7 @@ static int parse_symbols(void)

node = rb_last(&kernel_dso->syms);
sym = rb_entry(node, struct symbol, rb_node);
- max_ip = sym->start;
+ max_ip = sym->end;

if (dump_symtab)
dso__fprintf(kernel_dso, stderr);

tip-bot for Ingo Molnar

unread,
May 29, 2009, 5:10:15 AM5/29/09
to
Commit-ID: c04f5e5d7b523f90ee3cdd70a68c4002aaecd3fa
Gitweb: http://git.kernel.org/tip/c04f5e5d7b523f90ee3cdd70a68c4002aaecd3fa
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Fri, 29 May 2009 09:10:54 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 09:11:49 +0200

perf_counter tools: Clean up builtin-stat.c's do_perfstat()

[ Impact: cleanup ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>

Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-stat.c | 116 +++++++++++++++++------------
1 files changed, 67 insertions(+), 49 deletions(-)

diff --git a/Documentation/perf_counter/builtin-stat.c b/Documentation/perf_counter/builtin-stat.c
index ac14086..6a29361 100644
--- a/Documentation/perf_counter/builtin-stat.c
+++ b/Documentation/perf_counter/builtin-stat.c
@@ -109,11 +109,75 @@ static void create_perfstat_counter(int counter)
}
}

+/*
+ * Does the counter have nsecs as a unit?
+ */
+static inline int nsec_counter(int counter)
+{
+ if (event_id[counter] == EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_CLOCK))
+ return 1;
+ if (event_id[counter] == EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK))
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Print out the results of a single counter:
+ */
+static void print_counter(int counter)
+{
+ __u64 count[3], single_count[3];
+ ssize_t res;
+ int cpu, nv;
+ int scaled;
+
+ count[0] = count[1] = count[2] = 0;
+ nv = scale ? 3 : 1;
+ for (cpu = 0; cpu < nr_cpus; cpu ++) {
+ res = read(fd[cpu][counter], single_count, nv * sizeof(__u64));
+ assert(res == nv * sizeof(__u64));
+
+ count[0] += single_count[0];
+ if (scale) {
+ count[1] += single_count[1];
+ count[2] += single_count[2];
+ }
+ }
+
+ scaled = 0;
+ if (scale) {
+ if (count[2] == 0) {
+ fprintf(stderr, " %14s %-20s\n",
+ "<not counted>", event_name(counter));
+ return;
+ }
+ if (count[2] < count[1]) {
+ scaled = 1;
+ count[0] = (unsigned long long)
+ ((double)count[0] * count[1] / count[2] + 0.5);
+ }
+ }
+
+ if (nsec_counter(counter)) {
+ double msecs = (double)count[0] / 1000000;
+
+ fprintf(stderr, " %14.6f %-20s (msecs)",
+ msecs, event_name(counter));
+ } else {
+ fprintf(stderr, " %14Ld %-20s (events)",
+ count[0], event_name(counter));
+ }
+ if (scaled)
+ fprintf(stderr, " (scaled from %.2f%%)",
+ (double) count[2] / count[1] * 100);
+ fprintf(stderr, "\n");
+}
+
static int do_perfstat(int argc, const char **argv)
{
unsigned long long t0, t1;
int counter;
- ssize_t res;
int status;
int pid;

@@ -149,55 +213,10 @@ static int do_perfstat(int argc, const char **argv)
argv[0]);
fprintf(stderr, "\n");

- for (counter = 0; counter < nr_counters; counter++) {
- int cpu, nv;
- __u64 count[3], single_count[3];
- int scaled;
-
- count[0] = count[1] = count[2] = 0;
- nv = scale ? 3 : 1;
- for (cpu = 0; cpu < nr_cpus; cpu ++) {
- res = read(fd[cpu][counter],
- single_count, nv * sizeof(__u64));
- assert(res == nv * sizeof(__u64));
-
- count[0] += single_count[0];
- if (scale) {
- count[1] += single_count[1];
- count[2] += single_count[2];
- }
- }
-
- scaled = 0;
- if (scale) {
- if (count[2] == 0) {
- fprintf(stderr, " %14s %-20s\n",
- "<not counted>", event_name(counter));
- continue;
- }
- if (count[2] < count[1]) {
- scaled = 1;
- count[0] = (unsigned long long)
- ((double)count[0] * count[1] / count[2] + 0.5);
- }
- }
-
- if (event_id[counter] == EID(PERF_TYPE_SOFTWARE, PERF_COUNT_CPU_CLOCK) ||
- event_id[counter] == EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK)) {
+ for (counter = 0; counter < nr_counters; counter++)
+ print_counter(counter);

- double msecs = (double)count[0] / 1000000;

- fprintf(stderr, " %14.6f %-20s (msecs)",
- msecs, event_name(counter));
- } else {
- fprintf(stderr, " %14Ld %-20s (events)",
- count[0], event_name(counter));
- }
- if (scaled)
- fprintf(stderr, " (scaled from %.2f%%)",
- (double) count[2] / count[1] * 100);
- fprintf(stderr, "\n");
- }
fprintf(stderr, "\n");
fprintf(stderr, " Wall-clock time elapsed: %12.6f msecs\n",
(double)(t1-t0)/1e6);
@@ -212,7 +231,6 @@ static void skip_signal(int signo)



static const char * const stat_usage[] = {
"perf stat [<options>] <command>",

- "perf stat [<options>] -- <command> [<options>]",
NULL
};

Ingo Molnar

unread,
May 29, 2009, 5:10:17 AM5/29/09
to

* Peter Zijlstra <a.p.zi...@chello.nl> wrote:

> On Thu, 2009-05-28 at 21:09 +1000, Paul Mackerras wrote:
> > tip-bot for Ingo Molnar writes:
> >
> > > perf stat: handle Ctrl-C
> > >
> > > Before this change, if a long-running perf stat workload was Ctrl-C-ed,
> > > the utility exited without displaying statistics.
> > >
> > > After the change, the Ctrl-C gets propagated into the workload (and
> > > causes its early exit there), but perf stat itself will still continue
> > > to run and will display counter results.
> > >
> > > This is useful to run open-ended workloads, let them run for
> > > a while, then Ctrl-C them to get the stats.
> >
> > Unfortunately it means that if you do e.g.
> >
> > $ while true; do perf stat something; done
> >
> > it's impossible to kill the loop with ctrl-C. To fix this we need to
> > make perf stat kill itself with the signal after printing the results,
> > so bash sees the died-due-to-signal exit status and stops the loop.
>
> Yep, just ran into the same..
>
> ^Z kill $! worked though, but that's not ideal.

would be nice to have a fix for this - i suspect people will run
into this frequently.

Ingo

tip-bot for Ingo Molnar

unread,
May 29, 2009, 5:10:20 AM5/29/09
to
Commit-ID: be1ac0d81d0e3ab655f8c8ade31fb860ef6aa186
Gitweb: http://git.kernel.org/tip/be1ac0d81d0e3ab655f8c8ade31fb860ef6aa186

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Fri, 29 May 2009 09:10:54 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 09:46:45 +0200

perf_counter tools: Also display time-normalized stat results

Add new column that normalizes counter results by
'nanoseconds spent running' unit.

Before:

Performance counter stats for '/home/mingo/hackbench':

10469.403605 task clock ticks (msecs)
75502 context switches (events)
9501 CPU migrations (events)
36158 pagefaults (events)
31975676185 CPU cycles (events)
26257738659 instructions (events)
108740581 cache references (events)
54606088 cache misses (events)

Wall-clock time elapsed: 810.514504 msecs

After:

Performance counter stats for '/home/mingo/hackbench':

10469.403605 task clock ticks (msecs)
75502 context switches # 0.007 M/sec
9501 CPU migrations # 0.001 M/sec
36158 pagefaults # 0.003 M/sec
31975676185 CPU cycles # 3054.202 M/sec
26257738659 instructions # 2508.045 M/sec
108740581 cache references # 10.387 M/sec
54606088 cache misses # 5.216 M/sec

Wall-clock time elapsed: 810.514504 msecs

The advantage of that column is that it is characteristic of the
execution workflow, regardless of runtime. Hence 'hackbench 10'
will look similar to 'hackbench 15' - while the absolute counter
values are very different.

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-stat.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/Documentation/perf_counter/builtin-stat.c b/Documentation/perf_counter/builtin-stat.c
index 0c92eb7..ef7e0e1 100644
--- a/Documentation/perf_counter/builtin-stat.c
+++ b/Documentation/perf_counter/builtin-stat.c
@@ -74,6 +74,8 @@ static const unsigned int default_count[] = {
static __u64 event_res[MAX_COUNTERS][3];
static __u64 event_scaled[MAX_COUNTERS];

+static __u64 runtime_nsecs;
+
static void create_perfstat_counter(int counter)
{
struct perf_counter_hw_event hw_event;
@@ -165,6 +167,11 @@ static void read_counter(int counter)


((double)count[0] * count[1] / count[2] + 0.5);
}
}

+ /*
+ * Save the full runtime - to allow normalization during printout:
+ */


+ if (event_id[counter] == EID(PERF_TYPE_SOFTWARE, PERF_COUNT_TASK_CLOCK))

+ runtime_nsecs = count[0];
}

/*
@@ -190,8 +197,11 @@ static void print_counter(int counter)


fprintf(stderr, " %14.6f %-20s (msecs)",

msecs, event_name(counter));


} else {
- fprintf(stderr, " %14Ld %-20s (events)",

+ fprintf(stderr, " %14Ld %-20s",
count[0], event_name(counter));
+ if (runtime_nsecs)
+ fprintf(stderr, " # %12.3f M/sec",
+ (double)count[0]/runtime_nsecs*1000.0);
}
if (scaled)


fprintf(stderr, " (scaled from %.2f%%)",

tip-bot for Ingo Molnar

unread,
May 29, 2009, 5:10:21 AM5/29/09
to
Commit-ID: 2996f5ddb7ba8889caeeac65edafe48845106eaa
Gitweb: http://git.kernel.org/tip/2996f5ddb7ba8889caeeac65edafe48845106eaa

Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Fri, 29 May 2009 09:10:54 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 09:21:49 +0200

perf_counter tools: Split display into reading and printing

We introduce the extra pass to allow the print-out to possibly
rely on already read counters.

[ Impact: cleanup ]

Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/perf_counter/builtin-stat.c | 40 ++++++++++++++++++++++++----
1 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/Documentation/perf_counter/builtin-stat.c b/Documentation/perf_counter/builtin-stat.c
index 6a29361..0c92eb7 100644
--- a/Documentation/perf_counter/builtin-stat.c
+++ b/Documentation/perf_counter/builtin-stat.c
@@ -71,6 +71,9 @@ static const unsigned int default_count[] = {
10000,
};

+static __u64 event_res[MAX_COUNTERS][3];
+static __u64 event_scaled[MAX_COUNTERS];


+
static void create_perfstat_counter(int counter)
{
struct perf_counter_hw_event hw_event;

@@ -123,16 +126,19 @@ static inline int nsec_counter(int counter)
}

/*
- * Print out the results of a single counter:
+ * Read out the results of a single counter:
*/
-static void print_counter(int counter)
+static void read_counter(int counter)
{


- __u64 count[3], single_count[3];

+ __u64 *count, single_count[3];
ssize_t res;
int cpu, nv;
int scaled;

+ count = event_res[counter];


+
count[0] = count[1] = count[2] = 0;
+
nv = scale ? 3 : 1;

for (cpu = 0; cpu < nr_cpus; cpu ++) {

res = read(fd[cpu][counter], single_count, nv * sizeof(__u64));

@@ -148,16 +154,35 @@ static void print_counter(int counter)
scaled = 0;
if (scale) {


if (count[2] == 0) {
- fprintf(stderr, " %14s %-20s\n",
- "<not counted>", event_name(counter));

+ event_scaled[counter] = -1;
+ count[0] = 0;
return;


}
+
if (count[2] < count[1]) {

- scaled = 1;
+ event_scaled[counter] = 1;


count[0] = (unsigned long long)

((double)count[0] * count[1] / count[2] + 0.5);
}
}
+}

+
+/*
+ * Print out the results of a single counter:
+ */
+static void print_counter(int counter)
+{

+ __u64 *count;
+ int scaled;
+
+ count = event_res[counter];
+ scaled = event_scaled[counter];
+
+ if (scaled == -1) {


+ fprintf(stderr, " %14s %-20s\n",
+ "<not counted>", event_name(counter));
+ return;
+ }

if (nsec_counter(counter)) {


double msecs = (double)count[0] / 1000000;

@@ -214,6 +239,9 @@ static int do_perfstat(int argc, const char **argv)
fprintf(stderr, "\n");



for (counter = 0; counter < nr_counters; counter++)

+ read_counter(counter);
+


+ for (counter = 0; counter < nr_counters; counter++)

print_counter(counter);

tip-bot for Ingo Molnar

unread,
May 29, 2009, 8:30:16 AM5/29/09
to
Commit-ID: 012b84dae17126d8b5d159173091eb3db5a2bc43
Gitweb: http://git.kernel.org/tip/012b84dae17126d8b5d159173091eb3db5a2bc43
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Sun, 17 May 2009 11:08:41 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 14:28:37 +0200

perf_counter: Robustify counter-free logic

This fixes a nasty crash and highlights a bug that we were
freeing failed-fork() counters incorrectly.

(the fix for that will come separately)

[ Impact: fix crashes/lockups with inherited counters ]

Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>

Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/perf_counter.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index eb34604..616c524 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -1004,6 +1004,10 @@ static void __perf_counter_task_sched_out(struct perf_counter_context *ctx)

if (!cpuctx->task_ctx)
return;
+
+ if (WARN_ON_ONCE(ctx != cpuctx->task_ctx))
+ return;
+
__perf_counter_sched_out(ctx, cpuctx);
cpuctx->task_ctx = NULL;

tip-bot for Ingo Molnar

unread,
May 29, 2009, 8:30:20 AM5/29/09
to
Commit-ID: 3f4dee227348daac32f36daad9a91059efd0723e
Gitweb: http://git.kernel.org/tip/3f4dee227348daac32f36daad9a91059efd0723e
Author: Ingo Molnar <mi...@elte.hu>
AuthorDate: Fri, 29 May 2009 11:25:09 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 14:28:36 +0200

perf_counter: Fix cpuctx->task_ctx races

Peter noticed that we are sometimes reading cpuctx->task_ctx with
interrupts enabled.

Noticed-by: Peter Zijlstra <a.p.zi...@chello.nl>


Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/perf_counter.c | 28 ++++++++++++++++++++--------
1 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index db843f8..eb34604 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -234,15 +234,18 @@ static void __perf_counter_remove_from_context(void *info)
struct perf_counter_context *ctx = counter->ctx;
unsigned long flags;

+ local_irq_save(flags);
/*
* If this is a task context, we need to check whether it is
* the current task context of this cpu. If not it has been
* scheduled out before the smp call arrived.
*/
- if (ctx->task && cpuctx->task_ctx != ctx)
+ if (ctx->task && cpuctx->task_ctx != ctx) {
+ local_irq_restore(flags);
return;
+ }

- spin_lock_irqsave(&ctx->lock, flags);
+ spin_lock(&ctx->lock);
/*
* Protect the list operation against NMI by disabling the
* counters on a global level.
@@ -382,14 +385,17 @@ static void __perf_counter_disable(void *info)
struct perf_counter_context *ctx = counter->ctx;
unsigned long flags;

+ local_irq_save(flags);
/*
* If this is a per-task counter, need to check whether this
* counter's task is the current task on this cpu.
*/
- if (ctx->task && cpuctx->task_ctx != ctx)
+ if (ctx->task && cpuctx->task_ctx != ctx) {
+ local_irq_restore(flags);
return;
+ }

- spin_lock_irqsave(&ctx->lock, flags);
+ spin_lock(&ctx->lock);

/*
* If the counter is on, turn it off.
@@ -615,6 +621,7 @@ static void __perf_install_in_context(void *info)
unsigned long flags;
int err;

+ local_irq_save(flags);
/*
* If this is a task context, we need to check whether it is
* the current task context of this cpu. If not it has been
@@ -623,12 +630,14 @@ static void __perf_install_in_context(void *info)
* on this cpu because it had no counters.
*/
if (ctx->task && cpuctx->task_ctx != ctx) {
- if (cpuctx->task_ctx || ctx->task != current)
+ if (cpuctx->task_ctx || ctx->task != current) {
+ local_irq_restore(flags);
return;
+ }
cpuctx->task_ctx = ctx;
}

- spin_lock_irqsave(&ctx->lock, flags);
+ spin_lock(&ctx->lock);
ctx->is_active = 1;
update_context_time(ctx);

@@ -745,17 +754,20 @@ static void __perf_counter_enable(void *info)
unsigned long flags;
int err;

+ local_irq_save(flags);
/*
* If this is a per-task counter, need to check whether this
* counter's task is the current task on this cpu.
*/
if (ctx->task && cpuctx->task_ctx != ctx) {
- if (cpuctx->task_ctx || ctx->task != current)
+ if (cpuctx->task_ctx || ctx->task != current) {
+ local_irq_restore(flags);
return;
+ }
cpuctx->task_ctx = ctx;
}

- spin_lock_irqsave(&ctx->lock, flags);
+ spin_lock(&ctx->lock);
ctx->is_active = 1;
update_context_time(ctx);

tip-bot for Peter Zijlstra

unread,
May 29, 2009, 1:20:08 PM5/29/09
to
Commit-ID: bbbee90829304d156c12b171c0ac7e6e1aba8b90
Gitweb: http://git.kernel.org/tip/bbbee90829304d156c12b171c0ac7e6e1aba8b90
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Fri, 29 May 2009 14:25:58 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 16:21:52 +0200

perf_counter: Ammend cleanup in fork() fail

When fork() fails we cannot use perf_counter_exit_task() since that
assumes to operate on current. Write a new helper that cleans up
unused/clean contexts.

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
include/linux/perf_counter.h | 2 +
kernel/fork.c | 2 +-
kernel/perf_counter.c | 43 +++++++++++++++++++++++++++++++++++++++--
3 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_counter.h b/include/linux/perf_counter.h
index 717bf3b..519a41b 100644
--- a/include/linux/perf_counter.h
+++ b/include/linux/perf_counter.h
@@ -579,6 +579,7 @@ extern void perf_counter_task_sched_out(struct task_struct *task,
extern void perf_counter_task_tick(struct task_struct *task, int cpu);
extern int perf_counter_init_task(struct task_struct *child);
extern void perf_counter_exit_task(struct task_struct *child);
+extern void perf_counter_free_task(struct task_struct *task);
extern void perf_counter_do_pending(void);
extern void perf_counter_print_debug(void);
extern void __perf_disable(void);
@@ -644,6 +645,7 @@ static inline void


perf_counter_task_tick(struct task_struct *task, int cpu) { }

static inline int perf_counter_init_task(struct task_struct *child) { return 0; }
static inline void perf_counter_exit_task(struct task_struct *child) { }

+static inline void perf_counter_free_task(struct task_struct *task) { }


static inline void perf_counter_do_pending(void) { }
static inline void perf_counter_print_debug(void) { }

static inline void perf_disable(void) { }
diff --git a/kernel/fork.c b/kernel/fork.c
index c07c333..23bf757 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1298,7 +1298,7 @@ bad_fork_cleanup_semundo:
bad_fork_cleanup_audit:
audit_free(p);
bad_fork_cleanup_policy:
- perf_counter_exit_task(p);
+ perf_counter_free_task(p);
#ifdef CONFIG_NUMA
mpol_put(p->mempolicy);
bad_fork_cleanup_cgroup:
diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 0c000d3..79c3f26 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -3538,8 +3538,7 @@ static void sync_child_counter(struct perf_counter *child_counter,
}

static void
-__perf_counter_exit_task(struct task_struct *child,
- struct perf_counter *child_counter,
+__perf_counter_exit_task(struct perf_counter *child_counter,
struct perf_counter_context *child_ctx)
{
struct perf_counter *parent_counter;
@@ -3605,7 +3604,7 @@ void perf_counter_exit_task(struct task_struct *child)
again:
list_for_each_entry_safe(child_counter, tmp, &child_ctx->counter_list,
list_entry)
- __perf_counter_exit_task(child, child_counter, child_ctx);
+ __perf_counter_exit_task(child_counter, child_ctx);

/*
* If the last counter was a group counter, it will have appended all
@@ -3621,6 +3620,44 @@ again:
}

/*
+ * free an unexposed, unused context as created by inheritance by
+ * init_task below, used by fork() in case of fail.
+ */
+void perf_counter_free_task(struct task_struct *task)
+{
+ struct perf_counter_context *ctx = task->perf_counter_ctxp;
+ struct perf_counter *counter, *tmp;
+
+ if (!ctx)
+ return;
+
+ mutex_lock(&ctx->mutex);
+again:
+ list_for_each_entry_safe(counter, tmp, &ctx->counter_list, list_entry) {
+ struct perf_counter *parent = counter->parent;
+
+ if (WARN_ON_ONCE(!parent))
+ continue;
+
+ mutex_lock(&parent->child_mutex);
+ list_del_init(&counter->child_list);
+ mutex_unlock(&parent->child_mutex);
+
+ fput(parent->filp);
+
+ list_del_counter(counter, ctx);
+ free_counter(counter);
+ }
+
+ if (!list_empty(&ctx->counter_list))
+ goto again;
+
+ mutex_unlock(&ctx->mutex);
+
+ put_ctx(ctx);
+}
+
+/*
* Initialize the perf_counter context in task_struct
*/
int perf_counter_init_task(struct task_struct *child)

tip-bot for Peter Zijlstra

unread,
May 29, 2009, 1:20:06 PM5/29/09
to
Commit-ID: 665c2142a94202881a3c11cbaee6506cb10ada2d
Gitweb: http://git.kernel.org/tip/665c2142a94202881a3c11cbaee6506cb10ada2d
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Fri, 29 May 2009 14:51:57 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 16:21:51 +0200

perf_counter: Clean up task_ctx vs interrupts

Remove the local_irq_save() etc.. in routines that are smp function
calls, or have IRQs disabled by other means.

Then change the COMM, MMAP, and swcounter context iteration to
current->perf_counter_ctxp and RCU, since it really doesn't matter
which context they iterate, they're all folded.

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/perf_counter.c | 82 ++++++++++++++++++++++++++++++-------------------
1 files changed, 50 insertions(+), 32 deletions(-)

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 58d6d19..0c000d3 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -232,18 +232,14 @@ static void __perf_counter_remove_from_context(void *info)
struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
struct perf_counter *counter = info;


struct perf_counter_context *ctx = counter->ctx;

- unsigned long flags;

- local_irq_save(flags);


/*
* If this is a task context, we need to check whether it is
* the current task context of this cpu. If not it has been
* scheduled out before the smp call arrived.
*/

- if (ctx->task && cpuctx->task_ctx != ctx) {
- local_irq_restore(flags);


+ if (ctx->task && cpuctx->task_ctx != ctx)

return;
- }

spin_lock(&ctx->lock);
/*
@@ -267,7 +263,7 @@ static void __perf_counter_remove_from_context(void *info)
}

perf_enable();
- spin_unlock_irqrestore(&ctx->lock, flags);
+ spin_unlock(&ctx->lock);
}


@@ -383,17 +379,13 @@ static void __perf_counter_disable(void *info)
struct perf_counter *counter = info;
struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);


struct perf_counter_context *ctx = counter->ctx;

- unsigned long flags;

- local_irq_save(flags);


/*
* If this is a per-task counter, need to check whether this
* counter's task is the current task on this cpu.
*/

- if (ctx->task && cpuctx->task_ctx != ctx) {
- local_irq_restore(flags);


+ if (ctx->task && cpuctx->task_ctx != ctx)

return;
- }

spin_lock(&ctx->lock);

@@ -411,7 +403,7 @@ static void __perf_counter_disable(void *info)
counter->state = PERF_COUNTER_STATE_OFF;
}

- spin_unlock_irqrestore(&ctx->lock, flags);
+ spin_unlock(&ctx->lock);
}

/*
@@ -618,10 +610,8 @@ static void __perf_install_in_context(void *info)


struct perf_counter_context *ctx = counter->ctx;

struct perf_counter *leader = counter->group_leader;
int cpu = smp_processor_id();
- unsigned long flags;
int err;

- local_irq_save(flags);


/*
* If this is a task context, we need to check whether it is
* the current task context of this cpu. If not it has been

@@ -630,10 +620,8 @@ static void __perf_install_in_context(void *info)


* on this cpu because it had no counters.
*/
if (ctx->task && cpuctx->task_ctx != ctx) {

- if (cpuctx->task_ctx || ctx->task != current) {
- local_irq_restore(flags);


+ if (cpuctx->task_ctx || ctx->task != current)

return;
- }
cpuctx->task_ctx = ctx;
}

@@ -687,7 +675,7 @@ static void __perf_install_in_context(void *info)
unlock:
perf_enable();

- spin_unlock_irqrestore(&ctx->lock, flags);
+ spin_unlock(&ctx->lock);
}

/*
@@ -751,19 +739,15 @@ static void __perf_counter_enable(void *info)
struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);


struct perf_counter_context *ctx = counter->ctx;

struct perf_counter *leader = counter->group_leader;
- unsigned long flags;
int err;

- local_irq_save(flags);


/*
* If this is a per-task counter, need to check whether this
* counter's task is the current task on this cpu.
*/
if (ctx->task && cpuctx->task_ctx != ctx) {

- if (cpuctx->task_ctx || ctx->task != current) {
- local_irq_restore(flags);


+ if (cpuctx->task_ctx || ctx->task != current)

return;
- }
cpuctx->task_ctx = ctx;
}

@@ -811,7 +795,7 @@ static void __perf_counter_enable(void *info)
}

unlock:
- spin_unlock_irqrestore(&ctx->lock, flags);
+ spin_unlock(&ctx->lock);
}

/*
@@ -981,6 +965,10 @@ void perf_counter_task_sched_out(struct task_struct *task,
spin_lock(&ctx->lock);
spin_lock_nested(&next_ctx->lock, SINGLE_DEPTH_NESTING);
if (context_equiv(ctx, next_ctx)) {
+ /*
+ * XXX do we need a memory barrier of sorts
+ * wrt to rcu_dereference() of perf_counter_ctxp
+ */
task->perf_counter_ctxp = next_ctx;
next->perf_counter_ctxp = ctx;
ctx->task = next;
@@ -998,6 +986,9 @@ void perf_counter_task_sched_out(struct task_struct *task,
}
}

+/*
+ * Called with IRQs disabled
+ */


static void __perf_counter_task_sched_out(struct perf_counter_context *ctx)

{
struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
@@ -1012,6 +1003,9 @@ static void __perf_counter_task_sched_out(struct perf_counter_context *ctx)
cpuctx->task_ctx = NULL;
}

+/*
+ * Called with IRQs disabled
+ */
static void perf_counter_cpu_sched_out(struct perf_cpu_context *cpuctx)
{
__perf_counter_sched_out(&cpuctx->ctx, cpuctx);
@@ -2431,6 +2425,7 @@ static void perf_counter_comm_ctx(struct perf_counter_context *ctx,
static void perf_counter_comm_event(struct perf_comm_event *comm_event)
{
struct perf_cpu_context *cpuctx;
+ struct perf_counter_context *ctx;
unsigned int size;
char *comm = comm_event->task->comm;

@@ -2443,9 +2438,17 @@ static void perf_counter_comm_event(struct perf_comm_event *comm_event)

cpuctx = &get_cpu_var(perf_cpu_context);
perf_counter_comm_ctx(&cpuctx->ctx, comm_event);
- if (cpuctx->task_ctx)
- perf_counter_comm_ctx(cpuctx->task_ctx, comm_event);
put_cpu_var(perf_cpu_context);
+
+ rcu_read_lock();
+ /*
+ * doesn't really matter which of the child contexts the
+ * events ends up in.
+ */
+ ctx = rcu_dereference(current->perf_counter_ctxp);
+ if (ctx)
+ perf_counter_comm_ctx(ctx, comm_event);
+ rcu_read_unlock();
}

void perf_counter_comm(struct task_struct *task)
@@ -2536,6 +2539,7 @@ static void perf_counter_mmap_ctx(struct perf_counter_context *ctx,
static void perf_counter_mmap_event(struct perf_mmap_event *mmap_event)
{
struct perf_cpu_context *cpuctx;
+ struct perf_counter_context *ctx;
struct file *file = mmap_event->file;
unsigned int size;
char tmp[16];
@@ -2568,10 +2572,18 @@ got_name:

cpuctx = &get_cpu_var(perf_cpu_context);
perf_counter_mmap_ctx(&cpuctx->ctx, mmap_event);
- if (cpuctx->task_ctx)
- perf_counter_mmap_ctx(cpuctx->task_ctx, mmap_event);
put_cpu_var(perf_cpu_context);

+ rcu_read_lock();
+ /*
+ * doesn't really matter which of the child contexts the
+ * events ends up in.
+ */
+ ctx = rcu_dereference(current->perf_counter_ctxp);
+ if (ctx)
+ perf_counter_mmap_ctx(ctx, mmap_event);
+ rcu_read_unlock();
+
kfree(buf);
}

@@ -2882,6 +2894,7 @@ static void __perf_swcounter_event(enum perf_event_types type, u32 event,
{
struct perf_cpu_context *cpuctx = &get_cpu_var(perf_cpu_context);
int *recursion = perf_swcounter_recursion_context(cpuctx);
+ struct perf_counter_context *ctx;

if (*recursion)
goto out;
@@ -2891,10 +2904,15 @@ static void __perf_swcounter_event(enum perf_event_types type, u32 event,

perf_swcounter_ctx_event(&cpuctx->ctx, type, event,
nr, nmi, regs, addr);
- if (cpuctx->task_ctx) {
- perf_swcounter_ctx_event(cpuctx->task_ctx, type, event,
- nr, nmi, regs, addr);
- }
+ rcu_read_lock();
+ /*
+ * doesn't really matter which of the child contexts the
+ * events ends up in.
+ */
+ ctx = rcu_dereference(current->perf_counter_ctxp);
+ if (ctx)
+ perf_swcounter_ctx_event(ctx, type, event, nr, nmi, regs, addr);
+ rcu_read_unlock();

barrier();
(*recursion)--;

tip-bot for GeunSik Lim

unread,
May 29, 2009, 1:20:12 PM5/29/09
to
Commit-ID: 294ae4011530d008c59c4fb9847738e39228821e
Gitweb: http://git.kernel.org/tip/294ae4011530d008c59c4fb9847738e39228821e
Author: GeunSik Lim <lee...@gmail.com>
AuthorDate: Thu, 28 May 2009 10:36:11 +0900
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 16:21:08 +0200

ftrace: fix typo about map of kernel priority in ftrace.txt file.

Fix typo about chart to map the kernel priority to user land priorities.

* About sched_setscheduler(2)
Processes scheduled under SCHED_FIFO or SCHED_RR
can have a (user-space) static priority in the range 1 to 99.
(reference: http://www.kernel.org/doc/man-pages/online/pages/
man2/sched_setscheduler.2.html)

* From: Steven Rostedt
0 to 98 - maps to RT tasks 99 to 1 (SCHED_RR or SCHED_FIFO)

99 - maps to internal kernel threads that want to be lower than RT tasks
but higher than SCHED_OTHER tasks. Although I'm not sure if any
kernel thread actually uses this. I'm not even sure how this can be
set, because the internal sched_setscheduler function does not allow
for it.

100 to 139 - maps nice levels -20 to 19. These are not set via
sched_setscheduler, but are set via the nice system call.

140 - reserved for idle tasks.

Signed-off-by: GeunSik Lim <geuns...@samsung.com>
Acked-by: Steven Rostedt <ros...@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/trace/ftrace.txt | 15 ++++++++++++---
1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index fd9a3e6..e362f50 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -518,9 +518,18 @@ priority with zero (0) being the highest priority and the nice
values starting at 100 (nice -20). Below is a quick chart to map
the kernel priority to user land priorities.

- Kernel priority: 0 to 99 ==> user RT priority 99 to 0
- Kernel priority: 100 to 139 ==> user nice -20 to 19
- Kernel priority: 140 ==> idle task priority
+ Kernel Space User Space
+ ===============================================================
+ 0(high) to 98(low) user RT priority 99(high) to 1(low)
+ with SCHED_RR or SCHED_FIFO
+ ---------------------------------------------------------------
+ 99 sched_priority is not used in scheduling
+ decisions(it must be specified as 0)
+ ---------------------------------------------------------------
+ 100(high) to 139(low) user nice -20(high) to 19(low)
+ ---------------------------------------------------------------
+ 140 idle task priority
+ ---------------------------------------------------------------

The task states are:

tip-bot for Peter Zijlstra

unread,
May 29, 2009, 1:20:12 PM5/29/09
to
Commit-ID: efb3d17240d80e27508d238809168120fe4b93a4
Gitweb: http://git.kernel.org/tip/efb3d17240d80e27508d238809168120fe4b93a4
Author: Peter Zijlstra <a.p.zi...@chello.nl>
AuthorDate: Fri, 29 May 2009 14:25:58 +0200
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 16:21:51 +0200

perf_counter: Fix COMM and MMAP events for cpu wide counters

Commit a63eaf34ae6 ("perf_counter: Dynamically allocate tasks'
perf_counter_context struct") broke COMM and MMAP notification for
cpu wide counters by dropping out early if there was no task context,
thereby also not iterating the cpu context.

Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Mike Galbraith <efa...@gmx.de>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Corey Ashford <cjas...@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: Arnaldo Carvalho de Melo <ac...@redhat.com>
Cc: John Kacur <jka...@redhat.com>

LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
kernel/perf_counter.c | 12 ++++--------
1 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/perf_counter.c b/kernel/perf_counter.c
index 616c524..58d6d19 100644
--- a/kernel/perf_counter.c
+++ b/kernel/perf_counter.c
@@ -2443,9 +2443,9 @@ static void perf_counter_comm_event(struct perf_comm_event *comm_event)



cpuctx = &get_cpu_var(perf_cpu_context);
perf_counter_comm_ctx(&cpuctx->ctx, comm_event);

+ if (cpuctx->task_ctx)
+ perf_counter_comm_ctx(cpuctx->task_ctx, comm_event);
put_cpu_var(perf_cpu_context);
-
- perf_counter_comm_ctx(current->perf_counter_ctxp, comm_event);
}

void perf_counter_comm(struct task_struct *task)
@@ -2454,8 +2454,6 @@ void perf_counter_comm(struct task_struct *task)

if (!atomic_read(&nr_comm_tracking))
return;
- if (!current->perf_counter_ctxp)
- return;

comm_event = (struct perf_comm_event){
.task = task,
@@ -2570,10 +2568,10 @@ got_name:



cpuctx = &get_cpu_var(perf_cpu_context);
perf_counter_mmap_ctx(&cpuctx->ctx, mmap_event);

+ if (cpuctx->task_ctx)
+ perf_counter_mmap_ctx(cpuctx->task_ctx, mmap_event);
put_cpu_var(perf_cpu_context);

- perf_counter_mmap_ctx(current->perf_counter_ctxp, mmap_event);
-
kfree(buf);
}

@@ -2584,8 +2582,6 @@ void perf_counter_mmap(unsigned long addr, unsigned long len,

if (!atomic_read(&nr_mmap_tracking))
return;
- if (!current->perf_counter_ctxp)
- return;

mmap_event = (struct perf_mmap_event){
.file = file,

tip-bot for GeunSik Lim

unread,
May 29, 2009, 1:20:13 PM5/29/09
to
Commit-ID: f04d82b7e0c63d0251f9952a537a4bc4d73aa1a9
Gitweb: http://git.kernel.org/tip/f04d82b7e0c63d0251f9952a537a4bc4d73aa1a9
Author: GeunSik Lim <lee...@gmail.com>
AuthorDate: Thu, 28 May 2009 10:36:14 +0900
Committer: Ingo Molnar <mi...@elte.hu>
CommitDate: Fri, 29 May 2009 16:21:09 +0200

sched: fix typo in sched-rt-group.txt file

Fix typo about static priority's range.

Kernel Space User Space
===============================================================


0(high) to 98(low) user RT priority 99(high) to 1(low)

with SCHED_RR or SCHED_FIFO
---------------------------------------------------------------


99 sched_priority is not used in scheduling

decisions(it must be specified as 0)

---------------------------------------------------------------


100(high) to 139(low) user nice -20(high) to 19(low)

---------------------------------------------------------------
140 idle task priority
---------------------------------------------------------------
* ref) http://www.kernel.org/doc/man-pages/online/pages/man2/sched_setscheduler.2.html

Signed-off-by: GeunSik Lim <geuns...@samsung.com>
CC: Steven Rostedt <ros...@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>


LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mi...@elte.hu>


---
Documentation/scheduler/sched-rt-group.txt | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index eb74b01..1df7f9c 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -187,7 +187,7 @@ get their allocated time.

Implementing SCHED_EDF might take a while to complete. Priority Inheritance is
the biggest challenge as the current linux PI infrastructure is geared towards
-the limited static priority levels 0-139. With deadline scheduling you need to
+the limited static priority levels 0-99. With deadline scheduling you need to
do deadline inheritance (since priority is inversely proportional to the
deadline delta (deadline - now).

GeunSik Lim

unread,
May 29, 2009, 8:20:08 PM5/29/09
to
> On Sat, May 30, 2009 at 2:16 AM, tip-bot for Peter Zijlstra <a.p.zi...@chello.nl> wrote:
> Commit-ID:  efb3d17240d80e27508d238809168120fe4b93a4
> Gitweb:     http://git.kernel.org/tip/efb3d17240d80e27508d238809168120fe4b93a4

I have one question about "tip-bot" and "-tip" word.
"tip" word is abbreviation. Can anyone explain me meaning of the "tip" word?
Sorry for trivial question.
But I always wondered about this abbreviation in private.


--
Regards,
GeunSik Lim ( SAMSUNG ELECTRONICS)
Blog : http://blog.naver.com/invain/
e-Mail: geuns...@samsung.com
lee...@gmail.com , lee...@gmail.com

Yinghai Lu

unread,
May 29, 2009, 8:30:11 PM5/29/09
to
GeunSik Lim wrote:
>> On Sat, May 30, 2009 at 2:16 AM, tip-bot for Peter Zijlstra <a.p.zi...@chello.nl> wrote:
>> Commit-ID: efb3d17240d80e27508d238809168120fe4b93a4
>> Gitweb: http://git.kernel.org/tip/efb3d17240d80e27508d238809168120fe4b93a4
>
> I have one question about "tip-bot" and "-tip" word.
> "tip" word is abbreviation. Can anyone explain me meaning of the "tip" word?
> Sorry for trivial question.
> But I always wondered about this abbreviation in private.
>
>
T: Thomas
I: Ingo
P: hpa

YH

It is loading more messages.
0 new messages