ChangeLog V3:
1) Add --guestmount=/dir/to/all/guestos parameter. Admin mounts guest os
root directories under /dir/to/all/guestos by sshfs. For example, I start
2 guest os. The one's pid is 8888 and the other's is 9999.
#mkdir ~/guestmount; cd ~/guestmount
#sshfs -o allow_other,direct_io -p 5551 localhost:/ 8888/
#sshfs -o allow_other,direct_io -p 5552 localhost:/ 9999/
#perf kvm --host --guest --guestmount=~/guestmount top
The old --guestkallsyms and --guestmodules are still supported as default
guest os symbol parsing.
2) Add guest os buildid support.
3) Add sub command 'perf kvm buildid-list'.
4) Delete sub command 'perf kvm stat', because our current implementation
doesn't transfer guest/host requirement to kernel, and kernel always
collects both host and guest statistics. So regular 'perf stat' is ok.
5) Fix a couple of perf bugs.
6) We still have no support on command with parameter 'any' as current KVM
just uses process id to identify specific guest os instance. Users could
uses parameter -p to collect specific guest os instance statistics.
ChangeLog V2:
1) Based on Avi's suggestion, I moved callback functions
to generic code area. So the kernel part of the patch is
clearer.
2) Add 'perf kvm stat'.
From: Zhang, Yanmin <yanmin...@linux.intel.com>
Based on the discussion in KVM community, I worked out the patch to support
perf to collect guest os statistics from host side. This patch is implemented
with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
critical bug and provided good suggestions with other guys. I really appreciate
their kind help.
The patch adds new sub command kvm to perf.
perf kvm top
perf kvm record
perf kvm report
perf kvm diff
perf kvm buildid-list
The new perf could profile guest os kernel except guest os user space, but it
could summarize guest os user space utilization per guest os.
Below are some examples.
1) perf kvm top
[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules top
--------------------------------------------------------------------------------------------------------------------------
PerfTop: 16010 irqs/sec kernel:59.1% us: 1.5% guest kernel:31.9% guest us: 7.5% exact: 0.0% [1000Hz cycles], (all, 16 CPUs)
--------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _________________________ _______________________
38770.00 20.4% __ticket_spin_lock [guest.kernel.kallsyms]
22560.00 11.9% ftrace_likely_update [kernel.kallsyms]
9208.00 4.8% __lock_acquire [kernel.kallsyms]
5473.00 2.9% trace_hardirqs_off_caller [kernel.kallsyms]
5222.00 2.7% copy_user_generic_string [guest.kernel.kallsyms]
4450.00 2.3% validate_chain [kernel.kallsyms]
4262.00 2.2% trace_hardirqs_on_caller [kernel.kallsyms]
4239.00 2.2% do_raw_spin_lock [kernel.kallsyms]
3548.00 1.9% do_raw_spin_unlock [kernel.kallsyms]
2487.00 1.3% lock_release [kernel.kallsyms]
2165.00 1.1% __local_bh_disable [kernel.kallsyms]
1905.00 1.0% check_chain_key [kernel.kallsyms]
1737.00 0.9% lock_acquire [kernel.kallsyms]
1604.00 0.8% tcp_recvmsg [kernel.kallsyms]
1524.00 0.8% mark_lock [kernel.kallsyms]
1464.00 0.8% schedule [kernel.kallsyms]
1423.00 0.7% __d_lookup [guest.kernel.kallsyms]
If you want to just show host data, pls. don't use parameter --guest.
The headline includes guest os kernel and userspace percentage.
2) perf kvm record
[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules record -f -a sleep 60
[ perf record: Woken up 15 times to write data ]
[ perf record: Captured and wrote 29.385 MB perf.data.kvm (~1283837 samples) ]
3) perf kvm report
3.1) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules report --sort pid --showcpuutilization>norm.host.guest.report.pid
# Samples: 424719292247
#
# Overhead sys us guest sys guest us Command: Pid
# ........ .....................
#
50.57% 1.02% 0.00% 39.97% 9.58% qemu-system-x86: 3587
49.32% 1.35% 0.01% 35.20% 12.76% qemu-system-x86: 3347
0.07% 0.07% 0.00% 0.00% 0.00% perf: 5217
Some performance guys require perf to show sys/us/guest_sys/guest_us per KVM guest
instance which is actually just a multi-threaded process. Above sub parameter --showcpuutilization
does so.
3.2) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules report >norm.host.guest.report
# Samples: 2466991384118
#
# Overhead Command Shared Object Symbol
# ........ ............... ........................................................................ ......
#
29.11% qemu-system-x86 [guest.kernel.kallsyms] [g] __ticket_spin_lock
5.88% tbench_srv [kernel.kallsyms] [k] ftrace_likely_update
5.76% tbench [kernel.kallsyms] [k] ftrace_likely_update
3.88% qemu-system-x86 34c3255482 [u] 0x000034c3255482
1.83% tbench [kernel.kallsyms] [k] __lock_acquire
1.81% tbench_srv [kernel.kallsyms] [k] __lock_acquire
1.38% tbench_srv [kernel.kallsyms] [k] trace_hardirqs_off_caller
1.37% tbench [kernel.kallsyms] [k] trace_hardirqs_off_caller
1.13% qemu-system-x86 [guest.kernel.kallsyms] [g] copy_user_generic_string
1.04% tbench_srv [kernel.kallsyms] [k] validate_chain
1.00% tbench [kernel.kallsyms] [k] trace_hardirqs_on_caller
1.00% tbench_srv [kernel.kallsyms] [k] trace_hardirqs_on_caller
0.95% tbench [kernel.kallsyms] [k] do_raw_spin_lock
[u] means it's in guest os user space. [g] means in guest os kernel. Other info is very direct.
If it shows a module such like [ext4], it means guest kernel module, because native host kernel's
modules are start from something like /lib/modules/XXX.
4) --guestmount example. I started 2 guest os. Run dbench testing in the 1st and tbench in 2nd guest os.
[root@lkp-ne01 norm]#perf kvm --host --guest --guestmount=/home/ymzhang/guestmount/ top
---------------------------------------------------------------------------------------------------------------------------------------
PerfTop: 15972 irqs/sec kernel: 8.3% us: 0.5% guest kernel:73.9% guest us:17.3% exact: 0.0% [1000Hz cycles], (all, 16 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ _________________________ __________________________________________________
32960.00 17.4% __ticket_spin_lock [guest.kernel.kallsyms]
5464.00 2.9% copy_user_generic_string [guest.kernel.kallsyms]
4069.00 2.1% copy_user_generic_string [guest.kernel.kallsyms]
3238.00 1.7% ftrace_likely_update /lib/modules/2.6.34-rc4-tip-yangkvm+/build/vmlinux
2997.00 1.6% __lock_acquire /lib/modules/2.6.34-rc4-tip-yangkvm+/build/vmlinux
2797.00 1.5% tcp_sendmsg [guest.kernel.kallsyms]
2703.00 1.4% schedule [guest.kernel.kallsyms]
2384.00 1.3% __switch_to [guest.kernel.kallsyms]
2125.00 1.1% tcp_ack [guest.kernel.kallsyms]
2045.00 1.1% tcp_recvmsg [guest.kernel.kallsyms]
1862.00 1.0% tcp_transmit_skb [guest.kernel.kallsyms]
1734.00 0.9% __ticket_spin_lock [guest.kernel.kallsyms]
1388.00 0.7% lock_release /lib/modules/2.6.34-rc4-tip-yangkvm+/build/vmlinux
1367.00 0.7% update_curr [guest.kernel.kallsyms]
1339.00 0.7% fget_light [guest.kernel.kallsyms]
1332.00 0.7% put_page [guest.kernel.kallsyms]
1324.00 0.7% ip_queue_xmit [guest.kernel.kallsyms]
1296.00 0.7% __d_lookup [guest.kernel.kallsyms]
1296.00 0.7% tcp_rcv_established [guest.kernel.kallsyms]
1230.00 0.6% tcp_v4_rcv [guest.kernel.kallsyms]
1092.00 0.6% dev_queue_xmit [guest.kernel.kallsyms]
1073.00 0.6% kmem_cache_alloc [guest.kernel.kallsyms]
1066.00 0.6% ip_rcv [guest.kernel.kallsyms]
1049.00 0.6% __inet_lookup_established [guest.kernel.kallsyms]
1048.00 0.6% tcp_write_xmit [guest.kernel.kallsyms]
Below is the patch against tip/master tree of 13th April.
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
---
diff -Nraup linux-2.6_tip0413/arch/x86/include/asm/perf_event.h linux-2.6_tip0413_perfkvm/arch/x86/include/asm/perf_event.h
--- linux-2.6_tip0413/arch/x86/include/asm/perf_event.h 2010-04-14 11:11:03.992966568 +0800
+++ linux-2.6_tip0413_perfkvm/arch/x86/include/asm/perf_event.h 2010-04-14 11:13:17.261881591 +0800
@@ -135,17 +135,10 @@ extern void perf_events_lapic_init(void)
*/
#define PERF_EFLAGS_EXACT (1UL << 3)
-#define perf_misc_flags(regs) \
-({ int misc = 0; \
- if (user_mode(regs)) \
- misc |= PERF_RECORD_MISC_USER; \
- else \
- misc |= PERF_RECORD_MISC_KERNEL; \
- if (regs->flags & PERF_EFLAGS_EXACT) \
- misc |= PERF_RECORD_MISC_EXACT; \
- misc; })
-
-#define perf_instruction_pointer(regs) ((regs)->ip)
+struct pt_regs;
+extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
+extern unsigned long perf_misc_flags(struct pt_regs *regs);
+#define perf_misc_flags(regs) perf_misc_flags(regs)
#else
static inline void init_hw_perf_events(void) { }
diff -Nraup linux-2.6_tip0413/arch/x86/kernel/cpu/perf_event.c linux-2.6_tip0413_perfkvm/arch/x86/kernel/cpu/perf_event.c
--- linux-2.6_tip0413/arch/x86/kernel/cpu/perf_event.c 2010-04-14 11:11:04.825028810 +0800
+++ linux-2.6_tip0413_perfkvm/arch/x86/kernel/cpu/perf_event.c 2010-04-14 17:02:12.198063684 +0800
@@ -1720,6 +1720,11 @@ struct perf_callchain_entry *perf_callch
{
struct perf_callchain_entry *entry;
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+ /* TODO: We don't support guest os callchain now */
+ return NULL;
+ }
+
if (in_nmi())
entry = &__get_cpu_var(pmc_nmi_entry);
else
@@ -1743,3 +1748,30 @@ void perf_arch_fetch_caller_regs(struct
regs->cs = __KERNEL_CS;
local_save_flags(regs->flags);
}
+
+unsigned long perf_instruction_pointer(struct pt_regs *regs)
+{
+ unsigned long ip;
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
+ ip = perf_guest_cbs->get_guest_ip();
+ else
+ ip = instruction_pointer(regs);
+ return ip;
+}
+
+unsigned long perf_misc_flags(struct pt_regs *regs)
+{
+ int misc = 0;
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+ misc |= perf_guest_cbs->is_user_mode() ?
+ PERF_RECORD_MISC_GUEST_USER :
+ PERF_RECORD_MISC_GUEST_KERNEL;
+ } else
+ misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
+ PERF_RECORD_MISC_KERNEL;
+ if (regs->flags & PERF_EFLAGS_EXACT)
+ misc |= PERF_RECORD_MISC_EXACT;
+
+ return misc;
+}
+
diff -Nraup linux-2.6_tip0413/arch/x86/kvm/x86.c linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.c
--- linux-2.6_tip0413/arch/x86/kvm/x86.c 2010-04-14 11:11:04.341042024 +0800
+++ linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.c 2010-04-14 11:32:45.841278890 +0800
@@ -3765,6 +3765,35 @@ static void kvm_timer_init(void)
}
}
+static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
+
+static int kvm_is_in_guest(void)
+{
+ return percpu_read(current_vcpu) != NULL;
+}
+
+static int kvm_is_user_mode(void)
+{
+ int user_mode = 3;
+ if (percpu_read(current_vcpu))
+ user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu));
+ return user_mode != 0;
+}
+
+static unsigned long kvm_get_guest_ip(void)
+{
+ unsigned long ip = 0;
+ if (percpu_read(current_vcpu))
+ ip = kvm_rip_read(percpu_read(current_vcpu));
+ return ip;
+}
+
+static struct perf_guest_info_callbacks kvm_guest_cbs = {
+ .is_in_guest = kvm_is_in_guest,
+ .is_user_mode = kvm_is_user_mode,
+ .get_guest_ip = kvm_get_guest_ip,
+};
+
int kvm_arch_init(void *opaque)
{
int r;
@@ -3801,6 +3830,8 @@ int kvm_arch_init(void *opaque)
kvm_timer_init();
+ perf_register_guest_info_callbacks(&kvm_guest_cbs);
+
return 0;
out:
@@ -3809,6 +3840,8 @@ out:
void kvm_arch_exit(void)
{
+ perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);
@@ -4339,7 +4372,10 @@ static int vcpu_enter_guest(struct kvm_v
}
trace_kvm_entry(vcpu->vcpu_id);
+
+ percpu_write(current_vcpu, vcpu);
kvm_x86_ops->run(vcpu);
+ percpu_write(current_vcpu, NULL);
/*
* If the guest has used debug registers, at least dr7
diff -Nraup linux-2.6_tip0413/include/linux/perf_event.h linux-2.6_tip0413_perfkvm/include/linux/perf_event.h
--- linux-2.6_tip0413/include/linux/perf_event.h 2010-04-14 11:11:16.922212684 +0800
+++ linux-2.6_tip0413_perfkvm/include/linux/perf_event.h 2010-04-14 11:34:33.478072738 +0800
@@ -288,11 +288,13 @@ struct perf_event_mmap_page {
__u64 data_tail; /* user-space written tail */
};
-#define PERF_RECORD_MISC_CPUMODE_MASK (3 << 0)
+#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0)
#define PERF_RECORD_MISC_CPUMODE_UNKNOWN (0 << 0)
#define PERF_RECORD_MISC_KERNEL (1 << 0)
#define PERF_RECORD_MISC_USER (2 << 0)
#define PERF_RECORD_MISC_HYPERVISOR (3 << 0)
+#define PERF_RECORD_MISC_GUEST_KERNEL (4 << 0)
+#define PERF_RECORD_MISC_GUEST_USER (5 << 0)
#define PERF_RECORD_MISC_EXACT (1 << 14)
/*
@@ -446,6 +448,12 @@ enum perf_callchain_context {
# include <asm/perf_event.h>
#endif
+struct perf_guest_info_callbacks {
+ int (*is_in_guest) (void);
+ int (*is_user_mode) (void);
+ unsigned long (*get_guest_ip) (void);
+};
+
#ifdef CONFIG_HAVE_HW_BREAKPOINT
#include <asm/hw_breakpoint.h>
#endif
@@ -920,6 +928,12 @@ static inline void perf_event_mmap(struc
__perf_event_mmap(vma);
}
+extern struct perf_guest_info_callbacks *perf_guest_cbs;
+extern int perf_register_guest_info_callbacks(
+ struct perf_guest_info_callbacks *);
+extern int perf_unregister_guest_info_callbacks(
+ struct perf_guest_info_callbacks *);
+
extern void perf_event_comm(struct task_struct *tsk);
extern void perf_event_fork(struct task_struct *tsk);
@@ -989,6 +1003,11 @@ perf_sw_event(u32 event_id, u64 nr, int
static inline void
perf_bp_event(struct perf_event *event, void *data) { }
+static inline int perf_register_guest_info_callbacks
+(struct perf_guest_info_callbacks *) {return 0; }
+static inline int perf_unregister_guest_info_callbacks
+(struct perf_guest_info_callbacks *) {return 0; }
+
static inline void perf_event_mmap(struct vm_area_struct *vma) { }
static inline void perf_event_comm(struct task_struct *tsk) { }
static inline void perf_event_fork(struct task_struct *tsk) { }
diff -Nraup linux-2.6_tip0413/kernel/perf_event.c linux-2.6_tip0413_perfkvm/kernel/perf_event.c
--- linux-2.6_tip0413/kernel/perf_event.c 2010-04-14 11:12:04.090770764 +0800
+++ linux-2.6_tip0413_perfkvm/kernel/perf_event.c 2010-04-14 11:13:17.265859229 +0800
@@ -2797,6 +2797,27 @@ void perf_arch_fetch_caller_regs(struct
/*
+ * We assume there is only KVM supporting the callbacks.
+ * Later on, we might change it to a list if there is
+ * another virtualization implementation supporting the callbacks.
+ */
+struct perf_guest_info_callbacks *perf_guest_cbs;
+
+int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+{
+ perf_guest_cbs = cbs;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
+
+int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+{
+ perf_guest_cbs = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
+
+/*
* Output
*/
static bool perf_output_space(struct perf_mmap_data *data, unsigned long tail,
@@ -3748,7 +3769,7 @@ void __perf_event_mmap(struct vm_area_st
.event_id = {
.header = {
.type = PERF_RECORD_MMAP,
- .misc = 0,
+ .misc = PERF_RECORD_MISC_USER,
/* .size */
},
/* .pid */
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-annotate.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-annotate.c
--- linux-2.6_tip0413/tools/perf/builtin-annotate.c 2010-04-14 11:11:58.474229259 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-annotate.c 2010-04-14 11:13:17.269859901 +0800
@@ -571,7 +571,7 @@ static int __cmd_annotate(void)
perf_session__fprintf(session, stdout);
if (verbose > 2)
- dsos__fprintf(stdout);
+ dsos__fprintf(&session->kerninfo_root, stdout);
perf_session__collapse_resort(&session->hists);
perf_session__output_resort(&session->hists, session->event_total[0]);
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-buildid-list.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-buildid-list.c
--- linux-2.6_tip0413/tools/perf/builtin-buildid-list.c 2010-04-14 11:11:58.462227060 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-buildid-list.c 2010-04-14 11:13:17.269859901 +0800
@@ -46,7 +46,7 @@ static int __cmd_buildid_list(void)
if (with_hits)
perf_session__process_events(session, &build_id__mark_dso_hit_ops);
- dsos__fprintf_buildid(stdout, with_hits);
+ dsos__fprintf_buildid(&session->kerninfo_root, stdout, with_hits);
perf_session__delete(session);
return err;
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-diff.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-diff.c
--- linux-2.6_tip0413/tools/perf/builtin-diff.c 2010-04-14 11:11:58.426247688 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-diff.c 2010-04-14 11:35:43.245364332 +0800
@@ -33,7 +33,7 @@ static int perf_session__add_hist_entry(
return -ENOMEM;
if (hit)
- he->count += count;
+ __perf_session__add_count(he, al, count);
return 0;
}
@@ -225,6 +225,10 @@ int cmd_diff(int argc, const char **argv
input_new = argv[1];
} else
input_new = argv[0];
+ } else if (symbol_conf.default_guest_vmlinux_name ||
+ symbol_conf.default_guest_kallsyms) {
+ input_old = "perf.data.host";
+ input_new = "perf.data.guest";
}
symbol_conf.exclude_other = false;
diff -Nraup linux-2.6_tip0413/tools/perf/builtin.h linux-2.6_tip0413_perfkvm/tools/perf/builtin.h
--- linux-2.6_tip0413/tools/perf/builtin.h 2010-04-14 11:11:58.234222967 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin.h 2010-04-14 11:13:17.313858518 +0800
@@ -32,5 +32,6 @@ extern int cmd_version(int argc, const c
extern int cmd_probe(int argc, const char **argv, const char *prefix);
extern int cmd_kmem(int argc, const char **argv, const char *prefix);
extern int cmd_lock(int argc, const char **argv, const char *prefix);
+extern int cmd_kvm(int argc, const char **argv, const char *prefix);
#endif
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-kmem.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-kmem.c
--- linux-2.6_tip0413/tools/perf/builtin-kmem.c 2010-04-14 11:11:58.806260439 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-kmem.c 2010-04-14 11:39:10.199395473 +0800
@@ -351,6 +351,7 @@ static void __print_result(struct rb_roo
int n_lines, int is_caller)
{
struct rb_node *next;
+ struct kernel_info *kerninfo;
printf("%.102s\n", graph_dotted_line);
printf(" %-34s |", is_caller ? "Callsite": "Alloc Ptr");
@@ -359,6 +360,11 @@ static void __print_result(struct rb_roo
next = rb_first(root);
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
+ if (!kerninfo) {
+ pr_err("__print_result: couldn't find kernel information\n");
+ return;
+ }
while (next && n_lines--) {
struct alloc_stat *data = rb_entry(next, struct alloc_stat,
node);
@@ -370,7 +376,7 @@ static void __print_result(struct rb_roo
if (is_caller) {
addr = data->call_site;
if (!raw_ip)
- sym = map_groups__find_function(&session->kmaps,
+ sym = map_groups__find_function(&kerninfo->kmaps,
addr, &map, NULL);
} else
addr = data->ptr;
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-kvm.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-kvm.c
--- linux-2.6_tip0413/tools/perf/builtin-kvm.c 1970-01-01 08:00:00.000000000 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-kvm.c 2010-04-14 11:40:06.551652083 +0800
@@ -0,0 +1,145 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/session.h"
+
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+
+#include "util/debug.h"
+
+#include <sys/prctl.h>
+
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+static char *file_name = NULL;
+static char name_buffer[256];
+
+int perf_host = 1;
+int perf_guest = 0;
+
+static const char * const kvm_usage[] = {
+ "perf kvm [<options>] {top|record|report|diff}",
+ NULL
+};
+
+static const struct option kvm_options[] = {
+ OPT_STRING('i', "input", &file_name, "file",
+ "Input file name"),
+ OPT_STRING('o', "output", &file_name, "file",
+ "Output file name"),
+ OPT_BOOLEAN(0, "guest", &perf_guest,
+ "Collect guest os data"),
+ OPT_BOOLEAN(0, "host", &perf_host,
+ "Collect guest os data"),
+ OPT_STRING(0, "guestmount", &symbol_conf.guestmount, "directory",
+ "guest mount directory under which every guest os instance has a subdir"),
+ OPT_STRING(0, "guestvmlinux", &symbol_conf.default_guest_vmlinux_name, "file",
+ "file saving guest os vmlinux"),
+ OPT_STRING(0, "guestkallsyms", &symbol_conf.default_guest_kallsyms, "file",
+ "file saving guest os /proc/kallsyms"),
+ OPT_STRING(0, "guestmodules", &symbol_conf.default_guest_modules, "file",
+ "file saving guest os /proc/modules"),
+ OPT_END()
+};
+
+static int __cmd_record(int argc, const char **argv)
+{
+ int rec_argc, i = 0, j;
+ const char **rec_argv;
+
+ rec_argc = argc + 2;
+ rec_argv = calloc(rec_argc + 1, sizeof(char *));
+ rec_argv[i++] = strdup("record");
+ rec_argv[i++] = strdup("-o");
+ rec_argv[i++] = strdup(file_name);
+ for (j = 1; j < argc; j++, i++)
+ rec_argv[i] = argv[j];
+
+ BUG_ON(i != rec_argc);
+
+ return cmd_record(i, rec_argv, NULL);
+}
+
+static int __cmd_report(int argc, const char **argv)
+{
+ int rec_argc, i = 0, j;
+ const char **rec_argv;
+
+ rec_argc = argc + 2;
+ rec_argv = calloc(rec_argc + 1, sizeof(char *));
+ rec_argv[i++] = strdup("report");
+ rec_argv[i++] = strdup("-i");
+ rec_argv[i++] = strdup(file_name);
+ for (j = 1; j < argc; j++, i++)
+ rec_argv[i] = argv[j];
+
+ BUG_ON(i != rec_argc);
+
+ return cmd_report(i, rec_argv, NULL);
+}
+
+static int __cmd_buildid_list(int argc, const char **argv)
+{
+ int rec_argc, i = 0, j;
+ const char **rec_argv;
+
+ rec_argc = argc + 2;
+ rec_argv = calloc(rec_argc + 1, sizeof(char *));
+ rec_argv[i++] = strdup("buildid-list");
+ rec_argv[i++] = strdup("-i");
+ rec_argv[i++] = strdup(file_name);
+ for (j = 1; j < argc; j++, i++)
+ rec_argv[i] = argv[j];
+
+ BUG_ON(i != rec_argc);
+
+ return cmd_buildid_list(i, rec_argv, NULL);
+}
+
+int cmd_kvm(int argc, const char **argv, const char *prefix __used)
+{
+ perf_host = perf_guest = 0;
+
+ argc = parse_options(argc, argv, kvm_options, kvm_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (!argc)
+ usage_with_options(kvm_usage, kvm_options);
+
+ if (!perf_host)
+ perf_guest = 1;
+
+ if (!file_name) {
+ if (perf_host && !perf_guest)
+ sprintf(name_buffer, "perf.data.host");
+ else if (!perf_host && perf_guest)
+ sprintf(name_buffer, "perf.data.guest");
+ else
+ sprintf(name_buffer, "perf.data.kvm");
+ file_name = name_buffer;
+ }
+
+ if (!strncmp(argv[0], "rec", 3)) {
+ return __cmd_record(argc, argv);
+ } else if (!strncmp(argv[0], "rep", 3)) {
+ return __cmd_report(argc, argv);
+ } else if (!strncmp(argv[0], "diff", 4)) {
+ return cmd_diff(argc, argv, NULL);
+ } else if (!strncmp(argv[0], "top", 3)) {
+ return cmd_top(argc, argv, NULL);
+ } else if (!strncmp(argv[0], "buildid-list", 12)) {
+ return __cmd_buildid_list(argc, argv);
+ } else {
+ usage_with_options(kvm_usage, kvm_options);
+ }
+
+ return 0;
+}
+
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-record.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-record.c
--- linux-2.6_tip0413/tools/perf/builtin-record.c 2010-04-14 11:11:58.806260439 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-record.c 2010-04-14 14:11:09.625252460 +0800
@@ -426,6 +426,52 @@ static void atexit_header(void)
perf_header__write(&session->header, output, true);
}
+static void event__synthesize_guest_os(struct kernel_info *kerninfo,
+ void *data __attribute__((unused)))
+{
+ int err;
+ char *guest_kallsyms;
+ char path[PATH_MAX];
+
+ if (is_host_kernel(kerninfo))
+ return;
+
+ /*
+ *As for guest kernel when processing subcommand record&report,
+ *we arrange module mmap prior to guest kernel mmap and trigger
+ *a preload dso because default guest module symbols are loaded
+ *from guest kallsyms instead of /lib/modules/XXX/XXX. This
+ *method is used to avoid symbol missing when the first addr is
+ *in module instead of in guest kernel.
+ */
+ err = event__synthesize_modules(process_synthesized_event,
+ session,
+ kerninfo);
+ if (err < 0)
+ pr_err("Couldn't record guest kernel [%d]'s reference"
+ " relocation symbol.\n", kerninfo->pid);
+
+ if (is_default_guest(kerninfo))
+ guest_kallsyms = (char *) symbol_conf.default_guest_kallsyms;
+ else {
+ sprintf(path, "%s/proc/kallsyms", kerninfo->root_dir);
+ guest_kallsyms = path;
+ }
+
+ /*
+ * We use _stext for guest kernel because guest kernel's /proc/kallsyms
+ * have no _text sometimes.
+ */
+ err = event__synthesize_kernel_mmap(process_synthesized_event,
+ session, kerninfo, "_text");
+ if (err < 0)
+ err = event__synthesize_kernel_mmap(process_synthesized_event,
+ session, kerninfo, "_stext");
+ if (err < 0)
+ pr_err("Couldn't record guest kernel [%d]'s reference"
+ " relocation symbol.\n", kerninfo->pid);
+}
+
static int __cmd_record(int argc, const char **argv)
{
int i, counter;
@@ -437,6 +483,7 @@ static int __cmd_record(int argc, const
int child_ready_pipe[2], go_pipe[2];
const bool forks = argc > 0;
char buf;
+ struct kernel_info *kerninfo;
page_size = sysconf(_SC_PAGE_SIZE);
@@ -572,21 +619,31 @@ static int __cmd_record(int argc, const
post_processing_offset = lseek(output, 0, SEEK_CUR);
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
+ if (!kerninfo) {
+ pr_err("Couldn't find native kernel information.\n");
+ return -1;
+ }
+
err = event__synthesize_kernel_mmap(process_synthesized_event,
- session, "_text");
+ session, kerninfo, "_text");
if (err < 0)
err = event__synthesize_kernel_mmap(process_synthesized_event,
- session, "_stext");
+ session, kerninfo, "_stext");
if (err < 0) {
pr_err("Couldn't record kernel reference relocation symbol.\n");
return err;
}
- err = event__synthesize_modules(process_synthesized_event, session);
+ err = event__synthesize_modules(process_synthesized_event,
+ session, kerninfo);
if (err < 0) {
pr_err("Couldn't record kernel reference relocation symbol.\n");
return err;
}
+ if (perf_guest)
+ kerninfo__process_allkernels(&session->kerninfo_root,
+ event__synthesize_guest_os, session);
if (!system_wide && profile_cpu == -1)
event__synthesize_thread(target_tid, process_synthesized_event,
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-report.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-report.c
--- linux-2.6_tip0413/tools/perf/builtin-report.c 2010-04-14 11:11:58.462227060 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-report.c 2010-04-14 11:13:17.313858518 +0800
@@ -108,7 +108,7 @@ static int perf_session__add_hist_entry(
return -ENOMEM;
if (hit)
- he->count += data->period;
+ __perf_session__add_count(he, al, data->period);
if (symbol_conf.use_callchain) {
if (!hit)
@@ -300,7 +300,7 @@ static int __cmd_report(void)
perf_session__fprintf(session, stdout);
if (verbose > 2)
- dsos__fprintf(stdout);
+ dsos__fprintf(&session->kerninfo_root, stdout);
next = rb_first(&session->stats_by_id);
while (next) {
@@ -437,6 +437,8 @@ static const struct option options[] = {
"sort by key(s): pid, comm, dso, symbol, parent"),
OPT_BOOLEAN('P', "full-paths", &symbol_conf.full_paths,
"Don't shorten the pathnames taking into account the cwd"),
+ OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
+ "Show sample percentage for different cpu modes"),
OPT_STRING('p', "parent", &parent_pattern, "regex",
"regex filter to identify parent, see: '--sort parent'"),
OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-top.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-top.c
--- linux-2.6_tip0413/tools/perf/builtin-top.c 2010-04-14 11:11:58.458238567 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-top.c 2010-04-14 14:28:14.576215651 +0800
@@ -420,8 +420,9 @@ static double sym_weight(const struct sy
}
static long samples;
-static long userspace_samples;
+static long kernel_samples, us_samples;
static long exact_samples;
+static long guest_us_samples, guest_kernel_samples;
static const char CONSOLE_CLEAR[] = " [H [2J";
static void __list_insert_active_sym(struct sym_entry *syme)
@@ -461,7 +462,10 @@ static void print_sym_table(void)
int printed = 0, j;
int counter, snap = !display_weighted ? sym_counter : 0;
float samples_per_sec = samples/delay_secs;
- float ksamples_per_sec = (samples-userspace_samples)/delay_secs;
+ float ksamples_per_sec = kernel_samples/delay_secs;
+ float us_samples_per_sec = (us_samples)/delay_secs;
+ float guest_kernel_samples_per_sec = (guest_kernel_samples)/delay_secs;
+ float guest_us_samples_per_sec = (guest_us_samples)/delay_secs;
float esamples_percent = (100.0*exact_samples)/samples;
float sum_ksamples = 0.0;
struct sym_entry *syme, *n;
@@ -470,7 +474,8 @@ static void print_sym_table(void)
int sym_width = 0, dso_width = 0, dso_short_width = 0;
const int win_width = winsize.ws_col - 1;
- samples = userspace_samples = exact_samples = 0;
+ samples = us_samples = kernel_samples = exact_samples = 0;
+ guest_kernel_samples = guest_us_samples = 0;
/* Sort the active symbols */
pthread_mutex_lock(&active_symbols_lock);
@@ -501,10 +506,21 @@ static void print_sym_table(void)
puts(CONSOLE_CLEAR);
printf("%-*.*s\n", win_width, win_width, graph_dotted_line);
- printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% exact: %4.1f%% [",
- samples_per_sec,
- 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
- esamples_percent);
+ if (!perf_guest) {
+ printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% exact: %4.1f%% [",
+ samples_per_sec,
+ 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
+ esamples_percent);
+ } else {
+ printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% us:%4.1f%%"
+ " guest kernel:%4.1f%% guest us:%4.1f%% exact: %4.1f%% [",
+ samples_per_sec,
+ 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
+ 100.0 - (100.0*((samples_per_sec-us_samples_per_sec)/samples_per_sec)),
+ 100.0 - (100.0*((samples_per_sec-guest_kernel_samples_per_sec)/samples_per_sec)),
+ 100.0 - (100.0*((samples_per_sec-guest_us_samples_per_sec)/samples_per_sec)),
+ esamples_percent);
+ }
if (nr_counters == 1 || !display_weighted) {
printf("%Ld", (u64)attrs[0].sample_period);
@@ -597,7 +613,6 @@ static void print_sym_table(void)
syme = rb_entry(nd, struct sym_entry, rb_node);
sym = sym_entry__symbol(syme);
-
if (++printed > print_entries || (int)syme->snap_count < count_filter)
continue;
@@ -761,7 +776,7 @@ static int key_mapped(int c)
return 0;
}
-static void handle_keypress(int c)
+static void handle_keypress(struct perf_session *session, int c)
{
if (!key_mapped(c)) {
struct pollfd stdin_poll = { .fd = 0, .events = POLLIN };
@@ -830,7 +845,7 @@ static void handle_keypress(int c)
case 'Q':
printf("exiting.\n");
if (dump_symtab)
- dsos__fprintf(stderr);
+ dsos__fprintf(&session->kerninfo_root, stderr);
exit(0);
case 's':
prompt_symbol(&sym_filter_entry, "Enter details symbol");
@@ -866,6 +881,7 @@ static void *display_thread(void *arg __
struct pollfd stdin_poll = { .fd = 0, .events = POLLIN };
struct termios tc, save;
int delay_msecs, c;
+ struct perf_session *session = (struct perf_session *) arg;
tcgetattr(0, &save);
tc = save;
@@ -886,7 +902,7 @@ repeat:
c = getc(stdin);
tcsetattr(0, TCSAFLUSH, &save);
- handle_keypress(c);
+ handle_keypress(session, c);
goto repeat;
return NULL;
@@ -957,24 +973,46 @@ static void event__process_sample(const
u64 ip = self->ip.ip;
struct sym_entry *syme;
struct addr_location al;
+ struct kernel_info *kerninfo;
u8 origin = self->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
++samples;
switch (origin) {
case PERF_RECORD_MISC_USER:
- ++userspace_samples;
+ ++us_samples;
if (hide_user_symbols)
return;
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
break;
case PERF_RECORD_MISC_KERNEL:
+ ++kernel_samples;
if (hide_kernel_symbols)
return;
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
break;
+ case PERF_RECORD_MISC_GUEST_KERNEL:
+ ++guest_kernel_samples;
+ kerninfo = kerninfo__find(&session->kerninfo_root,
+ self->ip.pid);
+ break;
+ case PERF_RECORD_MISC_GUEST_USER:
+ ++guest_us_samples;
+ /*
+ * TODO: we don't process guest user from host side
+ * except simple counting
+ */
+ return;
default:
return;
}
+ if (!kerninfo && perf_guest) {
+ pr_err("Can't find guest [%d]'s kernel information\n",
+ self->ip.pid);
+ return;
+ }
+
if (self->header.misc & PERF_RECORD_MISC_EXACT)
exact_samples++;
@@ -994,7 +1032,7 @@ static void event__process_sample(const
* --hide-kernel-symbols, even if the user specifies an
* invalid --vmlinux ;-)
*/
- if (al.map == session->vmlinux_maps[MAP__FUNCTION] &&
+ if (al.map == kerninfo->vmlinux_maps[MAP__FUNCTION] &&
RB_EMPTY_ROOT(&al.map->dso->symbols[MAP__FUNCTION])) {
pr_err("The %s file can't be used\n",
symbol_conf.vmlinux_name);
@@ -1261,7 +1299,7 @@ static int __cmd_top(void)
perf_session__mmap_read(session);
- if (pthread_create(&thread, NULL, display_thread, NULL)) {
+ if (pthread_create(&thread, NULL, display_thread, session)) {
printf("Could not create display thread.\n");
exit(-1);
}
diff -Nraup linux-2.6_tip0413/tools/perf/Makefile linux-2.6_tip0413_perfkvm/tools/perf/Makefile
--- linux-2.6_tip0413/tools/perf/Makefile 2010-04-14 11:11:58.802281816 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/Makefile 2010-04-14 11:13:17.313858518 +0800
@@ -472,6 +472,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-trace.o
BUILTIN_OBJS += $(OUTPUT)builtin-probe.o
BUILTIN_OBJS += $(OUTPUT)builtin-kmem.o
BUILTIN_OBJS += $(OUTPUT)builtin-lock.o
+BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
PERFLIBS = $(LIB_FILE)
diff -Nraup linux-2.6_tip0413/tools/perf/perf.c linux-2.6_tip0413_perfkvm/tools/perf/perf.c
--- linux-2.6_tip0413/tools/perf/perf.c 2010-04-14 11:11:58.478250552 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/perf.c 2010-04-14 11:13:17.313858518 +0800
@@ -307,6 +307,7 @@ static void handle_internal_command(int
{ "probe", cmd_probe, 0 },
{ "kmem", cmd_kmem, 0 },
{ "lock", cmd_lock, 0 },
+ { "kvm", cmd_kvm, 0 },
};
unsigned int i;
static const char ext[] = STRIP_EXTENSION;
diff -Nraup linux-2.6_tip0413/tools/perf/perf.h linux-2.6_tip0413_perfkvm/tools/perf/perf.h
--- linux-2.6_tip0413/tools/perf/perf.h 2010-04-14 11:11:58.810277694 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/perf.h 2010-04-14 11:13:17.313858518 +0800
@@ -131,4 +131,6 @@ struct ip_callchain {
u64 ips[0];
};
+extern int perf_host, perf_guest;
+
#endif
diff -Nraup linux-2.6_tip0413/tools/perf/util/build-id.c linux-2.6_tip0413_perfkvm/tools/perf/util/build-id.c
--- linux-2.6_tip0413/tools/perf/util/build-id.c 2010-04-14 11:11:58.654213263 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/build-id.c 2010-04-14 11:13:17.317861518 +0800
@@ -24,7 +24,7 @@ static int build_id__mark_dso_hit(event_
}
thread__find_addr_map(thread, session, cpumode, MAP__FUNCTION,
- event->ip.ip, &al);
+ event->ip.pid, event->ip.ip, &al);
if (al.map != NULL)
al.map->dso->hit = 1;
diff -Nraup linux-2.6_tip0413/tools/perf/util/event.c linux-2.6_tip0413_perfkvm/tools/perf/util/event.c
--- linux-2.6_tip0413/tools/perf/util/event.c 2010-04-14 11:11:58.662259868 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/event.c 2010-04-14 15:33:50.903104472 +0800
@@ -112,7 +112,11 @@ static int event__synthesize_mmap_events
event_t ev = {
.header = {
.type = PERF_RECORD_MMAP,
- .misc = 0, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */
+ /*
+ * Just like the kernel, see kernel/perf_event.c
+ * __perf_event_mmap
+ */
+ .misc = PERF_RECORD_MISC_USER,
},
};
int n;
@@ -167,11 +171,23 @@ static int event__synthesize_mmap_events
}
int event__synthesize_modules(event__handler_t process,
- struct perf_session *session)
+ struct perf_session *session,
+ struct kernel_info *kerninfo)
{
struct rb_node *nd;
+ struct map_groups *kmaps = &kerninfo->kmaps;
+ u16 misc;
- for (nd = rb_first(&session->kmaps.maps[MAP__FUNCTION]);
+ /*
+ * kernel uses 0 for user space maps, see kernel/perf_event.c
+ * __perf_event_mmap
+ */
+ if (is_host_kernel(kerninfo))
+ misc = PERF_RECORD_MISC_KERNEL;
+ else
+ misc = PERF_RECORD_MISC_GUEST_KERNEL;
+
+ for (nd = rb_first(&kmaps->maps[MAP__FUNCTION]);
nd; nd = rb_next(nd)) {
event_t ev;
size_t size;
@@ -182,12 +198,13 @@ int event__synthesize_modules(event__han
size = ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
memset(&ev, 0, sizeof(ev));
- ev.mmap.header.misc = 1; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
+ ev.mmap.header.misc = misc;
ev.mmap.header.type = PERF_RECORD_MMAP;
ev.mmap.header.size = (sizeof(ev.mmap) -
(sizeof(ev.mmap.filename) - size));
ev.mmap.start = pos->start;
ev.mmap.len = pos->end - pos->start;
+ ev.mmap.pid = kerninfo->pid;
memcpy(ev.mmap.filename, pos->dso->long_name,
pos->dso->long_name_len + 1);
@@ -250,13 +267,17 @@ static int find_symbol_cb(void *arg, con
int event__synthesize_kernel_mmap(event__handler_t process,
struct perf_session *session,
+ struct kernel_info *kerninfo,
const char *symbol_name)
{
size_t size;
+ const char *filename, *mmap_name;
+ char path[PATH_MAX];
+ struct map *map;
+
event_t ev = {
.header = {
.type = PERF_RECORD_MMAP,
- .misc = 1, /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
},
};
/*
@@ -266,16 +287,38 @@ int event__synthesize_kernel_mmap(event_
*/
struct process_symbol_args args = { .name = symbol_name, };
- if (kallsyms__parse("/proc/kallsyms", &args, find_symbol_cb) <= 0)
+ if (is_host_kernel(kerninfo)) {
+ /*
+ * kernel uses PERF_RECORD_MISC_USER for user space maps,
+ * see kernel/perf_event.c __perf_event_mmap
+ */
+ ev.header.misc = PERF_RECORD_MISC_KERNEL;
+ mmap_name = "kernel.kallsyms";
+ filename = "/proc/kallsyms";
+ } else {
+ ev.header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
+ mmap_name = "guest.kernel.kallsyms";
+ if (is_default_guest(kerninfo))
+ filename = (char *) symbol_conf.default_guest_kallsyms;
+ else {
+ sprintf(path, "%s/proc/kallsyms", kerninfo->root_dir);
+ filename = path;
+ }
+ }
+
+ if (kallsyms__parse(filename, &args, find_symbol_cb) <= 0)
return -ENOENT;
+ map = kerninfo->vmlinux_maps[MAP__FUNCTION];
size = snprintf(ev.mmap.filename, sizeof(ev.mmap.filename),
- "[kernel.kallsyms.%s]", symbol_name) + 1;
+ "[%s.%s]", mmap_name, symbol_name) + 1;
size = ALIGN(size, sizeof(u64));
- ev.mmap.header.size = (sizeof(ev.mmap) - (sizeof(ev.mmap.filename) - size));
+ ev.mmap.header.size = (sizeof(ev.mmap) -
+ (sizeof(ev.mmap.filename) - size));
ev.mmap.pgoff = args.start;
- ev.mmap.start = session->vmlinux_maps[MAP__FUNCTION]->start;
- ev.mmap.len = session->vmlinux_maps[MAP__FUNCTION]->end - ev.mmap.start ;
+ ev.mmap.start = map->start;
+ ev.mmap.len = map->end - ev.mmap.start;
+ ev.mmap.pid = kerninfo->pid;
return process(&ev, session);
}
@@ -329,82 +372,134 @@ int event__process_lost(event_t *self, s
return 0;
}
-int event__process_mmap(event_t *self, struct perf_session *session)
+static void event_set_kernel_mmap_len(struct map **maps, event_t *self)
{
- struct thread *thread;
- struct map *map;
-
- dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
- self->mmap.pid, self->mmap.tid, self->mmap.start,
- self->mmap.len, self->mmap.pgoff, self->mmap.filename);
+ maps[MAP__FUNCTION]->start = self->mmap.start;
+ maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len;
+ /*
+ * Be a bit paranoid here, some perf.data file came with
+ * a zero sized synthesized MMAP event for the kernel.
+ */
+ if (maps[MAP__FUNCTION]->end == 0)
+ maps[MAP__FUNCTION]->end = ~0UL;
+}
- if (self->mmap.pid == 0) {
- static const char kmmap_prefix[] = "[kernel.kallsyms.";
+static int event__process_kernel_mmap(event_t *self,
+ struct perf_session *session)
+{
+ struct map *map;
+ const char *kmmap_prefix, *short_name;
+ struct kernel_info *kerninfo;
+ enum dso_kernel_type kernel_type;
+
+ kerninfo = kerninfo__findnew(&session->kerninfo_root, self->mmap.pid);
+ if (!kerninfo) {
+ pr_err("Can't find id %d's kerninfo\n", self->mmap.pid);
+ goto out_problem;
+ }
- if (self->mmap.filename[0] == '/') {
- char short_module_name[1024];
- char *name = strrchr(self->mmap.filename, '/'), *dot;
-
- if (name == NULL)
- goto out_problem;
-
- ++name; /* skip / */
- dot = strrchr(name, '.');
- if (dot == NULL)
- goto out_problem;
-
- snprintf(short_module_name, sizeof(short_module_name),
- "[%.*s]", (int)(dot - name), name);
- strxfrchar(short_module_name, '-', '_');
-
- map = perf_session__new_module_map(session,
- self->mmap.start,
- self->mmap.filename);
- if (map == NULL)
- goto out_problem;
-
- name = strdup(short_module_name);
- if (name == NULL)
- goto out_problem;
-
- map->dso->short_name = name;
- map->end = map->start + self->mmap.len;
- } else if (memcmp(self->mmap.filename, kmmap_prefix,
+ if (is_host_kernel(kerninfo)) {
+ kmmap_prefix = "[kernel.kallsyms.";
+ short_name = "[kernel.kallsyms]";
+ kernel_type = DSO_TYPE_KERNEL;
+ } else {
+ kmmap_prefix = "[guest.kernel.kallsyms.";
+ short_name = "[guest.kernel.kallsyms]";
+ kernel_type = DSO_TYPE_GUEST_KERNEL;
+ }
+
+ if (self->mmap.filename[0] == '/') {
+
+ char short_module_name[1024];
+ char *name = strrchr(self->mmap.filename, '/'), *dot;
+
+ if (name == NULL)
+ goto out_problem;
+
+ ++name; /* skip / */
+ dot = strrchr(name, '.');
+ if (dot == NULL)
+ goto out_problem;
+
+ snprintf(short_module_name, sizeof(short_module_name),
+ "[%.*s]", (int)(dot - name), name);
+ strxfrchar(short_module_name, '-', '_');
+
+ map = map_groups__new_module(&kerninfo->kmaps,
+ self->mmap.start,
+ self->mmap.filename,
+ kerninfo);
+ if (map == NULL)
+ goto out_problem;
+
+ name = strdup(short_module_name);
+ if (name == NULL)
+ goto out_problem;
+
+ map->dso->short_name = name;
+ map->end = map->start + self->mmap.len;
+ } else if (memcmp(self->mmap.filename, kmmap_prefix,
sizeof(kmmap_prefix) - 1) == 0) {
- const char *symbol_name = (self->mmap.filename +
- sizeof(kmmap_prefix) - 1);
+ const char *symbol_name = (self->mmap.filename +
+ sizeof(kmmap_prefix) - 1);
+ /*
+ * Should be there already, from the build-id table in
+ * the header.
+ */
+ struct dso *kernel = __dsos__findnew(&kerninfo->dsos__kernel,
+ short_name);
+ if (kernel == NULL)
+ goto out_problem;
+
+ kernel->kernel = kernel_type;
+ if (__map_groups__create_kernel_maps(&kerninfo->kmaps,
+ kerninfo->vmlinux_maps, kernel) < 0)
+ goto out_problem;
+
+ event_set_kernel_mmap_len(kerninfo->vmlinux_maps, self);
+ perf_session__set_kallsyms_ref_reloc_sym(kerninfo->vmlinux_maps,
+ symbol_name,
+ self->mmap.pgoff);
+ if (is_default_guest(kerninfo)) {
/*
- * Should be there already, from the build-id table in
- * the header.
+ * preload dso of guest kernel and modules
*/
- struct dso *kernel = __dsos__findnew(&dsos__kernel,
- "[kernel.kallsyms]");
- if (kernel == NULL)
- goto out_problem;
-
- kernel->kernel = 1;
- if (__perf_session__create_kernel_maps(session, kernel) < 0)
- goto out_problem;
+ dso__load(kernel,
+ kerninfo->vmlinux_maps[MAP__FUNCTION],
+ NULL);
+ }
+ }
+ return 0;
+out_problem:
+ return -1;
+}
- session->vmlinux_maps[MAP__FUNCTION]->start = self->mmap.start;
- session->vmlinux_maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len;
- /*
- * Be a bit paranoid here, some perf.data file came with
- * a zero sized synthesized MMAP event for the kernel.
- */
- if (session->vmlinux_maps[MAP__FUNCTION]->end == 0)
- session->vmlinux_maps[MAP__FUNCTION]->end = ~0UL;
+int event__process_mmap(event_t *self, struct perf_session *session)
+{
+ struct kernel_info *kerninfo;
+ struct thread *thread;
+ struct map *map;
+ u8 cpumode = self->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+ int ret = 0;
- perf_session__set_kallsyms_ref_reloc_sym(session, symbol_name,
- self->mmap.pgoff);
- }
+ dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
+ self->mmap.pid, self->mmap.tid, self->mmap.start,
+ self->mmap.len, self->mmap.pgoff, self->mmap.filename);
+
+ if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
+ cpumode == PERF_RECORD_MISC_KERNEL) {
+ ret = event__process_kernel_mmap(self, session);
+ if (ret < 0)
+ goto out_problem;
return 0;
}
thread = perf_session__findnew(session, self->mmap.pid);
- map = map__new(self->mmap.start, self->mmap.len, self->mmap.pgoff,
- self->mmap.pid, self->mmap.filename, MAP__FUNCTION,
- session->cwd, session->cwdlen);
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
+ map = map__new(&kerninfo->dsos__user, self->mmap.start,
+ self->mmap.len, self->mmap.pgoff,
+ self->mmap.pid, self->mmap.filename,
+ MAP__FUNCTION, session->cwd, session->cwdlen);
if (thread == NULL || map == NULL)
goto out_problem;
@@ -444,22 +539,52 @@ int event__process_task(event_t *self, s
void thread__find_addr_map(struct thread *self,
struct perf_session *session, u8 cpumode,
- enum map_type type, u64 addr,
+ enum map_type type, pid_t pid, u64 addr,
struct addr_location *al)
{
struct map_groups *mg = &self->mg;
+ struct kernel_info *kerninfo = NULL;
al->thread = self;
al->addr = addr;
+ al->cpumode = cpumode;
+ al->filtered = false;
- if (cpumode == PERF_RECORD_MISC_KERNEL) {
+ if (cpumode == PERF_RECORD_MISC_KERNEL && perf_host) {
al->level = 'k';
- mg = &session->kmaps;
- } else if (cpumode == PERF_RECORD_MISC_USER)
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
+ mg = &kerninfo->kmaps;
+ } else if (cpumode == PERF_RECORD_MISC_USER && perf_host) {
al->level = '.';
- else {
- al->level = 'H';
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
+ } else if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL && perf_guest) {
+ al->level = 'g';
+ kerninfo = kerninfo__find(&session->kerninfo_root, pid);
+ if (!kerninfo) {
+ al->map = NULL;
+ return;
+ }
+ mg = &kerninfo->kmaps;
+ } else {
+ /*
+ * 'u' means guest os user space.
+ * TODO: We don't support guest user space. Might support late.
+ */
+ if (cpumode == PERF_RECORD_MISC_GUEST_USER && perf_guest)
+ al->level = 'u';
+ else
+ al->level = 'H';
al->map = NULL;
+
+ if ((cpumode == PERF_RECORD_MISC_GUEST_USER ||
+ cpumode == PERF_RECORD_MISC_GUEST_KERNEL) &&
+ !perf_guest)
+ al->filtered = true;
+ if ((cpumode == PERF_RECORD_MISC_USER ||
+ cpumode == PERF_RECORD_MISC_KERNEL) &&
+ !perf_host)
+ al->filtered = true;
+
return;
}
try_again:
@@ -474,8 +599,11 @@ try_again:
* "[vdso]" dso, but for now lets use the old trick of looking
* in the whole kernel symbol list.
*/
- if ((long long)al->addr < 0 && mg != &session->kmaps) {
- mg = &session->kmaps;
+ if ((long long)al->addr < 0 &&
+ cpumode == PERF_RECORD_MISC_KERNEL &&
+ kerninfo &&
+ mg != &kerninfo->kmaps) {
+ mg = &kerninfo->kmaps;
goto try_again;
}
} else
@@ -484,11 +612,11 @@ try_again:
void thread__find_addr_location(struct thread *self,
struct perf_session *session, u8 cpumode,
- enum map_type type, u64 addr,
+ enum map_type type, pid_t pid, u64 addr,
struct addr_location *al,
symbol_filter_t filter)
{
- thread__find_addr_map(self, session, cpumode, type, addr, al);
+ thread__find_addr_map(self, session, cpumode, type, pid, addr, al);
if (al->map != NULL)
al->sym = map__find_symbol(al->map, al->addr, filter);
else
@@ -524,7 +652,7 @@ int event__preprocess_sample(const event
dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
thread__find_addr_map(thread, session, cpumode, MAP__FUNCTION,
- self->ip.ip, al);
+ self->ip.pid, self->ip.ip, al);
dump_printf(" ...... dso: %s\n",
al->map ? al->map->dso->long_name :
al->level == 'H' ? "[hypervisor]" : "<not found>");
@@ -554,7 +682,6 @@ int event__preprocess_sample(const event
!strlist__has_entry(symbol_conf.sym_list, al->sym->name))
goto out_filtered;
- al->filtered = false;
return 0;
out_filtered:
diff -Nraup linux-2.6_tip0413/tools/perf/util/event.h linux-2.6_tip0413_perfkvm/tools/perf/util/event.h
--- linux-2.6_tip0413/tools/perf/util/event.h 2010-04-14 11:11:58.638239002 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/event.h 2010-04-14 14:12:02.533688079 +0800
@@ -79,6 +79,7 @@ struct sample_data {
struct build_id_event {
struct perf_event_header header;
+ pid_t pid;
u8 build_id[ALIGN(BUILD_ID_SIZE, sizeof(u64))];
char filename[];
};
@@ -119,10 +120,13 @@ int event__synthesize_thread(pid_t pid,
void event__synthesize_threads(event__handler_t process,
struct perf_session *session);
int event__synthesize_kernel_mmap(event__handler_t process,
- struct perf_session *session,
- const char *symbol_name);
+ struct perf_session *session,
+ struct kernel_info *kerninfo,
+ const char *symbol_name);
+
int event__synthesize_modules(event__handler_t process,
- struct perf_session *session);
+ struct perf_session *session,
+ struct kernel_info *kerninfo);
int event__process_comm(event_t *self, struct perf_session *session);
int event__process_lost(event_t *self, struct perf_session *session);
diff -Nraup linux-2.6_tip0413/tools/perf/util/header.c linux-2.6_tip0413_perfkvm/tools/perf/util/header.c
--- linux-2.6_tip0413/tools/perf/util/header.c 2010-04-14 11:11:58.594236160 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/header.c 2010-04-14 11:13:17.317861518 +0800
@@ -197,7 +197,8 @@ static int write_padded(int fd, const vo
continue; \
else
-static int __dsos__write_buildid_table(struct list_head *head, u16 misc, int fd)
+static int __dsos__write_buildid_table(struct list_head *head, pid_t pid,
+ u16 misc, int fd)
{
struct dso *pos;
@@ -212,6 +213,7 @@ static int __dsos__write_buildid_table(s
len = ALIGN(len, NAME_ALIGN);
memset(&b, 0, sizeof(b));
memcpy(&b.build_id, pos->build_id, sizeof(pos->build_id));
+ b.pid = pid;
b.header.misc = misc;
b.header.size = sizeof(b) + len;
err = do_write(fd, &b, sizeof(b));
@@ -226,13 +228,33 @@ static int __dsos__write_buildid_table(s
return 0;
}
-static int dsos__write_buildid_table(int fd)
+static int dsos__write_buildid_table(struct perf_header *header, int fd)
{
- int err = __dsos__write_buildid_table(&dsos__kernel,
- PERF_RECORD_MISC_KERNEL, fd);
- if (err == 0)
- err = __dsos__write_buildid_table(&dsos__user,
- PERF_RECORD_MISC_USER, fd);
+ struct perf_session *session = container_of(header,
+ struct perf_session, header);
+ struct rb_node *nd;
+ int err = 0;
+ u16 kmisc, umisc;
+
+ for (nd = rb_first(&session->kerninfo_root); nd; nd = rb_next(nd)) {
+ struct kernel_info *pos = rb_entry(nd, struct kernel_info,
+ rb_node);
+ if (is_host_kernel(pos)) {
+ kmisc = PERF_RECORD_MISC_KERNEL;
+ umisc = PERF_RECORD_MISC_USER;
+ } else {
+ kmisc = PERF_RECORD_MISC_GUEST_KERNEL;
+ umisc = PERF_RECORD_MISC_GUEST_USER;
+ }
+
+ err = __dsos__write_buildid_table(&pos->dsos__kernel, pos->pid,
+ kmisc, fd);
+ if (err == 0)
+ err = __dsos__write_buildid_table(&pos->dsos__user,
+ pos->pid, umisc, fd);
+ if (err)
+ break;
+ }
return err;
}
@@ -349,9 +371,12 @@ static int __dsos__cache_build_ids(struc
return err;
}
-static int dsos__cache_build_ids(void)
+static int dsos__cache_build_ids(struct perf_header *self)
{
- int err_kernel, err_user;
+ struct perf_session *session = container_of(self,
+ struct perf_session, header);
+ struct rb_node *nd;
+ int ret = 0;
char debugdir[PATH_MAX];
snprintf(debugdir, sizeof(debugdir), "%s/%s", getenv("HOME"),
@@ -360,9 +385,30 @@ static int dsos__cache_build_ids(void)
if (mkdir(debugdir, 0755) != 0 && errno != EEXIST)
return -1;
- err_kernel = __dsos__cache_build_ids(&dsos__kernel, debugdir);
- err_user = __dsos__cache_build_ids(&dsos__user, debugdir);
- return err_kernel || err_user ? -1 : 0;
+ for (nd = rb_first(&session->kerninfo_root); nd; nd = rb_next(nd)) {
+ struct kernel_info *pos = rb_entry(nd, struct kernel_info,
+ rb_node);
+ ret |= __dsos__cache_build_ids(&pos->dsos__kernel, debugdir);
+ ret |= __dsos__cache_build_ids(&pos->dsos__user, debugdir);
+ }
+ return ret ? -1 : 0;
+}
+
+static bool dsos__read_build_ids(struct perf_header *self, bool with_hits)
+{
+ bool ret = false;
+ struct perf_session *session = container_of(self,
+ struct perf_session, header);
+ struct rb_node *nd;
+
+ for (nd = rb_first(&session->kerninfo_root); nd; nd = rb_next(nd)) {
+ struct kernel_info *pos = rb_entry(nd, struct kernel_info,
+ rb_node);
+ ret |= __dsos__read_build_ids(&pos->dsos__kernel, with_hits);
+ ret |= __dsos__read_build_ids(&pos->dsos__user, with_hits);
+ }
+
+ return ret;
}
static int perf_header__adds_write(struct perf_header *self, int fd)
@@ -373,7 +419,7 @@ static int perf_header__adds_write(struc
u64 sec_start;
int idx = 0, err;
- if (dsos__read_build_ids(true))
+ if (dsos__read_build_ids(self, true))
perf_header__set_feat(self, HEADER_BUILD_ID);
nr_sections = bitmap_weight(self->adds_features, HEADER_FEAT_BITS);
@@ -408,14 +454,14 @@ static int perf_header__adds_write(struc
/* Write build-ids */
buildid_sec->offset = lseek(fd, 0, SEEK_CUR);
- err = dsos__write_buildid_table(fd);
+ err = dsos__write_buildid_table(self, fd);
if (err < 0) {
pr_debug("failed to write buildid table\n");
goto out_free;
}
buildid_sec->size = lseek(fd, 0, SEEK_CUR) -
buildid_sec->offset;
- dsos__cache_build_ids();
+ dsos__cache_build_ids(self);
}
lseek(fd, sec_start, SEEK_SET);
@@ -636,6 +682,72 @@ int perf_file_header__read(struct perf_f
return 0;
}
+static int perf_header__read_build_ids(struct perf_header *self,
+ int input, u64 offset, u64 size)
+{
+ struct perf_session *session = container_of(self,
+ struct perf_session, header);
+ struct build_id_event bev;
+ char filename[PATH_MAX];
+ u64 limit = offset + size;
+ int err = -1;
+ struct list_head *head;
+ struct kernel_info *kerninfo;
+ u16 misc;
+
+ while (offset < limit) {
+ struct dso *dso;
+ ssize_t len;
+ enum dso_kernel_type dso_type;
+
+ if (read(input, &bev, sizeof(bev)) != sizeof(bev))
+ goto out;
+
+ kerninfo = kerninfo__findnew(&session->kerninfo_root, bev.pid);
+ if (!kerninfo)
+ goto out;
+
+ if (self->needs_swap)
+ perf_event_header__bswap(&bev.header);
+
+ len = bev.header.size - sizeof(bev);
+ if (read(input, filename, len) != len)
+ goto out;
+
+ misc = bev.header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+ switch(misc) {
+ case PERF_RECORD_MISC_KERNEL:
+ dso_type = DSO_TYPE_KERNEL;
+ head = &kerninfo->dsos__kernel;
+ break;
+ case PERF_RECORD_MISC_GUEST_KERNEL:
+ dso_type = DSO_TYPE_GUEST_KERNEL;
+ head = &kerninfo->dsos__kernel;
+ break;
+ case PERF_RECORD_MISC_USER:
+ case PERF_RECORD_MISC_GUEST_USER:
+ dso_type = DSO_TYPE_USER;
+ head = &kerninfo->dsos__user;
+ break;
+ default:
+ goto out;
+ }
+
+ dso = __dsos__findnew(head, filename);
+ if (dso != NULL) {
+ dso__set_build_id(dso, &bev.build_id);
+ if (filename[0] == '[')
+ dso->kernel = dso_type;
+ }
+
+ offset += bev.header.size;
+ }
+ err = 0;
+out:
+ return err;
+}
+
static int perf_file_section__process(struct perf_file_section *self,
struct perf_header *ph,
int feat, int fd)
diff -Nraup linux-2.6_tip0413/tools/perf/util/hist.c linux-2.6_tip0413_perfkvm/tools/perf/util/hist.c
--- linux-2.6_tip0413/tools/perf/util/hist.c 2010-04-14 11:11:58.766255670 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/hist.c 2010-04-14 16:02:22.299845756 +0800
@@ -8,6 +8,30 @@ struct callchain_param callchain_param =
.min_percent = 0.5
};
+void __perf_session__add_count(struct hist_entry *he,
+ struct addr_location *al,
+ u64 count)
+{
+ he->count += count;
+
+ switch (al->cpumode) {
+ case PERF_RECORD_MISC_KERNEL:
+ he->count_sys += count;
+ break;
+ case PERF_RECORD_MISC_USER:
+ he->count_us += count;
+ break;
+ case PERF_RECORD_MISC_GUEST_KERNEL:
+ he->count_guest_sys += count;
+ break;
+ case PERF_RECORD_MISC_GUEST_USER:
+ he->count_guest_us += count;
+ break;
+ default:
+ break;
+ }
+}
+
/*
* histogram, sorted on item, collects counts
*/
@@ -464,7 +488,7 @@ int hist_entry__snprintf(struct hist_ent
u64 session_total)
{
struct sort_entry *se;
- u64 count, total;
+ u64 count, total, count_sys, count_us, count_guest_sys, count_guest_us;
const char *sep = symbol_conf.field_sep;
int ret;
@@ -474,9 +498,17 @@ int hist_entry__snprintf(struct hist_ent
if (pair_session) {
count = self->pair ? self->pair->count : 0;
total = pair_session->events_stats.total;
+ count_sys = self->pair ? self->pair->count_sys : 0;
+ count_us = self->pair ? self->pair->count_us : 0;
+ count_guest_sys = self->pair ? self->pair->count_guest_sys : 0;
+ count_guest_us = self->pair ? self->pair->count_guest_us : 0;
} else {
count = self->count;
total = session_total;
+ count_sys = self->count_sys;
+ count_us = self->count_us;
+ count_guest_sys = self->count_guest_sys;
+ count_guest_us = self->count_guest_us;
}
if (total) {
@@ -487,6 +519,22 @@ int hist_entry__snprintf(struct hist_ent
else
ret = snprintf(s, size, sep ? "%.2f" : " %6.2f%%",
(count * 100.0) / total);
+ if (symbol_conf.show_cpu_utilization) {
+ ret += percent_color_snprintf(s + ret, size - ret,
+ sep ? "%.2f" : " %6.2f%%",
+ (count_sys * 100.0) / total);
+ ret += percent_color_snprintf(s + ret, size - ret,
+ sep ? "%.2f" : " %6.2f%%",
+ (count_us * 100.0) / total);
+ if (perf_guest) {
+ ret += percent_color_snprintf(s + ret, size - ret,
+ sep ? "%.2f" : " %6.2f%%",
+ (count_guest_sys * 100.0) / total);
+ ret += percent_color_snprintf(s + ret, size - ret,
+ sep ? "%.2f" : " %6.2f%%",
+ (count_guest_us * 100.0) / total);
+ }
+ }
} else
ret = snprintf(s, size, sep ? "%lld" : "%12lld ", count);
@@ -597,6 +645,24 @@ size_t perf_session__fprintf_hists(struc
fputs(" Samples ", fp);
}
+ if (symbol_conf.show_cpu_utilization) {
+ if (sep) {
+ ret += fprintf(fp, "%csys", *sep);
+ ret += fprintf(fp, "%cus", *sep);
+ if (perf_guest) {
+ ret += fprintf(fp, "%cguest sys", *sep);
+ ret += fprintf(fp, "%cguest us", *sep);
+ }
+ } else {
+ ret += fprintf(fp, " sys ");
+ ret += fprintf(fp, " us ");
+ if (perf_guest) {
+ ret += fprintf(fp, " guest sys ");
+ ret += fprintf(fp, " guest us ");
+ }
+ }
+ }
+
if (pair) {
if (sep)
ret += fprintf(fp, "%cDelta", *sep);
diff -Nraup linux-2.6_tip0413/tools/perf/util/hist.h linux-2.6_tip0413_perfkvm/tools/perf/util/hist.h
--- linux-2.6_tip0413/tools/perf/util/hist.h 2010-04-14 11:11:58.674215806 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/hist.h 2010-04-14 11:13:17.317861518 +0800
@@ -12,6 +12,9 @@ struct addr_location;
struct symbol;
struct rb_root;
+void __perf_session__add_count(struct hist_entry *he,
+ struct addr_location *al,
+ u64 count);
struct hist_entry *__perf_session__add_hist_entry(struct rb_root *hists,
struct addr_location *al,
struct symbol *parent,
diff -Nraup linux-2.6_tip0413/tools/perf/util/map.c linux-2.6_tip0413_perfkvm/tools/perf/util/map.c
--- linux-2.6_tip0413/tools/perf/util/map.c 2010-04-14 11:11:58.642241284 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/map.c 2010-04-14 16:08:55.377366557 +0800
@@ -4,6 +4,7 @@
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
+#include <unistd.h>
#include "map.h"
const char *map_type__name[MAP__NR_TYPES] = {
@@ -37,9 +38,11 @@ void map__init(struct map *self, enum ma
self->map_ip = map__map_ip;
self->unmap_ip = map__unmap_ip;
RB_CLEAR_NODE(&self->rb_node);
+ self->groups = NULL;
}
-struct map *map__new(u64 start, u64 len, u64 pgoff, u32 pid, char *filename,
+struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
+ u64 pgoff, u32 pid, char *filename,
enum map_type type, char *cwd, int cwdlen)
{
struct map *self = malloc(sizeof(*self));
@@ -66,7 +69,7 @@ struct map *map__new(u64 start, u64 len,
filename = newfilename;
}
- dso = dsos__findnew(filename);
+ dso = __dsos__findnew(dsos__list, filename);
if (dso == NULL)
goto out_delete;
@@ -242,6 +245,7 @@ void map_groups__init(struct map_groups
self->maps[i] = RB_ROOT;
INIT_LIST_HEAD(&self->removed_maps[i]);
}
+ self->this_kerninfo = NULL;
}
void map_groups__flush(struct map_groups *self)
@@ -508,3 +512,123 @@ struct map *maps__find(struct rb_root *m
return NULL;
}
+
+struct kernel_info * add_new_kernel_info(struct rb_root *kerninfo_root,
+ pid_t pid, const char * root_dir)
+{
+ struct rb_node **p = &kerninfo_root->rb_node;
+ struct rb_node *parent = NULL;
+ struct kernel_info *kerninfo, *pos;
+
+ kerninfo = malloc(sizeof(struct kernel_info));
+ if (!kerninfo)
+ return NULL;
+
+ kerninfo->pid = pid;
+ map_groups__init(&kerninfo->kmaps);
+ kerninfo->root_dir = strdup(root_dir);
+ RB_CLEAR_NODE(&kerninfo->rb_node);
+ INIT_LIST_HEAD(&kerninfo->dsos__user);
+ INIT_LIST_HEAD(&kerninfo->dsos__kernel);
+ kerninfo->kmaps.this_kerninfo = kerninfo;
+
+ while (*p != NULL) {
+ parent = *p;
+ pos = rb_entry(parent, struct kernel_info, rb_node);
+ if (pid < pos->pid)
+ p = &(*p)->rb_left;
+ else
+ p = &(*p)->rb_right;
+ }
+
+ rb_link_node(&kerninfo->rb_node, parent, p);
+ rb_insert_color(&kerninfo->rb_node, kerninfo_root);
+
+ return kerninfo;
+}
+
+struct kernel_info *kerninfo__find(struct rb_root *kerninfo_root, pid_t pid)
+{
+ struct rb_node **p = &kerninfo_root->rb_node;
+ struct rb_node *parent = NULL;
+ struct kernel_info *kerninfo;
+ struct kernel_info *default_kerninfo = NULL;
+
+ while (*p != NULL) {
+ parent = *p;
+ kerninfo = rb_entry(parent, struct kernel_info, rb_node);
+ if (pid < kerninfo->pid)
+ p = &(*p)->rb_left;
+ else if (pid > kerninfo->pid)
+ p = &(*p)->rb_right;
+ else
+ return kerninfo;
+ if (!kerninfo->pid)
+ default_kerninfo = kerninfo;
+ }
+
+ return default_kerninfo;
+}
+
+struct kernel_info *kerninfo__findhost(struct rb_root *kerninfo_root)
+{
+ struct rb_node **p = &kerninfo_root->rb_node;
+ struct rb_node *parent = NULL;
+ struct kernel_info *kerninfo;
+ pid_t pid = HOST_KERNEL_ID;
+
+ while (*p != NULL) {
+ parent = *p;
+ kerninfo = rb_entry(parent, struct kernel_info, rb_node);
+ if (pid < kerninfo->pid)
+ p = &(*p)->rb_left;
+ else if (pid > kerninfo->pid)
+ p = &(*p)->rb_right;
+ else
+ return kerninfo;
+ }
+
+ return NULL;
+}
+
+struct kernel_info *kerninfo__findnew(struct rb_root *kerninfo_root, pid_t pid)
+{
+ char path[PATH_MAX];
+ const char * root_dir;
+ int ret;
+ struct kernel_info *kerninfo = kerninfo__find(kerninfo_root, pid);
+
+ if (!kerninfo || kerninfo->pid != pid) {
+ if (pid == HOST_KERNEL_ID || pid == DEFAULT_GUEST_KERNEL_ID)
+ root_dir = "";
+ else {
+ if (!symbol_conf.guestmount)
+ goto out;
+ sprintf(path, "%s/%d", symbol_conf.guestmount, pid);
+ ret = access(path, R_OK);
+ if (ret) {
+ pr_err("Can't access file %s\n", path);
+ goto out;
+ }
+ root_dir = path;
+ }
+ kerninfo = add_new_kernel_info(kerninfo_root, pid, root_dir);
+ }
+
+out:
+ return kerninfo;
+}
+
+void kerninfo__process_allkernels(struct rb_root *kerninfo_root,
+ process_kernel_info process,
+ void * data)
+{
+ struct rb_node *nd;
+
+ for (nd = rb_first(kerninfo_root); nd; nd = rb_next(nd)) {
+ struct kernel_info *pos = rb_entry(nd, struct kernel_info,
+ rb_node);
+ process(pos, data);
+ }
+}
+
diff -Nraup linux-2.6_tip0413/tools/perf/util/map.h linux-2.6_tip0413_perfkvm/tools/perf/util/map.h
--- linux-2.6_tip0413/tools/perf/util/map.h 2010-04-14 11:11:58.686216105 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/map.h 2010-04-14 16:12:24.683245583 +0800
@@ -19,6 +19,7 @@ extern const char *map_type__name[MAP__N
struct dso;
struct ref_reloc_sym;
struct map_groups;
+struct kernel_info;
struct map {
union {
@@ -36,6 +37,7 @@ struct map {
u64 (*unmap_ip)(struct map *, u64);
struct dso *dso;
+ struct map_groups *groups;
};
struct kmap {
@@ -43,6 +45,26 @@ struct kmap {
struct map_groups *kmaps;
};
+struct map_groups {
+ struct rb_root maps[MAP__NR_TYPES];
+ struct list_head removed_maps[MAP__NR_TYPES];
+ struct kernel_info *this_kerninfo;
+};
+
+/* Native host kernel uses -1 as pid index in kernel_info */
+#define HOST_KERNEL_ID (-1)
+#define DEFAULT_GUEST_KERNEL_ID (0)
+
+struct kernel_info {
+ struct rb_node rb_node;
+ pid_t pid;
+ char * root_dir;
+ struct list_head dsos__user;
+ struct list_head dsos__kernel;
+ struct map_groups kmaps;
+ struct map *vmlinux_maps[MAP__NR_TYPES];
+};
+
static inline struct kmap *map__kmap(struct map *self)
{
return (struct kmap *)(self + 1);
@@ -74,7 +96,8 @@ typedef int (*symbol_filter_t)(struct ma
void map__init(struct map *self, enum map_type type,
u64 start, u64 end, u64 pgoff, struct dso *dso);
-struct map *map__new(u64 start, u64 len, u64 pgoff, u32 pid, char *filename,
+struct map *map__new(struct list_head *dsos__list, u64 start, u64 len,
+ u64 pgoff, u32 pid, char *filename,
enum map_type type, char *cwd, int cwdlen);
void map__delete(struct map *self);
struct map *map__clone(struct map *self);
@@ -91,11 +114,6 @@ void map__fixup_end(struct map *self);
void map__reloc_vmlinux(struct map *self);
-struct map_groups {
- struct rb_root maps[MAP__NR_TYPES];
- struct list_head removed_maps[MAP__NR_TYPES];
-};
-
size_t __map_groups__fprintf_maps(struct map_groups *self,
enum map_type type, int verbose, FILE *fp);
void maps__insert(struct rb_root *maps, struct map *map);
@@ -106,9 +124,39 @@ int map_groups__clone(struct map_groups
size_t map_groups__fprintf(struct map_groups *self, int verbose, FILE *fp);
size_t map_groups__fprintf_maps(struct map_groups *self, int verbose, FILE *fp);
+struct kernel_info * add_new_kernel_info(struct rb_root *kerninfo_root,
+ pid_t pid, const char * root_dir);
+struct kernel_info *kerninfo__find(struct rb_root *kerninfo_root, pid_t pid);
+struct kernel_info *kerninfo__findnew(struct rb_root *kerninfo_root, pid_t pid);
+struct kernel_info *kerninfo__findhost(struct rb_root *kerninfo_root);
+
+/*
+ * Default guest kernel is defined by parameter --guestkallsyms
+ * and --guestmodules
+ */
+static inline int is_default_guest(struct kernel_info * kerninfo)
+{
+ if (!kerninfo)
+ return 0;
+ return kerninfo->pid == DEFAULT_GUEST_KERNEL_ID;
+}
+
+static inline int is_host_kernel(struct kernel_info * kerninfo)
+{
+ if (!kerninfo)
+ return 0;
+ return kerninfo->pid == HOST_KERNEL_ID;
+}
+
+typedef void (*process_kernel_info)(struct kernel_info *kerninfo, void *data);
+void kerninfo__process_allkernels(struct rb_root *kerninfo_root,
+ process_kernel_info process,
+ void * data);
+
static inline void map_groups__insert(struct map_groups *self, struct map *map)
{
- maps__insert(&self->maps[map->type], map);
+ maps__insert(&self->maps[map->type], map);
+ map->groups = self;
}
static inline struct map *map_groups__find(struct map_groups *self,
@@ -148,13 +196,11 @@ int map_groups__fixup_overlappings(struc
struct map *map_groups__find_by_name(struct map_groups *self,
enum map_type type, const char *name);
-int __map_groups__create_kernel_maps(struct map_groups *self,
- struct map *vmlinux_maps[MAP__NR_TYPES],
- struct dso *kernel);
-int map_groups__create_kernel_maps(struct map_groups *self,
- struct map *vmlinux_maps[MAP__NR_TYPES]);
-struct map *map_groups__new_module(struct map_groups *self, u64 start,
- const char *filename);
+struct map *map_groups__new_module(struct map_groups *self,
+ u64 start,
+ const char *filename,
+ struct kernel_info *kerninfo);
+
void map_groups__flush(struct map_groups *self);
#endif /* __PERF_MAP_H */
diff -Nraup linux-2.6_tip0413/tools/perf/util/probe-event.c linux-2.6_tip0413_perfkvm/tools/perf/util/probe-event.c
--- linux-2.6_tip0413/tools/perf/util/probe-event.c 2010-04-14 11:11:58.614279111 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/probe-event.c 2010-04-14 11:13:17.321860837 +0800
@@ -78,6 +78,8 @@ static struct map *kmaps[MAP__NR_TYPES];
/* Initialize symbol maps and path of vmlinux */
static void init_vmlinux(void)
{
+ struct dso *kernel;
+
symbol_conf.sort_by_name = true;
if (symbol_conf.vmlinux_name == NULL)
symbol_conf.try_vmlinux_path = true;
@@ -86,8 +88,12 @@ static void init_vmlinux(void)
if (symbol__init() < 0)
die("Failed to init symbol map.");
+ kernel = dso__new_kernel(symbol_conf.vmlinux_name);
+ if (kernel == NULL)
+ die("Failed to create kernel dso.");
+
map_groups__init(&kmap_groups);
- if (map_groups__create_kernel_maps(&kmap_groups, kmaps) < 0)
+ if (__map_groups__create_kernel_maps(&kmap_groups, kmaps, kernel) < 0)
die("Failed to create kernel maps.");
}
diff -Nraup linux-2.6_tip0413/tools/perf/util/session.c linux-2.6_tip0413_perfkvm/tools/perf/util/session.c
--- linux-2.6_tip0413/tools/perf/util/session.c 2010-04-14 11:11:58.794254600 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/session.c 2010-04-14 16:15:56.564948860 +0800
@@ -52,6 +52,17 @@ out_close:
return -1;
}
+int perf_session__create_kernel_maps(struct perf_session *self)
+{
+ int ret;
+ struct rb_root *root = &self->kerninfo_root;
+
+ ret = map_groups__create_kernel_maps(root, HOST_KERNEL_ID);
+ if (ret >= 0)
+ ret = map_groups__create_guest_kernel_maps(root);
+ return ret;
+}
+
struct perf_session *perf_session__new(const char *filename, int mode, bool force)
{
size_t len = filename ? strlen(filename) + 1 : 0;
@@ -71,7 +82,7 @@ struct perf_session *perf_session__new(c
self->cwd = NULL;
self->cwdlen = 0;
self->unknown_events = 0;
- map_groups__init(&self->kmaps);
+ self->kerninfo_root = RB_ROOT;
if (mode == O_RDONLY) {
if (perf_session__open(self, force) < 0)
@@ -142,8 +153,9 @@ struct map_symbol *perf_session__resolve
continue;
}
+ al.filtered = false;
thread__find_addr_location(thread, self, cpumode,
- MAP__FUNCTION, ip, &al, NULL);
+ MAP__FUNCTION, thread->pid, ip, &al, NULL);
if (al.sym != NULL) {
if (sort__has_parent && !*parent &&
symbol__match_parent_regex(al.sym))
@@ -324,46 +336,6 @@ void perf_event_header__bswap(struct per
self->size = bswap_16(self->size);
}
-int perf_header__read_build_ids(struct perf_header *self,
- int input, u64 offset, u64 size)
-{
- struct build_id_event bev;
- char filename[PATH_MAX];
- u64 limit = offset + size;
- int err = -1;
-
- while (offset < limit) {
- struct dso *dso;
- ssize_t len;
- struct list_head *head = &dsos__user;
-
- if (read(input, &bev, sizeof(bev)) != sizeof(bev))
- goto out;
-
- if (self->needs_swap)
- perf_event_header__bswap(&bev.header);
-
- len = bev.header.size - sizeof(bev);
- if (read(input, filename, len) != len)
- goto out;
-
- if (bev.header.misc & PERF_RECORD_MISC_KERNEL)
- head = &dsos__kernel;
-
- dso = __dsos__findnew(head, filename);
- if (dso != NULL) {
- dso__set_build_id(dso, &bev.build_id);
- if (head == &dsos__kernel && filename[0] == '[')
- dso->kernel = 1;
- }
-
- offset += bev.header.size;
- }
- err = 0;
-out:
- return err;
-}
-
static struct thread *perf_session__register_idle_thread(struct perf_session *self)
{
struct thread *thread = perf_session__findnew(self, 0);
@@ -516,26 +488,33 @@ bool perf_session__has_traces(struct per
return true;
}
-int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self,
+int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps,
const char *symbol_name,
u64 addr)
{
char *bracket;
enum map_type i;
+ struct ref_reloc_sym *ref;
+
+ ref = zalloc(sizeof(struct ref_reloc_sym));
+ if (ref == NULL)
+ return -ENOMEM;
- self->ref_reloc_sym.name = strdup(symbol_name);
- if (self->ref_reloc_sym.name == NULL)
+ ref->name = strdup(symbol_name);
+ if (ref->name == NULL) {
+ free(ref);
return -ENOMEM;
+ }
- bracket = strchr(self->ref_reloc_sym.name, ']');
+ bracket = strchr(ref->name, ']');
if (bracket)
*bracket = '\0';
- self->ref_reloc_sym.addr = addr;
+ ref->addr = addr;
for (i = 0; i < MAP__NR_TYPES; ++i) {
- struct kmap *kmap = map__kmap(self->vmlinux_maps[i]);
- kmap->ref_reloc_sym = &self->ref_reloc_sym;
+ struct kmap *kmap = map__kmap(maps[i]);
+ kmap->ref_reloc_sym = ref;
}
return 0;
diff -Nraup linux-2.6_tip0413/tools/perf/util/session.h linux-2.6_tip0413_perfkvm/tools/perf/util/session.h
--- linux-2.6_tip0413/tools/perf/util/session.h 2010-04-14 11:11:58.606252925 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/session.h 2010-04-14 11:13:17.321860837 +0800
@@ -15,17 +15,15 @@ struct perf_session {
struct perf_header header;
unsigned long size;
unsigned long mmap_window;
- struct map_groups kmaps;
struct rb_root threads;
struct thread *last_match;
- struct map *vmlinux_maps[MAP__NR_TYPES];
+ struct rb_root kerninfo_root;
struct events_stats events_stats;
struct rb_root stats_by_id;
unsigned long event_total[PERF_RECORD_MAX];
unsigned long unknown_events;
struct rb_root hists;
u64 sample_type;
- struct ref_reloc_sym ref_reloc_sym;
int fd;
int cwdlen;
char *cwd;
@@ -64,33 +62,13 @@ struct map_symbol *perf_session__resolve
bool perf_session__has_traces(struct perf_session *self, const char *msg);
-int perf_header__read_build_ids(struct perf_header *self, int input,
- u64 offset, u64 file_size);
-
-int perf_session__set_kallsyms_ref_reloc_sym(struct perf_session *self,
+int perf_session__set_kallsyms_ref_reloc_sym(struct map ** maps,
const char *symbol_name,
u64 addr);
void mem_bswap_64(void *src, int byte_size);
-static inline int __perf_session__create_kernel_maps(struct perf_session *self,
- struct dso *kernel)
-{
- return __map_groups__create_kernel_maps(&self->kmaps,
- self->vmlinux_maps, kernel);
-}
-
-static inline int perf_session__create_kernel_maps(struct perf_session *self)
-{
- return map_groups__create_kernel_maps(&self->kmaps, self->vmlinux_maps);
-}
-
-static inline struct map *
- perf_session__new_module_map(struct perf_session *self,
- u64 start, const char *filename)
-{
- return map_groups__new_module(&self->kmaps, start, filename);
-}
+int perf_session__create_kernel_maps(struct perf_session *self);
#ifdef NO_NEWT_SUPPORT
static inline int perf_session__browse_hists(struct rb_root *hists __used,
diff -Nraup linux-2.6_tip0413/tools/perf/util/sort.h linux-2.6_tip0413_perfkvm/tools/perf/util/sort.h
--- linux-2.6_tip0413/tools/perf/util/sort.h 2010-04-14 11:11:58.610258472 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/sort.h 2010-04-14 11:13:17.321860837 +0800
@@ -44,6 +44,11 @@ extern enum sort_type sort__first_dimens
struct hist_entry {
struct rb_node rb_node;
u64 count;
+ u64 count_sys;
+ u64 count_us;
+ u64 count_guest_sys;
+ u64 count_guest_us;
+
/*
* XXX WARNING!
* thread _has_ to come after ms, see
diff -Nraup linux-2.6_tip0413/tools/perf/util/symbol.c linux-2.6_tip0413_perfkvm/tools/perf/util/symbol.c
--- linux-2.6_tip0413/tools/perf/util/symbol.c 2010-04-14 11:11:58.614279111 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/symbol.c 2010-04-14 16:51:51.803796961 +0800
@@ -28,6 +28,8 @@ static void dsos__add(struct list_head *
static struct map *map__new2(u64 start, struct dso *dso, enum map_type type);
static int dso__load_kernel_sym(struct dso *self, struct map *map,
symbol_filter_t filter);
+static int dso__load_guest_kernel_sym(struct dso *self, struct map *map,
+ symbol_filter_t filter);
static int vmlinux_path__nr_entries;
static char **vmlinux_path;
@@ -186,6 +188,7 @@ struct dso *dso__new(const char *name)
self->loaded = 0;
self->sorted_by_name = 0;
self->has_build_id = 0;
+ self->kernel = DSO_TYPE_USER;
}
return self;
@@ -402,12 +405,9 @@ int kallsyms__parse(const char *filename
char *symbol_name;
line_len = getline(&line, &n, file);
- if (line_len < 0)
+ if (line_len < 0 || !line)
break;
- if (!line)
- goto out_failure;
-
line[--line_len] = '\0'; /* \n */
len = hex2u64(line, &start);
@@ -459,6 +459,7 @@ static int map__process_kallsym_symbol(v
* map__split_kallsyms, when we have split the maps per module
*/
symbols__insert(root, sym);
+
return 0;
}
@@ -489,6 +490,7 @@ static int dso__split_kallsyms(struct ds
struct rb_root *root = &self->symbols[map->type];
struct rb_node *next = rb_first(root);
int kernel_range = 0;
+ const char *root_dir;
while (next) {
char *module;
@@ -504,15 +506,32 @@ static int dso__split_kallsyms(struct ds
*module++ = '\0';
if (strcmp(curr_map->dso->short_name, module)) {
+ if (curr_map != map &&
+ self->kernel == DSO_TYPE_GUEST_KERNEL &&
+ is_default_guest(kmaps->this_kerninfo)) {
+ /*
+ * We assume all symbols of a module are continuous in
+ * kallsyms, so curr_map points to a module and all its
+ * symbols are in its kmap. Mark it as loaded.
+ */
+ dso__set_loaded(curr_map->dso, curr_map->type);
+ }
+
curr_map = map_groups__find_by_name(kmaps, map->type, module);
if (curr_map == NULL) {
- pr_debug("/proc/{kallsyms,modules} "
+ if (kmaps->this_kerninfo)
+ root_dir = kmaps->this_kerninfo->root_dir;
+ else
+ root_dir = "";
+ pr_debug("%s/proc/{kallsyms,modules} "
"inconsistency while looking "
- "for \"%s\" module!\n", module);
+ "for \"%s\" module!\n",
+ root_dir, module);
return -1;
}
- if (curr_map->dso->loaded)
+ if (curr_map->dso->loaded &&
+ !is_default_guest(kmaps->this_kerninfo))
goto discard_symbol;
}
/*
@@ -525,13 +544,21 @@ static int dso__split_kallsyms(struct ds
char dso_name[PATH_MAX];
struct dso *dso;
- snprintf(dso_name, sizeof(dso_name), "[kernel].%d",
- kernel_range++);
+ if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+ snprintf(dso_name, sizeof(dso_name),
+ "[guest.kernel].%d",
+ kernel_range++);
+ else
+ snprintf(dso_name, sizeof(dso_name),
+ "[kernel].%d",
+ kernel_range++);
dso = dso__new(dso_name);
if (dso == NULL)
return -1;
+ dso->kernel = self->kernel;
+
curr_map = map__new2(pos->start, dso, map->type);
if (curr_map == NULL) {
dso__delete(dso);
@@ -555,6 +582,12 @@ discard_symbol: rb_erase(&pos->rb_node,
}
}
+ if (curr_map != map &&
+ self->kernel == DSO_TYPE_GUEST_KERNEL &&
+ is_default_guest(kmaps->this_kerninfo)) {
+ dso__set_loaded(curr_map->dso, curr_map->type);
+ }
+
return count;
}
@@ -565,7 +598,10 @@ int dso__load_kallsyms(struct dso *self,
return -1;
symbols__fixup_end(&self->symbols[map->type]);
- self->origin = DSO__ORIG_KERNEL;
+ if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+ self->origin = DSO__ORIG_GUEST_KERNEL;
+ else
+ self->origin = DSO__ORIG_KERNEL;
return dso__split_kallsyms(self, map, filter);
}
@@ -952,7 +988,7 @@ static int dso__load_sym(struct dso *sel
nr_syms = shdr.sh_size / shdr.sh_entsize;
memset(&sym, 0, sizeof(sym));
- if (!self->kernel) {
+ if (self->kernel == DSO_TYPE_USER) {
self->adjust_symbols = (ehdr.e_type == ET_EXEC ||
elf_section_by_name(elf, &ehdr, &shdr,
".gnu.prelink_undo",
@@ -984,7 +1020,7 @@ static int dso__load_sym(struct dso *sel
section_name = elf_sec__name(&shdr, secstrs);
- if (self->kernel || kmodule) {
+ if (self->kernel != DSO_TYPE_USER || kmodule) {
char dso_name[PATH_MAX];
if (strcmp(section_name,
@@ -1011,6 +1047,7 @@ static int dso__load_sym(struct dso *sel
curr_dso = dso__new(dso_name);
if (curr_dso == NULL)
goto out_elf_end;
+ curr_dso->kernel = self->kernel;
curr_map = map__new2(start, curr_dso,
map->type);
if (curr_map == NULL) {
@@ -1021,7 +1058,7 @@ static int dso__load_sym(struct dso *sel
curr_map->unmap_ip = identity__map_ip;
curr_dso->origin = self->origin;
map_groups__insert(kmap->kmaps, curr_map);
- dsos__add(&dsos__kernel, curr_dso);
+ dsos__add(&self->node, curr_dso);
dso__set_loaded(curr_dso, map->type);
} else
curr_dso = curr_map->dso;
@@ -1083,7 +1120,7 @@ static bool dso__build_id_equal(const st
return memcmp(self->build_id, build_id, sizeof(self->build_id)) == 0;
}
-static bool __dsos__read_build_ids(struct list_head *head, bool with_hits)
+bool __dsos__read_build_ids(struct list_head *head, bool with_hits)
{
bool have_build_id = false;
struct dso *pos;
@@ -1101,13 +1138,6 @@ static bool __dsos__read_build_ids(struc
return have_build_id;
}
-bool dsos__read_build_ids(bool with_hits)
-{
- bool kbuildids = __dsos__read_build_ids(&dsos__kernel, with_hits),
- ubuildids = __dsos__read_build_ids(&dsos__user, with_hits);
- return kbuildids || ubuildids;
-}
-
/*
* Align offset to 4 bytes as needed for note name and descriptor data.
*/
@@ -1242,6 +1272,8 @@ char dso__symtab_origin(const struct dso
[DSO__ORIG_BUILDID] = 'b',
[DSO__ORIG_DSO] = 'd',
[DSO__ORIG_KMODULE] = 'K',
+ [DSO__ORIG_GUEST_KERNEL] = 'g',
+ [DSO__ORIG_GUEST_KMODULE] = 'G',
};
if (self == NULL || self->origin == DSO__ORIG_NOT_FOUND)
@@ -1257,11 +1289,20 @@ int dso__load(struct dso *self, struct m
char build_id_hex[BUILD_ID_SIZE * 2 + 1];
int ret = -1;
int fd;
+ struct kernel_info *kerninfo;
+ const char *root_dir;
dso__set_loaded(self, map->type);
- if (self->kernel)
+ if (self->kernel == DSO_TYPE_KERNEL)
return dso__load_kernel_sym(self, map, filter);
+ else if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+ return dso__load_guest_kernel_sym(self, map, filter);
+
+ if (map->groups && map->groups->this_kerninfo)
+ kerninfo = map->groups->this_kerninfo;
+ else
+ kerninfo = NULL;
name = malloc(size);
if (!name)
@@ -1315,6 +1356,13 @@ more:
case DSO__ORIG_DSO:
snprintf(name, size, "%s", self->long_name);
break;
+ case DSO__ORIG_GUEST_KMODULE:
+ if (map->groups && map->groups->this_kerninfo)
+ root_dir = map->groups->this_kerninfo->root_dir;
+ else
+ root_dir = "";
+ snprintf(name, size, "%s%s", root_dir, self->long_name);
+ break;
default:
goto out;
@@ -1368,7 +1416,8 @@ struct map *map_groups__find_by_name(str
return NULL;
}
-static int dso__kernel_module_get_build_id(struct dso *self)
+static int dso__kernel_module_get_build_id(struct dso *self,
+ const char * root_dir)
{
char filename[PATH_MAX];
/*
@@ -1378,8 +1427,8 @@ static int dso__kernel_module_get_build_
const char *name = self->short_name + 1;
snprintf(filename, sizeof(filename),
- "/sys/module/%.*s/notes/.note.gnu.build-id",
- (int)strlen(name - 1), name);
+ "%s/sys/module/%.*s/notes/.note.gnu.build-id",
+ root_dir, (int)strlen(name) - 1, name);
if (sysfs__read_build_id(filename, self->build_id,
sizeof(self->build_id)) == 0)
@@ -1388,7 +1437,8 @@ static int dso__kernel_module_get_build_
return 0;
}
-static int map_groups__set_modules_path_dir(struct map_groups *self, char *dir_name)
+static int map_groups__set_modules_path_dir(struct map_groups *self,
+ const char *dir_name)
{
struct dirent *dent;
DIR *dir = opendir(dir_name);
@@ -1400,8 +1450,14 @@ static int map_groups__set_modules_path_
while ((dent = readdir(dir)) != NULL) {
char path[PATH_MAX];
+ struct stat st;
+
+ /*sshfs might return bad dent->d_type, so we have to stat*/
+ sprintf(path, "%s/%s", dir_name, dent->d_name);
+ if (stat(path, &st))
+ continue;
- if (dent->d_type == DT_DIR) {
+ if (S_ISDIR(st.st_mode)) {
if (!strcmp(dent->d_name, ".") ||
!strcmp(dent->d_name, ".."))
continue;
@@ -1433,7 +1489,7 @@ static int map_groups__set_modules_path_
if (long_name == NULL)
goto failure;
dso__set_long_name(map->dso, long_name);
- dso__kernel_module_get_build_id(map->dso);
+ dso__kernel_module_get_build_id(map->dso, "");
}
}
@@ -1443,16 +1499,46 @@ failure:
return -1;
}
-static int map_groups__set_modules_path(struct map_groups *self)
+static char * get_kernel_version(const char * root_dir)
{
- struct utsname uts;
+ char version[PATH_MAX];
+ FILE *file;
+ char *name, *tmp;
+ const char * prefix="Linux version ";
+
+ sprintf(version, "%s/proc/version", root_dir);
+ file = fopen(version, "r");
+ if (!file)
+ return NULL;
+
+ version[0] = '\0';
+ tmp = fgets(version, sizeof(version), file);
+ fclose(file);
+
+ name = strstr(version, prefix);
+ if (!name)
+ return NULL;
+ name += strlen(prefix);
+ tmp = strchr(name, ' ');
+ if (tmp)
+ *tmp = '\0';
+
+ return strdup(name);
+}
+
+static int map_groups__set_modules_path(struct map_groups *self,
+ const char * root_dir)
+{
+ char *version;
char modules_path[PATH_MAX];
- if (uname(&uts) < 0)
+ version = get_kernel_version(root_dir);
+ if (!version)
return -1;
- snprintf(modules_path, sizeof(modules_path), "/lib/modules/%s/kernel",
- uts.release);
+ snprintf(modules_path, sizeof(modules_path), "%s/lib/modules/%s/kernel",
+ root_dir, version);
+ free(version);
return map_groups__set_modules_path_dir(self, modules_path);
}
@@ -1477,11 +1563,13 @@ static struct map *map__new2(u64 start,
}
struct map *map_groups__new_module(struct map_groups *self, u64 start,
- const char *filename)
+ const char *filename,
+ struct kernel_info *kerninfo)
{
struct map *map;
- struct dso *dso = __dsos__findnew(&dsos__kernel, filename);
+ struct dso *dso;
+ dso = __dsos__findnew(&kerninfo->dsos__kernel, filename);
if (dso == NULL)
return NULL;
@@ -1489,21 +1577,37 @@ struct map *map_groups__new_module(struc
if (map == NULL)
return NULL;
- dso->origin = DSO__ORIG_KMODULE;
+ if (is_host_kernel(kerninfo))
+ dso->origin = DSO__ORIG_KMODULE;
+ else
+ dso->origin = DSO__ORIG_GUEST_KMODULE;
map_groups__insert(self, map);
return map;
}
-static int map_groups__create_modules(struct map_groups *self)
+static int map_groups__create_modules(struct kernel_info *kerninfo)
{
char *line = NULL;
size_t n;
- FILE *file = fopen("/proc/modules", "r");
+ FILE *file;
struct map *map;
+ const char * root_dir;
+ const char *modules;
+ char path[PATH_MAX];
+
+ if(is_default_guest(kerninfo))
+ modules = symbol_conf.default_guest_modules;
+ else {
+ sprintf(path, "%s/proc/modules", kerninfo->root_dir);
+ modules = path;
+ }
+ file = fopen(modules, "r");
if (file == NULL)
return -1;
+ root_dir = kerninfo->root_dir;
+
while (!feof(file)) {
char name[PATH_MAX];
u64 start;
@@ -1532,16 +1636,17 @@ static int map_groups__create_modules(st
*sep = '\0';
snprintf(name, sizeof(name), "[%s]", line);
- map = map_groups__new_module(self, start, name);
+ map = map_groups__new_module(&kerninfo->kmaps,
+ start, name, kerninfo);
if (map == NULL)
goto out_delete_line;
- dso__kernel_module_get_build_id(map->dso);
+ dso__kernel_module_get_build_id(map->dso, root_dir);
}
free(line);
fclose(file);
- return map_groups__set_modules_path(self);
+ return map_groups__set_modules_path(&kerninfo->kmaps, root_dir);
out_delete_line:
free(line);
@@ -1708,8 +1813,54 @@ out_fixup:
return err;
}
-LIST_HEAD(dsos__user);
-LIST_HEAD(dsos__kernel);
+static int dso__load_guest_kernel_sym(struct dso *self, struct map *map,
+ symbol_filter_t filter)
+{
+ int err;
+ const char *kallsyms_filename = NULL;
+ struct kernel_info *kerninfo;
+ char path[PATH_MAX];
+
+ if (!map->groups) {
+ pr_debug("Guest kernel map hasn't the point to groups\n");
+ return -1;
+ }
+ kerninfo = map->groups->this_kerninfo;
+
+ if (is_default_guest(kerninfo)) {
+ /*
+ * if the user specified a vmlinux filename, use it and only
+ * it, reporting errors to the user if it cannot be used.
+ * Or use file guest_kallsyms inputted by user on commandline
+ */
+ if (symbol_conf.default_guest_vmlinux_name != NULL) {
+ err = dso__load_vmlinux(self, map,
+ symbol_conf.default_guest_vmlinux_name, filter);
+ goto out_try_fixup;
+ }
+
+ kallsyms_filename = symbol_conf.default_guest_kallsyms;
+ if (!kallsyms_filename)
+ return -1;
+ } else {
+ sprintf(path, "%s/proc/kallsyms", kerninfo->root_dir);
+ kallsyms_filename = path;
+ }
+
+ err = dso__load_kallsyms(self, kallsyms_filename, map, filter);
+ if (err > 0)
+ pr_debug("Using %s for symbols\n", kallsyms_filename);
+
+out_try_fixup:
+ if (err > 0) {
+ if (kallsyms_filename != NULL)
+ dso__set_long_name(self, strdup("[guest.kernel.kallsyms]"));
+ map__fixup_start(map);
+ map__fixup_end(map);
+ }
+
+ return err;
+}
static void dsos__add(struct list_head *head, struct dso *dso)
{
@@ -1752,10 +1903,16 @@ static void __dsos__fprintf(struct list_
}
}
-void dsos__fprintf(FILE *fp)
+void dsos__fprintf(struct rb_root *kerninfo_root, FILE *fp)
{
- __dsos__fprintf(&dsos__kernel, fp);
- __dsos__fprintf(&dsos__user, fp);
+ struct rb_node *nd;
+
+ for (nd = rb_first(kerninfo_root); nd; nd = rb_next(nd)) {
+ struct kernel_info *pos = rb_entry(nd, struct kernel_info,
+ rb_node);
+ __dsos__fprintf(&pos->dsos__kernel, fp);
+ __dsos__fprintf(&pos->dsos__user, fp);
+ }
}
static size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp,
@@ -1773,10 +1930,21 @@ static size_t __dsos__fprintf_buildid(st
return ret;
}
-size_t dsos__fprintf_buildid(FILE *fp, bool with_hits)
+size_t dsos__fprintf_buildid(struct rb_root *kerninfo_root,
+ FILE *fp, bool with_hits)
{
- return (__dsos__fprintf_buildid(&dsos__kernel, fp, with_hits) +
- __dsos__fprintf_buildid(&dsos__user, fp, with_hits));
+ struct rb_node *nd;
+ size_t ret = 0;
+
+ for (nd = rb_first(kerninfo_root); nd; nd = rb_next(nd)) {
+ struct kernel_info *pos = rb_entry(nd, struct kernel_info,
+ rb_node);
+ ret += __dsos__fprintf_buildid(&pos->dsos__kernel,
+ fp, with_hits);
+ ret += __dsos__fprintf_buildid(&pos->dsos__user,
+ fp, with_hits);
+ }
+ return ret;
}
struct dso *dso__new_kernel(const char *name)
@@ -1785,28 +1953,55 @@ struct dso *dso__new_kernel(const char *
if (self != NULL) {
dso__set_short_name(self, "[kernel]");
- self->kernel = 1;
+ self->kernel = DSO_TYPE_KERNEL;
+ }
+
+ return self;
+}
+
+struct dso *dso__new_guest_kernel(const char *name)
+{
+ struct dso *self = dso__new(name ?: "[guest.kernel.kallsyms]");
+
+ if (self != NULL) {
+ dso__set_short_name(self, "[guest.kernel]");
+ self->kernel = DSO_TYPE_GUEST_KERNEL;
}
return self;
}
-void dso__read_running_kernel_build_id(struct dso *self)
+void dso__read_running_kernel_build_id(struct dso *self,
+ struct kernel_info *kerninfo)
{
- if (sysfs__read_build_id("/sys/kernel/notes", self->build_id,
+ char path[PATH_MAX];
+
+ if (is_default_guest(kerninfo))
+ return;
+ sprintf(path, "%s/sys/kernel/notes", kerninfo->root_dir);
+ if (sysfs__read_build_id(path, self->build_id,
sizeof(self->build_id)) == 0)
self->has_build_id = true;
}
-static struct dso *dsos__create_kernel(const char *vmlinux)
+static struct dso *dsos__create_kernel(struct kernel_info *kerninfo)
{
- struct dso *kernel = dso__new_kernel(vmlinux);
+ const char * vmlinux_name = NULL;
+ struct dso *kernel;
- if (kernel != NULL) {
- dso__read_running_kernel_build_id(kernel);
- dsos__add(&dsos__kernel, kernel);
+ if (is_host_kernel(kerninfo)) {
+ vmlinux_name = symbol_conf.vmlinux_name;
+ kernel = dso__new_kernel(vmlinux_name);
+ } else {
+ if (is_default_guest(kerninfo))
+ vmlinux_name = symbol_conf.default_guest_vmlinux_name;
+ kernel = dso__new_guest_kernel(vmlinux_name);
}
+ if (kernel != NULL) {
+ dso__read_running_kernel_build_id(kernel, kerninfo);
+ dsos__add(&kerninfo->dsos__kernel, kernel);
+ }
return kernel;
}
@@ -1950,23 +2145,29 @@ out_free_comm_list:
return -1;
}
-int map_groups__create_kernel_maps(struct map_groups *self,
- struct map *vmlinux_maps[MAP__NR_TYPES])
+int map_groups__create_kernel_maps(struct rb_root *kerninfo_root, pid_t pid)
{
- struct dso *kernel = dsos__create_kernel(symbol_conf.vmlinux_name);
+ struct kernel_info *kerninfo;
+ struct dso *kernel;
+ kerninfo = kerninfo__findnew(kerninfo_root, pid);
+ if (kerninfo == NULL)
+ return -1;
+ kernel = dsos__create_kernel(kerninfo);
if (kernel == NULL)
return -1;
- if (__map_groups__create_kernel_maps(self, vmlinux_maps, kernel) < 0)
+ if (__map_groups__create_kernel_maps(&kerninfo->kmaps,
+ kerninfo->vmlinux_maps, kernel) < 0)
return -1;
- if (symbol_conf.use_modules && map_groups__create_modules(self) < 0)
+ if (symbol_conf.use_modules &&
+ map_groups__create_modules(kerninfo) < 0)
pr_debug("Problems creating module maps, continuing anyway...\n");
/*
* Now that we have all the maps created, just set the ->end of them:
*/
- map_groups__fixup_end(self);
+ map_groups__fixup_end(&kerninfo->kmaps);
return 0;
}
@@ -2012,3 +2213,47 @@ char *strxfrchar(char *s, char from, cha
return s;
}
+
+int map_groups__create_guest_kernel_maps(struct rb_root *kerninfo_root)
+{
+ int ret = 0;
+ struct dirent **namelist = NULL;
+ int i, items = 0;
+ char path[PATH_MAX];
+ pid_t pid;
+
+ if (symbol_conf.default_guest_vmlinux_name ||
+ symbol_conf.default_guest_modules ||
+ symbol_conf.default_guest_kallsyms) {
+ map_groups__create_kernel_maps(kerninfo_root,
+ DEFAULT_GUEST_KERNEL_ID);
+ }
+
+ if (symbol_conf.guestmount) {
+ items = scandir(symbol_conf.guestmount, &namelist, NULL, NULL);
+ if (items <= 0)
+ return -ENOENT;
+ for (i = 0; i < items; i++) {
+ if (!isdigit(namelist[i]->d_name[0])) {
+ /* Filter out . and .. */
+ continue;
+ }
+ pid = atoi(namelist[i]->d_name);
+ sprintf(path, "%s/%s/proc/kallsyms",
+ symbol_conf.guestmount,
+ namelist[i]->d_name);
+ ret = access(path, R_OK);
+ if (ret) {
+ pr_debug("Can't access file %s\n", path);
+ goto failure;
+ }
+ map_groups__create_kernel_maps(kerninfo_root,
+ pid);
+ }
+failure:
+ free(namelist);
+ }
+
+ return ret;
+}
+
diff -Nraup linux-2.6_tip0413/tools/perf/util/symbol.h linux-2.6_tip0413_perfkvm/tools/perf/util/symbol.h
--- linux-2.6_tip0413/tools/perf/util/symbol.h 2010-04-14 11:11:58.766255670 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/symbol.h 2010-04-14 11:13:17.321860837 +0800
@@ -69,10 +69,15 @@ struct symbol_conf {
show_nr_samples,
use_callchain,
exclude_other,
- full_paths;
+ full_paths,
+ show_cpu_utilization;
const char *vmlinux_name,
*field_sep;
- char *dso_list_str,
+ const char *default_guest_vmlinux_name,
+ *default_guest_kallsyms,
+ *default_guest_modules;
+ const char *guestmount;
+ char *dso_list_str,
*comm_list_str,
*sym_list_str,
*col_width_list_str;
@@ -106,6 +111,13 @@ struct addr_location {
u64 addr;
char level;
bool filtered;
+ unsigned int cpumode;
+};
+
+enum dso_kernel_type {
+ DSO_TYPE_USER = 0,
+ DSO_TYPE_KERNEL,
+ DSO_TYPE_GUEST_KERNEL
};
struct dso {
@@ -115,7 +127,7 @@ struct dso {
u8 adjust_symbols:1;
u8 slen_calculated:1;
u8 has_build_id:1;
- u8 kernel:1;
+ enum dso_kernel_type kernel;
u8 hit:1;
u8 annotate_warned:1;
unsigned char origin;
@@ -131,6 +143,7 @@ struct dso {
struct dso *dso__new(const char *name);
struct dso *dso__new_kernel(const char *name);
+struct dso *dso__new_guest_kernel(const char *name);
void dso__delete(struct dso *self);
bool dso__loaded(const struct dso *self, enum map_type type);
@@ -143,34 +156,30 @@ static inline void dso__set_loaded(struc
void dso__sort_by_name(struct dso *self, enum map_type type);
-extern struct list_head dsos__user, dsos__kernel;
-
struct dso *__dsos__findnew(struct list_head *head, const char *name);
-static inline struct dso *dsos__findnew(const char *name)
-{
- return __dsos__findnew(&dsos__user, name);
-}
-
int dso__load(struct dso *self, struct map *map, symbol_filter_t filter);
int dso__load_vmlinux_path(struct dso *self, struct map *map,
symbol_filter_t filter);
int dso__load_kallsyms(struct dso *self, const char *filename, struct map *map,
symbol_filter_t filter);
-void dsos__fprintf(FILE *fp);
-size_t dsos__fprintf_buildid(FILE *fp, bool with_hits);
+void dsos__fprintf(struct rb_root *kerninfo_root, FILE *fp);
+size_t dsos__fprintf_buildid(struct rb_root *kerninfo_root,
+ FILE *fp, bool with_hits);
size_t dso__fprintf_buildid(struct dso *self, FILE *fp);
size_t dso__fprintf(struct dso *self, enum map_type type, FILE *fp);
enum dso_origin {
DSO__ORIG_KERNEL = 0,
+ DSO__ORIG_GUEST_KERNEL,
DSO__ORIG_JAVA_JIT,
DSO__ORIG_BUILD_ID_CACHE,
DSO__ORIG_FEDORA,
DSO__ORIG_UBUNTU,
DSO__ORIG_BUILDID,
DSO__ORIG_DSO,
+ DSO__ORIG_GUEST_KMODULE,
DSO__ORIG_KMODULE,
DSO__ORIG_NOT_FOUND,
};
@@ -178,19 +187,26 @@ enum dso_origin {
char dso__symtab_origin(const struct dso *self);
void dso__set_long_name(struct dso *self, char *name);
void dso__set_build_id(struct dso *self, void *build_id);
-void dso__read_running_kernel_build_id(struct dso *self);
+void dso__read_running_kernel_build_id(struct dso *self,
+ struct kernel_info *kerninfo);
struct symbol *dso__find_symbol(struct dso *self, enum map_type type, u64 addr);
struct symbol *dso__find_symbol_by_name(struct dso *self, enum map_type type,
const char *name);
int filename__read_build_id(const char *filename, void *bf, size_t size);
int sysfs__read_build_id(const char *filename, void *bf, size_t size);
-bool dsos__read_build_ids(bool with_hits);
+bool __dsos__read_build_ids(struct list_head *head, bool with_hits);
int build_id__sprintf(const u8 *self, int len, char *bf);
int kallsyms__parse(const char *filename, void *arg,
int (*process_symbol)(void *arg, const char *name,
char type, u64 start));
+int __map_groups__create_kernel_maps(struct map_groups *self,
+ struct map *vmlinux_maps[MAP__NR_TYPES],
+ struct dso *kernel);
+int map_groups__create_kernel_maps(struct rb_root *kerninfo_root, pid_t pid);
+int map_groups__create_guest_kernel_maps(struct rb_root *kerninfo_root);
+
int symbol__init(void);
bool symbol_type__is_a(char symbol_type, enum map_type map_type);
diff -Nraup linux-2.6_tip0413/tools/perf/util/thread.h linux-2.6_tip0413_perfkvm/tools/perf/util/thread.h
--- linux-2.6_tip0413/tools/perf/util/thread.h 2010-04-14 11:11:58.594236160 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/thread.h 2010-04-14 11:13:17.321860837 +0800
@@ -33,12 +33,12 @@ static inline struct map *thread__find_m
void thread__find_addr_map(struct thread *self,
struct perf_session *session, u8 cpumode,
- enum map_type type, u64 addr,
+ enum map_type type, pid_t pid, u64 addr,
struct addr_location *al);
void thread__find_addr_location(struct thread *self,
struct perf_session *session, u8 cpumode,
- enum map_type type, u64 addr,
+ enum map_type type, pid_t pid, u64 addr,
struct addr_location *al,
symbol_filter_t filter);
#endif /* __PERF_THREAD_H */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Thanks for persisting despite the flames.
Can you please separate arch/x86/kvm part of the patch? That will make
for easier reviewing, and will need to go through separate trees.
Sheng, did you make any progress with the NMI injection issue?
> +
> diff -Nraup linux-2.6_tip0413/arch/x86/kvm/x86.c linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.c
> --- linux-2.6_tip0413/arch/x86/kvm/x86.c 2010-04-14 11:11:04.341042024 +0800
> +++ linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.c 2010-04-14 11:32:45.841278890 +0800
> @@ -3765,6 +3765,35 @@ static void kvm_timer_init(void)
> }
> }
>
> +static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
> +
> +static int kvm_is_in_guest(void)
> +{
> + return percpu_read(current_vcpu) != NULL;
>
An even more accurate way to determine this is to check whether the
interrupt frame points back at the 'int $2' instruction. However we
plan to switch to a self-IPI method to inject the NMI, and I'm not sure
wether APIC NMIs are accepted on an instruction boundary or whether
there's some latency involved.
> +static unsigned long kvm_get_guest_ip(void)
> +{
> + unsigned long ip = 0;
> + if (percpu_read(current_vcpu))
> + ip = kvm_rip_read(percpu_read(current_vcpu));
> + return ip;
> +}
>
This may be racy. kvm_rip_read() accesses a cache in memory; if we're
in the process of updating the cache, then we may read a stale value.
See below.
>
> trace_kvm_entry(vcpu->vcpu_id);
> +
> + percpu_write(current_vcpu, vcpu);
> kvm_x86_ops->run(vcpu);
> + percpu_write(current_vcpu, NULL);
>
If you move this around the 'int $2' instructions you will close the
race, as a stray NMI won't catch us updating the rip cache. But that
depends on whether self-IPI is accepted on the next instruction or not.
--
error compiling committee.c: too many arguments to function
Yes, though some other works interrupt me lately...
The very first version has issue due to SELF_IPI mode can't be used to send
NMI according to SDM. That's the reason why x2apic don't have way to do this.
But later I found another issue of fail to inspect inside the guest. I think
it's due to NMI is asynchronous event, though it should be triggered very
quickly, you can't guarantee that the handler would be triggered before the
state(current_vcpu) is cleared with current code.
Maybe just extended the "guest state" region would be fine, if the latency is
stable enough(though I think it maybe platform depended). I am working on this
now.
--
regards
Yang, Sheng
Yes, I see that now. Looks like others have the same questions...
> But later I found another issue of fail to inspect inside the guest. I think
> it's due to NMI is asynchronous event, though it should be triggered very
> quickly, you can't guarantee that the handler would be triggered before the
> state(current_vcpu) is cleared with current code.
>
> Maybe just extended the "guest state" region would be fine, if the latency is
> stable enough(though I think it maybe platform depended). I am working on this
> now.
>
I wouldn't like to depend on model specific behaviour.
One option is to read all the information synchronously and store it in
a per-cpu area with atomic instructions, then queue the NMI. Another
option is to have another callback which tells us that the NMI is done,
and have a busy loop wait until the NMI is delivered.
--
error compiling committee.c: too many arguments to function
--
But I am still curious if we extend the region, how much it would help. Would
get a result soon...
--
regards
Yang, Sheng
The patch we're replying to adds callbacks (to read rip, etc.), so it's
no big deal. For the queue solution, a queue of size one would probably
be sufficient even if not guaranteed by the spec. I don't see how the
cpu can do another guest entry without delivering the NMI.
> But I am still curious if we extend the region, how much it would help. Would
> get a result soon...
>
Yes, interesting to see what the latency is. If it's reasonably short
(and I expect it will be so), we can do the busy wait solution.
If we have an NMI counter somewhere, we can simply wait until it changes.
--
error compiling committee.c: too many arguments to function
--
--
regards
Yang, Sheng
Okay, but kvm doesn't want to know about it. How about a new arch
function, invoke_nmi_sync(), that will trigger the NMI and wait for it?
--
error compiling committee.c: too many arguments to function
--
--
regards
Yang, Sheng
> On 04/14/2030 12:05 PM, Zhang, Yanmin wrote:
> >Here is the new patch of V3 against tip/master of April 13th
> >if anyone wants to try it.
> >
>
> Thanks for persisting despite the flames.
>
> Can you please separate arch/x86/kvm part of the patch? That will make for
> easier reviewing, and will need to go through separate trees.
Once it gets into a state that it can be applied could you please create a
separate, -git based branch for it, so that i can pull it for testing and
integration with the tools/perf/ bits?
Assuming there are no serious conflicts with pending KVM work.
(or i can do that too)
Thanks,
Ingo
Sure.
> Assuming there are no serious conflicts with pending KVM work.
>
There will be a conflict with the NMI fix (which has to go in first,
we'll want to backport it), I'll put it on the same branch.
--
error compiling committee.c: too many arguments to function
--
>
> Sheng, did you make any progress with the NMI injection issue?
>
> > +
> > diff -Nraup linux-2.6_tip0413/arch/x86/kvm/x86.c linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.c
> > --- linux-2.6_tip0413/arch/x86/kvm/x86.c 2010-04-14 11:11:04.341042024 +0800
> > +++ linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.c 2010-04-14 11:32:45.841278890 +0800
> > @@ -3765,6 +3765,35 @@ static void kvm_timer_init(void)
> > }
> > }
> >
> > +static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
> > +
> > +static int kvm_is_in_guest(void)
> > +{
> > + return percpu_read(current_vcpu) != NULL;
> >
>
> An even more accurate way to determine this is to check whether the
> interrupt frame points back at the 'int $2' instruction. However we
> plan to switch to a self-IPI method to inject the NMI, and I'm not sure
> wether APIC NMIs are accepted on an instruction boundary or whether
> there's some latency involved.
Yes. But the frame pointer checking seems a little complicated.
>
> > +static unsigned long kvm_get_guest_ip(void)
> > +{
> > + unsigned long ip = 0;
> > + if (percpu_read(current_vcpu))
> > + ip = kvm_rip_read(percpu_read(current_vcpu));
> > + return ip;
> > +}
> >
>
> This may be racy. kvm_rip_read() accesses a cache in memory; if we're
> in the process of updating the cache, then we may read a stale value.
> See below.
Right. The racy window seems too big.
>
> >
> > trace_kvm_entry(vcpu->vcpu_id);
> > +
> > + percpu_write(current_vcpu, vcpu);
> > kvm_x86_ops->run(vcpu);
> > + percpu_write(current_vcpu, NULL);
> >
>
> If you move this around the 'int $2' instructions you will close the
> race, as a stray NMI won't catch us updating the rip cache. But that
> depends on whether self-IPI is accepted on the next instruction or not.
Right. The kernel part has dependency on the self-IPI implementation.
I will move above percpu_write(current_vcpu, vcpu) (or a new wrapper function)
just around 'int $2'.
Sheng would find a solution on the self-IPI delivery. Let's separate my patch
and self-IPI as 2 issues as we don't know when the self-IPI delivery would be
resolved.
Thanks,
Yanmin
An even bigger disadvantage is that it won't work with Sheng's patch,
self-NMIs are not synchronous.
>>> trace_kvm_entry(vcpu->vcpu_id);
>>> +
>>> + percpu_write(current_vcpu, vcpu);
>>> kvm_x86_ops->run(vcpu);
>>> + percpu_write(current_vcpu, NULL);
>>>
>>>
>> If you move this around the 'int $2' instructions you will close the
>> race, as a stray NMI won't catch us updating the rip cache. But that
>> depends on whether self-IPI is accepted on the next instruction or not.
>>
> Right. The kernel part has dependency on the self-IPI implementation.
> I will move above percpu_write(current_vcpu, vcpu) (or a new wrapper function)
> just around 'int $2'.
>
>
Or create a new function to inject the interrupt in x86.c. That will
reduce duplication between svm.c and vmx.c.
> Sheng would find a solution on the self-IPI delivery. Let's separate my patch
> and self-IPI as 2 issues as we don't know when the self-IPI delivery would be
> resolved.
>
Sure.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
So, we'd need something like the following:
if (exit == NMI)
__get_cpu_var(nmi_vcpu) = vcpu;
stgi();
if (exit == NMI) {
while (!nmi_handled())
cpu_relax();
__get_cpu_var(nmi_vcpu) = NULL;
}
and no code sharing betweem vmx and svm.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
> I checked svm.c and it seems svm.c doesn't trigger a NMI to host if the NMI
> happens in guest os. In addition, svm_complete_interrupts is called after
> interrupt is enabled.
Yes. The NMI is held pending by the hardware until the STGI instruction
is executed.
And for nested svm the svm_complete_interrupts function needs to be
executed after the nested exit handling. Therefore it is done late on
svm.
Joerg
Hmm, looks a bit complicated to me. The NMI should happen shortly after
the stgi instruction. Interrupts are still disabled so we stay on this
cpu. Can't we just set and erase the cpu_var at vcpu_load/vcpu_put time?
Joerg
That means an NMI that happens outside guest code (for example, in the
mmu, or during the exit itself) would be counted as if in guest code.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
Hmm, true. The same is true for an NMI that happens between VMSAVE and
STGI but that window is smaller. Anyway, I think we don't need the
busy-wait loop. The NMI should be executed at a well defined point and
we set the cpu_var back to NULL after that point.
Joerg
The point is not well defined. Considering there are already at least
two implementations svm, I don't want to rely on implementation details.
We could tune the position of the loop so that zero iterations are
executed on the implementations we know about.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
--
After more investigating, I realized that I had interpreted the SDM wrong.
Sorry.
There is *no* risk with the original method of calling "int $2".
According to the SDM 24.1:
> The following bullets detail when architectural state is and is not updated
in response to VM exits:
[...]
> - An NMI causes subsequent NMIs to be blocked, but only after the VM exit
completes.
So the truth is, after NMI directly caused VMExit, the following NMIs would be
blocked, until encountered next "iret". So execute "int $2" is safe in
vmx_complete_interrupts(), no risk in causing nested NMI. And it would unblock
the following NMIs as well due to "iret" it executed.
So there is unnecessary to make change to avoid "potential nested NMI".
Sorry for the mistake and caused confusing.
--
regards
Yang, Sheng
>
> We could tune the position of the loop so that zero iterations are
> executed on the implementations we know about.
>
--
ChangeLog V4:
1) Based on Ingo's comments, I added help information around kvm
such like command-list.txt and perf-kvm.txt.
2) Added guest process id at the tail of kernel dso long name, so
the display could show different label with different guest os.
3) Based on Avi's comments, erase the racy window which might
trigger an NMI while the NMI isn't in guest os.
4) Fixed all the errors and warnings reported by scripts/checkpatch.pl.
5) Fixed a compilation error pointed by Yang Sheng.
From: Zhang, Yanmin <yanmin...@linux.intel.com>
---------------------------------------------------------------------------------------------------------------------------------------
PerfTop: 16024 irqs/sec kernel: 2.6% us: 0.6% guest kernel:76.2% guest us:20.6% exact: 0.0% [1000Hz cycles], (all, 16 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ________________________ _______________________
3740.00 8.0% __ticket_spin_lock [guest.kernel.kallsyms]
2056.00 4.4% copy_user_generic_string [guest.kernel.kallsyms]
1412.00 3.0% resource_string [guest.kernel.kallsyms]
595.00 1.3% __switch_to [guest.kernel.kallsyms]
586.00 1.2% __d_lookup [guest.kernel.kallsyms]
574.00 1.2% tcp_sendmsg [guest.kernel.kallsyms]
565.00 1.2% kmem_cache_alloc [guest.kernel.kallsyms]
532.00 1.1% tcp_ack [guest.kernel.kallsyms]
494.00 1.1% __kmalloc [guest.kernel.kallsyms]
468.00 1.0% print_cfs_rq [guest.kernel.kallsyms]
437.00 0.9% link_path_walk [guest.kernel.kallsyms]
380.00 0.8% balance_runtime [guest.kernel.kallsyms]
379.00 0.8% kmem_cache_free [guest.kernel.kallsyms]
377.00 0.8% in_gate_area_no_task [guest.kernel.kallsyms]
374.00 0.8% get_page_from_freelist [guest.kernel.kallsyms]
372.00 0.8% mark_files_ro [guest.kernel.kallsyms]
368.00 0.8% _atomic_dec_and_lock [guest.kernel.kallsyms]
356.00 0.8% crc16 [crc16]
353.00 0.8% put_page [guest.kernel.kallsyms]
PerfTop: 16014 irqs/sec kernel: 1.8% us: 0.0% guest kernel:75.5% guest us:22.7% exact: 0.0% [1000Hz cycles], (all, 16 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------
samples pcnt function DSO
_______ _____ ________________________ ________________________________________________________________
16583.00 9.3% __ticket_spin_lock [guest.kernel.kallsyms.3067]
7178.00 4.0% copy_user_generic_string [guest.kernel.kallsyms.3067]
4637.00 2.6% copy_user_generic_string [guest.kernel.kallsyms.3187]
2495.00 1.4% schedule [guest.kernel.kallsyms.3187]
2322.00 1.3% tcp_sendmsg [guest.kernel.kallsyms.3187]
2255.00 1.3% __d_lookup [guest.kernel.kallsyms.3067]
1892.00 1.1% __switch_to [guest.kernel.kallsyms.3187]
1884.00 1.1% kmem_cache_alloc [guest.kernel.kallsyms.3067]
1809.00 1.0% tcp_ack [guest.kernel.kallsyms.3187]
1733.00 1.0% _atomic_dec_and_lock [guest.kernel.kallsyms.3067]
1707.00 1.0% tcp_transmit_skb [guest.kernel.kallsyms.3187]
1612.00 0.9% tcp_recvmsg [guest.kernel.kallsyms.3187]
1546.00 0.9% __kmalloc [guest.kernel.kallsyms.3067]
1538.00 0.9% __ticket_spin_lock [guest.kernel.kallsyms.3187]
1467.00 0.8% link_path_walk [guest.kernel.kallsyms.3067]
1403.00 0.8% path_get [guest.kernel.kallsyms.3067]
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
Joerg,
Would you like to add support on svm? I don't know the exact point to trigger
NMI to host with svm.
See below code with vmx:
+ kvm_before_handle_nmi(&vmx->vcpu);
asm("int $2");
+ kvm_after_handle_nmi(&vmx->vcpu);
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
---
diff -Nraup --exclude=tools linux-2.6_tip0413/arch/x86/include/asm/perf_event.h linux-2.6_tip0413_perfkvm/arch/x86/include/asm/perf_event.h
--- linux-2.6_tip0413/arch/x86/include/asm/perf_event.h 2010-04-14 11:11:03.992966568 +0800
+++ linux-2.6_tip0413_perfkvm/arch/x86/include/asm/perf_event.h 2010-04-14 11:13:17.261881591 +0800
@@ -135,17 +135,10 @@ extern void perf_events_lapic_init(void)
*/
#define PERF_EFLAGS_EXACT (1UL << 3)
-#define perf_misc_flags(regs) \
-({ int misc = 0; \
- if (user_mode(regs)) \
- misc |= PERF_RECORD_MISC_USER; \
- else \
- misc |= PERF_RECORD_MISC_KERNEL; \
- if (regs->flags & PERF_EFLAGS_EXACT) \
- misc |= PERF_RECORD_MISC_EXACT; \
- misc; })
-
-#define perf_instruction_pointer(regs) ((regs)->ip)
+struct pt_regs;
+extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
+extern unsigned long perf_misc_flags(struct pt_regs *regs);
+#define perf_misc_flags(regs) perf_misc_flags(regs)
#else
static inline void init_hw_perf_events(void) { }
diff -Nraup --exclude=tools linux-2.6_tip0413/arch/x86/kernel/cpu/perf_event.c linux-2.6_tip0413_perfkvm/arch/x86/kernel/cpu/perf_event.c
--- linux-2.6_tip0413/arch/x86/kernel/cpu/perf_event.c 2010-04-14 11:11:04.825028810 +0800
+++ linux-2.6_tip0413_perfkvm/arch/x86/kernel/cpu/perf_event.c 2010-04-14 17:02:12.198063684 +0800
@@ -1720,6 +1720,11 @@ struct perf_callchain_entry *perf_callch
{
struct perf_callchain_entry *entry;
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+ /* TODO: We don't support guest os callchain now */
+ return NULL;
+ }
+
if (in_nmi())
entry = &__get_cpu_var(pmc_nmi_entry);
else
@@ -1743,3 +1748,30 @@ void perf_arch_fetch_caller_regs(struct
regs->cs = __KERNEL_CS;
local_save_flags(regs->flags);
}
+
+unsigned long perf_instruction_pointer(struct pt_regs *regs)
+{
+ unsigned long ip;
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
+ ip = perf_guest_cbs->get_guest_ip();
+ else
+ ip = instruction_pointer(regs);
+ return ip;
+}
+
+unsigned long perf_misc_flags(struct pt_regs *regs)
+{
+ int misc = 0;
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+ misc |= perf_guest_cbs->is_user_mode() ?
+ PERF_RECORD_MISC_GUEST_USER :
+ PERF_RECORD_MISC_GUEST_KERNEL;
+ } else
+ misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
+ PERF_RECORD_MISC_KERNEL;
+ if (regs->flags & PERF_EFLAGS_EXACT)
+ misc |= PERF_RECORD_MISC_EXACT;
+
+ return misc;
+}
+
diff -Nraup --exclude=tools linux-2.6_tip0413/arch/x86/kvm/vmx.c linux-2.6_tip0413_perfkvm/arch/x86/kvm/vmx.c
--- linux-2.6_tip0413/arch/x86/kvm/vmx.c 2010-04-14 11:11:04.353024541 +0800
+++ linux-2.6_tip0413_perfkvm/arch/x86/kvm/vmx.c 2010-04-15 10:28:39.516891050 +0800
@@ -3654,8 +3654,11 @@ static void vmx_complete_interrupts(stru
/* We need to handle NMIs before interrupts are enabled */
if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
- (exit_intr_info & INTR_INFO_VALID_MASK))
+ (exit_intr_info & INTR_INFO_VALID_MASK)) {
+ kvm_before_handle_nmi(&vmx->vcpu);
asm("int $2");
+ kvm_after_handle_nmi(&vmx->vcpu);
+ }
idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
diff -Nraup --exclude=tools linux-2.6_tip0413/arch/x86/kvm/x86.c linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.c
--- linux-2.6_tip0413/arch/x86/kvm/x86.c 2010-04-14 11:11:04.341042024 +0800
+++ linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.c 2010-04-15 17:16:41.340064784 +0800
@@ -40,6 +40,7 @@
#include <linux/user-return-notifier.h>
#include <linux/srcu.h>
#include <linux/slab.h>
+#include <linux/perf_event.h>
#include <trace/events/kvm.h>
#undef TRACE_INCLUDE_FILE
#define CREATE_TRACE_POINTS
@@ -3765,6 +3766,47 @@ static void kvm_timer_init(void)
}
}
+static DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
+
+static int kvm_is_in_guest(void)
+{
+ return percpu_read(current_vcpu) != NULL;
+}
+
+static int kvm_is_user_mode(void)
+{
+ int user_mode = 3;
+ if (percpu_read(current_vcpu))
+ user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu));
+ return user_mode != 0;
+}
+
+static unsigned long kvm_get_guest_ip(void)
+{
+ unsigned long ip = 0;
+ if (percpu_read(current_vcpu))
+ ip = kvm_rip_read(percpu_read(current_vcpu));
+ return ip;
+}
+
+static struct perf_guest_info_callbacks kvm_guest_cbs = {
+ .is_in_guest = kvm_is_in_guest,
+ .is_user_mode = kvm_is_user_mode,
+ .get_guest_ip = kvm_get_guest_ip,
+};
+
+void kvm_before_handle_nmi(struct kvm_vcpu *vcpu)
+{
+ percpu_write(current_vcpu, vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_before_handle_nmi);
+
+void kvm_after_handle_nmi(struct kvm_vcpu *vcpu)
+{
+ percpu_write(current_vcpu, NULL);
+}
+EXPORT_SYMBOL_GPL(kvm_after_handle_nmi);
+
int kvm_arch_init(void *opaque)
{
int r;
@@ -3801,6 +3843,8 @@ int kvm_arch_init(void *opaque)
kvm_timer_init();
+ perf_register_guest_info_callbacks(&kvm_guest_cbs);
+
return 0;
out:
@@ -3809,6 +3853,8 @@ out:
void kvm_arch_exit(void)
{
+ perf_unregister_guest_info_callbacks(&kvm_guest_cbs);
+
if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
cpufreq_unregister_notifier(&kvmclock_cpufreq_notifier_block,
CPUFREQ_TRANSITION_NOTIFIER);
diff -Nraup --exclude=tools linux-2.6_tip0413/arch/x86/kvm/x86.h linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.h
--- linux-2.6_tip0413/arch/x86/kvm/x86.h 2010-04-14 11:11:04.328996790 +0800
+++ linux-2.6_tip0413_perfkvm/arch/x86/kvm/x86.h 2010-04-15 10:27:57.116972433 +0800
@@ -65,4 +65,7 @@ static inline int is_paging(struct kvm_v
return kvm_read_cr0_bits(vcpu, X86_CR0_PG);
}
+void kvm_before_handle_nmi(struct kvm_vcpu *vcpu);
+void kvm_after_handle_nmi(struct kvm_vcpu *vcpu);
+
#endif
diff -Nraup --exclude=tools linux-2.6_tip0413/include/linux/perf_event.h linux-2.6_tip0413_perfkvm/include/linux/perf_event.h
diff -Nraup --exclude=tools linux-2.6_tip0413/kernel/perf_event.c linux-2.6_tip0413_perfkvm/kernel/perf_event.c
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
---
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-annotate.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-annotate.c
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-kmem.c 2010-04-15 17:53:49.998951264 +0800
@@ -351,6 +351,7 @@ static void __print_result(struct rb_roo
int n_lines, int is_caller)
{
struct rb_node *next;
+ struct kernel_info *kerninfo;
printf("%.102s\n", graph_dotted_line);
printf(" %-34s |", is_caller ? "Callsite": "Alloc Ptr");
@@ -359,10 +360,16 @@ static void __print_result(struct rb_roo
next = rb_first(root);
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
+ if (!kerninfo) {
+ pr_err("__print_result: couldn't find kernel information\n");
+ return;
+ }
while (next && n_lines--) {
struct alloc_stat *data = rb_entry(next, struct alloc_stat,
node);
struct symbol *sym = NULL;
+ struct map_groups *kmaps = &kerninfo->kmaps;
struct map *map;
char buf[BUFSIZ];
u64 addr;
@@ -370,8 +377,8 @@ static void __print_result(struct rb_roo
if (is_caller) {
addr = data->call_site;
if (!raw_ip)
- sym = map_groups__find_function(&session->kmaps,
- addr, &map, NULL);
+ sym = map_groups__find_function(kmaps, addr,
+ &map, NULL);
} else
addr = data->ptr;
diff -Nraup linux-2.6_tip0413/tools/perf/builtin-kvm.c linux-2.6_tip0413_perfkvm/tools/perf/builtin-kvm.c
--- linux-2.6_tip0413/tools/perf/builtin-kvm.c 1970-01-01 08:00:00.000000000 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-kvm.c 2010-04-16 11:28:37.364630979 +0800
@@ -0,0 +1,145 @@
+#include "builtin.h"
+#include "perf.h"
+
+#include "util/util.h"
+#include "util/cache.h"
+#include "util/symbol.h"
+#include "util/thread.h"
+#include "util/header.h"
+#include "util/session.h"
+
+#include "util/parse-options.h"
+#include "util/trace-event.h"
+
+#include "util/debug.h"
+
+#include <sys/prctl.h>
+
+#include <semaphore.h>
+#include <pthread.h>
+#include <math.h>
+
+static char *file_name;
+static char name_buffer[256];
+
+int perf_host = 1;
+int perf_guest;
+
+static const char * const kvm_usage[] = {
+ "perf kvm [<options>] {top|record|report|diff|buildid-list}",
+ NULL
+};
+
+static const struct option kvm_options[] = {
+ OPT_STRING('i', "input", &file_name, "file",
+ "Input file name"),
+ OPT_STRING('o', "output", &file_name, "file",
+ "Output file name"),
+ OPT_BOOLEAN(0, "guest", &perf_guest,
+ "Collect guest os data"),
+ OPT_BOOLEAN(0, "host", &perf_host,
+ "Collect guest os data"),
+ OPT_STRING(0, "guestmount", &symbol_conf.guestmount, "directory",
+ "guest mount directory under which every guest os"
+ " instance has a subdir"),
+ OPT_STRING(0, "guestvmlinux", &symbol_conf.default_guest_vmlinux_name,
+ "file", "file saving guest os vmlinux"),
+ OPT_STRING(0, "guestkallsyms", &symbol_conf.default_guest_kallsyms,
+ "file", "file saving guest os /proc/kallsyms"),
+ OPT_STRING(0, "guestmodules", &symbol_conf.default_guest_modules,
+ "file", "file saving guest os /proc/modules"),
+++ linux-2.6_tip0413_perfkvm/tools/perf/builtin-top.c 2010-04-16 15:56:59.695361174 +0800
@@ -501,10 +506,30 @@ static void print_sym_table(void)
puts(CONSOLE_CLEAR);
printf("%-*.*s\n", win_width, win_width, graph_dotted_line);
- printf( " PerfTop:%8.0f irqs/sec kernel:%4.1f%% exact: %4.1f%% [",
- samples_per_sec,
- 100.0 - (100.0*((samples_per_sec-ksamples_per_sec)/samples_per_sec)),
- esamples_percent);
+ if (!perf_guest) {
+ printf(" PerfTop:%8.0f irqs/sec kernel:%4.1f%%"
+ " exact: %4.1f%% [",
+ samples_per_sec,
+ 100.0 - (100.0 * ((samples_per_sec - ksamples_per_sec) /
+ samples_per_sec)),
+ esamples_percent);
+ } else {
+ printf(" PerfTop:%8.0f irqs/sec kernel:%4.1f%% us:%4.1f%%"
+ " guest kernel:%4.1f%% guest us:%4.1f%%"
+ " exact: %4.1f%% [",
+ samples_per_sec,
+ 100.0 - (100.0 * ((samples_per_sec-ksamples_per_sec) /
+ samples_per_sec)),
+ 100.0 - (100.0 * ((samples_per_sec-us_samples_per_sec) /
+ samples_per_sec)),
+ 100.0 - (100.0 * ((samples_per_sec -
+ guest_kernel_samples_per_sec) /
+ samples_per_sec)),
+ 100.0 - (100.0 * ((samples_per_sec -
+ guest_us_samples_per_sec) /
+ samples_per_sec)),
+ esamples_percent);
+ }
if (nr_counters == 1 || !display_weighted) {
printf("%Ld", (u64)attrs[0].sample_period);
@@ -597,7 +622,6 @@ static void print_sym_table(void)
syme = rb_entry(nd, struct sym_entry, rb_node);
sym = sym_entry__symbol(syme);
-
if (++printed > print_entries || (int)syme->snap_count < count_filter)
continue;
@@ -761,7 +785,7 @@ static int key_mapped(int c)
return 0;
}
-static void handle_keypress(int c)
+static void handle_keypress(struct perf_session *session, int c)
{
if (!key_mapped(c)) {
struct pollfd stdin_poll = { .fd = 0, .events = POLLIN };
@@ -830,7 +854,7 @@ static void handle_keypress(int c)
case 'Q':
printf("exiting.\n");
if (dump_symtab)
- dsos__fprintf(stderr);
+ dsos__fprintf(&session->kerninfo_root, stderr);
exit(0);
case 's':
prompt_symbol(&sym_filter_entry, "Enter details symbol");
@@ -866,6 +890,7 @@ static void *display_thread(void *arg __
struct pollfd stdin_poll = { .fd = 0, .events = POLLIN };
struct termios tc, save;
int delay_msecs, c;
+ struct perf_session *session = (struct perf_session *) arg;
tcgetattr(0, &save);
tc = save;
@@ -886,7 +911,7 @@ repeat:
c = getc(stdin);
tcsetattr(0, TCSAFLUSH, &save);
- handle_keypress(c);
+ handle_keypress(session, c);
goto repeat;
return NULL;
@@ -957,24 +982,46 @@ static void event__process_sample(const
+ * except simple counting.
+ */
+ return;
default:
return;
}
+ if (!kerninfo && perf_guest) {
+ pr_err("Can't find guest [%d]'s kernel information\n",
+ self->ip.pid);
+ return;
+ }
+
if (self->header.misc & PERF_RECORD_MISC_EXACT)
exact_samples++;
@@ -994,7 +1041,7 @@ static void event__process_sample(const
* --hide-kernel-symbols, even if the user specifies an
* invalid --vmlinux ;-)
*/
- if (al.map == session->vmlinux_maps[MAP__FUNCTION] &&
+ if (al.map == kerninfo->vmlinux_maps[MAP__FUNCTION] &&
RB_EMPTY_ROOT(&al.map->dso->symbols[MAP__FUNCTION])) {
pr_err("The %s file can't be used\n",
symbol_conf.vmlinux_name);
@@ -1261,7 +1308,7 @@ static int __cmd_top(void)
perf_session__mmap_read(session);
- if (pthread_create(&thread, NULL, display_thread, NULL)) {
+ if (pthread_create(&thread, NULL, display_thread, session)) {
printf("Could not create display thread.\n");
exit(-1);
}
diff -Nraup linux-2.6_tip0413/tools/perf/command-list.txt linux-2.6_tip0413_perfkvm/tools/perf/command-list.txt
--- linux-2.6_tip0413/tools/perf/command-list.txt 2010-04-14 11:11:58.414224251 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/command-list.txt 2010-04-14 18:09:51.984032138 +0800
@@ -19,3 +19,4 @@ perf-trace mainporcelain common
perf-probe mainporcelain common
perf-kmem mainporcelain common
perf-lock mainporcelain common
+perf-kvm mainporcelain common
diff -Nraup linux-2.6_tip0413/tools/perf/Documentation/perf-kvm.txt linux-2.6_tip0413_perfkvm/tools/perf/Documentation/perf-kvm.txt
--- linux-2.6_tip0413/tools/perf/Documentation/perf-kvm.txt 1970-01-01 08:00:00.000000000 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/Documentation/perf-kvm.txt 2010-04-16 11:30:26.451107100 +0800
@@ -0,0 +1,67 @@
+perf-kvm(1)
+==============
+
+NAME
+----
+perf-kvm - Tool to trace/measure kvm guest os
+
+SYNOPSIS
+--------
+[verse]
+'perf kvm' [--host] [--guest] [--guestmount=<path>
+ [--guestkallsyms=<path> --guestmodules=<path> | --guestvmlinux=<path>]]
+ {top|record|report|diff|buildid-list}
+'perf kvm' [--host] [--guest] [--guestkallsyms=<path> --guestmodules=<path>
+ | --guestvmlinux=<path>] {top|record|report|diff|buildid-list}
+
+DESCRIPTION
+-----------
+There are a couple of variants of perf kvm:
+
+ 'perf kvm [options] top <command>' to generates and displays
+ a performance counter profile of guest os in realtime
+ of an arbitrary workload.
+
+ 'perf kvm record <command>' to record the performance couinter profile
+ of an arbitrary workload and save it into a perf data file. If both
+ --host and --guest are input, the perf data file name is perf.data.kvm.
+ If there is no --host but --guest, the file name is perf.data.guest.
+ If there is no --guest but --host, the file name is perf.data.host.
+
+ 'perf kvm report' to display the performance counter profile information
+ recorded via perf kvm record.
+
+ 'perf kvm diff' to displays the performance difference amongst two perf.data
+ files captured via perf record.
+
+ 'perf kvm buildid-list' to display the buildids found in a perf data file,
+ so that other tools can be used to fetch packages with matching symbol tables
+ for use by perf report.
+
+OPTIONS
+-------
+--host=::
+ Collect host side perforamnce profile.
+--guest=::
+ Collect guest side perforamnce profile.
+--guestmount=<path>::
+ Guest os root file system mount directory. Users mounts guest os
+ root directories under <path> by a specific filesystem access method,
+ typically, sshfs. For example, start 2 guest os. The one's pid is 8888
+ and the other's is 9999.
+ #mkdir ~/guestmount; cd ~/guestmount
+ #sshfs -o allow_other,direct_io -p 5551 localhost:/ 8888/
+ #sshfs -o allow_other,direct_io -p 5552 localhost:/ 9999/
+ #perf kvm --host --guest --guestmount=~/guestmount top
+--guestkallsyms=<path>::
+ Guest os /proc/kallsyms file copy. 'perf' kvm' reads it to get guest
+ kernel symbols. Users copy it out from guest os.
+--guestmodules=<path>::
+ Guest os /proc/modules file copy. 'perf' kvm' reads it to get guest
+ kernel module information. Users copy it out from guest os.
+--guestvmlinux=<path>::
+ Guest os kernel vmlinux.
+
+SEE ALSO
+--------
+linkperf:perf-top[1] perf-record[1] perf-report[1] perf-diff[1] perf-buildid-list[1]
diff -Nraup linux-2.6_tip0413/tools/perf/Makefile linux-2.6_tip0413_perfkvm/tools/perf/Makefile
--- linux-2.6_tip0413/tools/perf/Makefile 2010-04-14 11:11:58.802281816 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/Makefile 2010-04-16 14:47:01.649542605 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/event.c 2010-04-16 15:50:08.477896519 +0800
@@ -112,7 +112,11 @@ static int event__synthesize_mmap_events
event_t ev = {
.header = {
.type = PERF_RECORD_MMAP,
- .misc = 0, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */
+ /*
+ * Just like the kernel, see __perf_event_mmap
+ * in kernel/perf_event.c
+ */
+ .misc = PERF_RECORD_MISC_USER,
},
};
int n;
@@ -167,11 +171,23 @@ static int event__synthesize_mmap_events
}
int event__synthesize_modules(event__handler_t process,
- struct perf_session *session)
+ struct perf_session *session,
+ struct kernel_info *kerninfo)
{
struct rb_node *nd;
+ struct map_groups *kmaps = &kerninfo->kmaps;
+ u16 misc;
- for (nd = rb_first(&session->kmaps.maps[MAP__FUNCTION]);
+ /*
+ * kernel uses 0 for user space maps, see kernel/perf_event.c
+ * __perf_event_mmap
+ */
+ if (is_host_kernel(kerninfo))
+ misc = PERF_RECORD_MISC_KERNEL;
+ else
+ misc = PERF_RECORD_MISC_GUEST_KERNEL;
+
+ for (nd = rb_first(&kmaps->maps[MAP__FUNCTION]);
nd; nd = rb_next(nd)) {
event_t ev;
size_t size;
@@ -182,12 +198,13 @@ int event__synthesize_modules(event__han
size = ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
memset(&ev, 0, sizeof(ev));
- ev.mmap.header.misc = 1; /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
+ ev.mmap.header.misc = misc;
ev.mmap.header.type = PERF_RECORD_MMAP;
ev.mmap.header.size = (sizeof(ev.mmap) -
(sizeof(ev.mmap.filename) - size));
ev.mmap.start = pos->start;
ev.mmap.len = pos->end - pos->start;
+ ev.mmap.pid = kerninfo->pid;
memcpy(ev.mmap.filename, pos->dso->long_name,
pos->dso->long_name_len + 1);
@@ -250,13 +267,18 @@ static int find_symbol_cb(void *arg, con
int event__synthesize_kernel_mmap(event__handler_t process,
struct perf_session *session,
+ struct kernel_info *kerninfo,
const char *symbol_name)
{
size_t size;
+ const char *filename, *mmap_name;
+ char path[PATH_MAX];
+ char name_buff[PATH_MAX];
+ struct map *map;
+
event_t ev = {
.header = {
.type = PERF_RECORD_MMAP,
- .misc = 1, /* kernel uses 0 for user space maps, see kernel/perf_event.c __perf_event_mmap */
},
};
/*
@@ -266,16 +288,37 @@ int event__synthesize_kernel_mmap(event_
*/
struct process_symbol_args args = { .name = symbol_name, };
- if (kallsyms__parse("/proc/kallsyms", &args, find_symbol_cb) <= 0)
+ mmap_name = kern_mmap_name(kerninfo, name_buff);
+ if (is_host_kernel(kerninfo)) {
+ /*
+ * kernel uses PERF_RECORD_MISC_USER for user space maps,
+ * see kernel/perf_event.c __perf_event_mmap
+ */
+ ev.header.misc = PERF_RECORD_MISC_KERNEL;
+ filename = "/proc/kallsyms";
+ } else {
+ ev.header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
+ if (is_default_guest(kerninfo))
+ filename = (char *) symbol_conf.default_guest_kallsyms;
+ else {
+ sprintf(path, "%s/proc/kallsyms", kerninfo->root_dir);
+ filename = path;
+ }
+ }
+
+ if (kallsyms__parse(filename, &args, find_symbol_cb) <= 0)
return -ENOENT;
+ map = kerninfo->vmlinux_maps[MAP__FUNCTION];
size = snprintf(ev.mmap.filename, sizeof(ev.mmap.filename),
- "[kernel.kallsyms.%s]", symbol_name) + 1;
+ "[%s]%s", mmap_name, symbol_name) + 1;
size = ALIGN(size, sizeof(u64));
- ev.mmap.header.size = (sizeof(ev.mmap) - (sizeof(ev.mmap.filename) - size));
+ ev.mmap.header.size = (sizeof(ev.mmap) -
+ (sizeof(ev.mmap.filename) - size));
ev.mmap.pgoff = args.start;
- ev.mmap.start = session->vmlinux_maps[MAP__FUNCTION]->start;
- ev.mmap.len = session->vmlinux_maps[MAP__FUNCTION]->end - ev.mmap.start ;
+ ev.mmap.start = map->start;
+ ev.mmap.len = map->end - ev.mmap.start;
+ ev.mmap.pid = kerninfo->pid;
return process(&ev, session);
}
@@ -329,82 +372,130 @@ int event__process_lost(event_t *self, s
return 0;
}
-int event__process_mmap(event_t *self, struct perf_session *session)
+static void event_set_kernel_mmap_len(struct map **maps, event_t *self)
+{
+ maps[MAP__FUNCTION]->start = self->mmap.start;
+ maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len;
+ /*
+ * Be a bit paranoid here, some perf.data file came with
+ * a zero sized synthesized MMAP event for the kernel.
+ */
+ if (maps[MAP__FUNCTION]->end == 0)
+ maps[MAP__FUNCTION]->end = ~0UL;
+}
+
+static int event__process_kernel_mmap(event_t *self,
+ struct perf_session *session)
{
- struct thread *thread;
struct map *map;
+ char kmmap_prefix[PATH_MAX];
+ struct kernel_info *kerninfo;
+ enum dso_kernel_type kernel_type;
+
+ kerninfo = kerninfo__findnew(&session->kerninfo_root, self->mmap.pid);
+ if (!kerninfo) {
+ pr_err("Can't find id %d's kerninfo\n", self->mmap.pid);
+ goto out_problem;
+ }
- dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
- self->mmap.pid, self->mmap.tid, self->mmap.start,
- self->mmap.len, self->mmap.pgoff, self->mmap.filename);
+ kern_mmap_name_parenthese(kerninfo, kmmap_prefix);
+ if (is_host_kernel(kerninfo))
+ kernel_type = DSO_TYPE_KERNEL;
+ else
+ kernel_type = DSO_TYPE_GUEST_KERNEL;
- if (self->mmap.pid == 0) {
- static const char kmmap_prefix[] = "[kernel.kallsyms.";
+ if (self->mmap.filename[0] == '/') {
- sizeof(kmmap_prefix) - 1) == 0) {
- const char *symbol_name = (self->mmap.filename +
- sizeof(kmmap_prefix) - 1);
- /*
- * Should be there already, from the build-id table in
- * the header.
- */
- struct dso *kernel = __dsos__findnew(&dsos__kernel,
- "[kernel.kallsyms]");
- if (kernel == NULL)
- goto out_problem;
-
- kernel->kernel = 1;
- if (__perf_session__create_kernel_maps(session, kernel) < 0)
- goto out_problem;
+ char short_module_name[1024];
+ char *name = strrchr(self->mmap.filename, '/'), *dot;
- session->vmlinux_maps[MAP__FUNCTION]->start = self->mmap.start;
- session->vmlinux_maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len;
+ strlen(kmmap_prefix)) == 0) {
+ const char *symbol_name = (self->mmap.filename +
+ strlen(kmmap_prefix));
+ /*
+ * Should be there already, from the build-id table in
+ * the header.
+ */
+ struct dso *kernel = __dsos__findnew(&kerninfo->dsos__kernel,
+ kmmap_prefix);
+ if (kernel == NULL)
+ goto out_problem;
+
+ kernel->kernel = kernel_type;
+ if (__map_groups__create_kernel_maps(&kerninfo->kmaps,
+ kerninfo->vmlinux_maps, kernel) < 0)
+ goto out_problem;
+
+ event_set_kernel_mmap_len(kerninfo->vmlinux_maps, self);
+ perf_session__set_kallsyms_ref_reloc_sym(kerninfo->vmlinux_maps,
+ symbol_name,
+ self->mmap.pgoff);
+ if (is_default_guest(kerninfo)) {
/*
- * Be a bit paranoid here, some perf.data file came with
- * a zero sized synthesized MMAP event for the kernel.
+ * preload dso of guest kernel and modules
*/
- if (session->vmlinux_maps[MAP__FUNCTION]->end == 0)
- session->vmlinux_maps[MAP__FUNCTION]->end = ~0UL;
-
- perf_session__set_kallsyms_ref_reloc_sym(session, symbol_name,
- self->mmap.pgoff);
+ dso__load(kernel,
+ kerninfo->vmlinux_maps[MAP__FUNCTION],
+ NULL);
}
+ }
+ return 0;
+out_problem:
+ return -1;
+}
+
+int event__process_mmap(event_t *self, struct perf_session *session)
+{
+ struct kernel_info *kerninfo;
+ struct thread *thread;
+ struct map *map;
+ u8 cpumode = self->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+ int ret = 0;
+
+ dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
+ self->mmap.pid, self->mmap.tid, self->mmap.start,
+ self->mmap.len, self->mmap.pgoff, self->mmap.filename);
+
+ if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
+ cpumode == PERF_RECORD_MISC_KERNEL) {
+ ret = event__process_kernel_mmap(self, session);
+ if (ret < 0)
+ goto out_problem;
return 0;
}
thread = perf_session__findnew(session, self->mmap.pid);
- map = map__new(self->mmap.start, self->mmap.len, self->mmap.pgoff,
- self->mmap.pid, self->mmap.filename, MAP__FUNCTION,
- session->cwd, session->cwdlen);
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
+ map = map__new(&kerninfo->dsos__user, self->mmap.start,
+ self->mmap.len, self->mmap.pgoff,
+ self->mmap.pid, self->mmap.filename,
+ MAP__FUNCTION, session->cwd, session->cwdlen);
if (thread == NULL || map == NULL)
goto out_problem;
@@ -444,22 +535,52 @@ int event__process_task(event_t *self, s
@@ -474,8 +595,11 @@ try_again:
* "[vdso]" dso, but for now lets use the old trick of looking
* in the whole kernel symbol list.
*/
- if ((long long)al->addr < 0 && mg != &session->kmaps) {
- mg = &session->kmaps;
+ if ((long long)al->addr < 0 &&
+ cpumode == PERF_RECORD_MISC_KERNEL &&
+ kerninfo &&
+ mg != &kerninfo->kmaps) {
+ mg = &kerninfo->kmaps;
goto try_again;
}
} else
@@ -484,11 +608,11 @@ try_again:
void thread__find_addr_location(struct thread *self,
struct perf_session *session, u8 cpumode,
- enum map_type type, u64 addr,
+ enum map_type type, pid_t pid, u64 addr,
struct addr_location *al,
symbol_filter_t filter)
{
- thread__find_addr_map(self, session, cpumode, type, addr, al);
+ thread__find_addr_map(self, session, cpumode, type, pid, addr, al);
if (al->map != NULL)
al->sym = map__find_symbol(al->map, al->addr, filter);
else
@@ -524,7 +648,7 @@ int event__preprocess_sample(const event
dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
thread__find_addr_map(thread, session, cpumode, MAP__FUNCTION,
- self->ip.ip, al);
+ self->ip.pid, self->ip.ip, al);
dump_printf(" ...... dso: %s\n",
al->map ? al->map->dso->long_name :
al->level == 'H' ? "[hypervisor]" : "<not found>");
@@ -554,7 +678,6 @@ int event__preprocess_sample(const event
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/header.c 2010-04-15 18:07:29.010855524 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/hist.c 2010-04-15 18:08:28.379944370 +0800
@@ -487,6 +519,26 @@ int hist_entry__snprintf(struct hist_ent
else
ret = snprintf(s, size, sep ? "%.2f" : " %6.2f%%",
(count * 100.0) / total);
+ if (symbol_conf.show_cpu_utilization) {
+ ret += percent_color_snprintf(s + ret, size - ret,
+ sep ? "%.2f" : " %6.2f%%",
+ (count_sys * 100.0) / total);
+ ret += percent_color_snprintf(s + ret, size - ret,
+ sep ? "%.2f" : " %6.2f%%",
+ (count_us * 100.0) / total);
+ if (perf_guest) {
+ ret += percent_color_snprintf(s + ret,
+ size - ret,
+ sep ? "%.2f" : " %6.2f%%",
+ (count_guest_sys * 100.0) /
+ total);
+ ret += percent_color_snprintf(s + ret,
+ size - ret,
+ sep ? "%.2f" : " %6.2f%%",
+ (count_guest_us * 100.0) /
+ total);
+ }
+ }
} else
ret = snprintf(s, size, sep ? "%lld" : "%12lld ", count);
@@ -597,6 +649,24 @@ size_t perf_session__fprintf_hists(struc
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/map.c 2010-04-16 12:29:12.013545441 +0800
@@ -508,3 +512,144 @@ struct map *maps__find(struct rb_root *m
return NULL;
}
+
+struct kernel_info *add_new_kernel_info(struct rb_root *kerninfo_root,
+ pid_t pid, const char *root_dir)
+{
+ struct rb_node **p = &kerninfo_root->rb_node;
+ struct rb_node *parent = NULL;
+ struct kernel_info *kerninfo, *pos;
+
+ kerninfo = malloc(sizeof(struct kernel_info));
+ if (!kerninfo)
+ return NULL;
+
+ return NULL;
+}
+
+char *kern_mmap_name(struct kernel_info *kerninfo, char *buff)
+{
+ if (is_host_kernel(kerninfo))
+ sprintf(buff, "%s", "kernel.kallsyms");
+ else if (is_default_guest(kerninfo))
+ sprintf(buff, "%s", "guest.kernel.kallsyms");
+ else
+ sprintf(buff, "%s.%d", "guest.kernel.kallsyms", kerninfo->pid);
+
+ return buff;
+}
+
+char *kern_mmap_name_parenthese(struct kernel_info *kerninfo, char *buff)
+{
+ char name[PATH_MAX];
+
+ kern_mmap_name(kerninfo, name);
+ sprintf(buff, "[%s]", name);
+ return buff;
+}
+
diff -Nraup linux-2.6_tip0413/tools/perf/util/map.h linux-2.6_tip0413_perfkvm/tools/perf/util/map.h
--- linux-2.6_tip0413/tools/perf/util/map.h 2010-04-14 11:11:58.686216105 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/map.h 2010-04-16 12:39:20.253178208 +0800
@@ -106,9 +124,41 @@ int map_groups__clone(struct map_groups
size_t map_groups__fprintf(struct map_groups *self, int verbose, FILE *fp);
size_t map_groups__fprintf_maps(struct map_groups *self, int verbose, FILE *fp);
+struct kernel_info *add_new_kernel_info(struct rb_root *kerninfo_root,
+ pid_t pid, const char *root_dir);
+struct kernel_info *kerninfo__find(struct rb_root *kerninfo_root, pid_t pid);
+struct kernel_info *kerninfo__findnew(struct rb_root *kerninfo_root, pid_t pid);
+struct kernel_info *kerninfo__findhost(struct rb_root *kerninfo_root);
+char *kern_mmap_name(struct kernel_info *kerninfo, char *buff);
+char *kern_mmap_name_parenthese(struct kernel_info *kerninfo, char *buff);
@@ -148,13 +198,11 @@ int map_groups__fixup_overlappings(struc
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/session.c 2010-04-15 18:15:17.650831879 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/session.h 2010-04-15 18:15:31.480436185 +0800
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/symbol.c 2010-04-16 15:58:25.355509333 +0800
@@ -483,6 +484,7 @@ static int dso__split_kallsyms(struct ds
symbol_filter_t filter)
{
struct map_groups *kmaps = map__kmap(map)->kmaps;
+ struct kernel_info *kerninfo = kmaps->this_kerninfo;
struct map *curr_map = map;
struct symbol *pos;
int count = 0;
@@ -504,15 +506,33 @@ static int dso__split_kallsyms(struct ds
*module++ = '\0';
if (strcmp(curr_map->dso->short_name, module)) {
- curr_map = map_groups__find_by_name(kmaps, map->type, module);
+ if (curr_map != map &&
+ self->kernel == DSO_TYPE_GUEST_KERNEL &&
+ is_default_guest(kerninfo)) {
+ /*
+ * We assume all symbols of a module are
+ * continuous in * kallsyms, so curr_map
+ * points to a module and all its
+ * symbols are in its kmap. Mark it as
+ * loaded.
+ */
+ dso__set_loaded(curr_map->dso,
+ curr_map->type);
+ }
+
+ curr_map = map_groups__find_by_name(kmaps,
+ map->type, module);
if (curr_map == NULL) {
- pr_debug("/proc/{kallsyms,modules} "
+ pr_err("%s/proc/{kallsyms,modules} "
"inconsistency while looking "
- "for \"%s\" module!\n", module);
- return -1;
+ "for \"%s\" module!\n",
+ kerninfo->root_dir, module);
+ curr_map = map;
+ goto discard_symbol;
}
- if (curr_map->dso->loaded)
+ if (curr_map->dso->loaded &&
+ !is_default_guest(kmaps->this_kerninfo))
goto discard_symbol;
}
/*
@@ -525,13 +545,21 @@ static int dso__split_kallsyms(struct ds
char dso_name[PATH_MAX];
struct dso *dso;
- snprintf(dso_name, sizeof(dso_name), "[kernel].%d",
- kernel_range++);
+ if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+ snprintf(dso_name, sizeof(dso_name),
+ "[guest.kernel].%d",
+ kernel_range++);
+ else
+ snprintf(dso_name, sizeof(dso_name),
+ "[kernel].%d",
+ kernel_range++);
dso = dso__new(dso_name);
if (dso == NULL)
return -1;
+ dso->kernel = self->kernel;
+
curr_map = map__new2(pos->start, dso, map->type);
if (curr_map == NULL) {
dso__delete(dso);
@@ -555,6 +583,12 @@ discard_symbol: rb_erase(&pos->rb_node,
}
}
+ if (curr_map != map &&
+ self->kernel == DSO_TYPE_GUEST_KERNEL &&
+ is_default_guest(kmaps->this_kerninfo)) {
+ dso__set_loaded(curr_map->dso, curr_map->type);
+ }
+
return count;
}
@@ -565,7 +599,10 @@ int dso__load_kallsyms(struct dso *self,
return -1;
symbols__fixup_end(&self->symbols[map->type]);
- self->origin = DSO__ORIG_KERNEL;
+ if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+ self->origin = DSO__ORIG_GUEST_KERNEL;
+ else
+ self->origin = DSO__ORIG_KERNEL;
return dso__split_kallsyms(self, map, filter);
}
@@ -952,7 +989,7 @@ static int dso__load_sym(struct dso *sel
nr_syms = shdr.sh_size / shdr.sh_entsize;
memset(&sym, 0, sizeof(sym));
- if (!self->kernel) {
+ if (self->kernel == DSO_TYPE_USER) {
self->adjust_symbols = (ehdr.e_type == ET_EXEC ||
elf_section_by_name(elf, &ehdr, &shdr,
".gnu.prelink_undo",
@@ -984,7 +1021,7 @@ static int dso__load_sym(struct dso *sel
section_name = elf_sec__name(&shdr, secstrs);
- if (self->kernel || kmodule) {
+ if (self->kernel != DSO_TYPE_USER || kmodule) {
char dso_name[PATH_MAX];
if (strcmp(section_name,
@@ -1011,6 +1048,7 @@ static int dso__load_sym(struct dso *sel
curr_dso = dso__new(dso_name);
if (curr_dso == NULL)
goto out_elf_end;
+ curr_dso->kernel = self->kernel;
curr_map = map__new2(start, curr_dso,
map->type);
if (curr_map == NULL) {
@@ -1021,7 +1059,7 @@ static int dso__load_sym(struct dso *sel
curr_map->unmap_ip = identity__map_ip;
curr_dso->origin = self->origin;
map_groups__insert(kmap->kmaps, curr_map);
- dsos__add(&dsos__kernel, curr_dso);
+ dsos__add(&self->node, curr_dso);
dso__set_loaded(curr_dso, map->type);
} else
curr_dso = curr_map->dso;
@@ -1083,7 +1121,7 @@ static bool dso__build_id_equal(const st
return memcmp(self->build_id, build_id, sizeof(self->build_id)) == 0;
}
-static bool __dsos__read_build_ids(struct list_head *head, bool with_hits)
+bool __dsos__read_build_ids(struct list_head *head, bool with_hits)
{
bool have_build_id = false;
struct dso *pos;
@@ -1101,13 +1139,6 @@ static bool __dsos__read_build_ids(struc
return have_build_id;
}
-bool dsos__read_build_ids(bool with_hits)
-{
- bool kbuildids = __dsos__read_build_ids(&dsos__kernel, with_hits),
- ubuildids = __dsos__read_build_ids(&dsos__user, with_hits);
- return kbuildids || ubuildids;
-}
-
/*
* Align offset to 4 bytes as needed for note name and descriptor data.
*/
@@ -1242,6 +1273,8 @@ char dso__symtab_origin(const struct dso
[DSO__ORIG_BUILDID] = 'b',
[DSO__ORIG_DSO] = 'd',
[DSO__ORIG_KMODULE] = 'K',
+ [DSO__ORIG_GUEST_KERNEL] = 'g',
+ [DSO__ORIG_GUEST_KMODULE] = 'G',
};
if (self == NULL || self->origin == DSO__ORIG_NOT_FOUND)
@@ -1257,11 +1290,20 @@ int dso__load(struct dso *self, struct m
char build_id_hex[BUILD_ID_SIZE * 2 + 1];
int ret = -1;
int fd;
+ struct kernel_info *kerninfo;
+ const char *root_dir;
dso__set_loaded(self, map->type);
- if (self->kernel)
+ if (self->kernel == DSO_TYPE_KERNEL)
return dso__load_kernel_sym(self, map, filter);
+ else if (self->kernel == DSO_TYPE_GUEST_KERNEL)
+ return dso__load_guest_kernel_sym(self, map, filter);
+
+ if (map->groups && map->groups->this_kerninfo)
+ kerninfo = map->groups->this_kerninfo;
+ else
+ kerninfo = NULL;
name = malloc(size);
if (!name)
@@ -1315,6 +1357,13 @@ more:
case DSO__ORIG_DSO:
snprintf(name, size, "%s", self->long_name);
break;
+ case DSO__ORIG_GUEST_KMODULE:
+ if (map->groups && map->groups->this_kerninfo)
+ root_dir = map->groups->this_kerninfo->root_dir;
+ else
+ root_dir = "";
+ snprintf(name, size, "%s%s", root_dir, self->long_name);
+ break;
default:
goto out;
@@ -1368,7 +1417,8 @@ struct map *map_groups__find_by_name(str
return NULL;
}
-static int dso__kernel_module_get_build_id(struct dso *self)
+static int dso__kernel_module_get_build_id(struct dso *self,
+ const char *root_dir)
{
char filename[PATH_MAX];
/*
@@ -1378,8 +1428,8 @@ static int dso__kernel_module_get_build_
const char *name = self->short_name + 1;
snprintf(filename, sizeof(filename),
- "/sys/module/%.*s/notes/.note.gnu.build-id",
- (int)strlen(name - 1), name);
+ "%s/sys/module/%.*s/notes/.note.gnu.build-id",
+ root_dir, (int)strlen(name) - 1, name);
if (sysfs__read_build_id(filename, self->build_id,
sizeof(self->build_id)) == 0)
@@ -1388,7 +1438,8 @@ static int dso__kernel_module_get_build_
return 0;
}
-static int map_groups__set_modules_path_dir(struct map_groups *self, char *dir_name)
+static int map_groups__set_modules_path_dir(struct map_groups *self,
+ const char *dir_name)
{
struct dirent *dent;
DIR *dir = opendir(dir_name);
@@ -1400,8 +1451,14 @@ static int map_groups__set_modules_path_
while ((dent = readdir(dir)) != NULL) {
char path[PATH_MAX];
+ struct stat st;
- if (dent->d_type == DT_DIR) {
+ /*sshfs might return bad dent->d_type, so we have to stat*/
+ sprintf(path, "%s/%s", dir_name, dent->d_name);
+ if (stat(path, &st))
+ continue;
+
+ if (S_ISDIR(st.st_mode)) {
if (!strcmp(dent->d_name, ".") ||
!strcmp(dent->d_name, ".."))
continue;
@@ -1433,7 +1490,7 @@ static int map_groups__set_modules_path_
if (long_name == NULL)
goto failure;
dso__set_long_name(map->dso, long_name);
- dso__kernel_module_get_build_id(map->dso);
+ dso__kernel_module_get_build_id(map->dso, "");
}
}
@@ -1443,16 +1500,46 @@ failure:
return -1;
}
-static int map_groups__set_modules_path(struct map_groups *self)
+static char *get_kernel_version(const char *root_dir)
{
- struct utsname uts;
+ char version[PATH_MAX];
+ FILE *file;
+ char *name, *tmp;
+ const char *prefix = "Linux version ";
+
+ sprintf(version, "%s/proc/version", root_dir);
+ file = fopen(version, "r");
+ if (!file)
+ return NULL;
+
@@ -1477,11 +1564,13 @@ static struct map *map__new2(u64 start,
}
struct map *map_groups__new_module(struct map_groups *self, u64 start,
- const char *filename)
+ const char *filename,
+ struct kernel_info *kerninfo)
{
struct map *map;
- struct dso *dso = __dsos__findnew(&dsos__kernel, filename);
+ struct dso *dso;
+ dso = __dsos__findnew(&kerninfo->dsos__kernel, filename);
if (dso == NULL)
return NULL;
@@ -1489,21 +1578,37 @@ struct map *map_groups__new_module(struc
@@ -1532,16 +1637,17 @@ static int map_groups__create_modules(st
*sep = '\0';
snprintf(name, sizeof(name), "[%s]", line);
- map = map_groups__new_module(self, start, name);
+ map = map_groups__new_module(&kerninfo->kmaps,
+ start, name, kerninfo);
if (map == NULL)
goto out_delete_line;
- dso__kernel_module_get_build_id(map->dso);
+ dso__kernel_module_get_build_id(map->dso, root_dir);
}
free(line);
fclose(file);
- return map_groups__set_modules_path(self);
+ return map_groups__set_modules_path(&kerninfo->kmaps, root_dir);
out_delete_line:
free(line);
@@ -1708,8 +1814,57 @@ out_fixup:
+ kern_mmap_name_parenthese(kerninfo, path);
+ dso__set_long_name(self,
+ strdup(path));
+ }
+ map__fixup_start(map);
+ map__fixup_end(map);
+ }
+
+ return err;
+}
static void dsos__add(struct list_head *head, struct dso *dso)
{
@@ -1752,10 +1907,16 @@ static void __dsos__fprintf(struct list_
}
}
-void dsos__fprintf(FILE *fp)
+void dsos__fprintf(struct rb_root *kerninfo_root, FILE *fp)
{
- __dsos__fprintf(&dsos__kernel, fp);
- __dsos__fprintf(&dsos__user, fp);
+ struct rb_node *nd;
+
+ for (nd = rb_first(kerninfo_root); nd; nd = rb_next(nd)) {
+ struct kernel_info *pos = rb_entry(nd, struct kernel_info,
+ rb_node);
+ __dsos__fprintf(&pos->dsos__kernel, fp);
+ __dsos__fprintf(&pos->dsos__user, fp);
+ }
}
static size_t __dsos__fprintf_buildid(struct list_head *head, FILE *fp,
@@ -1773,10 +1934,21 @@ static size_t __dsos__fprintf_buildid(st
return ret;
}
-size_t dsos__fprintf_buildid(FILE *fp, bool with_hits)
+size_t dsos__fprintf_buildid(struct rb_root *kerninfo_root,
+ FILE *fp, bool with_hits)
{
- return (__dsos__fprintf_buildid(&dsos__kernel, fp, with_hits) +
- __dsos__fprintf_buildid(&dsos__user, fp, with_hits));
+ struct rb_node *nd;
+ size_t ret = 0;
+
+ for (nd = rb_first(kerninfo_root); nd; nd = rb_next(nd)) {
+ struct kernel_info *pos = rb_entry(nd, struct kernel_info,
+ rb_node);
+ ret += __dsos__fprintf_buildid(&pos->dsos__kernel,
+ fp, with_hits);
+ ret += __dsos__fprintf_buildid(&pos->dsos__user,
+ fp, with_hits);
+ }
+ return ret;
}
struct dso *dso__new_kernel(const char *name)
@@ -1785,28 +1957,59 @@ struct dso *dso__new_kernel(const char *
if (self != NULL) {
dso__set_short_name(self, "[kernel]");
- self->kernel = 1;
+ self->kernel = DSO_TYPE_KERNEL;
}
return self;
}
-void dso__read_running_kernel_build_id(struct dso *self)
+static struct dso *dso__new_guest_kernel(struct kernel_info *kerninfo,
+ const char *name)
{
- if (sysfs__read_build_id("/sys/kernel/notes", self->build_id,
+ char buff[PATH_MAX];
+ struct dso *self;
+
+ kern_mmap_name_parenthese(kerninfo, buff);
+ self = dso__new(name ?: buff);
+ if (self != NULL) {
+ dso__set_short_name(self, "[guest.kernel]");
+ self->kernel = DSO_TYPE_GUEST_KERNEL;
+ }
+
+ return self;
+}
+
+void dso__read_running_kernel_build_id(struct dso *self,
+ struct kernel_info *kerninfo)
+{
+ char path[PATH_MAX];
+
+ if (is_default_guest(kerninfo))
+ return;
+ sprintf(path, "%s/sys/kernel/notes", kerninfo->root_dir);
+ if (sysfs__read_build_id(path, self->build_id,
sizeof(self->build_id)) == 0)
self->has_build_id = true;
}
-static struct dso *dsos__create_kernel(const char *vmlinux)
+static struct dso *dsos__create_kernel(struct kernel_info *kerninfo)
{
- struct dso *kernel = dso__new_kernel(vmlinux);
+ const char *vmlinux_name = NULL;
+ struct dso *kernel;
- if (kernel != NULL) {
- dso__read_running_kernel_build_id(kernel);
- dsos__add(&dsos__kernel, kernel);
+ if (is_host_kernel(kerninfo)) {
+ vmlinux_name = symbol_conf.vmlinux_name;
+ kernel = dso__new_kernel(vmlinux_name);
+ } else {
+ if (is_default_guest(kerninfo))
+ vmlinux_name = symbol_conf.default_guest_vmlinux_name;
+ kernel = dso__new_guest_kernel(kerninfo, vmlinux_name);
}
+ if (kernel != NULL) {
+ dso__read_running_kernel_build_id(kernel, kerninfo);
+ dsos__add(&kerninfo->dsos__kernel, kernel);
+ }
return kernel;
}
@@ -1950,23 +2153,29 @@ out_free_comm_list:
@@ -2012,3 +2221,47 @@ char *strxfrchar(char *s, char from, cha
+++ linux-2.6_tip0413_perfkvm/tools/perf/util/symbol.h 2010-04-16 12:44:29.345307369 +0800
@@ -143,34 +155,30 @@ static inline void dso__set_loaded(struc
@@ -178,19 +186,26 @@ enum dso_origin {
#endif /* __PERF_THREAD_H */
On Fri, Apr 16, 2010 at 03:34:35PM +0800, Zhang, Yanmin wrote:
> Below is the kernel patch to enable perf to collect guest os statistics.
>
> Joerg,
>
> Would you like to add support on svm? I don't know the exact point to trigger
> NMI to host with svm.
Yes I will do that, thanks for all the work you have already done :-) Do
we have a branch for that work somewhere? Probably in the -tip tree?
Joerg
Let's look at the surrounding text...
>
> The following bullets detail when architectural state is and is not
> updated in response
> to VM exits:
> • If an event causes a VM exit directly, it does not update
> architectural state as it
> would have if it had it not caused the VM exit:
> — A debug exception does not update DR6, DR7.GD, or IA32_DEBUGCTL.LBR.
> (Information about the nature of the debug exception is saved
> in the exit
> qualification field.)
> — A page fault does not update CR2. (The linear address causing
> the page fault
> is saved in the exit-qualification field.)
> — An NMI causes subsequent NMIs to be blocked, but only after the
> VM exit
> completes.
> — An external interrupt does not acknowledge the interrupt
> controller and the
> interrupt remains pending, unless the “acknowledge interrupt
> on exit”
> VM-exit control is 1. In such a case, the interrupt controller
> is acknowledged
> and the interrupt is no longer pending.
Everywhere it says state is _not_ updated, so I think what is meant is
that NMIs are blocked, but only _until_ the VM exit completes.
I think you were right the first time around. Can you check with your
architecture team?
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
Can you please split it further?
Patch 1 introduces perf_register_guest_info_callbacks() and related.
Ingo can merge this into a branch in tip.git.
Patch 2 is just the kvm bits, I'll apply that after merging the branch
with patch 1.
Patch 3 adds the tools/perf changes.
This way perf development can continue on tip.git, and kvm development
can continue on kvm.git, without the code bases diverging and requiring
a merge later.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
--
We can try doing this (currently we don't, but this is simple enough
that we could). I'd still like 1-2 in two patches.
> On 04/16/2010 10:34 AM, Zhang, Yanmin wrote:
> >Below is the kernel patch to enable perf to collect guest os statistics.
> >
> >Joerg,
> >
> >Would you like to add support on svm? I don't know the exact point to trigger
> >NMI to host with svm.
> >
> >See below code with vmx:
> >
> >+ kvm_before_handle_nmi(&vmx->vcpu);
> > asm("int $2");
> >+ kvm_after_handle_nmi(&vmx->vcpu);
> >
> >Signed-off-by: Zhang Yanmin<yanmin...@linux.intel.com>
>
> Can you please split it further?
>
> Patch 1 introduces perf_register_guest_info_callbacks() and related. Ingo
> can merge this into a branch in tip.git. Patch 2 is just the kvm bits, I'll
> apply that after merging the branch with patch 1. Patch 3 adds the
> tools/perf changes.
>
> This way perf development can continue on tip.git, and kvm development can
> continue on kvm.git, without the code bases diverging and requiring a merge
> later.
I'd like to pull the KVM bits from you into perf - so that there's a testable
form of the changes. We can do that via a branch that has 1-2 changes, plus
minimal conflicts down the line, right?
Ingo
Thanks.
> [...] I'd still like 1-2 in two patches.
Sure.
Ingo
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
---
diff -Nraup --exclude-from=exclude.diff linux-2.6_tip0417/arch/x86/include/asm/perf_event.h linux-2.6_tip0417_perfkvm/arch/x86/include/asm/perf_event.h
--- linux-2.6_tip0417/arch/x86/include/asm/perf_event.h 2010-04-19 09:51:47.557797121 +0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/include/asm/perf_event.h 2010-04-19 09:53:59.689452915 +0800
@@ -135,17 +135,10 @@ extern void perf_events_lapic_init(void)
*/
#define PERF_EFLAGS_EXACT (1UL << 3)
-#define perf_misc_flags(regs) \
-({ int misc = 0; \
- if (user_mode(regs)) \
- misc |= PERF_RECORD_MISC_USER; \
- else \
- misc |= PERF_RECORD_MISC_KERNEL; \
- if (regs->flags & PERF_EFLAGS_EXACT) \
- misc |= PERF_RECORD_MISC_EXACT; \
- misc; })
-
-#define perf_instruction_pointer(regs) ((regs)->ip)
+struct pt_regs;
+extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
+extern unsigned long perf_misc_flags(struct pt_regs *regs);
+#define perf_misc_flags(regs) perf_misc_flags(regs)
#else
static inline void init_hw_perf_events(void) { }
diff -Nraup --exclude-from=exclude.diff linux-2.6_tip0417/arch/x86/kernel/cpu/perf_event.c linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c
--- linux-2.6_tip0417/arch/x86/kernel/cpu/perf_event.c 2010-04-19 09:51:48.347655964 +0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c 2010-04-19 09:53:59.689452915 +0800
@@ -1720,6 +1720,11 @@ struct perf_callchain_entry *perf_callch
{
struct perf_callchain_entry *entry;
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+ /* TODO: We don't support guest os callchain now */
+ return NULL;
+ }
+
if (in_nmi())
entry = &__get_cpu_var(pmc_nmi_entry);
else
@@ -1743,3 +1748,30 @@ void perf_arch_fetch_caller_regs(struct
regs->cs = __KERNEL_CS;
local_save_flags(regs->flags);
}
+
+unsigned long perf_instruction_pointer(struct pt_regs *regs)
+{
+ unsigned long ip;
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
+ ip = perf_guest_cbs->get_guest_ip();
+ else
+ ip = instruction_pointer(regs);
+ return ip;
+}
+
+unsigned long perf_misc_flags(struct pt_regs *regs)
+{
+ int misc = 0;
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+ misc |= perf_guest_cbs->is_user_mode() ?
+ PERF_RECORD_MISC_GUEST_USER :
+ PERF_RECORD_MISC_GUEST_KERNEL;
+ } else
+ misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
+ PERF_RECORD_MISC_KERNEL;
+ if (regs->flags & PERF_EFLAGS_EXACT)
+ misc |= PERF_RECORD_MISC_EXACT;
+
+ return misc;
+}
+
diff -Nraup --exclude-from=exclude.diff linux-2.6_tip0417/include/linux/perf_event.h linux-2.6_tip0417_perfkvm/include/linux/perf_event.h
--- linux-2.6_tip0417/include/linux/perf_event.h 2010-04-19 09:51:59.544791000 +0800
+++ linux-2.6_tip0417_perfkvm/include/linux/perf_event.h 2010-04-19 09:53:59.691378953 +0800
@@ -288,11 +288,13 @@ struct perf_event_mmap_page {
__u64 data_tail; /* user-space written tail */
};
-#define PERF_RECORD_MISC_CPUMODE_MASK (3 << 0)
+#define PERF_RECORD_MISC_CPUMODE_MASK (7 << 0)
#define PERF_RECORD_MISC_CPUMODE_UNKNOWN (0 << 0)
#define PERF_RECORD_MISC_KERNEL (1 << 0)
#define PERF_RECORD_MISC_USER (2 << 0)
#define PERF_RECORD_MISC_HYPERVISOR (3 << 0)
+#define PERF_RECORD_MISC_GUEST_KERNEL (4 << 0)
+#define PERF_RECORD_MISC_GUEST_USER (5 << 0)
#define PERF_RECORD_MISC_EXACT (1 << 14)
/*
@@ -446,6 +448,12 @@ enum perf_callchain_context {
# include <asm/perf_event.h>
#endif
+struct perf_guest_info_callbacks {
+ int (*is_in_guest) (void);
+ int (*is_user_mode) (void);
+ unsigned long (*get_guest_ip) (void);
+};
+
#ifdef CONFIG_HAVE_HW_BREAKPOINT
#include <asm/hw_breakpoint.h>
#endif
@@ -932,6 +940,12 @@ static inline void perf_event_mmap(struc
__perf_event_mmap(vma);
}
+extern struct perf_guest_info_callbacks *perf_guest_cbs;
+extern int perf_register_guest_info_callbacks(
+ struct perf_guest_info_callbacks *);
+extern int perf_unregister_guest_info_callbacks(
+ struct perf_guest_info_callbacks *);
+
extern void perf_event_comm(struct task_struct *tsk);
extern void perf_event_fork(struct task_struct *tsk);
@@ -1001,6 +1015,11 @@ perf_sw_event(u32 event_id, u64 nr, int
static inline void
perf_bp_event(struct perf_event *event, void *data) { }
+static inline int perf_register_guest_info_callbacks
+(struct perf_guest_info_callbacks *) {return 0; }
+static inline int perf_unregister_guest_info_callbacks
+(struct perf_guest_info_callbacks *) {return 0; }
+
static inline void perf_event_mmap(struct vm_area_struct *vma) { }
static inline void perf_event_comm(struct task_struct *tsk) { }
static inline void perf_event_fork(struct task_struct *tsk) { }
diff -Nraup --exclude-from=exclude.diff linux-2.6_tip0417/kernel/perf_event.c linux-2.6_tip0417_perfkvm/kernel/perf_event.c
--- linux-2.6_tip0417/kernel/perf_event.c 2010-04-19 09:52:40.907135718 +0800
+++ linux-2.6_tip0417_perfkvm/kernel/perf_event.c 2010-04-19 09:53:59.693377237 +0800
@@ -2798,6 +2798,27 @@ void perf_arch_fetch_caller_regs(struct
/*
+ * We assume there is only KVM supporting the callbacks.
+ * Later on, we might change it to a list if there is
+ * another virtualization implementation supporting the callbacks.
+ */
+struct perf_guest_info_callbacks *perf_guest_cbs;
+
+int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+{
+ perf_guest_cbs = cbs;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);
+
+int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
+{
+ perf_guest_cbs = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(perf_unregister_guest_info_callbacks);
+
+/*
* Output
*/
static bool perf_output_space(struct perf_mmap_data *data, unsigned long tail,
@@ -3749,7 +3770,7 @@ void __perf_event_mmap(struct vm_area_st
.event_id = {
.header = {
.type = PERF_RECORD_MMAP,
- .misc = 0,
+ .misc = PERF_RECORD_MISC_USER,
/* .size */
},
/* .pid */
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
---
diff -Nraup linux-2.6_tip0417/arch/x86/kvm/vmx.c linux-2.6_tip0417_perfkvm/arch/x86/kvm/vmx.c
--- linux-2.6_tip0417/arch/x86/kvm/vmx.c 2010-04-19 09:51:47.908673911 +0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/kvm/vmx.c 2010-04-19 09:53:59.690399987 +0800
@@ -3654,8 +3654,11 @@ static void vmx_complete_interrupts(stru
/* We need to handle NMIs before interrupts are enabled */
if ((exit_intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
- (exit_intr_info & INTR_INFO_VALID_MASK))
+ (exit_intr_info & INTR_INFO_VALID_MASK)) {
+ kvm_before_handle_nmi(&vmx->vcpu);
asm("int $2");
+ kvm_after_handle_nmi(&vmx->vcpu);
+ }
idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
diff -Nraup linux-2.6_tip0417/arch/x86/kvm/x86.c linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c
--- linux-2.6_tip0417/arch/x86/kvm/x86.c 2010-04-19 09:51:47.892676413 +0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c 2010-04-19 09:53:59.691378953 +0800
+{
+ unsigned long ip = 0;
+ if (percpu_read(current_vcpu))
+ ip = kvm_rip_read(percpu_read(current_vcpu));
+ return ip;
+}
+
diff -Nraup linux-2.6_tip0417/arch/x86/kvm/x86.h linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.h
--- linux-2.6_tip0417/arch/x86/kvm/x86.h 2010-04-19 09:51:47.884709050 +0800
+++ linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.h 2010-04-19 09:53:59.691378953 +0800
@@ -65,4 +65,7 @@ static inline int is_paging(struct kvm_v
return kvm_read_cr0_bits(vcpu, X86_CR0_PG);
}
+void kvm_before_handle_nmi(struct kvm_vcpu *vcpu);
+void kvm_after_handle_nmi(struct kvm_vcpu *vcpu);
+
#endif
ChangeLog V5:
1) Split kernel patch to 2 parts. The one introduces
perf_guest_info_callbacks() and related register/unregister
functions. The other is the kvm implementation of the callbacks.
2) Port to tip/master tree of April 17th.
3) Fix a bug which causes the module parsing of default guest kernel
fail.
ChangeLog V4:
1) Based on Ingo's comments, I added help information around kvm
such like command-list.txt and perf-kvm.txt.
2) Added guest process id at the tail of kernel dso long name, so
the display could show different label with different guest os.
3) Based on Avi's comments, erase the racy window which might
trigger an NMI while the NMI isn't in guest os.
4) Fixed all the errors and warnings reported by scripts/checkpatch.pl.
5) Fixed a compilation error pointed by Yang Sheng.
ChangeLog V3:
1) Add --guestmount=/dir/to/all/guestos parameter. Admin mounts guest os
root directories under /dir/to/all/guestos by sshfs. For example, I start
2 guest os. The one's pid is 8888 and the other's is 9999.
#mkdir ~/guestmount; cd ~/guestmount
#sshfs -o allow_other,direct_io -p 5551 localhost:/ 8888/
#sshfs -o allow_other,direct_io -p 5552 localhost:/ 9999/
#perf kvm --host --guest --guestmount=~/guestmount top
The old --guestkallsyms and --guestmodules are still supported as default
guest os symbol parsing.
From: Zhang, Yanmin <yanmin...@linux.intel.com>
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
--
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
---
diff -Nraup linux-2.6_tip0417/tools/perf/builtin-annotate.c linux-2.6_tip0417_perfkvm/tools/perf/builtin-annotate.c
--- linux-2.6_tip0417/tools/perf/builtin-annotate.c 2010-04-19 09:52:40.282230518 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/builtin-annotate.c 2010-04-19 09:54:07.970013157 +0800
@@ -571,7 +571,7 @@ static int __cmd_annotate(void)
perf_session__fprintf(session, stdout);
if (verbose > 2)
- dsos__fprintf(stdout);
+ dsos__fprintf(&session->kerninfo_root, stdout);
perf_session__collapse_resort(&session->hists);
perf_session__output_resort(&session->hists, session->event_total[0]);
diff -Nraup linux-2.6_tip0417/tools/perf/builtin-buildid-list.c linux-2.6_tip0417_perfkvm/tools/perf/builtin-buildid-list.c
--- linux-2.6_tip0417/tools/perf/builtin-buildid-list.c 2010-04-19 09:52:40.282230518 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/builtin-buildid-list.c 2010-04-19 09:54:07.970013157 +0800
@@ -46,7 +46,7 @@ static int __cmd_buildid_list(void)
if (with_hits)
perf_session__process_events(session, &build_id__mark_dso_hit_ops);
- dsos__fprintf_buildid(stdout, with_hits);
+ dsos__fprintf_buildid(&session->kerninfo_root, stdout, with_hits);
perf_session__delete(session);
return err;
diff -Nraup linux-2.6_tip0417/tools/perf/builtin-diff.c linux-2.6_tip0417_perfkvm/tools/perf/builtin-diff.c
--- linux-2.6_tip0417/tools/perf/builtin-diff.c 2010-04-19 09:52:40.256204096 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/builtin-diff.c 2010-04-19 09:54:07.970013157 +0800
@@ -33,7 +33,7 @@ static int perf_session__add_hist_entry(
return -ENOMEM;
if (hit)
- he->count += count;
+ __perf_session__add_count(he, al, count);
return 0;
}
@@ -225,6 +225,10 @@ int cmd_diff(int argc, const char **argv
input_new = argv[1];
} else
input_new = argv[0];
+ } else if (symbol_conf.default_guest_vmlinux_name ||
+ symbol_conf.default_guest_kallsyms) {
+ input_old = "perf.data.host";
+ input_new = "perf.data.guest";
}
symbol_conf.exclude_other = false;
diff -Nraup linux-2.6_tip0417/tools/perf/builtin.h linux-2.6_tip0417_perfkvm/tools/perf/builtin.h
--- linux-2.6_tip0417/tools/perf/builtin.h 2010-04-19 09:52:40.101230042 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/builtin.h 2010-04-19 09:54:07.971013152 +0800
@@ -32,5 +32,6 @@ extern int cmd_version(int argc, const c
extern int cmd_probe(int argc, const char **argv, const char *prefix);
extern int cmd_kmem(int argc, const char **argv, const char *prefix);
extern int cmd_lock(int argc, const char **argv, const char *prefix);
+extern int cmd_kvm(int argc, const char **argv, const char *prefix);
#endif
diff -Nraup linux-2.6_tip0417/tools/perf/builtin-kmem.c linux-2.6_tip0417_perfkvm/tools/perf/builtin-kmem.c
--- linux-2.6_tip0417/tools/perf/builtin-kmem.c 2010-04-19 09:52:40.543213874 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/builtin-kmem.c 2010-04-19 09:54:07.971013152 +0800
diff -Nraup linux-2.6_tip0417/tools/perf/builtin-kvm.c linux-2.6_tip0417_perfkvm/tools/perf/builtin-kvm.c
--- linux-2.6_tip0417/tools/perf/builtin-kvm.c 1970-01-01 08:00:00.000000000 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/builtin-kvm.c 2010-04-19 09:54:07.971013152 +0800
diff -Nraup linux-2.6_tip0417/tools/perf/builtin-record.c linux-2.6_tip0417_perfkvm/tools/perf/builtin-record.c
--- linux-2.6_tip0417/tools/perf/builtin-record.c 2010-04-19 09:52:40.544188668 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/builtin-record.c 2010-04-19 09:54:07.971013152 +0800
@@ -456,6 +456,52 @@ static void atexit_header(void)
@@ -467,6 +513,7 @@ static int __cmd_record(int argc, const
int child_ready_pipe[2], go_pipe[2];
const bool forks = argc > 0;
char buf;
+ struct kernel_info *kerninfo;
page_size = sysconf(_SC_PAGE_SIZE);
@@ -635,21 +682,31 @@ static int __cmd_record(int argc, const
advance_output(err);
diff -Nraup linux-2.6_tip0417/tools/perf/builtin-report.c linux-2.6_tip0417_perfkvm/tools/perf/builtin-report.c
--- linux-2.6_tip0417/tools/perf/builtin-report.c 2010-04-19 09:52:40.282230518 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/builtin-report.c 2010-04-19 09:54:07.972013094 +0800
@@ -108,7 +108,7 @@ static int perf_session__add_hist_entry(
return -ENOMEM;
if (hit)
- he->count += data->period;
+ __perf_session__add_count(he, al, data->period);
if (symbol_conf.use_callchain) {
if (!hit)
@@ -313,7 +313,7 @@ static int __cmd_report(void)
perf_session__fprintf(session, stdout);
if (verbose > 2)
- dsos__fprintf(stdout);
+ dsos__fprintf(&session->kerninfo_root, stdout);
next = rb_first(&session->stats_by_id);
while (next) {
@@ -450,6 +450,8 @@ static const struct option options[] = {
"sort by key(s): pid, comm, dso, symbol, parent"),
OPT_BOOLEAN('P', "full-paths", &symbol_conf.full_paths,
"Don't shorten the pathnames taking into account the cwd"),
+ OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
+ "Show sample percentage for different cpu modes"),
OPT_STRING('p', "parent", &parent_pattern, "regex",
"regex filter to identify parent, see: '--sort parent'"),
OPT_BOOLEAN('x', "exclude-other", &symbol_conf.exclude_other,
diff -Nraup linux-2.6_tip0417/tools/perf/builtin-top.c linux-2.6_tip0417_perfkvm/tools/perf/builtin-top.c
--- linux-2.6_tip0417/tools/perf/builtin-top.c 2010-04-19 09:52:40.282230518 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/builtin-top.c 2010-04-19 09:54:07.972013094 +0800
diff -Nraup linux-2.6_tip0417/tools/perf/command-list.txt linux-2.6_tip0417_perfkvm/tools/perf/command-list.txt
--- linux-2.6_tip0417/tools/perf/command-list.txt 2010-04-19 09:52:40.256204096 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/command-list.txt 2010-04-19 09:54:07.972013094 +0800
@@ -19,3 +19,4 @@ perf-trace mainporcelain common
perf-probe mainporcelain common
perf-kmem mainporcelain common
perf-lock mainporcelain common
+perf-kvm mainporcelain common
diff -Nraup linux-2.6_tip0417/tools/perf/Documentation/perf-kvm.txt linux-2.6_tip0417_perfkvm/tools/perf/Documentation/perf-kvm.txt
--- linux-2.6_tip0417/tools/perf/Documentation/perf-kvm.txt 1970-01-01 08:00:00.000000000 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/Documentation/perf-kvm.txt 2010-04-19 09:54:07.973043957 +0800
diff -Nraup linux-2.6_tip0417/tools/perf/Makefile linux-2.6_tip0417_perfkvm/tools/perf/Makefile
--- linux-2.6_tip0417/tools/perf/Makefile 2010-04-19 09:52:40.536190479 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/Makefile 2010-04-19 11:47:23.377999848 +0800
@@ -472,6 +472,7 @@ BUILTIN_OBJS += $(OUTPUT)builtin-trace.o
BUILTIN_OBJS += $(OUTPUT)builtin-probe.o
BUILTIN_OBJS += $(OUTPUT)builtin-kmem.o
BUILTIN_OBJS += $(OUTPUT)builtin-lock.o
+BUILTIN_OBJS += $(OUTPUT)builtin-kvm.o
PERFLIBS = $(LIB_FILE)
diff -Nraup linux-2.6_tip0417/tools/perf/perf.c linux-2.6_tip0417_perfkvm/tools/perf/perf.c
--- linux-2.6_tip0417/tools/perf/perf.c 2010-04-19 09:52:40.286240448 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/perf.c 2010-04-19 09:54:07.973043957 +0800
@@ -307,6 +307,7 @@ static void handle_internal_command(int
{ "probe", cmd_probe, 0 },
{ "kmem", cmd_kmem, 0 },
{ "lock", cmd_lock, 0 },
+ { "kvm", cmd_kvm, 0 },
};
unsigned int i;
static const char ext[] = STRIP_EXTENSION;
diff -Nraup linux-2.6_tip0417/tools/perf/perf.h linux-2.6_tip0417_perfkvm/tools/perf/perf.h
--- linux-2.6_tip0417/tools/perf/perf.h 2010-04-19 09:52:40.553208044 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/perf.h 2010-04-19 09:54:07.973043957 +0800
@@ -131,4 +131,6 @@ struct ip_callchain {
u64 ips[0];
};
+extern int perf_host, perf_guest;
+
#endif
diff -Nraup linux-2.6_tip0417/tools/perf/util/build-id.c linux-2.6_tip0417_perfkvm/tools/perf/util/build-id.c
--- linux-2.6_tip0417/tools/perf/util/build-id.c 2010-04-19 09:52:40.339191461 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/build-id.c 2010-04-19 09:54:07.973043957 +0800
@@ -24,7 +24,7 @@ static int build_id__mark_dso_hit(event_
}
thread__find_addr_map(thread, session, cpumode, MAP__FUNCTION,
- event->ip.ip, &al);
+ event->ip.pid, event->ip.ip, &al);
if (al.map != NULL)
al.map->dso->hit = 1;
diff -Nraup linux-2.6_tip0417/tools/perf/util/event.c linux-2.6_tip0417_perfkvm/tools/perf/util/event.c
--- linux-2.6_tip0417/tools/perf/util/event.c 2010-04-19 09:52:40.341224763 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/event.c 2010-04-19 14:08:08.723999849 +0800
@@ -112,7 +112,11 @@ static int event__synthesize_mmap_events
event_t ev = {
.header = {
.type = PERF_RECORD_MMAP,
- .misc = 0, /* Just like the kernel, see kernel/perf_event.c __perf_event_mmap */
+ /*
+ * Just like the kernel, see __perf_event_mmap
+ * in kernel/perf_event.c
+ */
+ .misc = PERF_RECORD_MISC_USER,
},
};
int n;
@@ -167,11 +171,23 @@ static int event__synthesize_mmap_events
}
int event__synthesize_modules(event__handler_t process,
- struct perf_session *session)
+ struct perf_session *session,
+ struct kernel_info *kerninfo)
{
struct rb_node *nd;
+ struct map_groups *kmaps = &kerninfo->kmaps;
+ u16 misc;
- for (nd = rb_first(&session->kmaps.maps[MAP__FUNCTION]);
+ /*
+ * kernel uses 0 for user space maps, see kernel/perf_event.c
+ * __perf_event_mmap
+ */
+ if (is_host_kernel(kerninfo))
+ misc = PERF_RECORD_MISC_KERNEL;
+ else
event_t ev = {
.header = {
.type = PERF_RECORD_MMAP,
+ "%s%s", mmap_name, symbol_name) + 1;
size = ALIGN(size, sizeof(u64));
- ev.mmap.header.size = (sizeof(ev.mmap) - (sizeof(ev.mmap.filename) - size));
+ ev.mmap.header.size = (sizeof(ev.mmap) -
+ (sizeof(ev.mmap.filename) - size));
ev.mmap.pgoff = args.start;
- ev.mmap.start = session->vmlinux_maps[MAP__FUNCTION]->start;
- ev.mmap.len = session->vmlinux_maps[MAP__FUNCTION]->end - ev.mmap.start ;
+ ev.mmap.start = map->start;
+ ev.mmap.len = map->end - ev.mmap.start;
+ ev.mmap.pid = kerninfo->pid;
return process(&ev, session);
}
@@ -329,22 +372,50 @@ int event__process_lost(event_t *self, s
return 0;
}
-int event__process_mmap(event_t *self, struct perf_session *session)
+static void event_set_kernel_mmap_len(struct map **maps, event_t *self)
+{
+ maps[MAP__FUNCTION]->start = self->mmap.start;
+ maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len;
+ /*
+ * Be a bit paranoid here, some perf.data file came with
+ * a zero sized synthesized MMAP event for the kernel.
+ */
+ if (maps[MAP__FUNCTION]->end == 0)
+ maps[MAP__FUNCTION]->end = ~0UL;
+}
+
+static int event__process_kernel_mmap(event_t *self,
+ struct perf_session *session)
{
- struct thread *thread;
struct map *map;
+ char kmmap_prefix[PATH_MAX];
+ struct kernel_info *kerninfo;
+ enum dso_kernel_type kernel_type;
+ bool is_kernel_mmap;
+
+ kerninfo = kerninfo__findnew(&session->kerninfo_root, self->mmap.pid);
+ if (!kerninfo) {
+ pr_err("Can't find id %d's kerninfo\n", self->mmap.pid);
+ goto out_problem;
+ }
- dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
- self->mmap.pid, self->mmap.tid, self->mmap.start,
- self->mmap.len, self->mmap.pgoff, self->mmap.filename);
+ kern_mmap_name(kerninfo, kmmap_prefix);
+ if (is_host_kernel(kerninfo))
+ kernel_type = DSO_TYPE_KERNEL;
+ else
+ kernel_type = DSO_TYPE_GUEST_KERNEL;
- if (self->mmap.pid == 0) {
- static const char kmmap_prefix[] = "[kernel.kallsyms.";
+ is_kernel_mmap = memcmp(self->mmap.filename,
+ kmmap_prefix,
+ strlen(kmmap_prefix)) == 0;
+ if (self->mmap.filename[0] == '/' ||
+ (!is_kernel_mmap && self->mmap.filename[0] == '[')) {
- if (self->mmap.filename[0] == '/') {
- char short_module_name[1024];
- char *name = strrchr(self->mmap.filename, '/'), *dot;
+ char short_module_name[1024];
+ char *name, *dot;
+ if (self->mmap.filename[0] == '/') {
+ name = strrchr(self->mmap.filename, '/');
if (name == NULL)
goto out_problem;
@@ -352,59 +423,86 @@ int event__process_mmap(event_t *self, s
dot = strrchr(name, '.');
if (dot == NULL)
goto out_problem;
-
snprintf(short_module_name, sizeof(short_module_name),
- "[%.*s]", (int)(dot - name), name);
+ "[%.*s]", (int)(dot - name), name);
strxfrchar(short_module_name, '-', '_');
+ } else
+ strcpy(short_module_name, self->mmap.filename);
- map = perf_session__new_module_map(session,
- self->mmap.start,
- self->mmap.filename);
- if (map == NULL)
- goto out_problem;
-
- name = strdup(short_module_name);
- if (name == NULL)
- goto out_problem;
-
- map->dso->short_name = name;
- map->end = map->start + self->mmap.len;
- } else if (memcmp(self->mmap.filename, kmmap_prefix,
- sizeof(kmmap_prefix) - 1) == 0) {
- const char *symbol_name = (self->mmap.filename +
- sizeof(kmmap_prefix) - 1);
+ map = map_groups__new_module(&kerninfo->kmaps,
+ self->mmap.start,
+ self->mmap.filename,
+ kerninfo);
+ if (map == NULL)
+ goto out_problem;
+
+ name = strdup(short_module_name);
+ if (name == NULL)
+ goto out_problem;
+
+ map->dso->short_name = name;
+ map->end = map->start + self->mmap.len;
+ } else if (is_kernel_mmap) {
+ const char *symbol_name = (self->mmap.filename +
+ strlen(kmmap_prefix));
+ /*
+ * Should be there already, from the build-id table in
+ * the header.
+ */
+ struct dso *kernel = __dsos__findnew(&kerninfo->dsos__kernel,
+ kmmap_prefix);
+ if (kernel == NULL)
+ goto out_problem;
+
+ kernel->kernel = kernel_type;
+ if (__map_groups__create_kernel_maps(&kerninfo->kmaps,
+ kerninfo->vmlinux_maps, kernel) < 0)
+ goto out_problem;
+
+ event_set_kernel_mmap_len(kerninfo->vmlinux_maps, self);
+ perf_session__set_kallsyms_ref_reloc_sym(kerninfo->vmlinux_maps,
+ symbol_name,
+ self->mmap.pgoff);
+ if (is_default_guest(kerninfo)) {
/*
- * Should be there already, from the build-id table in
- * the header.
+ * preload dso of guest kernel and modules
*/
- struct dso *kernel = __dsos__findnew(&dsos__kernel,
- "[kernel.kallsyms]");
- if (kernel == NULL)
- goto out_problem;
+ dso__load(kernel,
+ kerninfo->vmlinux_maps[MAP__FUNCTION],
+ NULL);
+ }
+ }
+ return 0;
+out_problem:
+ return -1;
+}
- kernel->kernel = 1;
- if (__perf_session__create_kernel_maps(session, kernel) < 0)
- goto out_problem;
+int event__process_mmap(event_t *self, struct perf_session *session)
+{
+ struct kernel_info *kerninfo;
+ struct thread *thread;
+ struct map *map;
+ u8 cpumode = self->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+ int ret = 0;
- session->vmlinux_maps[MAP__FUNCTION]->start = self->mmap.start;
- session->vmlinux_maps[MAP__FUNCTION]->end = self->mmap.start + self->mmap.len;
- /*
- * Be a bit paranoid here, some perf.data file came with
- * a zero sized synthesized MMAP event for the kernel.
- */
- if (session->vmlinux_maps[MAP__FUNCTION]->end == 0)
- session->vmlinux_maps[MAP__FUNCTION]->end = ~0UL;
+ dump_printf(" %d/%d: [%#Lx(%#Lx) @ %#Lx]: %s\n",
+ self->mmap.pid, self->mmap.tid, self->mmap.start,
+ self->mmap.len, self->mmap.pgoff, self->mmap.filename);
- perf_session__set_kallsyms_ref_reloc_sym(session, symbol_name,
- self->mmap.pgoff);
- }
+ if (cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
+ cpumode == PERF_RECORD_MISC_KERNEL) {
+ ret = event__process_kernel_mmap(self, session);
+ if (ret < 0)
+ goto out_problem;
return 0;
}
thread = perf_session__findnew(session, self->mmap.pid);
- map = map__new(self->mmap.start, self->mmap.len, self->mmap.pgoff,
- self->mmap.pid, self->mmap.filename, MAP__FUNCTION,
- session->cwd, session->cwdlen);
+ kerninfo = kerninfo__findhost(&session->kerninfo_root);
+ map = map__new(&kerninfo->dsos__user, self->mmap.start,
+ self->mmap.len, self->mmap.pgoff,
+ self->mmap.pid, self->mmap.filename,
+ MAP__FUNCTION, session->cwd, session->cwdlen);
if (thread == NULL || map == NULL)
goto out_problem;
@@ -444,22 +542,52 @@ int event__process_task(event_t *self, s
@@ -474,8 +602,11 @@ try_again:
* "[vdso]" dso, but for now lets use the old trick of looking
* in the whole kernel symbol list.
*/
- if ((long long)al->addr < 0 && mg != &session->kmaps) {
- mg = &session->kmaps;
+ if ((long long)al->addr < 0 &&
+ cpumode == PERF_RECORD_MISC_KERNEL &&
+ kerninfo &&
+ mg != &kerninfo->kmaps) {
+ mg = &kerninfo->kmaps;
goto try_again;
}
} else
@@ -484,11 +615,11 @@ try_again:
void thread__find_addr_location(struct thread *self,
struct perf_session *session, u8 cpumode,
- enum map_type type, u64 addr,
+ enum map_type type, pid_t pid, u64 addr,
struct addr_location *al,
symbol_filter_t filter)
{
- thread__find_addr_map(self, session, cpumode, type, addr, al);
+ thread__find_addr_map(self, session, cpumode, type, pid, addr, al);
if (al->map != NULL)
al->sym = map__find_symbol(al->map, al->addr, filter);
else
@@ -524,7 +655,7 @@ int event__preprocess_sample(const event
dump_printf(" ... thread: %s:%d\n", thread->comm, thread->pid);
thread__find_addr_map(thread, session, cpumode, MAP__FUNCTION,
- self->ip.ip, al);
+ self->ip.pid, self->ip.ip, al);
dump_printf(" ...... dso: %s\n",
al->map ? al->map->dso->long_name :
al->level == 'H' ? "[hypervisor]" : "<not found>");
@@ -554,7 +685,6 @@ int event__preprocess_sample(const event
!strlist__has_entry(symbol_conf.sym_list, al->sym->name))
goto out_filtered;
- al->filtered = false;
return 0;
out_filtered:
diff -Nraup linux-2.6_tip0417/tools/perf/util/event.h linux-2.6_tip0417_perfkvm/tools/perf/util/event.h
--- linux-2.6_tip0417/tools/perf/util/event.h 2010-04-19 09:52:40.321193673 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/event.h 2010-04-19 09:54:07.974012967 +0800
@@ -79,6 +79,7 @@ struct sample_data {
struct build_id_event {
struct perf_event_header header;
+ pid_t pid;
u8 build_id[ALIGN(BUILD_ID_SIZE, sizeof(u64))];
char filename[];
};
@@ -154,10 +155,13 @@ int event__synthesize_thread(pid_t pid,
void event__synthesize_threads(event__handler_t process,
struct perf_session *session);
int event__synthesize_kernel_mmap(event__handler_t process,
- struct perf_session *session,
- const char *symbol_name);
+ struct perf_session *session,
+ struct kernel_info *kerninfo,
+ const char *symbol_name);
+
int event__synthesize_modules(event__handler_t process,
- struct perf_session *session);
+ struct perf_session *session,
+ struct kernel_info *kerninfo);
int event__process_comm(event_t *self, struct perf_session *session);
int event__process_lost(event_t *self, struct perf_session *session);
diff -Nraup linux-2.6_tip0417/tools/perf/util/header.c linux-2.6_tip0417_perfkvm/tools/perf/util/header.c
--- linux-2.6_tip0417/tools/perf/util/header.c 2010-04-19 09:52:40.294227060 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/header.c 2010-04-19 11:01:33.882483101 +0800
@@ -190,7 +190,8 @@ static int write_padded(int fd, const vo
continue; \
else
-static int __dsos__write_buildid_table(struct list_head *head, u16 misc, int fd)
+static int __dsos__write_buildid_table(struct list_head *head, pid_t pid,
+ u16 misc, int fd)
{
struct dso *pos;
@@ -205,6 +206,7 @@ static int __dsos__write_buildid_table(s
len = ALIGN(len, NAME_ALIGN);
memset(&b, 0, sizeof(b));
memcpy(&b.build_id, pos->build_id, sizeof(pos->build_id));
+ b.pid = pid;
b.header.misc = misc;
b.header.size = sizeof(b) + len;
err = do_write(fd, &b, sizeof(b));
@@ -219,13 +221,33 @@ static int __dsos__write_buildid_table(s
@@ -342,9 +364,12 @@ static int __dsos__cache_build_ids(struc
return err;
}
-static int dsos__cache_build_ids(void)
+static int dsos__cache_build_ids(struct perf_header *self)
{
- int err_kernel, err_user;
+ struct perf_session *session = container_of(self,
+ struct perf_session, header);
+ struct rb_node *nd;
+ int ret = 0;
char debugdir[PATH_MAX];
snprintf(debugdir, sizeof(debugdir), "%s/%s", getenv("HOME"),
@@ -353,9 +378,30 @@ static int dsos__cache_build_ids(void)
@@ -366,7 +412,7 @@ static int perf_header__adds_write(struc
u64 sec_start;
int idx = 0, err;
- if (dsos__read_build_ids(true))
+ if (dsos__read_build_ids(self, true))
perf_header__set_feat(self, HEADER_BUILD_ID);
nr_sections = bitmap_weight(self->adds_features, HEADER_FEAT_BITS);
@@ -401,14 +447,14 @@ static int perf_header__adds_write(struc
/* Write build-ids */
buildid_sec->offset = lseek(fd, 0, SEEK_CUR);
- err = dsos__write_buildid_table(fd);
+ err = dsos__write_buildid_table(self, fd);
if (err < 0) {
pr_debug("failed to write buildid table\n");
goto out_free;
}
buildid_sec->size = lseek(fd, 0, SEEK_CUR) -
buildid_sec->offset;
- dsos__cache_build_ids();
+ dsos__cache_build_ids(self);
}
lseek(fd, sec_start, SEEK_SET);
@@ -633,6 +679,85 @@ int perf_file_header__read(struct perf_f
return 0;
}
+static int __event_process_build_id(struct build_id_event *bev,
+ char *filename,
+ struct perf_session *session)
+{
+ int err = -1;
+ struct list_head *head;
+ struct kernel_info *kerninfo;
+ u16 misc;
+ struct dso *dso;
+ enum dso_kernel_type dso_type;
+
+ kerninfo = kerninfo__findnew(&session->kerninfo_root, bev->pid);
+ if (!kerninfo)
+ goto out;
+
+ misc = bev->header.misc & PERF_RECORD_MISC_CPUMODE_MASK;
+
+ switch (misc) {
+ case PERF_RECORD_MISC_KERNEL:
+ dso_type = DSO_TYPE_KERNEL;
+ head = &kerninfo->dsos__kernel;
+ break;
+ case PERF_RECORD_MISC_GUEST_KERNEL:
+ dso_type = DSO_TYPE_GUEST_KERNEL;
+ head = &kerninfo->dsos__kernel;
+ break;
+ case PERF_RECORD_MISC_USER:
+ case PERF_RECORD_MISC_GUEST_USER:
+ dso_type = DSO_TYPE_USER;
+ head = &kerninfo->dsos__user;
+ break;
+ default:
+ goto out;
+ }
+
+ dso = __dsos__findnew(head, filename);
+ if (dso != NULL) {
+ dso__set_build_id(dso, &bev->build_id);
+ if (filename[0] == '[')
+ dso->kernel = dso_type;
+ }
+
+ err = 0;
+out:
+ return err;
+}
+
+static int perf_header__read_build_ids(struct perf_header *self,
+ int input, u64 offset, u64 size)
+{
+ struct perf_session *session = container_of(self,
+ struct perf_session, header);
+ struct build_id_event bev;
+ char filename[PATH_MAX];
+ u64 limit = offset + size;
+ int err = -1;
+
+ while (offset < limit) {
+ ssize_t len;
+
+ if (read(input, &bev, sizeof(bev)) != sizeof(bev))
+ goto out;
+
+ if (self->needs_swap)
+ perf_event_header__bswap(&bev.header);
+
+ len = bev.header.size - sizeof(bev);
+ if (read(input, filename, len) != len)
+ goto out;
+
+ __event_process_build_id(&bev, filename, session);
+
+ offset += bev.header.size;
+ }
+ err = 0;
+out:
+ return err;
+}
+
static int perf_file_section__process(struct perf_file_section *self,
struct perf_header *ph,
int feat, int fd)
@@ -989,6 +1114,7 @@ int event__process_tracing_data(event_t
int event__synthesize_build_id(struct dso *pos, u16 misc,
event__handler_t process,
+ struct kernel_info *kerninfo,
struct perf_session *session)
{
event_t ev;
@@ -1005,6 +1131,7 @@ int event__synthesize_build_id(struct ds
memcpy(&ev.build_id.build_id, pos->build_id, sizeof(pos->build_id));
ev.build_id.header.type = PERF_RECORD_HEADER_BUILD_ID;
ev.build_id.header.misc = misc;
+ ev.build_id.pid = kerninfo->pid;
ev.build_id.header.size = sizeof(ev.build_id) + len;
memcpy(&ev.build_id.filename, pos->long_name, pos->long_name_len);
@@ -1015,6 +1142,7 @@ int event__synthesize_build_id(struct ds
static int __event_synthesize_build_ids(struct list_head *head, u16 misc,
event__handler_t process,
+ struct kernel_info *kerninfo,
struct perf_session *session)
{
struct dso *pos;
@@ -1024,7 +1152,8 @@ static int __event_synthesize_build_ids(
if (!pos->hit)
continue;
- err = event__synthesize_build_id(pos, misc, process, session);
+ err = event__synthesize_build_id(pos, misc, process,
+ kerninfo, session);
if (err < 0)
return err;
}
@@ -1035,44 +1164,48 @@ static int __event_synthesize_build_ids(
int event__synthesize_build_ids(event__handler_t process,
struct perf_session *session)
{
- int err;
+ int err = 0;
+ u16 kmisc, umisc;
+ struct kernel_info *pos;
+ struct rb_node *nd;
- if (!dsos__read_build_ids(true))
+ if (!dsos__read_build_ids(&session->header, true))
return 0;
- err = __event_synthesize_build_ids(&dsos__kernel,
- PERF_RECORD_MISC_KERNEL,
- process, session);
- if (err == 0)
- err = __event_synthesize_build_ids(&dsos__user,
- PERF_RECORD_MISC_USER,
- process, session);
+ for (nd = rb_first(&session->kerninfo_root); nd; nd = rb_next(nd)) {
+ pos = rb_entry(nd, struct kernel_info, rb_node);
+ if (is_host_kernel(pos)) {
+ kmisc = PERF_RECORD_MISC_KERNEL;
+ umisc = PERF_RECORD_MISC_USER;
+ } else {
+ kmisc = PERF_RECORD_MISC_GUEST_KERNEL;
+ umisc = PERF_RECORD_MISC_GUEST_USER;
+ }
+
+ err = __event_synthesize_build_ids(&pos->dsos__kernel,
+ kmisc, process, pos, session);
+ if (err == 0)
+ err = __event_synthesize_build_ids(&pos->dsos__user,
+ umisc, process, pos, session);
+ if (err)
+ break;
+ }
if (err < 0) {
pr_debug("failed to synthesize build ids\n");
return err;
}
- dsos__cache_build_ids();
+ dsos__cache_build_ids(&session->header);
return 0;
}
int event__process_build_id(event_t *self,
- struct perf_session *session __unused)
+ struct perf_session *session)
{
- struct list_head *head = &dsos__user;
- struct dso *dso;
-
- if (self->build_id.header.misc & PERF_RECORD_MISC_KERNEL)
- head = &dsos__kernel;
-
- dso = __dsos__findnew(head, self->build_id.filename);
- if (dso != NULL) {
- dso__set_build_id(dso, &self->build_id.build_id);
- if (head == &dsos__kernel && self->build_id.filename[0] == '[')
- dso->kernel = 1;
- }
-
+ __event_process_build_id(&self->build_id,
+ self->build_id.filename,
+ session);
return 0;
}
diff -Nraup linux-2.6_tip0417/tools/perf/util/header.h linux-2.6_tip0417_perfkvm/tools/perf/util/header.h
--- linux-2.6_tip0417/tools/perf/util/header.h 2010-04-19 09:52:40.497193513 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/header.h 2010-04-19 10:27:04.070333535 +0800
@@ -120,6 +120,7 @@ int event__process_tracing_data(event_t
int event__synthesize_build_id(struct dso *pos, u16 misc,
event__handler_t process,
+ struct kernel_info *kerninfo,
struct perf_session *session);
int event__synthesize_build_ids(event__handler_t process,
struct perf_session *session);
diff -Nraup linux-2.6_tip0417/tools/perf/util/hist.c linux-2.6_tip0417_perfkvm/tools/perf/util/hist.c
--- linux-2.6_tip0417/tools/perf/util/hist.c 2010-04-19 09:52:40.498255781 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/hist.c 2010-04-19 09:54:07.974012967 +0800
diff -Nraup linux-2.6_tip0417/tools/perf/util/hist.h linux-2.6_tip0417_perfkvm/tools/perf/util/hist.h
--- linux-2.6_tip0417/tools/perf/util/hist.h 2010-04-19 09:52:40.484204361 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/hist.h 2010-04-19 09:54:07.975012737 +0800
@@ -12,6 +12,9 @@ struct addr_location;
struct symbol;
struct rb_root;
+void __perf_session__add_count(struct hist_entry *he,
+ struct addr_location *al,
+ u64 count);
struct hist_entry *__perf_session__add_hist_entry(struct rb_root *hists,
struct addr_location *al,
struct symbol *parent,
diff -Nraup linux-2.6_tip0417/tools/perf/util/map.c linux-2.6_tip0417_perfkvm/tools/perf/util/map.c
--- linux-2.6_tip0417/tools/perf/util/map.c 2010-04-19 09:52:40.327249455 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/map.c 2010-04-19 14:07:37.164999850 +0800
@@ -508,3 +512,135 @@ struct map *maps__find(struct rb_root *m
return NULL;
}
+
+struct kernel_info *add_new_kernel_info(struct rb_root *kerninfo_root,
+ pid_t pid, const char *root_dir)
+{
+ struct rb_node **p = &kerninfo_root->rb_node;
+ struct rb_node *parent = NULL;
+ struct kernel_info *kerninfo, *pos;
+
+ kerninfo = malloc(sizeof(struct kernel_info));
+ if (!kerninfo)
+ return NULL;
+
+ return NULL;
+}
+
diff -Nraup linux-2.6_tip0417/tools/perf/util/map.h linux-2.6_tip0417_perfkvm/tools/perf/util/map.h
--- linux-2.6_tip0417/tools/perf/util/map.h 2010-04-19 09:52:40.495232926 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/map.h 2010-04-19 14:07:46.220999851 +0800
@@ -106,9 +124,40 @@ int map_groups__clone(struct map_groups
size_t map_groups__fprintf(struct map_groups *self, int verbose, FILE *fp);
size_t map_groups__fprintf_maps(struct map_groups *self, int verbose, FILE *fp);
+struct kernel_info *add_new_kernel_info(struct rb_root *kerninfo_root,
+ pid_t pid, const char *root_dir);
+struct kernel_info *kerninfo__find(struct rb_root *kerninfo_root, pid_t pid);
+struct kernel_info *kerninfo__findnew(struct rb_root *kerninfo_root, pid_t pid);
+struct kernel_info *kerninfo__findhost(struct rb_root *kerninfo_root);
+char *kern_mmap_name(struct kernel_info *kerninfo, char *buff);
@@ -148,13 +197,11 @@ int map_groups__fixup_overlappings(struc
struct map *map_groups__find_by_name(struct map_groups *self,
enum map_type type, const char *name);
-int __map_groups__create_kernel_maps(struct map_groups *self,
- struct map *vmlinux_maps[MAP__NR_TYPES],
- struct dso *kernel);
-int map_groups__create_kernel_maps(struct map_groups *self,
- struct map *vmlinux_maps[MAP__NR_TYPES]);
-struct map *map_groups__new_module(struct map_groups *self, u64 start,
- const char *filename);
+struct map *map_groups__new_module(struct map_groups *self,
+ u64 start,
+ const char *filename,
+ struct kernel_info *kerninfo);
+
void map_groups__flush(struct map_groups *self);
#endif /* __PERF_MAP_H */
diff -Nraup linux-2.6_tip0417/tools/perf/util/probe-event.c linux-2.6_tip0417_perfkvm/tools/perf/util/probe-event.c
--- linux-2.6_tip0417/tools/perf/util/probe-event.c 2010-04-19 09:52:40.303196350 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/probe-event.c 2010-04-19 09:56:57.867528798 +0800
@@ -78,6 +78,7 @@ static struct map *kmaps[MAP__NR_TYPES];
/* Initialize symbol maps and path of vmlinux */
static int init_vmlinux(void)
{
+ struct dso *kernel;
int ret;
symbol_conf.sort_by_name = true;
@@ -91,8 +92,12 @@ static int init_vmlinux(void)
goto out;
}
+ kernel = dso__new_kernel(symbol_conf.vmlinux_name);
+ if (kernel == NULL)
+ die("Failed to create kernel dso.");
+
map_groups__init(&kmap_groups);
- ret = map_groups__create_kernel_maps(&kmap_groups, kmaps);
+ ret = __map_groups__create_kernel_maps(&kmap_groups, kmaps, kernel);
if (ret < 0)
pr_debug("Failed to create kernel maps.\n");
diff -Nraup linux-2.6_tip0417/tools/perf/util/session.c linux-2.6_tip0417_perfkvm/tools/perf/util/session.c
--- linux-2.6_tip0417/tools/perf/util/session.c 2010-04-19 09:52:40.522161194 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/session.c 2010-04-19 09:54:07.976012541 +0800
@@ -67,6 +67,17 @@ void perf_session__update_sample_type(st
self->sample_type = perf_header__sample_type(&self->header);
}
+int perf_session__create_kernel_maps(struct perf_session *self)
+{
+ int ret;
+ struct rb_root *root = &self->kerninfo_root;
+
+ ret = map_groups__create_kernel_maps(root, HOST_KERNEL_ID);
+ if (ret >= 0)
+ ret = map_groups__create_guest_kernel_maps(root);
+ return ret;
+}
+
struct perf_session *perf_session__new(const char *filename, int mode, bool force)
{
size_t len = filename ? strlen(filename) + 1 : 0;
@@ -86,7 +97,7 @@ struct perf_session *perf_session__new(c
self->cwd = NULL;
self->cwdlen = 0;
self->unknown_events = 0;
- map_groups__init(&self->kmaps);
+ self->kerninfo_root = RB_ROOT;
if (mode == O_RDONLY) {
if (perf_session__open(self, force) < 0)
@@ -157,8 +168,9 @@ struct map_symbol *perf_session__resolve
continue;
}
+ al.filtered = false;
thread__find_addr_location(thread, self, cpumode,
- MAP__FUNCTION, ip, &al, NULL);
+ MAP__FUNCTION, thread->pid, ip, &al, NULL);
if (al.sym != NULL) {
if (sort__has_parent && !*parent &&
symbol__match_parent_regex(al.sym))
@@ -399,46 +411,6 @@ void perf_event_header__bswap(struct per
@@ -690,26 +662,33 @@ bool perf_session__has_traces(struct per
diff -Nraup linux-2.6_tip0417/tools/perf/util/session.h linux-2.6_tip0417_perfkvm/tools/perf/util/session.h
--- linux-2.6_tip0417/tools/perf/util/session.h 2010-04-19 09:52:40.296197738 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/session.h 2010-04-19 09:54:07.976012541 +0800
@@ -15,17 +15,15 @@ struct perf_session {
struct perf_header header;
unsigned long size;
unsigned long mmap_window;
- struct map_groups kmaps;
struct rb_root threads;
struct thread *last_match;
- struct map *vmlinux_maps[MAP__NR_TYPES];
+ struct rb_root kerninfo_root;
struct events_stats events_stats;
struct rb_root stats_by_id;
unsigned long event_total[PERF_RECORD_MAX];
unsigned long unknown_events;
struct rb_root hists;
u64 sample_type;
- struct ref_reloc_sym ref_reloc_sym;
int fd;
bool fd_pipe;
int cwdlen;
@@ -69,33 +67,13 @@ struct map_symbol *perf_session__resolve
int do_read(int fd, void *buf, size_t size);
void perf_session__update_sample_type(struct perf_session *self);
diff -Nraup linux-2.6_tip0417/tools/perf/util/sort.h linux-2.6_tip0417_perfkvm/tools/perf/util/sort.h
--- linux-2.6_tip0417/tools/perf/util/sort.h 2010-04-19 09:52:40.300228890 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/sort.h 2010-04-19 09:54:07.976012541 +0800
@@ -44,6 +44,11 @@ extern enum sort_type sort__first_dimens
struct hist_entry {
struct rb_node rb_node;
u64 count;
+ u64 count_sys;
+ u64 count_us;
+ u64 count_guest_sys;
+ u64 count_guest_us;
+
/*
* XXX WARNING!
* thread _has_ to come after ms, see
diff -Nraup linux-2.6_tip0417/tools/perf/util/symbol.c linux-2.6_tip0417_perfkvm/tools/perf/util/symbol.c
--- linux-2.6_tip0417/tools/perf/util/symbol.c 2010-04-19 09:52:40.301197165 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/symbol.c 2010-04-19 14:08:50.332999850 +0800
+ return NULL;
+
+ kern_mmap_name(kerninfo, path);
+ kern_mmap_name(kerninfo, buff);
diff -Nraup linux-2.6_tip0417/tools/perf/util/symbol.h linux-2.6_tip0417_perfkvm/tools/perf/util/symbol.h
--- linux-2.6_tip0417/tools/perf/util/symbol.h 2010-04-19 09:52:40.498255781 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/symbol.h 2010-04-19 09:54:07.977012528 +0800
diff -Nraup linux-2.6_tip0417/tools/perf/util/thread.h linux-2.6_tip0417_perfkvm/tools/perf/util/thread.h
--- linux-2.6_tip0417/tools/perf/util/thread.h 2010-04-19 09:52:40.294227060 +0800
+++ linux-2.6_tip0417_perfkvm/tools/perf/util/thread.h 2010-04-19 09:54:07.977012528 +0800
@@ -33,12 +33,12 @@ static inline struct map *thread__find_m
void thread__find_addr_map(struct thread *self,
struct perf_session *session, u8 cpumode,
- enum map_type type, u64 addr,
+ enum map_type type, pid_t pid, u64 addr,
struct addr_location *al);
void thread__find_addr_location(struct thread *self,
struct perf_session *session, u8 cpumode,
- enum map_type type, u64 addr,
+ enum map_type type, pid_t pid, u64 addr,
struct addr_location *al,
symbol_filter_t filter);
#endif /* __PERF_THREAD_H */
> Here is the new patch of V5 against tip/master of April 17th if anyone wants
> to try it.
Ok, this looks pretty good from the perf angle - so once Avi likes patches #1
and #2 and creates a pullable branch we can apply #3 as well to tip:perf/core
and put it on the potential-2.6.35-merge road.
Thanks,
Ingo
Re-reading again (esp. the part about treatment of indirect NMI
vmexits), I think this was wrong, and that the code is correct. I am
now thoroughly confused.
--
error compiling committee.c: too many arguments to function
--
This doesn't apply against upstream. What branch was this generated
against?
--
error compiling committee.c: too many arguments to function
--
> What branch was this generated
> against?
>
It's against the latest tip/master. I checked out to 19b26586090 as the latest
tip/master has some updates on perf.
Yes, sorry for being unclear.
>> What branch was this generated
>> against?
>>
>>
> It's against the latest tip/master. I checked out to 19b26586090 as the latest
> tip/master has some updates on perf.
>
I don't want to merge tip/master... does tip/perf/core contain the
needed updates?
--
error compiling committee.c: too many arguments to function
--
Note, given that there won't be changes to NMI handling in kvm, we can
go the simpler route of merging all the patches in tip/perf/core.
Thanks. I applied all three patches to the 'perf' branch in kvm.git and
merged it to master.
Ingo, please pull
git://git.kernel.org/pub/scm/virt/kvm/kvm.git perf
into tip's perf/core to receive those three patches.
--
error compiling committee.c: too many arguments to function
--
> + unsigned long ip;
> + if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
missing newline.
> + int misc = 0;
> + if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
ditto.
> + PERF_RECORD_MISC_GUEST_KERNEL;
> + } else
> + misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
> + PERF_RECORD_MISC_KERNEL;
- unbalanced curly braces
- missing curly brace for multi-line statement.
- unnecessary line-break due to col80 warning from checkpatch
> +extern struct perf_guest_info_callbacks *perf_guest_cbs;
> +extern int perf_register_guest_info_callbacks(
> + struct perf_guest_info_callbacks *);
> +extern int perf_unregister_guest_info_callbacks(
> + struct perf_guest_info_callbacks *);
- unnecessary line-break due to col80 warning from checkpatch
> +static inline int perf_register_guest_info_callbacks
> +(struct perf_guest_info_callbacks *) {return 0; }
> +static inline int perf_unregister_guest_info_callbacks
> +(struct perf_guest_info_callbacks *) {return 0; }
- invalid C: function parameter needs name even if unused
- missing space after opening curly brace
Please provide delta fixes.
Ingo
Here is the fix on the top of the prior 3 patches of V5.
From: Zhang, Yanmin <yanmin...@linux.intel.com>
Fix some programming style issues on the top of perf kvm
enhancement V5.
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
---
diff -Nraup linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c linux-2.6_tip0417_perfkvmstyle/arch/x86/kernel/cpu/perf_event.c
--- linux-2.6_tip0417_perfkvm/arch/x86/kernel/cpu/perf_event.c 2010-04-19 09:53:59.689452915 +0800
+++ linux-2.6_tip0417_perfkvmstyle/arch/x86/kernel/cpu/perf_event.c 2010-04-20 10:48:18.500999849 +0800
@@ -1752,23 +1752,29 @@ void perf_arch_fetch_caller_regs(struct
unsigned long perf_instruction_pointer(struct pt_regs *regs)
{
unsigned long ip;
+
if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
ip = perf_guest_cbs->get_guest_ip();
else
ip = instruction_pointer(regs);
+
return ip;
}
unsigned long perf_misc_flags(struct pt_regs *regs)
{
int misc = 0;
+
if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
- misc |= perf_guest_cbs->is_user_mode() ?
- PERF_RECORD_MISC_GUEST_USER :
- PERF_RECORD_MISC_GUEST_KERNEL;
- } else
- misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
- PERF_RECORD_MISC_KERNEL;
+ if (perf_guest_cbs->is_user_mode())
+ misc |= PERF_RECORD_MISC_GUEST_USER;
+ else
+ misc |= PERF_RECORD_MISC_GUEST_KERNEL;
+ } else if (user_mode(regs))
+ misc |= PERF_RECORD_MISC_USER;
+ else
+ misc |= PERF_RECORD_MISC_KERNEL;
+
if (regs->flags & PERF_EFLAGS_EXACT)
misc |= PERF_RECORD_MISC_EXACT;
diff -Nraup linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c linux-2.6_tip0417_perfkvmstyle/arch/x86/kvm/x86.c
--- linux-2.6_tip0417_perfkvm/arch/x86/kvm/x86.c 2010-04-19 09:53:59.691378953 +0800
+++ linux-2.6_tip0417_perfkvmstyle/arch/x86/kvm/x86.c 2010-04-20 10:11:40.507545564 +0800
@@ -3776,16 +3776,20 @@ static int kvm_is_in_guest(void)
static int kvm_is_user_mode(void)
{
int user_mode = 3;
+
if (percpu_read(current_vcpu))
user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu));
+
return user_mode != 0;
}
static unsigned long kvm_get_guest_ip(void)
{
unsigned long ip = 0;
+
if (percpu_read(current_vcpu))
ip = kvm_rip_read(percpu_read(current_vcpu));
+
return ip;
}
diff -Nraup linux-2.6_tip0417_perfkvm/include/linux/perf_event.h linux-2.6_tip0417_perfkvmstyle/include/linux/perf_event.h
--- linux-2.6_tip0417_perfkvm/include/linux/perf_event.h 2010-04-19 09:53:59.691378953 +0800
+++ linux-2.6_tip0417_perfkvmstyle/include/linux/perf_event.h 2010-04-20 10:08:03.531551890 +0800
@@ -941,10 +941,8 @@ static inline void perf_event_mmap(struc
}
extern struct perf_guest_info_callbacks *perf_guest_cbs;
-extern int perf_register_guest_info_callbacks(
- struct perf_guest_info_callbacks *);
-extern int perf_unregister_guest_info_callbacks(
- struct perf_guest_info_callbacks *);
+extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
+extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
extern void perf_event_comm(struct task_struct *tsk);
extern void perf_event_fork(struct task_struct *tsk);
@@ -1016,9 +1014,9 @@ static inline void
perf_bp_event(struct perf_event *event, void *data) { }
static inline int perf_register_guest_info_callbacks
-(struct perf_guest_info_callbacks *) {return 0; }
+(struct perf_guest_info_callbacks *callbacks) { return 0; }
static inline int perf_unregister_guest_info_callbacks
-(struct perf_guest_info_callbacks *) {return 0; }
+(struct perf_guest_info_callbacks *callbacks) { return 0; }
static inline void perf_event_mmap(struct vm_area_struct *vma) { }
static inline void perf_event_comm(struct task_struct *tsk) { }
To my understanding now, "If an event causes a VM exit directly, it does not
update architectural state as it would have if it had it not caused the VM
exit:", means: in NMI case, NMI would involve the NMI handler, and change the
"architectural state" to NMI block. In VMX non-root mode, the behavior of
calling NMI handler changed(determine by some VMCS fields), but not the
affection to the "architectural state". So the NMI block state would remain
the same.
--
regards
Yang, Sheng
>
> * Zhang, Yanmin <yanmin...@linux.intel.com> wrote:
>
> > unsigned long perf_misc_flags(struct pt_regs *regs)
> > {
> > int misc = 0;
> > +
> > if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> > + if (perf_guest_cbs->is_user_mode())
> > + misc |= PERF_RECORD_MISC_GUEST_USER;
> > + else
> > + misc |= PERF_RECORD_MISC_GUEST_KERNEL;
> > + } else if (user_mode(regs))
> > + misc |= PERF_RECORD_MISC_USER;
> > + else
> > + misc |= PERF_RECORD_MISC_KERNEL;
> > +
>
> We try to use balanced curly braces. I.e.:
>
> if (x) {
> boo();
> } else {
> if (y)
> foo();
> else
> bar();
> }
>
> And avoid unbalanced ones:
>
> if (x) {
> boo();
> } else
> if (y)
> foo();
> else
> bar();
Note, i fixed this in the patch and applied it to perf/core. (the invalid-C
problem was causing build failures)
Thanks,
Ingo
> unsigned long perf_misc_flags(struct pt_regs *regs)
> {
> int misc = 0;
> +
> if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> + if (perf_guest_cbs->is_user_mode())
> + misc |= PERF_RECORD_MISC_GUEST_USER;
> + else
> + misc |= PERF_RECORD_MISC_GUEST_KERNEL;
> + } else if (user_mode(regs))
> + misc |= PERF_RECORD_MISC_USER;
> + else
> + misc |= PERF_RECORD_MISC_KERNEL;
> +
We try to use balanced curly braces. I.e.:
if (x) {
boo();
} else {
if (y)
foo();
else
bar();
}
And avoid unbalanced ones:
if (x) {
boo();
} else
if (y)
foo();
else
bar();
Ingo
Yanmin
Not at all, it's really confusingly worded.
> To my understanding now, "If an event causes a VM exit directly, it does not
> update architectural state as it would have if it had it not caused the VM
> exit:", means: in NMI case, NMI would involve the NMI handler, and change the
> "architectural state" to NMI block. In VMX non-root mode, the behavior of
> calling NMI handler changed(determine by some VMCS fields), but not the
> affection to the "architectural state". So the NMI block state would remain
> the same.
>
Agree. It's confusing because the internal "nmi pending" flag is not
set, while the "nmi blocking" flag is set.
(on svm both are set, but the NMI is not taken until the vmexit
completes and the host unmasks NMIs).
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
--
perf & kvm: Clean up some of the guest profiling callback API details
Fix some build bug and programming style issues:
- use valid C
- fix up various style details
Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
Cc: Avi Kivity <a...@redhat.com>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Sheng Yang <sh...@linux.intel.com>
Cc: Marcelo Tosatti <mtos...@redhat.com>
Cc: oerg Roedel <jo...@8bytes.org>
Cc: Jes Sorensen <Jes.So...@redhat.com>
Cc: Gleb Natapov <gl...@redhat.com>
Cc: Zachary Amsden <zam...@redhat.com>
Cc: zhiten...@intel.com
Cc: tim.c...@intel.com
Cc: Arnaldo Carvalho de Melo <ac...@infradead.org>
LKML-Reference: <1271729638.2...@ymzhang.sh.intel.com>
Signed-off-by: Ingo Molnar <mi...@elte.hu>
---
arch/x86/kernel/cpu/perf_event.c | 20 ++++++++++++++------
arch/x86/kvm/x86.c | 4 ++++
include/linux/perf_event.h | 10 ++++------
3 files changed, 22 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 2ea78ab..7de7061 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1752,23 +1752,31 @@ void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned long ip, int ski
unsigned long perf_instruction_pointer(struct pt_regs *regs)
{
unsigned long ip;
+
if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
ip = perf_guest_cbs->get_guest_ip();
else
ip = instruction_pointer(regs);
+
return ip;
}
unsigned long perf_misc_flags(struct pt_regs *regs)
{
int misc = 0;
+
if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
- misc |= perf_guest_cbs->is_user_mode() ?
- PERF_RECORD_MISC_GUEST_USER :
- PERF_RECORD_MISC_GUEST_KERNEL;
- } else
- misc |= user_mode(regs) ? PERF_RECORD_MISC_USER :
- PERF_RECORD_MISC_KERNEL;
+ if (perf_guest_cbs->is_user_mode())
+ misc |= PERF_RECORD_MISC_GUEST_USER;
+ else
+ misc |= PERF_RECORD_MISC_GUEST_KERNEL;
+ } else {
+ if (user_mode(regs))
+ misc |= PERF_RECORD_MISC_USER;
+ else
+ misc |= PERF_RECORD_MISC_KERNEL;
+ }
+
if (regs->flags & PERF_EFLAGS_EXACT)
misc |= PERF_RECORD_MISC_EXACT;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c3a33b2..21b9b6a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3776,16 +3776,20 @@ static int kvm_is_in_guest(void)
static int kvm_is_user_mode(void)
{
int user_mode = 3;
+
if (percpu_read(current_vcpu))
user_mode = kvm_x86_ops->get_cpl(percpu_read(current_vcpu));
+
return user_mode != 0;
}
static unsigned long kvm_get_guest_ip(void)
{
unsigned long ip = 0;
+
if (percpu_read(current_vcpu))
ip = kvm_rip_read(percpu_read(current_vcpu));
+
return ip;
}
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 24de5f1..ace31fb 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -941,10 +941,8 @@ static inline void perf_event_mmap(struct vm_area_struct *vma)
> -----Original Message-----
> From: kvm-...@vger.kernel.org
> [mailto:kvm-...@vger.kernel.org] On Behalf Of Zhang, Yanmin
> Sent: Monday, April 19, 2010 1:33 PM
> To: Avi Kivity
> Cc: Ingo Molnar; Peter Zijlstra; Avi Kivity; Sheng Yang;
> linux-...@vger.kernel.org; k...@vger.kernel.org; Marcelo
> Tosatti; oerg Roedel; Jes Sorensen; Gleb Natapov; Zachary
> Amsden; zhiten...@intel.com; tim.c...@intel.com;
> Arnaldo Carvalho de Melo
> Subject: [PATCH V5 1/3] perf & kvm: Enhance perf to collect
> KVM guest os statistics from host side
>
> Below patch introduces perf_guest_info_callbacks and related
> register/unregister
> functions. Add more PERF_RECORD_MISC_XXX bits meaning guest
> kernel and guest user
> space.
>
> Signed-off-by: Zhang Yanmin <yanmin...@linux.intel.com>
>
> ---
> diff -Nraup --exclude-from=exclude.diff
> linux-2.6_tip0417/include/linux/perf_event.h
> linux-2.6_tip0417_perfkvm/include/linux/perf_event.h
> --- linux-2.6_tip0417/include/linux/perf_event.h
> 2010-04-19 09:51:59.544791000 +0800
> +++ linux-2.6_tip0417_perfkvm/include/linux/perf_event.h
> 2010-04-19 09:53:59.691378953 +0800
> @@ -932,6 +940,12 @@ static inline void perf_event_mmap(struc
> __perf_event_mmap(vma);
> }
>
> +extern struct perf_guest_info_callbacks *perf_guest_cbs;
> +extern int perf_register_guest_info_callbacks(
> + struct perf_guest_info_callbacks *);
> +extern int perf_unregister_guest_info_callbacks(
> + struct perf_guest_info_callbacks *);
> +
> extern void perf_event_comm(struct task_struct *tsk);
> extern void perf_event_fork(struct task_struct *tsk);
>
> @@ -1001,6 +1015,11 @@ perf_sw_event(u32 event_id, u64 nr, int
> static inline void
> perf_bp_event(struct perf_event *event, void *data)
> { }
>
> +static inline int perf_register_guest_info_callbacks
> +(struct perf_guest_info_callbacks *) {return 0; }
> +static inline int perf_unregister_guest_info_callbacks
> +(struct perf_guest_info_callbacks *) {return 0; }
> +
> static inline void perf_event_mmap(struct vm_area_struct
> *vma) { }
> static inline void perf_event_comm(struct task_struct *tsk)
> { }
> static inline void perf_event_fork(struct task_struct *tsk)
> { }
Hi,
I met this error when built kernel. Anything wrong?
CC init/main.o
In file included from include/linux/ftrace_event.h:8,
from include/trace/syscall.h:6,
from include/linux/syscalls.h:75,
from init/main.c:16:
include/linux/perf_event.h: In function 'perf_register_guest_info_callbacks':
include/linux/perf_event.h:1019: error: parameter name omitted
include/linux/perf_event.h: In function 'perf_unregister_guest_info_callbacks':
include/linux/perf_event.h:1021: error: parameter name omitted
make[1]: *** [init/main.o] Error 1
make: *** [init] Error 2
I merged tip/perf/code which may fix this. Find it in kvm.git next branch.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
--
Yanmin
Peace Be With You,
Uwaysi Bin Kareem.
--- Kconfig.hzorig 2010-04-27 13:33:10.302162524 +0200
+++ Kconfig.hz 2010-04-27 20:39:54.736959816 +0200
@@ -45,6 +45,18 @@
1000 Hz is the preferred choice for desktop systems and other
systems requiring fast interactive responses to events.
+ config HZ_3956
+ bool "3956 HZ"
+ help
+ 3956 Hz is nearly the highest timer interrupt rate supported in the
kernel.
+ Graphics workstations, and OpenGL applications may benefit from this,
+ since it gives the lowest framerate-jitter. The exact value 3956 is
+ psychovisually-optimized, meaning that it aims for a level of jitter,
+ percieved to be natural, and therefore non-nosiy. It is tuned for a
+ profile of "where the human senses register the most information".
+
+
+
endchoice
config HZ
@@ -53,6 +65,7 @@
default 250 if HZ_250
default 300 if HZ_300
default 1000 if HZ_1000
+ default 3956 if HZ_3956
config SCHED_HRTICK
def_bool HIGH_RES_TIMERS && (!SMP || USE_GENERIC_SMP_HELPERS)
> This is based on the research I did with optimizing my machine for
> graphics.
> I also wrote the following article:
> http://www.paradoxuncreated.com/articles/Millennium/Millennium.html
> It is a bit outdated now, but I will update it with current information.
> The value might iterate.
Hi,
What CPU architectures or platforms did you test this on?
Were any other kernel changes needed?
> Peace Be With You,
> Uwaysi Bin Kareem.
>
>
> --- Kconfig.hzorig 2010-04-27 13:33:10.302162524 +0200
> +++ Kconfig.hz 2010-04-27 20:39:54.736959816 +0200
> @@ -45,6 +45,18 @@
> 1000 Hz is the preferred choice for desktop systems and other
> systems requiring fast interactive responses to events.
>
> + config HZ_3956
> + bool "3956 HZ"
> + help
> + 3956 Hz is nearly the highest timer interrupt rate supported in the
> kernel.
> + Graphics workstations, and OpenGL applications may benefit from this,
drop first comma.
> + since it gives the lowest framerate-jitter. The exact value 3956 is
> + psychovisually-optimized, meaning that it aims for a level of jitter,
> + percieved to be natural, and therefore non-nosiy. It is tuned for a
perceived non-noisy.
> + profile of "where the human senses register the most information".
> +
> +
> +
> endchoice
>
> config HZ
> @@ -53,6 +65,7 @@
> default 250 if HZ_250
> default 300 if HZ_300
> default 1000 if HZ_1000
> + default 3956 if HZ_3956
>
> config SCHED_HRTICK
> def_bool HIGH_RES_TIMERS && (!SMP || USE_GENERIC_SMP_HELPERS)
>
> --
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
> http://www.paradoxuncreated.com/articles/Millennium/Millennium.html
> + config HZ_3956
> + bool "3956 HZ"
> + help
> + 3956 Hz is nearly the highest timer interrupt rate supported in the kernel.
> + Graphics workstations, and OpenGL applications may benefit from this,
> + since it gives the lowest framerate-jitter. The exact value 3956 is
> + psychovisually-optimized, meaning that it aims for a level of jitter,
Even after reading your link, it's unclear why 3956 and not 4000. All your link
said was "A granularity below 0.5 milliseconds, seems to suit the human
senses." - anything over 2000 meets that requirement. Also, if your screen
refresh is sitting at 72hz or a bit under 14ms per refresh, any jitter under
that won't really matter much - it doesn't matter if your next frame is
ready 5ms early or 5.5ms early, you *still* have to wait for the next vertical
blanking interval or suffer tearing.
There's also the case of programs where HZ=300 would *make* the time budget,
but the added 3,356 timer interrupts and associated overhead would cause a
missed screen refresh.
I think you need more technical justification of why 3956 is better than 1000.