[PATCH 0/9] oprofile, perf, x86: introduce new functions to reserve perfctrs

Robert Richter

unread,

Mar 4, 2010, 11:50:02 AM3/4/10

to

This patch set improves the perfctr reservation code. New functions
are available to reserve a counter by its index only. It is no longer
necessary to allocate both msrs of a counter which also improves the
code and makes it easier.

For oprofile a handler is implemented that returns an error now if a
counter is already reserved by a different subsystem such as perf or
watchdog. Before, oprofile silently ignored that counter. Finally the
new reservation functions can be used to allocate special parts of the
pmu such as IBS, which is necessary to use IBS with perf too.

The patches are available in the oprofile tree:

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core

If there are no objections, I suggest to merge it into the
tip/perf/core too, maybe after pending patches went in. If there are
already conflicts, I will do the merge for this.

See enclose changelog.

-Robert

The following changes since commit bb1165d6882f423f90fc7007a88c6c993b7c2ac4:
Robert Richter (1):
perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core

Robert Richter (9):
perf, x86: reduce number of CONFIG_X86_LOCAL_APIC macros
oprofile, perf, x86: do not allocate evntsel counter msr
oprofile, perf, x86: introduce new functions to reserve perfctrs by index
tsc, x86: use new perfctr reservation functions in tsc code
perf, x86: use new perfctr reservation functions in perf code
oprofile/x86: rework error handler in nmi_setup()
oprofile/x86: return -EBUSY if counters are already reserved
oprofile/x86: group IBS code
oprofile/x86: implement perfctr reservation for IBS

arch/x86/include/asm/nmi.h | 2 +
arch/x86/include/asm/perf_event.h | 3 +
arch/x86/kernel/cpu/perf_event.c | 36 ++---
arch/x86/kernel/cpu/perfctr-watchdog.c | 35 +++--
arch/x86/kernel/tsc.c | 8 +-
arch/x86/oprofile/nmi_int.c | 38 +++--
arch/x86/oprofile/op_model_amd.c | 277 ++++++++++++++++++--------------
arch/x86/oprofile/op_model_p4.c | 15 ++-
arch/x86/oprofile/op_model_ppro.c | 23 ++--
arch/x86/oprofile/op_x86_model.h | 2 +-
10 files changed, 248 insertions(+), 191 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Robert Richter

unread,

Mar 4, 2010, 11:50:01 AM3/4/10

to

For sharing IBS with oprofile and perf a resource allocation is
needed. This patch implements IBS allocation by extending the perfctr
reservation code.

Signed-off-by: Robert Richter <robert....@amd.com>
---
arch/x86/include/asm/perf_event.h | 3 +
arch/x86/oprofile/op_model_amd.c | 104 +++++++++++++++++++++++++------------
2 files changed, 74 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 80e6936..2a95e01 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -117,6 +117,9 @@ union cpuid10_edx {
*/
#define X86_PMC_IDX_FIXED_BTS (X86_PMC_IDX_FIXED + 16)

+#define X86_PMC_IDX_SPECIAL_IBS_FETCH (X86_PMC_IDX_FIXED + 17)
+#define X86_PMC_IDX_SPECIAL_IBS_OP (X86_PMC_IDX_FIXED + 18)
+
/* IbsFetchCtl bits/masks */
#define IBS_FETCH_RAND_EN (1ULL<<57)
#define IBS_FETCH_VAL (1ULL<<49)
diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 9c0d978..9bd040d 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -61,6 +61,7 @@ struct op_ibs_config {
};

static struct op_ibs_config ibs_config;
+static u64 ibs_fetch_ctl;
static u64 ibs_op_ctl;

/*
@@ -105,6 +106,63 @@ static u32 get_ibs_caps(void)
return ibs_caps;
}

+static void shutdown_ibs(void)
+{
+ if (ibs_fetch_ctl) {
+ ibs_fetch_ctl = 0;
+ release_perfctr(X86_PMC_IDX_SPECIAL_IBS_FETCH);
+ }
+
+ if (ibs_op_ctl) {
+ ibs_op_ctl = 0;
+ release_perfctr(X86_PMC_IDX_SPECIAL_IBS_OP);
+ }
+}
+
+static int setup_ibs(void)
+{
+ ibs_fetch_ctl = 0;
+ if (ibs_caps && ibs_config.fetch_enabled) {
+ if (!reserve_perfctr(X86_PMC_IDX_SPECIAL_IBS_FETCH))
+ goto fail;
+ ibs_fetch_ctl =
+ ((ibs_config.max_cnt_fetch >> 4) & IBS_FETCH_MAX_CNT)
+ | (ibs_config.rand_en ? IBS_FETCH_RAND_EN : 0)
+ | IBS_FETCH_ENABLE;
+ }
+
+ ibs_op_ctl = 0;
+ if (ibs_caps && ibs_config.op_enabled) {
+ if (!reserve_perfctr(X86_PMC_IDX_SPECIAL_IBS_OP))
+ goto fail;
+ ibs_op_ctl = ibs_config.max_cnt_op >> 4;
+ if (!(ibs_caps & IBS_CAPS_RDWROPCNT)) {
+ /*
+ * IbsOpCurCnt not supported. See
+ * op_amd_randomize_ibs_op() for details.
+ */
+ ibs_op_ctl = clamp(ibs_op_ctl, 0x0081ULL, 0xFF80ULL);
+ } else {
+ /*
+ * The start value is randomized with a
+ * positive offset, we need to compensate it
+ * with the half of the randomized range. Also
+ * avoid underflows.
+ */
+ ibs_op_ctl = min(ibs_op_ctl + IBS_RANDOM_MAXCNT_OFFSET,
+ IBS_OP_MAX_CNT);
+ }
+ if (ibs_caps & IBS_CAPS_OPCNT && ibs_config.dispatched_ops)
+ ibs_op_ctl |= IBS_OP_CNT_CTL;
+ ibs_op_ctl |= IBS_OP_ENABLE;
+ }
+
+ return 0;
+fail:
+ shutdown_ibs();
+ return -EBUSY;
+}
+
/*
* 16-bit Linear Feedback Shift Register (LFSR)
*
@@ -170,7 +228,7 @@ op_amd_handle_ibs(struct pt_regs * const regs,
if (!ibs_caps)
return;

- if (ibs_config.fetch_enabled) {
+ if (ibs_fetch_ctl) {
rdmsrl(MSR_AMD64_IBSFETCHCTL, ctl);
if (ctl & IBS_FETCH_VAL) {
rdmsrl(MSR_AMD64_IBSFETCHLINAD, val);
@@ -183,13 +241,11 @@ op_amd_handle_ibs(struct pt_regs * const regs,
oprofile_write_commit(&entry);

/* reenable the IRQ */
- ctl &= ~(IBS_FETCH_VAL | IBS_FETCH_CNT);
- ctl |= IBS_FETCH_ENABLE;
- wrmsrl(MSR_AMD64_IBSFETCHCTL, ctl);
+ wrmsrl(MSR_AMD64_IBSFETCHCTL, ibs_fetch_ctl);
}
}

- if (ibs_config.op_enabled) {
+ if (ibs_op_ctl) {
rdmsrl(MSR_AMD64_IBSOPCTL, ctl);
if (ctl & IBS_OP_VAL) {
rdmsrl(MSR_AMD64_IBSOPRIP, val);
@@ -222,34 +278,10 @@ static inline void op_amd_start_ibs(void)
if (!ibs_caps)
return;

- if (ibs_config.fetch_enabled) {
- val = (ibs_config.max_cnt_fetch >> 4) & IBS_FETCH_MAX_CNT;
- val |= ibs_config.rand_en ? IBS_FETCH_RAND_EN : 0;
- val |= IBS_FETCH_ENABLE;
- wrmsrl(MSR_AMD64_IBSFETCHCTL, val);
- }
+ if (ibs_fetch_ctl)
+ wrmsrl(MSR_AMD64_IBSFETCHCTL, ibs_fetch_ctl);

- if (ibs_config.op_enabled) {
- ibs_op_ctl = ibs_config.max_cnt_op >> 4;
- if (!(ibs_caps & IBS_CAPS_RDWROPCNT)) {
- /*
- * IbsOpCurCnt not supported. See
- * op_amd_randomize_ibs_op() for details.
- */
- ibs_op_ctl = clamp(ibs_op_ctl, 0x0081ULL, 0xFF80ULL);
- } else {
- /*
- * The start value is randomized with a
- * positive offset, we need to compensate it
- * with the half of the randomized range. Also
- * avoid underflows.
- */
- ibs_op_ctl = min(ibs_op_ctl + IBS_RANDOM_MAXCNT_OFFSET,
- IBS_OP_MAX_CNT);
- }
- if (ibs_caps & IBS_CAPS_OPCNT && ibs_config.dispatched_ops)
- ibs_op_ctl |= IBS_OP_CNT_CTL;
- ibs_op_ctl |= IBS_OP_ENABLE;
+ if (ibs_op_ctl) {
val = op_amd_randomize_ibs_op(ibs_op_ctl);
wrmsrl(MSR_AMD64_IBSOPCTL, val);
}
@@ -297,7 +329,11 @@ static void op_amd_shutdown(struct op_msrs const * const msrs);

static int op_amd_fill_in_addresses(struct op_msrs * const msrs)
{
- int i;
+ int err, i;
+
+ err = setup_ibs();
+ if (err)
+ return err;

for (i = 0; i < NUM_COUNTERS; i++) {
if (reserve_perfctr(i)) {
@@ -438,6 +474,8 @@ static void op_amd_shutdown(struct op_msrs const * const msrs)
if (msrs->counters[i].addr)
release_perfctr(i);
}
+
+ shutdown_ibs();
}

static u8 ibs_eilvt_off;
--
1.7.0

Robert Richter

unread,

Mar 4, 2010, 11:50:01 AM3/4/10

to

This codes improves the error handler in nmi_setup(). Most parts of
the code are moved to allocate_msrs(). In case of an error
allocate_msrs() also frees already allocated memory. nmi_setup()
becomes easier and better extendable.

Signed-off-by: Robert Richter <robert....@amd.com>
---

arch/x86/oprofile/nmi_int.c | 33 +++++++++++++++++++--------------
1 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index 2c505ee..c0c21f2 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -295,6 +295,7 @@ static void free_msrs(void)
kfree(per_cpu(cpu_msrs, i).controls);
per_cpu(cpu_msrs, i).controls = NULL;
}
+ nmi_shutdown_mux();
}

static int allocate_msrs(void)
@@ -307,14 +308,21 @@ static int allocate_msrs(void)
per_cpu(cpu_msrs, i).counters = kzalloc(counters_size,
GFP_KERNEL);
if (!per_cpu(cpu_msrs, i).counters)
- return 0;
+ goto fail;
per_cpu(cpu_msrs, i).controls = kzalloc(controls_size,
GFP_KERNEL);
if (!per_cpu(cpu_msrs, i).controls)
- return 0;
+ goto fail;
}

+ if (!nmi_setup_mux())
+ goto fail;
+
return 1;
+
+fail:
+ free_msrs();
+ return 0;
}

static void nmi_cpu_setup(void *dummy)
@@ -342,17 +350,7 @@ static int nmi_setup(void)
int cpu;

if (!allocate_msrs())
- err = -ENOMEM;
- else if (!nmi_setup_mux())
- err = -ENOMEM;
- else
- err = register_die_notifier(&profile_exceptions_nb);
-
- if (err) {
- free_msrs();
- nmi_shutdown_mux();
- return err;
- }
+ return -ENOMEM;

/* We need to serialize save and setup for HT because the subset
* of msrs are distinct for save and setup operations
@@ -374,9 +372,17 @@ static int nmi_setup(void)

mux_clone(cpu);
}
+
+ err = register_die_notifier(&profile_exceptions_nb);
+ if (err)
+ goto fail;
+
on_each_cpu(nmi_cpu_setup, NULL, 1);
nmi_enabled = 1;
return 0;
+fail:
+ free_msrs();
+ return err;
}

static void nmi_cpu_restore_registers(struct op_msrs *msrs)
@@ -421,7 +427,6 @@ static void nmi_shutdown(void)
nmi_enabled = 0;
on_each_cpu(nmi_cpu_shutdown, NULL, 1);
unregister_die_notifier(&profile_exceptions_nb);
- nmi_shutdown_mux();
msrs = &get_cpu_var(cpu_msrs);
model->shutdown(msrs);
free_msrs();
--
1.7.0

Robert Richter

unread,

Mar 4, 2010, 12:00:04 PM3/4/10

to

In case a counter is already reserved by the watchdog or perf_event
subsystem, oprofile ignored this counters silently. This case is
handled now and oprofile_setup() now reports an error.

Signed-off-by: Robert Richter <robert....@amd.com>
---

arch/x86/oprofile/nmi_int.c | 5 ++++-
arch/x86/oprofile/op_model_amd.c | 10 +++++++++-
arch/x86/oprofile/op_model_p4.c | 15 ++++++++++++++-
arch/x86/oprofile/op_model_ppro.c | 10 +++++++++-
arch/x86/oprofile/op_x86_model.h | 2 +-
5 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index c0c21f2..9f001d9 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -357,7 +357,10 @@ static int nmi_setup(void)
*/

/* Assume saved/restored counters are the same on all CPUs */
- model->fill_in_addresses(&per_cpu(cpu_msrs, 0));
+ err = model->fill_in_addresses(&per_cpu(cpu_msrs, 0));

+ if (err)
+ goto fail;
+

for_each_possible_cpu(cpu) {
if (!cpu)
continue;
diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 905995d..128d84c 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -129,7 +129,9 @@ static void op_mux_switch_ctrl(struct op_x86_model_spec const *model,

/* functions for op_amd_spec */

-static void op_amd_fill_in_addresses(struct op_msrs * const msrs)
+static void op_amd_shutdown(struct op_msrs const * const msrs);
+
+static int op_amd_fill_in_addresses(struct op_msrs * const msrs)
{
int i;

@@ -137,8 +139,14 @@ static void op_amd_fill_in_addresses(struct op_msrs * const msrs)
if (reserve_perfctr(i)) {
msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
+ } else if (counter_config[i].enabled) {
+ op_x86_warn_reserved(i);
+ op_amd_shutdown(msrs);
+ return -EBUSY;
}
}
+
+ return 0;
}

static void op_amd_setup_ctrs(struct op_x86_model_spec const *model,
diff --git a/arch/x86/oprofile/op_model_p4.c b/arch/x86/oprofile/op_model_p4.c
index e6a160a..599fcae 100644
--- a/arch/x86/oprofile/op_model_p4.c
+++ b/arch/x86/oprofile/op_model_p4.c
@@ -385,8 +385,9 @@ static unsigned int get_stagger(void)

static unsigned long reset_value[NUM_COUNTERS_NON_HT];

+static void p4_shutdown(struct op_msrs const * const msrs);

-static void p4_fill_in_addresses(struct op_msrs * const msrs)
+static int p4_fill_in_addresses(struct op_msrs * const msrs)
{
unsigned int i;
unsigned int addr, cccraddr, stag;
@@ -468,6 +469,18 @@ static void p4_fill_in_addresses(struct op_msrs * const msrs)
msrs->controls[i++].addr = MSR_P4_CRU_ESCR5;
}
}
+
+ for (i = 0; i < num_counters; ++i) {
+ if (!counter_config[i].enabled)
+ continue;
+ if (msrs->controls[i].addr)
+ continue;
+ op_x86_warn_reserved(i);
+ p4_shutdown(msrs);

+ return -EBUSY;
+ }
+

+ return 0;
}

diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index a838ee4..3cf4e2f 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -30,7 +30,9 @@ static int counter_width = 32;

static u64 *reset_value;

-static void ppro_fill_in_addresses(struct op_msrs * const msrs)
+static void ppro_shutdown(struct op_msrs const * const msrs);
+
+static int ppro_fill_in_addresses(struct op_msrs * const msrs)
{
int i;

@@ -38,8 +40,14 @@ static void ppro_fill_in_addresses(struct op_msrs * const msrs)
if (reserve_perfctr(i)) {
msrs->counters[i].addr = MSR_P6_PERFCTR0 + i;
msrs->controls[i].addr = MSR_P6_EVNTSEL0 + i;
+ } else if (counter_config[i].enabled) {
+ op_x86_warn_reserved(i);
+ ppro_shutdown(msrs);
+ return -EBUSY;
}
}
+
+ return 0;
}

diff --git a/arch/x86/oprofile/op_x86_model.h b/arch/x86/oprofile/op_x86_model.h
index ff82a75..5514013 100644
--- a/arch/x86/oprofile/op_x86_model.h
+++ b/arch/x86/oprofile/op_x86_model.h
@@ -41,7 +41,7 @@ struct op_x86_model_spec {
u16 event_mask;
int (*init)(struct oprofile_operations *ops);
void (*exit)(void);
- void (*fill_in_addresses)(struct op_msrs * const msrs);
+ int (*fill_in_addresses)(struct op_msrs * const msrs);
void (*setup_ctrs)(struct op_x86_model_spec const *model,

struct op_msrs const * const msrs);

int (*check_ctrs)(struct pt_regs * const regs,
--
1.7.0

Robert Richter

unread,

Mar 4, 2010, 12:00:03 PM3/4/10

to

The function reserve_pmc_hardware() and release_pmc_hardware() were
hard to read. This patch improves readablity of the code by removing
most of the CONFIG_X86_LOCAL_APIC macros.

Signed-off-by: Robert Richter <robert....@amd.com>
---

arch/x86/kernel/cpu/perf_event.c | 15 +++++++++------
1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 6531b4b..96b3a98 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -234,9 +234,10 @@ again:
static atomic_t active_events;
static DEFINE_MUTEX(pmc_reserve_mutex);

+#ifdef CONFIG_X86_LOCAL_APIC
+
static bool reserve_pmc_hardware(void)
{
-#ifdef CONFIG_X86_LOCAL_APIC
int i;

if (nmi_watchdog == NMI_LOCAL_APIC)
@@ -251,11 +252,9 @@ static bool reserve_pmc_hardware(void)
if (!reserve_evntsel_nmi(x86_pmu.eventsel + i))
goto eventsel_fail;
}
-#endif

return true;

-#ifdef CONFIG_X86_LOCAL_APIC
eventsel_fail:
for (i--; i >= 0; i--)
release_evntsel_nmi(x86_pmu.eventsel + i);
@@ -270,12 +269,10 @@ perfctr_fail:
enable_lapic_nmi_watchdog();

return false;
-#endif
}

static void release_pmc_hardware(void)
{
-#ifdef CONFIG_X86_LOCAL_APIC
int i;

for (i = 0; i < x86_pmu.num_events; i++) {
@@ -285,9 +282,15 @@ static void release_pmc_hardware(void)

if (nmi_watchdog == NMI_LOCAL_APIC)
enable_lapic_nmi_watchdog();
-#endif
}

+#else
+
+static bool reserve_pmc_hardware(void) { return true; }
+static void release_pmc_hardware(void) {}
+
+#endif
+
static inline bool bts_available(void)
{
return x86_pmu.enable_bts != NULL;
--
1.7.0

Robert Richter

unread,

Mar 4, 2010, 12:00:02 PM3/4/10

to

Current perfctr reservation code allocates single pmu msrs. The msr
addresses may differ depending on the model and offset calculation is
necessary. This can be easier implemented by reserving a counter by
its index only.

Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Andi Kleen <an...@firstfloor.org>

Signed-off-by: Robert Richter <robert....@amd.com>
---

arch/x86/include/asm/nmi.h | 2 ++
arch/x86/kernel/cpu/perfctr-watchdog.c | 29 ++++++++++++++++++++---------
arch/x86/oprofile/op_model_amd.c | 4 ++--
arch/x86/oprofile/op_model_ppro.c | 4 ++--
4 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h
index 93da9c3..f5db47d 100644
--- a/arch/x86/include/asm/nmi.h
+++ b/arch/x86/include/asm/nmi.h
@@ -19,6 +19,8 @@ extern void die_nmi(char *str, struct pt_regs *regs, int do_panic);
extern int check_nmi_watchdog(void);
extern int nmi_watchdog_enabled;
extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
+extern int reserve_perfctr(unsigned int);
+extern void release_perfctr(unsigned int);
extern int reserve_perfctr_nmi(unsigned int);
extern void release_perfctr_nmi(unsigned int);
extern int reserve_evntsel_nmi(unsigned int);
diff --git a/arch/x86/kernel/cpu/perfctr-watchdog.c b/arch/x86/kernel/cpu/perfctr-watchdog.c
index 5c129ee..0868b8b 100644
--- a/arch/x86/kernel/cpu/perfctr-watchdog.c
+++ b/arch/x86/kernel/cpu/perfctr-watchdog.c
@@ -117,11 +117,8 @@ int avail_to_resrv_perfctr_nmi_bit(unsigned int counter)
}
EXPORT_SYMBOL(avail_to_resrv_perfctr_nmi_bit);

-int reserve_perfctr_nmi(unsigned int msr)
+int reserve_perfctr(unsigned int counter)
{
- unsigned int counter;
-
- counter = nmi_perfctr_msr_to_bit(msr);
/* register not managed by the allocator? */
if (counter > NMI_MAX_COUNTER_BITS)
return 1;
@@ -130,19 +127,33 @@ int reserve_perfctr_nmi(unsigned int msr)
return 1;
return 0;
}
-EXPORT_SYMBOL(reserve_perfctr_nmi);
+EXPORT_SYMBOL(reserve_perfctr);

-void release_perfctr_nmi(unsigned int msr)
+void release_perfctr(unsigned int counter)
{
- unsigned int counter;
-
- counter = nmi_perfctr_msr_to_bit(msr);
/* register not managed by the allocator? */
if (counter > NMI_MAX_COUNTER_BITS)
return;

clear_bit(counter, perfctr_nmi_owner);
}
+EXPORT_SYMBOL(release_perfctr);
+
+int reserve_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+ counter = nmi_perfctr_msr_to_bit(msr);
+ return reserve_perfctr(counter);
+}
+EXPORT_SYMBOL(reserve_perfctr_nmi);
+
+void release_perfctr_nmi(unsigned int msr)
+{
+ unsigned int counter;
+
+ counter = nmi_perfctr_msr_to_bit(msr);
+ release_perfctr(counter);
+}
EXPORT_SYMBOL(release_perfctr_nmi);

int reserve_evntsel_nmi(unsigned int msr)
diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 79e79a6..905995d 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -134,7 +134,7 @@ static void op_amd_fill_in_addresses(struct op_msrs * const msrs)
int i;

for (i = 0; i < NUM_COUNTERS; i++) {
- if (reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i)) {
+ if (reserve_perfctr(i)) {

msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
}

@@ -428,7 +428,7 @@ static void op_amd_shutdown(struct op_msrs const * const msrs)

for (i = 0; i < NUM_COUNTERS; ++i) {

if (msrs->counters[i].addr)

- release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+ release_perfctr(i);
}
}

diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index d6abdf6..a838ee4 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -35,7 +35,7 @@ static void ppro_fill_in_addresses(struct op_msrs * const msrs)
int i;

for (i = 0; i < num_counters; i++) {
- if (!reserve_perfctr_nmi(MSR_P6_PERFCTR0 + i)) {
+ if (reserve_perfctr(i)) {

msrs->counters[i].addr = MSR_P6_PERFCTR0 + i;
msrs->controls[i].addr = MSR_P6_EVNTSEL0 + i;
}

@@ -192,7 +192,7 @@ static void ppro_shutdown(struct op_msrs const * const msrs)

for (i = 0; i < num_counters; ++i) {

if (msrs->counters[i].addr)

- release_perfctr_nmi(MSR_P6_PERFCTR0 + i);
+ release_perfctr(i);
}
if (reset_value) {
kfree(reset_value);
--
1.7.0

Robert Richter

unread,

Mar 4, 2010, 12:00:03 PM3/4/10

to

Use reserve_perfctr()/release_perfctr() in perf code.

Signed-off-by: Robert Richter <robert....@amd.com>
---

arch/x86/kernel/cpu/perf_event.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 62a1a5e..16ae7b4 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -244,7 +244,7 @@ static bool reserve_pmc_hardware(void)
disable_lapic_nmi_watchdog();

for (i = 0; i < x86_pmu.num_events; i++) {
- if (!reserve_perfctr_nmi(x86_pmu.perfctr + i))
+ if (!reserve_perfctr(i))
goto perfctr_fail;
}

@@ -252,7 +252,7 @@ static bool reserve_pmc_hardware(void)

perfctr_fail:

for (i--; i >= 0; i--)

- release_perfctr_nmi(x86_pmu.perfctr + i);
+ release_perfctr(i);

if (nmi_watchdog == NMI_LOCAL_APIC)
enable_lapic_nmi_watchdog();
@@ -265,7 +265,7 @@ static void release_pmc_hardware(void)
int i;

for (i = 0; i < x86_pmu.num_events; i++)
- release_perfctr_nmi(x86_pmu.perfctr + i);
+ release_perfctr(i);

if (nmi_watchdog == NMI_LOCAL_APIC)
enable_lapic_nmi_watchdog();
--
1.7.0

Robert Richter

unread,

Mar 4, 2010, 12:00:02 PM3/4/10

to

Use reserve_perfctr()/release_perfctr() in tsc code.

Cc: Thomas Gleixner <tg...@linutronix.de>

Signed-off-by: Robert Richter <robert....@amd.com>
---

arch/x86/kernel/tsc.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index ff98ef2..2cdacae 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -871,7 +871,7 @@ static unsigned long __init calibrate_cpu(void)
unsigned long flags;

for (i = 0; i < 4; i++)
- if (avail_to_resrv_perfctr_nmi_bit(i))
+ if (reserve_perfctr(i))
break;
no_ctr_free = (i == 4);
if (no_ctr_free) {
@@ -881,8 +881,6 @@ static unsigned long __init calibrate_cpu(void)
rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
wrmsrl(MSR_K7_EVNTSEL3, 0);
rdmsrl(MSR_K7_PERFCTR3, pmc3);
- } else {
- reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
}
local_irq_save(flags);
/* start measuring cycles, incrementing from 0 */
@@ -900,7 +898,7 @@ static unsigned long __init calibrate_cpu(void)
wrmsrl(MSR_K7_PERFCTR3, pmc3);
wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
} else {

- release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
+ release_perfctr(i);
}

return pmc_now * tsc_khz / (tsc_now - tsc_start);
--
1.7.0

Robert Richter

unread,

Mar 4, 2010, 12:00:03 PM3/4/10

to

Moving code in preparation of the next patch. This groups all IBS code
together.

Signed-off-by: Robert Richter <robert....@amd.com>
---

arch/x86/oprofile/op_model_amd.c | 202 +++++++++++++++++++-------------------
1 files changed, 101 insertions(+), 101 deletions(-)

diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 128d84c..9c0d978 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -105,107 +105,6 @@ static u32 get_ibs_caps(void)
return ibs_caps;
}

-#ifdef CONFIG_OPROFILE_EVENT_MULTIPLEX
-
-static void op_mux_switch_ctrl(struct op_x86_model_spec const *model,
- struct op_msrs const * const msrs)
-{
- u64 val;
- int i;
-
- /* enable active counters */
- for (i = 0; i < NUM_COUNTERS; ++i) {
- int virt = op_x86_phys_to_virt(i);
- if (!reset_value[virt])
- continue;
- rdmsrl(msrs->controls[i].addr, val);
- val &= model->reserved;
- val |= op_x86_get_ctrl(model, &counter_config[virt]);
- wrmsrl(msrs->controls[i].addr, val);
- }
-}
-
-#endif
-
-/* functions for op_amd_spec */
-
-static void op_amd_shutdown(struct op_msrs const * const msrs);
-
-static int op_amd_fill_in_addresses(struct op_msrs * const msrs)
-{
- int i;
-
- for (i = 0; i < NUM_COUNTERS; i++) {
- if (reserve_perfctr(i)) {
- msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
- msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
- } else if (counter_config[i].enabled) {
- op_x86_warn_reserved(i);
- op_amd_shutdown(msrs);
- return -EBUSY;
- }
- }
-
- return 0;
-}
-
-static void op_amd_setup_ctrs(struct op_x86_model_spec const *model,
- struct op_msrs const * const msrs)
-{
- u64 val;
- int i;
-
- /* setup reset_value */
- for (i = 0; i < NUM_VIRT_COUNTERS; ++i) {
- if (counter_config[i].enabled
- && msrs->counters[op_x86_virt_to_phys(i)].addr)
- reset_value[i] = counter_config[i].count;
- else
- reset_value[i] = 0;
- }
-
- /* clear all counters */
- for (i = 0; i < NUM_CONTROLS; ++i) {
- if (unlikely(!msrs->controls[i].addr)) {
- if (counter_config[i].enabled && !smp_processor_id())
- /*
- * counter is reserved, this is on all
- * cpus, so report only for cpu #0
- */
- op_x86_warn_reserved(i);
- continue;
- }
- rdmsrl(msrs->controls[i].addr, val);
- if (val & ARCH_PERFMON_EVENTSEL_ENABLE)
- op_x86_warn_in_use(i);
- val &= model->reserved;
- wrmsrl(msrs->controls[i].addr, val);
- }
-
- /* avoid a false detection of ctr overflows in NMI handler */
- for (i = 0; i < NUM_COUNTERS; ++i) {
- if (unlikely(!msrs->counters[i].addr))
- continue;
- wrmsrl(msrs->counters[i].addr, -1LL);
- }
-
- /* enable active counters */
- for (i = 0; i < NUM_COUNTERS; ++i) {
- int virt = op_x86_phys_to_virt(i);
- if (!reset_value[virt])
- continue;
-
- /* setup counter registers */
- wrmsrl(msrs->counters[i].addr, -(u64)reset_value[virt]);
-
- /* setup control registers */
- rdmsrl(msrs->controls[i].addr, val);
- val &= model->reserved;
- val |= op_x86_get_ctrl(model, &counter_config[virt]);
- wrmsrl(msrs->controls[i].addr, val);
- }
-}
-

/*
* 16-bit Linear Feedback Shift Register (LFSR)
*

@@ -370,6 +269,107 @@ static void op_amd_stop_ibs(void)
wrmsrl(MSR_AMD64_IBSOPCTL, 0);
}

+#ifdef CONFIG_OPROFILE_EVENT_MULTIPLEX
+
+static void op_mux_switch_ctrl(struct op_x86_model_spec const *model,
+ struct op_msrs const * const msrs)
+{
+ u64 val;
+ int i;
+
+ /* enable active counters */
+ for (i = 0; i < NUM_COUNTERS; ++i) {
+ int virt = op_x86_phys_to_virt(i);
+ if (!reset_value[virt])
+ continue;
+ rdmsrl(msrs->controls[i].addr, val);
+ val &= model->reserved;
+ val |= op_x86_get_ctrl(model, &counter_config[virt]);
+ wrmsrl(msrs->controls[i].addr, val);
+ }
+}
+
+#endif
+
+/* functions for op_amd_spec */
+
+static void op_amd_shutdown(struct op_msrs const * const msrs);
+
+static int op_amd_fill_in_addresses(struct op_msrs * const msrs)
+{
+ int i;
+
+ for (i = 0; i < NUM_COUNTERS; i++) {
+ if (reserve_perfctr(i)) {
+ msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
+ msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;

+ } else if (counter_config[i].enabled) {
+ op_x86_warn_reserved(i);
+ op_amd_shutdown(msrs);
+ return -EBUSY;
+ }
+ }

+
+ return 0;
+}
+
+static void op_amd_setup_ctrs(struct op_x86_model_spec const *model,
+ struct op_msrs const * const msrs)
+{
+ u64 val;
+ int i;
+
+ /* setup reset_value */
+ for (i = 0; i < NUM_VIRT_COUNTERS; ++i) {
+ if (counter_config[i].enabled
+ && msrs->counters[op_x86_virt_to_phys(i)].addr)
+ reset_value[i] = counter_config[i].count;
+ else
+ reset_value[i] = 0;
+ }
+
+ /* clear all counters */
+ for (i = 0; i < NUM_CONTROLS; ++i) {
+ if (unlikely(!msrs->controls[i].addr)) {
+ if (counter_config[i].enabled && !smp_processor_id())
+ /*
+ * counter is reserved, this is on all
+ * cpus, so report only for cpu #0
+ */
+ op_x86_warn_reserved(i);
+ continue;
+ }
+ rdmsrl(msrs->controls[i].addr, val);
+ if (val & ARCH_PERFMON_EVENTSEL_ENABLE)
+ op_x86_warn_in_use(i);
+ val &= model->reserved;
+ wrmsrl(msrs->controls[i].addr, val);
+ }
+
+ /* avoid a false detection of ctr overflows in NMI handler */
+ for (i = 0; i < NUM_COUNTERS; ++i) {
+ if (unlikely(!msrs->counters[i].addr))
+ continue;
+ wrmsrl(msrs->counters[i].addr, -1LL);
+ }
+
+ /* enable active counters */
+ for (i = 0; i < NUM_COUNTERS; ++i) {
+ int virt = op_x86_phys_to_virt(i);
+ if (!reset_value[virt])
+ continue;
+
+ /* setup counter registers */
+ wrmsrl(msrs->counters[i].addr, -(u64)reset_value[virt]);
+
+ /* setup control registers */
+ rdmsrl(msrs->controls[i].addr, val);
+ val &= model->reserved;
+ val |= op_x86_get_ctrl(model, &counter_config[virt]);
+ wrmsrl(msrs->controls[i].addr, val);
+ }
+}
+
static int op_amd_check_ctrs(struct pt_regs * const regs,

struct op_msrs const * const msrs)

{
--
1.7.0

Robert Richter

unread,

Mar 4, 2010, 12:00:03 PM3/4/10

to

Two msrs must be reserverd per generic counter though both are
depending each other. Allocating the evntsel counter msr is not
necessary if the corresponding perfctr msr is already reserved. This
patch removes evntsel counter msr reservation.

This is implemented for AMD and P6 models only. for P4 still the
legacy code is used, so no changes there.

Signed-off-by: Robert Richter <robert....@amd.com>
---

arch/x86/kernel/cpu/perf_event.c | 15 +--------------
arch/x86/kernel/cpu/perfctr-watchdog.c | 6 ------
arch/x86/kernel/tsc.c | 2 --
arch/x86/oprofile/op_model_amd.c | 11 ++---------
arch/x86/oprofile/op_model_ppro.c | 11 ++---------
5 files changed, 5 insertions(+), 40 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 96b3a98..62a1a5e 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -248,19 +248,8 @@ static bool reserve_pmc_hardware(void)
goto perfctr_fail;
}

- for (i = 0; i < x86_pmu.num_events; i++) {
- if (!reserve_evntsel_nmi(x86_pmu.eventsel + i))
- goto eventsel_fail;
- }
-
return true;

-eventsel_fail:
- for (i--; i >= 0; i--)
- release_evntsel_nmi(x86_pmu.eventsel + i);
-
- i = x86_pmu.num_events;
-

perfctr_fail:
for (i--; i >= 0; i--)

release_perfctr_nmi(x86_pmu.perfctr + i);
@@ -275,10 +264,8 @@ static void release_pmc_hardware(void)
{
int i;

- for (i = 0; i < x86_pmu.num_events; i++) {
+ for (i = 0; i < x86_pmu.num_events; i++)
release_perfctr_nmi(x86_pmu.perfctr + i);
- release_evntsel_nmi(x86_pmu.eventsel + i);
- }

if (nmi_watchdog == NMI_LOCAL_APIC)
enable_lapic_nmi_watchdog();

diff --git a/arch/x86/kernel/cpu/perfctr-watchdog.c b/arch/x86/kernel/cpu/perfctr-watchdog.c
index fb329e9..5c129ee 100644
--- a/arch/x86/kernel/cpu/perfctr-watchdog.c
+++ b/arch/x86/kernel/cpu/perfctr-watchdog.c
@@ -313,17 +313,11 @@ static int single_msr_reserve(void)
{
if (!reserve_perfctr_nmi(wd_ops->perfctr))
return 0;
-
- if (!reserve_evntsel_nmi(wd_ops->evntsel)) {
- release_perfctr_nmi(wd_ops->perfctr);
- return 0;
- }
return 1;
}

static void single_msr_unreserve(void)
{
- release_evntsel_nmi(wd_ops->evntsel);
release_perfctr_nmi(wd_ops->perfctr);
}

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 597683a..ff98ef2 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -883,7 +883,6 @@ static unsigned long __init calibrate_cpu(void)
rdmsrl(MSR_K7_PERFCTR3, pmc3);
} else {
reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);

}
local_irq_save(flags);
/* start measuring cycles, incrementing from 0 */

@@ -902,7 +901,6 @@ static unsigned long __init calibrate_cpu(void)
wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
} else {
release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);

}

return pmc_now * tsc_khz / (tsc_now - tsc_start);

diff --git a/arch/x86/oprofile/op_model_amd.c b/arch/x86/oprofile/op_model_amd.c
index 090cbbe..79e79a6 100644
--- a/arch/x86/oprofile/op_model_amd.c
+++ b/arch/x86/oprofile/op_model_amd.c
@@ -134,13 +134,10 @@ static void op_amd_fill_in_addresses(struct op_msrs * const msrs)
int i;

for (i = 0; i < NUM_COUNTERS; i++) {

- if (reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i))

+ if (reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i)) {

msrs->counters[i].addr = MSR_K7_PERFCTR0 + i;
- }

-
- for (i = 0; i < NUM_CONTROLS; i++) {
- if (reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i))

msrs->controls[i].addr = MSR_K7_EVNTSEL0 + i;
+ }
}
}

@@ -433,10 +430,6 @@ static void op_amd_shutdown(struct op_msrs const * const msrs)

if (msrs->counters[i].addr)

release_perfctr_nmi(MSR_K7_PERFCTR0 + i);

}
- for (i = 0; i < NUM_CONTROLS; ++i) {

- if (msrs->controls[i].addr)
- release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
}

static u8 ibs_eilvt_off;
diff --git a/arch/x86/oprofile/op_model_ppro.c b/arch/x86/oprofile/op_model_ppro.c
index 2bf90fa..d6abdf6 100644
--- a/arch/x86/oprofile/op_model_ppro.c
+++ b/arch/x86/oprofile/op_model_ppro.c
@@ -35,13 +35,10 @@ static void ppro_fill_in_addresses(struct op_msrs * const msrs)
int i;

for (i = 0; i < num_counters; i++) {
- if (reserve_perfctr_nmi(MSR_P6_PERFCTR0 + i))
+ if (!reserve_perfctr_nmi(MSR_P6_PERFCTR0 + i)) {
msrs->counters[i].addr = MSR_P6_PERFCTR0 + i;
- }
-
- for (i = 0; i < num_counters; i++) {
- if (reserve_evntsel_nmi(MSR_P6_EVNTSEL0 + i))
msrs->controls[i].addr = MSR_P6_EVNTSEL0 + i;
+ }
}
}

@@ -197,10 +194,6 @@ static void ppro_shutdown(struct op_msrs const * const msrs)

if (msrs->counters[i].addr)

release_perfctr_nmi(MSR_P6_PERFCTR0 + i);
}
- for (i = 0; i < num_counters; ++i) {
- if (msrs->controls[i].addr)
- release_evntsel_nmi(MSR_P6_EVNTSEL0 + i);
- }
if (reset_value) {
kfree(reset_value);
reset_value = NULL;
--
1.7.0

Peter Zijlstra

unread,

Mar 4, 2010, 1:00:02 PM3/4/10

to

On Thu, 2010-03-04 at 16:22 +0100, Robert Richter wrote:
> This patch set improves the perfctr reservation code. New functions
> are available to reserve a counter by its index only. It is no longer
> necessary to allocate both msrs of a counter which also improves the
> code and makes it easier.
>
> For oprofile a handler is implemented that returns an error now if a
> counter is already reserved by a different subsystem such as perf or
> watchdog. Before, oprofile silently ignored that counter. Finally the
> new reservation functions can be used to allocate special parts of the
> pmu such as IBS, which is necessary to use IBS with perf too.
>
> The patches are available in the oprofile tree:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core
>
> If there are no objections, I suggest to merge it into the
> tip/perf/core too, maybe after pending patches went in. If there are
> already conflicts, I will do the merge for this.

Right, so cleaning up that reservation code is nice, but wouldn't it be
much nicer to simply do away with all that and make everything use the
(low level) perf code?

You expressed interest in both making the oprofile kernel space user the
perf apis as well as making oprofile user space use the perf abi, in
which case the kernel entirely side goes away.

Peter Zijlstra

unread,

Mar 11, 2010, 6:50:02 AM3/11/10

to

On Thu, 2010-03-04 at 18:59 +0100, Peter Zijlstra wrote:
> On Thu, 2010-03-04 at 16:22 +0100, Robert Richter wrote:
> > This patch set improves the perfctr reservation code. New functions
> > are available to reserve a counter by its index only. It is no longer
> > necessary to allocate both msrs of a counter which also improves the
> > code and makes it easier.
> >
> > For oprofile a handler is implemented that returns an error now if a
> > counter is already reserved by a different subsystem such as perf or
> > watchdog. Before, oprofile silently ignored that counter. Finally the
> > new reservation functions can be used to allocate special parts of the
> > pmu such as IBS, which is necessary to use IBS with perf too.
> >
> > The patches are available in the oprofile tree:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core
> >
> > If there are no objections, I suggest to merge it into the
> > tip/perf/core too, maybe after pending patches went in. If there are
> > already conflicts, I will do the merge for this.
>
> Right, so cleaning up that reservation code is nice, but wouldn't it be
> much nicer to simply do away with all that and make everything use the
> (low level) perf code?

Alternatively, could we maybe further simplify this reservation into:

int reserve_pmu(void);
void release_pmu(void);

And not bother with anything finer grained.

Ingo Molnar

unread,

Mar 11, 2010, 7:50:02 AM3/11/10

to

* Peter Zijlstra <pet...@infradead.org> wrote:

> On Thu, 2010-03-04 at 18:59 +0100, Peter Zijlstra wrote:
> > On Thu, 2010-03-04 at 16:22 +0100, Robert Richter wrote:
> > > This patch set improves the perfctr reservation code. New functions
> > > are available to reserve a counter by its index only. It is no longer
> > > necessary to allocate both msrs of a counter which also improves the
> > > code and makes it easier.
> > >
> > > For oprofile a handler is implemented that returns an error now if a
> > > counter is already reserved by a different subsystem such as perf or
> > > watchdog. Before, oprofile silently ignored that counter. Finally the
> > > new reservation functions can be used to allocate special parts of the
> > > pmu such as IBS, which is necessary to use IBS with perf too.
> > >
> > > The patches are available in the oprofile tree:
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git core
> > >
> > > If there are no objections, I suggest to merge it into the
> > > tip/perf/core too, maybe after pending patches went in. If there are
> > > already conflicts, I will do the merge for this.
> >
> > Right, so cleaning up that reservation code is nice, but wouldn't it be
> > much nicer to simply do away with all that and make everything use the
> > (low level) perf code?
>
> Alternatively, could we maybe further simplify this reservation into:
>
> int reserve_pmu(void);
> void release_pmu(void);
>
> And not bother with anything finer grained.

Yeah, that looks quite a bit simpler.

It's all about making it easier to live with legacies anyway - all modern
facilities will use perf events to access the PMU.

Ingo

Robert Richter

unread,

Mar 11, 2010, 10:50:01 AM3/11/10

to

On 11.03.10 13:47:16, Ingo Molnar wrote:
> * Peter Zijlstra <pet...@infradead.org> wrote:
> > Alternatively, could we maybe further simplify this reservation into:
> >
> > int reserve_pmu(void);
> > void release_pmu(void);
> >
> > And not bother with anything finer grained.
>
> Yeah, that looks quite a bit simpler.

It does not solve the current problem that some parts of the pmu *are*
used simultaneously by different subsystems. But, even if only perf
would be used in the kernel you still can't be sure that all parts of
the pmu are available to be used, you simply don't have it under your
control. So why such limitations as an 'int reserve_pmu(int index)' is
almost the same but provides much more flexibility?

The question already arose if the watchdog would use perf permanently
and thus would block oprofile by making it unusable. The current
reservation code would provide a framework to solves this, sharing
perfctrs with watchdog, perf and oprofile. And, since the pmu becomes
more and more features other than perfctrs, why shouldn't it be
possible to run one feature with perf and the other with oprofile?

> It's all about making it easier to live with legacies anyway - all modern
> facilities will use perf events to access the PMU.

Scheduling events with perf is also 'some sort' of reservation, so
this code could be moved later into perf at all. In this case we also
will have to be able to reserve single counters or features by its
index.

For now, I don't think it is possible to change oprofile to use perf
in a big bang. This will disrupt oprofile users. I want to do the
switch to perf in a series of small changes and patch sets to make
sure, oprofile will not break. And this new reservation code is a step
towards perf.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert....@amd.com

Andi Kleen

unread,

Mar 20, 2010, 1:50:01 AM3/20/10

to

Robert Richter <robert....@amd.com> writes:

> Current perfctr reservation code allocates single pmu msrs. The msr
> addresses may differ depending on the model and offset calculation is
> necessary. This can be easier implemented by reserving a counter by
> its index only.

Sorry reviewing old patch. This doesn't work for the fixed counters on intel,
which don't have a index (or rather they have a separate number space)

I had a old patch to fix the reservation for them (and a matching
patch to perf to use it).

How to resolve this?

-Andi

---

---
arch/x86/kernel/cpu/perfctr-watchdog.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6.32-ak/arch/x86/kernel/cpu/perfctr-watchdog.c
===================================================================
--- linux-2.6.32-ak.orig/arch/x86/kernel/cpu/perfctr-watchdog.c
+++ linux-2.6.32-ak/arch/x86/kernel/cpu/perfctr-watchdog.c
@@ -70,9 +70,13 @@ static inline unsigned int nmi_perfctr_m
case X86_VENDOR_AMD:
return msr - MSR_K7_PERFCTR0;
case X86_VENDOR_INTEL:
- if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON))
+ if (cpu_has(&boot_cpu_data, X86_FEATURE_ARCH_PERFMON)) {
+ if (msr >= MSR_CORE_PERF_FIXED_CTR0 &&
+ msr < MSR_CORE_PERF_FIXED_CTR0 + 8)
+ return NMI_MAX_COUNTER_BITS -
+ (msr - MSR_CORE_PERF_FIXED_CTR0);
return msr - MSR_ARCH_PERFMON_PERFCTR0;
-
+ }
switch (boot_cpu_data.x86) {
case 6:
return msr - MSR_P6_PERFCTR0;

--
a...@linux.intel.com -- Speaking for myself only.

Robert Richter

unread,

Mar 25, 2010, 12:00:03 PM3/25/10

to

Andi,

so far it does not seem this reservation patches will go upstream. So
we still do not have a solution of how to share the pmu with perf. The
current approach is a global pmu lock. I don't think this is a good
solution and we already see questions on the oprofile mailing list why
counters are not available to use. This will become much worse if perf
is using counters permanently in the kernel (e.g. the perf nmi
watchdog). This will make oprofile unusable.

On 20.03.10 06:45:44, Andi Kleen wrote:
> Robert Richter <robert....@amd.com> writes:
>
> > Current perfctr reservation code allocates single pmu msrs. The msr
> > addresses may differ depending on the model and offset calculation is
> > necessary. This can be easier implemented by reserving a counter by
> > its index only.
>
> Sorry reviewing old patch. This doesn't work for the fixed counters on intel,
> which don't have a index (or rather they have a separate number space)
>
> I had a old patch to fix the reservation for them (and a matching
> patch to perf to use it).
>
> How to resolve this?

Fixed counter reservation is not really used in the kernel except for
p4. Oprofile only reserves generic counters. Unless there is a general
solution how to reserve counters I don't see the need to extend the
reservation code for fixed counters since the only subsystem using it
would be the nmi watchdog.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert....@amd.com

--

Andi Kleen

unread,

Mar 25, 2010, 3:40:01 PM3/25/10

to

On Thu, Mar 25, 2010 at 04:52:08PM +0100, Robert Richter wrote:
> Andi,
>
> so far it does not seem this reservation patches will go upstream. So
> we still do not have a solution of how to share the pmu with perf. The

There's the current reservation
code which has some issues and is butt ugly, but does its job mostly.

The only real problem I have with it is that it doesn't support fixed counters.

That's not very hard to fix (patches posted), but of course requires
some basic cooperation from impartial maintainers. I think extending
it to other registers shouldn't also be that hard.

> current approach is a global pmu lock. I don't think this is a good
> solution and we already see questions on the oprofile mailing list why
> counters are not available to use. This will become much worse if perf
> is using counters permanently in the kernel (e.g. the perf nmi
> watchdog). This will make oprofile unusable.

NMI watchdog is not on by default luckily.

Anyways I don't really understand what the problem with just
allocating counters properly in perf_events like everyone else.
They need to do that anyways to cooperate with firmware or VMs using these
counters.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.