[RFC] [PATCH 1/1] perf: add support for arch-dependent symbolic event names to "perf stat"

Corey Ashford

unread,

Mar 3, 2010, 9:40:02 PM3/3/10

to

For your review, this patch adds support for arch-dependent symbolic event names
to the "perf stat" tool, and could be expanded to other "perf *" commands fairly
easily, I suspect.

To support arch-dependent event names without adding arch-dependent code to
perf, I added a callout mechanism whereby perf will look for the environment
variable: PERF_ARCH_DEP_LIB, and if it exists, it will try to open it as a
shared object. If that succeeds, it looks for the symbol
"parse_arch_dep_event". If that exists, that function will be called by
parse_events() before all of the other event parsing functions in
parse-events.c. It is passed the same arguments as the other parse_*_event
functions, namely the event string and a pointer to an event attribute structure.

As the code existed, "perf stat" would print out the count results, but for raw
events (which is how arch-dependent events are supported in perf_events), it
would just print out a raw code. This is not acceptable, especially when a
symbolic name was placed on the command line. So I changed the code to save
away the event name that was passed on the command line, rather than doing a
reverse translation to an event string based on the event type and config fields
of the attr structure. In this way, there's no need for a reverse translation
function in the arch-dependent library; only a event string->attr struct
function is needed.

I could well be missing something, but I don't understand why reverse
translation is ever needed in perf, as long as the tool keeps track of the
original event strings.

Thanks for your consideration,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
503-578-3507
cjas...@us.ibm.com

perf_symbolic_event.diff

Corey Ashford

unread,

Mar 4, 2010, 1:40:01 PM3/4/10

to

A couple of follow-up comments on this patch:

This functionality was designed to provide a generalized interface to an
external event name -> attr struct library, such as libpfm4. libpfm4 has an
interface that nearly exactly matches parse_*_event() profiles, so it's quite
easy to write a small wrapper function to call libpfm4's function.

Ingo Molnar discussed adding some visibility to the arch-dependent event names
through some other interface, such as through /sys/devices/pmus perhaps, but
that discussion is a long way (as far as I know) from having something usable
today. So you could think of this external library approach to be a stop-gap
until something better is developed. When/if that new event naming mechanism
becomes available, we can easily remove this external library support from perf.

--
Regards,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
503-578-3507
cjas...@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Corey Ashford

unread,

Mar 5, 2010, 12:50:02 PM3/5/10

to

(I posted this yesterday, but I think LKML rejected it because there was more
quoted text than my text. So I am reposting with no quoted text).

A couple of follow-up comments on this patch:

This functionality was designed to provide a generalized interface to an
external event name -> attr struct library, such as libpfm4. libpfm4 has an
interface that nearly exactly matches parse_*_event() profiles, so it's quite
easy to write a small wrapper function to call libpfm4's function.

Ingo Molnar discussed adding some visibility to the arch-dependent event names
through some other interface, such as through /sys/devices/pmus perhaps, but
that discussion is a long way (as far as I know) from having something usable
today. So you could think of this external library approach to be a stop-gap
until something better is developed. When/if that new event naming mechanism
becomes available, we can easily remove this external library support from perf.

--
Regards,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR

Ingo Molnar

unread,

Mar 11, 2010, 7:50:02 AM3/11/10

to

I'm quite much against stop-gap measures like this - they tend to become
tomorrow's impossible-to-remove quirk.

If you want extensible events you can already do it by providing an ftrace
tracepoint event via TRACE_EVENT. They are easy to add and ad-hoc, and are
supported throughout by perf.

That could be librarized further by providing an /eventfs or /proc/events
interface to enumerate them.

Or if you want to extend the perf events namespace ABI you can send patches
for that as well. (It's not a big issue if a particular event is currently
only supported on Power for example - as long as you make a good effort naming
and structuring it in a reasonably generic way.)

Thanks,

Ingo

Corey Ashford

unread,

Mar 11, 2010, 1:50:02 PM3/11/10

to

On 3/11/2010 4:46 AM, Ingo Molnar wrote:
[snip]

> If you want extensible events you can already do it by providing an ftrace
> tracepoint event via TRACE_EVENT. They are easy to add and ad-hoc, and are
> supported throughout by perf.

Is TRACE_EVENT an appropriate way to add hardware-specific counter events? I
will look into this. Thanks for the pointer.

>
> That could be librarized further by providing an /eventfs or /proc/events
> interface to enumerate them.

We can enumerate events this way, but there are other aspects to events than
just their names (see below).

>
> Or if you want to extend the perf events namespace ABI you can send patches
> for that as well. (It's not a big issue if a particular event is currently
> only supported on Power for example - as long as you make a good effort naming
> and structuring it in a reasonably generic way.)

I'm not sure how that would work. The issue I am trying to solve here is that
Power arch chips have a large number of very hardware-specific events that are
not generalizable. Many of these events not only have names, but other
user-configurable bits as well that select or narrow the scope of which exact
events are recorded. This issue is dealt with nicely in libpfm4, as it has
mechanisms for parsing event names and attributes (aka modifiers or unit masks),
and then produces a usable config field for the perf_events_attr struct.

Should I take it from the above that you are completely against the idea of
using an external library for hardware-specific event and attribute naming?

--
Regards,

- Corey

Ingo Molnar

unread,

Mar 11, 2010, 2:20:02 PM3/11/10

to

* Corey Ashford <cjas...@linux.vnet.ibm.com> wrote:

Could you give a few relevant examples of events in question, and the kind of
configurability/attributes they have on Power?

Thanks,

Ingo

Corey Ashford

unread,

Mar 11, 2010, 3:50:02 PM3/11/10

to

On 3/11/2010 11:14 AM, Ingo Molnar wrote:
>
> * Corey Ashford<cjas...@linux.vnet.ibm.com> wrote:

[snip]

>> I'm not sure how that would work. The issue I am trying to solve
>> here is that Power arch chips have a large number of very
>> hardware-specific events that are not generalizable. Many of these
>> events not only have names, but other user-configurable bits as well
>> that select or narrow the scope of which exact events are recorded.
>> This issue is dealt with nicely in libpfm4, as it has mechanisms for
>> parsing event names and attributes (aka modifiers or unit masks),
>> and then produces a usable config field for the perf_events_attr
>> struct.
>>
>> Should I take it from the above that you are completely against the
>> idea of using an external library for hardware-specific event and
>> attribute naming?
>
> Could you give a few relevant examples of events in question, and the kind of
> configurability/attributes they have on Power?

Here are a few examples for the Power A2 processor. I've distorted the names
because PMU architecture isn't publicly released yet.

PM_DE_PMC_9:hrd_mask=0xff:hrd=0x22:pma_mask=0x3fff:pma=0x1b2d:culling_mode=3
PM_EX_0x03:lane=2:vlane=1
PM_OWE_ENG_MAC_FULL:usu=3

Note that the attribute fields shown above are fitted into the config field of
the perf_event_attr struct.

>
> Thanks,
>
> Ingo

Regards,

- Corey

Paul Mackerras

unread,

Mar 11, 2010, 9:50:02 PM3/11/10

to

On Thu, Mar 11, 2010 at 01:46:08PM +0100, Ingo Molnar wrote:
>
> * Corey Ashford <cjas...@linux.vnet.ibm.com> wrote:
>
> > On 3/3/2010 6:30 PM, Corey Ashford wrote:
> > >For your review, this patch adds support for arch-dependent symbolic
> > >event names to the "perf stat" tool, and could be expanded to other
> > >"perf *" commands fairly easily, I suspect.

> I'm quite much against stop-gap measures like this - they tend to become

> tomorrow's impossible-to-remove quirk.
>
> If you want extensible events you can already do it by providing an ftrace
> tracepoint event via TRACE_EVENT. They are easy to add and ad-hoc, and are
> supported throughout by perf.

If I've understood correctly what Corey is doing, I think you're
missing the point. The idea, I thought, was to provide a way to be
able to use symbolic names for raw hardware events rather than just
numbers. I don't see how ftrace tracepoint events are relevant to
that.

Now as to whether an external .so is the best way to provide the
processor-specific mapping of names to raw events, I'm not sure.
If the kernel can provide that mapping via procfs, sysfs or eventfs,
that would be an alternative, but it does mean the kernel has those
tables in unswappable memory (and potentially the tables for all the
processors that the kernel supports), which seems unnecessary. Or
they can just be added to the perf source code.

Paul.

Corey Ashford

unread,

Mar 12, 2010, 2:00:02 AM3/12/10

to

On 03/11/2010 06:41 PM, Paul Mackerras wrote:
> On Thu, Mar 11, 2010 at 01:46:08PM +0100, Ingo Molnar wrote:
>>
>> * Corey Ashford<cjas...@linux.vnet.ibm.com> wrote:
>>
>>> On 3/3/2010 6:30 PM, Corey Ashford wrote:
>>>> For your review, this patch adds support for arch-dependent symbolic
>>>> event names to the "perf stat" tool, and could be expanded to other
>>>> "perf *" commands fairly easily, I suspect.
>
>> I'm quite much against stop-gap measures like this - they tend to become
>> tomorrow's impossible-to-remove quirk.
>>
>> If you want extensible events you can already do it by providing an ftrace
>> tracepoint event via TRACE_EVENT. They are easy to add and ad-hoc, and are
>> supported throughout by perf.
>
> If I've understood correctly what Corey is doing, I think you're
> missing the point. The idea, I thought, was to provide a way to be
> able to use symbolic names for raw hardware events rather than just
> numbers.

Yes, that's what I meant.

> I don't see how ftrace tracepoint events are relevant to
> that.
>
> Now as to whether an external .so is the best way to provide the
> processor-specific mapping of names to raw events, I'm not sure.
> If the kernel can provide that mapping via procfs, sysfs or eventfs,
> that would be an alternative, but it does mean the kernel has those
> tables in unswappable memory (and potentially the tables for all the
> processors that the kernel supports), which seems unnecessary. Or
> they can just be added to the perf source code.

In addition to the names and attributes, we'd also need text-based
descriptions of the events and attributes.

I'm not opposed to the idea of placing them in sysfs (or other pseudo
fs), but it's also not clear to me how to represent the event data in a
clean, extensible, and space/performance efficient way. That said, I do
like the idea of being able to navigate events by looking through a
directory structure which is possibly organized by the physical topology
of the system and its PMUs.

- Corey

Corey Ashford

unread,

Mar 15, 2010, 7:40:01 PM3/15/10

to

On 3/11/2010 12:46 PM, Corey Ashford wrote:
>
>
> On 3/11/2010 11:14 AM, Ingo Molnar wrote:
>>
>> * Corey Ashford<cjas...@linux.vnet.ibm.com> wrote:
> [snip]
>>> I'm not sure how that would work. The issue I am trying to solve
>>> here is that Power arch chips have a large number of very
>>> hardware-specific events that are not generalizable. Many of these
>>> events not only have names, but other user-configurable bits as well
>>> that select or narrow the scope of which exact events are recorded.
>>> This issue is dealt with nicely in libpfm4, as it has mechanisms for
>>> parsing event names and attributes (aka modifiers or unit masks),
>>> and then produces a usable config field for the perf_events_attr
>>> struct.
>>>
>>> Should I take it from the above that you are completely against the
>>> idea of using an external library for hardware-specific event and
>>> attribute naming?
>>
>> Could you give a few relevant examples of events in question, and the
>> kind of
>> configurability/attributes they have on Power?
>
> Here are a few examples for the Power A2 processor. I've distorted the
> names because PMU architecture isn't publicly released yet.
>
> PM_DE_PMC_9:hrd_mask=0xff:hrd=0x22:pma_mask=0x3fff:pma=0x1b2d:culling_mode=3
>
> PM_EX_0x03:lane=2:vlane=1
> PM_OWE_ENG_MAC_FULL:usu=3

Just a follow-up note to this...

I learned that the much of the high-level architecture of the new chip that IBM
is working on has been publicly released recently, so I have "undistorted" the
event names below:

PM_DC_PMC_9:lpid_mask=0xff:lpid=0x22:pid_mask=0x3fff:pid=0x1b2d:marking_mode=3
PM_REGX_0x03:lane=2:vlane=1
PM_XML_ENG_MAC_FULL:sus=3

DC = Decompression/Compression accelerator
PMC_9 = Peformance monitoring event 9
REGX = Regular eXpression accelerator
XML = XML parsing accelerator
pid = process id to match
pid_mask = process id match mask
lpid = logical partition id
lpid_mask = logical partition id mask
sus = source unit select
lane, vlane = signal routing fields
marking_mode = used to determine which accelerator work units to mark for
performance monitoring

Ingo Molnar

unread,

Mar 16, 2010, 5:40:02 AM3/16/10

to

* Paul Mackerras <pau...@samba.org> wrote:

> On Thu, Mar 11, 2010 at 01:46:08PM +0100, Ingo Molnar wrote:
> >
> > * Corey Ashford <cjas...@linux.vnet.ibm.com> wrote:
> >
> > > On 3/3/2010 6:30 PM, Corey Ashford wrote:
> > > >For your review, this patch adds support for arch-dependent symbolic
> > > >event names to the "perf stat" tool, and could be expanded to other
> > > >"perf *" commands fairly easily, I suspect.
>
> > I'm quite much against stop-gap measures like this - they tend to become
> > tomorrow's impossible-to-remove quirk.
> >
> > If you want extensible events you can already do it by providing an ftrace
> > tracepoint event via TRACE_EVENT. They are easy to add and ad-hoc, and are
> > supported throughout by perf.
>
> If I've understood correctly what Corey is doing, I think you're missing the
> point. The idea, I thought, was to provide a way to be able to use symbolic
> names for raw hardware events rather than just numbers. I don't see how
> ftrace tracepoint events are relevant to that.

tracepoints are relevant because they are the currently best way of how we
assign symbolic names to various kernel-internal events. For ad-hoc usecases
like this:

http://dri.freedesktop.org/wiki/IntelPerformanceTuning

I'd much rather see that facility used (and, to the extent needed, extended)
to provide support for rare arch events that we dont want to enumerate in a
generic way.

Or, if the events are important enough to be hardcoded into the perf ABI
itself, they should be generalized in a meaningful way - even if you dont
expect them to show up on other CPUs.

Ingo

Ingo Molnar

unread,

Mar 16, 2010, 5:50:02 AM3/16/10

to

Are these special-purpose instructions for compression/regex/xml-parsing
speedups?

I think it would be rather useful to merge the hw (and sw) perf events with
the ftrace/tracepoints symbolic events space. That would be a one-stop-shop
for both perf and other tools to figure out the events we offer, their
characteristics, format, relationship to other events, etc.

Ingo

Corey Ashford

unread,

Mar 16, 2010, 2:30:03 PM3/16/10

to

>> I learned that much of the high-level architecture of the new

>> chip that IBM is working on has been publicly released recently, so
>> I have "undistorted" the event names below:
>>
>> PM_DC_PMC_9:lpid_mask=0xff:lpid=0x22:pid_mask=0x3fff:pid=0x1b2d:marking_mode=3
>> PM_REGX_0x03:lane=2:vlane=1
>> PM_XML_ENG_MAC_FULL:sus=3
>>
>>
>> DC = Decompression/Compression accelerator
>> PMC_9 = Peformance monitoring event 9
>> REGX = Regular eXpression accelerator
>> XML = XML parsing accelerator
>> pid = process id to match
>> pid_mask = process id match mask
>> lpid = logical partition id
>> lpid_mask = logical partition id mask
>> sus = source unit select
>> lane, vlane = signal routing fields
>> marking_mode = used to determine which accelerator work units to
>> mark for performance monitoring
>
> Are these special-purpose instructions for compression/regex/xml-parsing
> speedups?

No, these events are for nest (aka uncore) accelerators for
compression/regex/xml-parsing. These accelerators operate independently of the
CPU threads and are given work units via request blocks which are then queued up
by the accelerator.

>
> I think it would be rather useful to merge the hw (and sw) perf events with
> the ftrace/tracepoints symbolic events space. That would be a one-stop-shop
> for both perf and other tools to figure out the events we offer, their
> characteristics, format, relationship to other events, etc.
>
> Ingo

Ok, I will look into this. Thank you for your advice.

- Corey