On Linux perf development

Kao Quey-Liang

unread,

Mar 15, 2018, 2:46:03 AM3/15/18

to RISC-V SW Dev

Hi all,

There was a patch towards perf development which enables basic perf functionalities.

I tried to exchange some ideas in the thread, but it seems inappropriate to have discussions in a closed pull request.

I hope this is the right place.

During the development process of perf on RISC-V, I notice a few interesting things:

S-mode has no right to write to counters. Solomatnikov proposed that he was trying to provide a SBI for this purpose.
There is no interrupt for a counter overflow events. Assume that this is important to be appended in the spec in the future, or most of the vendors will support this feature,

There is no interrupt indicator to let software "know" which counter cause a counter overflow.
There is no interrupt en/disable mechanism for counters.
There is no mode control mechanism for counters, so that perf cannot explicitly count user-/kernel-space-only events.

Since the discussion of these limits is beyond the intention of this post, I will just state the perf-related, SW-related part here.

With limitations above in mind and as mentioned in the last post in the thread, I'm now finishing a perf patch for basic HW counter support, which will be ready by next Tuesday. The patch will contain:

A extensible framework, so that PMUs of each platform can be added into easily.

Documentation/riscv/pmu.txt, providing the guide to this process.

Conventional logic of perf design: the SBI can put in without pain.

Any comments are welcome, especially from Solomatnikov.

Thanks,

Alan Kao, Andes Technology

Christopher Celio

unread,

Mar 15, 2018, 3:07:49 AM3/15/18

to Kao Quey-Liang, RISC-V SW Dev

Great to see progress on this front!

> • There is no mode control mechanism for counters, so that perf cannot explicitly count user-/kernel-space-only events.

Does that have to be OS visible? In the hardware, I can trivially use 1 or 2 bits of the XLEN event ID selector to specify which mode I want the event to trigger on.

> There is no interrupt for a counter overflow events. Assume that this is important to be appended in the spec in the future, or most of the vendors will support this feature,

Is this fundamental to performance monitoring, or is this just what happened to exist in x86 land, so perf was implemented to utilize those features? I hate to take on other ISAs' legacy baggage if it's not fundamental.

-Chris

> --
> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/0f7563fe-b412-4e02-90f5-85e0889d7426%40groups.riscv.org.

Alan Kao

unread,

Mar 15, 2018, 4:55:38 AM3/15/18

to Christopher Celio, RISC-V SW Dev

Hi Chris,

Thanks for your feedback.

On Wed, Mar 14, 2018 at 08:07:49PM -0700, Christopher Celio wrote:
> Great to see progress on this front!
>
> > • There is no mode control mechanism for counters, so that perf cannot explicitly count user-/kernel-space-only events.
>
> Does that have to be OS visible? In the hardware, I can trivially use 1 or 2 bits of the XLEN event ID selector to specify which mode I want the event to trigger on.

I don't known much how dtrace deal with these issues, but in Linux landscape,
monitoring tools especially perf can take advantage of this feature to accurately
count/sample events in different processor mode.

For example, a classical usage of perf is like

> $ perf record -e cycles:u -a -- sleep 5

which means that perf will sample the cycle count of "sleep 5" in user space ONLY.
The same goes to kernel space.

Note that in such use case, the mode information should not only be visible to S-mode,
but also modifiable. I guess this somehow violates the RISC-V philosophy, so more
detail in this later.

>
> > There is no interrupt for a counter overflow events. Assume that this is important to be appended in the spec in the future, or most of the vendors will support this feature,
>
> Is this fundamental to performance monitoring, or is this just what happened to exist in x86 land, so perf was implemented to utilize those features?

Please allow me to quote this paragraph from the Perf Wiki
(https://perf.wiki.kernel.org/index.php/Tutorial#Event-based_sampling_overview)

> Perf_events is based on event-based sampling. The period is expressed as the number of occurrences of an event, not the number of timer ticks. A sample is recorded when the sampling counter overflows, i.e., wraps from 2^64 back to 0. No PMU implements 64-bit hardware counters, but perf_events emulates such counters in software.

So, the kernel just needs some asynchronous signal to tell it which and when
the counters overflowed, and many other legacy ISAs did support this.

> I hate to take on other ISAs' legacy baggage if it's not fundamental.

We understand that.

What I can provide in this discussion is the perspective of users
, who are Linux guys and tend to rely on perf, the most popular and
most general performance monitoring tool now, to tune their software and systems.
Fundamental or not, it is not my call.

We known that such hardware units may not conform to the design philosophy, but
it would be helpful if those registers/mechanisms becomes part of the standard spec.
Not only because some users do need them to be supported, but also we will do them
in our IPs.

I hope these explanation helps,
Alan Kao, Andes Technology

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/403B4B5F-AE1F-466C-9B8D-BFA7CF9B7AD4%40eecs.berkeley.edu.

Vince Weaver

unread,

Mar 15, 2018, 3:31:29 PM3/15/18

to Alan Kao, Christopher Celio, RISC-V SW Dev

I thought I'd chime in here as a Linux/perf/PAPI developer here.

Most Linux performance analsysi these days does assume that the CPU can
provide user/kernel distinction when specifying events, as well as an
overflow interrupt.

You can do analysis without these, but most architectures (not just x86)
provide these so a lot of the analysis tools assume they are available.
ARM was a bit late to the game but everything post Cortex-A15 has HW
interrupts and allows specifying user/kernel.

Having an overflow interrupt is useful for sampling.
It's also useful to handle automatic handling of counter overflow.

From the RISCV spec I can find, you can't even write to the counter
registers? Or even disable/enable them? They're free-running all the
time? In that case it's useful to have an overflow interrupt, otherwise
you have to regularly poll the registers to make sure they haven't
overflowed (though I guess with 64-bit counters that might not happen that
often).

A tricky thing is how much skid happens once the interrupt triggers.

As a side note, it is good to see the spec allows for enabling userspace
access to the counters, that really helps with overhead in self-monitoring
of programs, such as with PAPI.

Vince

Alex Solomatnikov

unread,

Mar 17, 2018, 12:42:14 AM3/17/18

to Kao Quey-Liang, RISC-V SW Dev

I didn't have much time to work on Linux perf because of other urgent things.

On Wed, Mar 14, 2018 at 7:46 PM, Kao Quey-Liang <none...@gmail.com> wrote:

Hi all,

There was a patch towards perf development which enables basic perf functionalities.
I tried to exchange some ideas in the thread, but it seems inappropriate to have discussions in a closed pull request.

https://github.com/riscv/riscv-linux/pull/124 is in the queue for upstreaming: https://github.com/riscv/riscv-linux/commit/236fe55c9d854ff17793fdaa5638429f06fe862e

It enables only Linux SW perf events for pipe cleaning.

Palmer rebases this commit and other commits on top of the latest RC: https://github.com/riscv/riscv-linux/commits/riscv-all

I hope this is the right place.

During the development process of perf on RISC-V, I notice a few interesting things:
S-mode has no right to write to counters. Solomatnikov proposed that he was trying to provide a SBI for this purpose.

SBI calls are necessary to prevent virtualization holes.

There is no interrupt for a counter overflow events. Assume that this is important to be appended in the spec in the future, or most of the vendors will support this feature,
There is no interrupt indicator to let software "know" which counter cause a counter overflow.
There is no interrupt en/disable mechanism for counters.

Counter overflow can be handled by timer interrupt, e.g. CSRRC instruction can be used to reset atomically MSB of the counter to prevent overflow.

SiFive preference is to avoid adding interrupts unless there is a compelling technical reason. From this email thread it looks like profiling/sampling for a HW event (e.g. figuring which lines of code/instructions cause (most frequent) cache misses) is a good use case. I think synchronous exceptions (where possible) are more useful for such use case instead of interrupts. Of course, this would be optional and implementation specific.

There is no mode control mechanism for counters, so that perf cannot explicitly count user-/kernel-space-only events.

This is not an ISA issue, it can be addressed at the implementation level.

For example, in SiFive U54 core (https://static.dev.sifive.com/U54-MC-RVCoreIP.pdf, section 3.11) there are HW perf counters that can be programmed to count different events or several events at the same time according to bit mask.

These bit masks can be extended to specify user only or kernel only events.

Since the discussion of these limits is beyond the intention of this post, I will just state the perf-related, SW-related part here.

With limitations above in mind and as mentioned in the last post in the thread, I'm now finishing a perf patch for basic HW counter support, which will be ready by next Tuesday. The patch will contain:
A extensible framework, so that PMUs of each platform can be added into easily.
Documentation/riscv/pmu.txt, providing the guide to this process.
Conventional logic of perf design: the SBI can put in without pain.

I am looking forward to seeing your patch.

Thanks,

Alex

Any comments are welcome, especially from Solomatnikov.

Thanks,
Alan Kao, Andes Technology

--

Reply all

Reply to author

Forward