Re: [PATCH 0/3 v2] new nmi_watchdog using perf events

Stephane Eranian

unread,

Feb 12, 2010, 11:20:01 AM2/12/10

to

Don,

How is this new NMI watchdog code going to work when you also have OProfile
enabled in your kernel?

Today, perf_event disables the NMI watchdog while there is at least one event.
By releasing the PMU registers, it also allows for Oprofile to work.

But now with this new NMI watchdog code, perf_event never releases the PMU.
Thus, I suspect Oprofile will not work anymore, unless the NMI watchdog is
explicitly disabled. Up until now OProfile could co-exist with the NMI watchdog.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Don Zickus

unread,

Feb 12, 2010, 12:10:03 PM2/12/10

to

On Fri, Feb 12, 2010 at 05:12:38PM +0100, Stephane Eranian wrote:
> Don,
>
> How is this new NMI watchdog code going to work when you also have OProfile
> enabled in your kernel?
>
> Today, perf_event disables the NMI watchdog while there is at least one event.
> By releasing the PMU registers, it also allows for Oprofile to work.
>
> But now with this new NMI watchdog code, perf_event never releases the PMU.
> Thus, I suspect Oprofile will not work anymore, unless the NMI watchdog is
> explicitly disabled. Up until now OProfile could co-exist with the NMI watchdog.

You are right. Originally when I read the code I thought perf_event just
grabbed all the PMUs in reserve_pmc_init(). But I see that only happens
when someone actually creates a PERF_TYPE_HARDWARE event, which the new
nmi watchdog does. Those PMUs only get released when the event is
destroyed which my new code only does when the cpu disappears.

So yeah, I have effectively blocked oprofile from working. I can change
my code such that when you disable the nmi_watchdog, you can release the
PMUs and let oprofile work.

But then I am curious, considering that perf and oprofile do the same
thing, how much longer do we let competing subsystems control the same
hardware? I thought the point of the perf_event subsystem was to have a
proper framework on top of the PMUs such that anyone who wants to use it
just registers themselves, which is what the new nmi_watchdog is doing.

I can add code that allows oprofile and the new nmi watchdog to coexist,
but things get a little ugly to maintain. Just wondering what the
gameplan is here?

Cheers,
Don

Stephane Eranian

unread,

Feb 12, 2010, 12:20:01 PM2/12/10

to

Don,

I believe OProfile should eventually be removed from the kernel. I suspect
much of the functionalities it needs are already provided by perf_events.
But that does not mean the OProfile user level tool must disappear. There is
a very large user community. I think it could and should be ported to use
perf_events instead. Given that the Oprofile users only interact through
opcontrol, opreport, opannotate and such, they never "see" the actual kernel
API. Thus by re-targeting the scripts, this should be mostly transparent to
end-users.

But for now, I believe the most practical solution is to release the perf_event
event when you disable the NMI watchdog. That would at least provide a
way to run OProfile.

Ingo Molnar

unread,

Feb 14, 2010, 3:40:02 AM2/14/10

to

* Stephane Eranian <era...@google.com> wrote:

> Don,
>
> On Fri, Feb 12, 2010 at 5:59 PM, Don Zickus <dzi...@redhat.com> wrote:
> > On Fri, Feb 12, 2010 at 05:12:38PM +0100, Stephane Eranian wrote:
> >> Don,
> >>
> >> How is this new NMI watchdog code going to work when you also have OProfile
> >> enabled in your kernel?
> >>
> >> Today, perf_event disables the NMI watchdog while there is at least one event.
> >> By releasing the PMU registers, it also allows for Oprofile to work.
> >>
> >> But now with this new NMI watchdog code, perf_event never releases the PMU.
> >> Thus, I suspect Oprofile will not work anymore, unless the NMI watchdog is
> >> explicitly disabled. Up until now OProfile could co-exist with the NMI watchdog.
> >

> > You are right. ??Originally when I read the code I thought perf_event just
> > grabbed all the PMUs in reserve_pmc_init(). ??But I see that only happens

> > when someone actually creates a PERF_TYPE_HARDWARE event, which the new

> > nmi watchdog does. ??Those PMUs only get released when the event is

> > destroyed which my new code only does when the cpu disappears.
> >

> > So yeah, I have effectively blocked oprofile from working. ??I can change

> > my code such that when you disable the nmi_watchdog, you can release the
> > PMUs and let oprofile work.
> >
> > But then I am curious, considering that perf and oprofile do the same
> > thing, how much longer do we let competing subsystems control the same

> > hardware? ??I thought the point of the perf_event subsystem was to have a

> > proper framework on top of the PMUs such that anyone who wants to use it
> > just registers themselves, which is what the new nmi_watchdog is doing.
> >
> > I can add code that allows oprofile and the new nmi watchdog to coexist,

> > but things get a little ugly to maintain. ??Just wondering what the

> > gameplan is here?
>
> I believe OProfile should eventually be removed from the kernel. I suspect
> much of the functionalities it needs are already provided by perf_events.
> But that does not mean the OProfile user level tool must disappear. There
> is a very large user community. I think it could and should be ported to
> use perf_events instead. Given that the Oprofile users only interact
> through opcontrol, opreport, opannotate and such, they never "see" the
> actual kernel API. Thus by re-targeting the scripts, this should be mostly
> transparent to end-users.

Agreed, i suggested this a few months ago. (In theory the oprofile user-space
tooling might even use something like libpapi or the perfmon library, instead
of directly binding to the perf syscall.)

Until that is done we can keep the parallel oprofile infrastructure i guess,
as a legacy option.

Then again, when it comes to profiling, i cannot see anyone not wanting to
use perf itself, it's so much easier to use (and more capable) than the
oprofile tooling in pretty much any area. Basically all instrumentation and
profiling development happens in that space these days as well.

Those few places where you cannot use perf yet (on weird or old CPUs) are
being filled in quickly as well - so it might be easier to just widen PMU
support to all relevant places instead of spending effort on oprofile
compatibility.

> But for now, I believe the most practical solution is to release the
> perf_event event when you disable the NMI watchdog. That would at least
> provide a way to run OProfile.

Agreed.

Ingo

Robert Richter

unread,

Feb 15, 2010, 3:10:03 PM2/15/10

to

There is the perfctr reservation framework what is used by all
subsystems. Perf reserves all counters if there is one event actively
running. This is ok as long you use perf from the userspace for
profiling. Nobody uses 2 different profilers at the same time. But if
the counters are also for implementing in-kernel features such as a
watchdog that is enabled all the time, perf must be modified to only
allocate those counters that are actually needed, and events may not
be scheduled on counters that are already reserved.

> > I can add code that allows oprofile and the new nmi watchdog to coexist,
> > but things get a little ugly to maintain. �Just wondering what the
> > gameplan is here?

There is no longer kernel feature implementation for oprofile. But it
will be still in the kernel for a while until we can completely switch
to perf. Perf is improving very fast, compared to the ongoing
development the implementation effort for coexistence is small. So I
think we all can spend some time to also improve the counter
reservation code.

> I believe OProfile should eventually be removed from the kernel. I suspect
> much of the functionalities it needs are already provided by perf_events.
> But that does not mean the OProfile user level tool must disappear. There is
> a very large user community. I think it could and should be ported to use
> perf_events instead. Given that the Oprofile users only interact through
> opcontrol, opreport, opannotate and such, they never "see" the actual kernel
> API. Thus by re-targeting the scripts, this should be mostly transparent to
> end-users.

I think, porting the oprofile userland to work on top of a performance
library (libpapi or libpfm) would be the cleanest solution. Alternativly
we could also port the kernel part to use the in-kernel perf api.

>
> But for now, I believe the most practical solution is to release the perf_event
> event when you disable the NMI watchdog. That would at least provide a
> way to run OProfile.

This solution is fine to me. The current implemenation also has some
limitations for oprofile if the watchdog is enabled.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert....@amd.com