Profiling on ARM

sundee...@gmail.com

unread,

Feb 9, 2009, 10:14:24 PM2/9/09

to Beagle Board

Hi,

I want to profile my application running on the ARM core on
BeagleBoard. I am using Code Sourcery's (free) GNU toolchain to cross-
compile my app. I wonder if the best way to profile would be using
GNUProf, which Code Sourcery provides in the toolchain? Are there
other (recommended) ways of profiling?

Also, currently I have my app compiled and tested on top of Ubuntu
running on the ARM. Since the intended target is an embedded platform,
would it be better to profile the app running on an embedded OS (like
Angstrom)?

Thanks & cheers,
Sunny

Koen Kooi

unread,

Feb 10, 2009, 8:27:59 AM2/10/09

to beagl...@googlegroups.com

Op 10 feb 2009, om 04:14 heeft sundee...@gmail.com het volgende
geschreven:

The angstrom demos have oprofile installed by default.

regards,

Koen

PGP.sig

Philip Balister

unread,

Feb 10, 2009, 8:30:25 AM2/10/09

to beagl...@googlegroups.com

I saw some emails about oprofile having counter issues on the armv7.
Does anyone have a good summary of the issue and how it impacts
oprofile on the beagle?

Philip

Siarhei Siamashka

unread,

Feb 10, 2009, 8:45:55 AM2/10/09

to beagl...@googlegroups.com

On Tue, Feb 10, 2009 at 3:30 PM, Philip Balister
<philip....@gmail.com> wrote:
> On Tue, Feb 10, 2009 at 8:27 AM, Koen Kooi <ko...@beagleboard.org> wrote:
>> Op 10 feb 2009, om 04:14 heeft sundee...@gmail.com het volgende

>>> Hi,
>>>
>>> I want to profile my application running on the ARM core on
>>> BeagleBoard. I am using Code Sourcery's (free) GNU toolchain to cross-
>>> compile my app. I wonder if the best way to profile would be using
>>> GNUProf, which Code Sourcery provides in the toolchain? Are there
>>> other (recommended) ways of profiling?
>>>
>>> Also, currently I have my app compiled and tested on top of Ubuntu
>>> running on the ARM. Since the intended target is an embedded platform,
>>> would it be better to profile the app running on an embedded OS (like
>>> Angstrom)?
>>
>> The angstrom demos have oprofile installed by default.
>
> I saw some emails about oprofile having counter issues on the armv7.
> Does anyone have a good summary of the issue and how it impacts
> oprofile on the beagle?

Without going into the details, the summary of the problem is the following.

Under certain conditions, PMU unit of Cortex-A8 core (at least for
r1pX revisions which are used in beagleboard) gets messed up,
interrupts get disabled and oprofile stops collecting samples. If you
are profiling just some number crunching application which does not
use system calls much, you are unlikely to encounter it. On the other
hand, for example repeatedly calling 'gettimeofday' function in a
tight loop triggers the problem almost instantly.

Best regards,
Siarhei Siamashka

Koen Kooi

unread,

Feb 10, 2009, 8:52:49 AM2/10/09

to beagl...@googlegroups.com

Op 10 feb 2009, om 14:45 heeft Siarhei Siamashka het volgende
geschreven:

> for example repeatedly calling 'gettimeofday' function in a
> tight loop triggers the problem almost instantly.

Doesn't gettimeofday kinda kill perfomance anyway?

regards,

Koen

PGP.sig

Siarhei Siamashka

unread,

Feb 10, 2009, 9:00:11 AM2/10/09

to beagl...@googlegroups.com

This is just a testcase to reproduce the problem. I don't feel much
relieved knowing that it takes a lot longer (dozens of seconds or even
several minutes instead of a fraction of second) to break when
profiling some real applications.

Best regards,
Siarhei Siamashka

Måns Rullgård

unread,

Feb 10, 2009, 9:03:35 AM2/10/09

to beagl...@googlegroups.com

Koen Kooi <ko...@beagleboard.org> writes:

Ever done an strace on firefox?

--
Måns Rullgård
ma...@mansr.com

Måns Rullgård

unread,

Feb 10, 2009, 9:05:59 AM2/10/09

to beagl...@googlegroups.com

Siarhei Siamashka <siarhei....@gmail.com> writes:

I have used oprofile quite a bit when working on FFmpeg. Occasionally
a run will generate very few or no samples. Restarting it seems to
get it back on track again. FFmpeg is, of course, mostly in the
"number crunching" domain, and doesn't do many system calls, certainly
not gettimeofday.

--
Måns Rullgård
ma...@mansr.com

Robert Schuster

unread,

Feb 10, 2009, 11:01:54 AM2/10/09

to beagl...@googlegroups.com

Hi,

Siarhei Siamashka schrieb:

> Under certain conditions, PMU unit of Cortex-A8 core (at least for
> r1pX revisions which are used in beagleboard) gets messed up,
> interrupts get disabled and oprofile stops collecting samples. If you
> are profiling just some number crunching application which does not
> use system calls much, you are unlikely to encounter it. On the other
> hand, for example repeatedly calling 'gettimeofday' function in a
> tight loop triggers the problem almost instantly.

OMG, that makes using oprofile for a lot of applications quiet useless
IMO. :(

a) Is there a way of detecting that the issue has occured?

b) Is a workaround (e.g. kernel patch) possible to fix this issue reliably?

Regards
Robert

signature.asc

Siarhei Siamashka

unread,

Feb 17, 2009, 4:15:14 PM2/17/09

to beagl...@googlegroups.com

On Tue, Feb 10, 2009 at 6:01 PM, Robert Schuster <theBo...@gmx.net> wrote:
> Hi,
>
> Siarhei Siamashka schrieb:
>> Under certain conditions, PMU unit of Cortex-A8 core (at least for
>> r1pX revisions which are used in beagleboard) gets messed up,
>> interrupts get disabled and oprofile stops collecting samples. If you
>> are profiling just some number crunching application which does not
>> use system calls much, you are unlikely to encounter it. On the other
>> hand, for example repeatedly calling 'gettimeofday' function in a
>> tight loop triggers the problem almost instantly.
> OMG, that makes using oprofile for a lot of applications quiet useless
> IMO. :(
>
> a) Is there a way of detecting that the issue has occured?

It is possible to check PMU state periodically (PMNC or CNTENS
registers for example) and if it unexpectedly changes to something
else (resets to zero) then it is broken.

> b) Is a workaround (e.g. kernel patch) possible to fix this issue reliably?

Well, thanks a lot for asking. Really. I just wanted to reply that no
practical workaround is available but then realized that I had
overlooked something simple :)

A patch with a workaround is attached. It is probably missing proper
locking/synchronization which would need to be added, but at least
should work in practice and seems to have almost no impact on
profiling statistics (samples which are related to 'watchdog' timer
activity which monitors PMU state get filtered out and are not taken
into account).

Testing and feedback is very much welcome.

Best regards,
Siarhei Siamashka

0001-ARM-OMAP-Cortex-A8-r1-PMU-bug-workaround-for-oprof.patch

Laurent Desnogues

unread,

Feb 17, 2009, 4:46:28 PM2/17/09

to Siarhei Siamashka, beagl...@googlegroups.com

On Tue, Feb 17, 2009 at 10:15 PM, Siarhei Siamashka
<siarhei....@gmail.com> wrote:
> A patch with a workaround is attached. It is probably missing proper
> locking/synchronization which would need to be added, but at least
> should work in practice and seems to have almost no impact on
> profiling statistics (samples which are related to 'watchdog' timer
> activity which monitors PMU state get filtered out and are not taken
> into account).
>
> Testing and feedback is very much welcome.

I took a quick look at your patch so I can't say much about it.

When I first heard of that bug, I had another idea: why not collect
active counters on a regular basis, accumulate the results and
clear the counters. I don't know if that fits well with oprofile, but that
would prevent any counter from overflowing (and so would prevent
the bug from occurring) provided the timer interrupt happens often
enough (I guess one second is enough given the frequency of
Cortex-A8). Does that make sense?

Laurent

Siarhei Siamashka

unread,

Feb 18, 2009, 12:33:14 PM2/18/09

to Laurent Desnogues, beagl...@googlegroups.com

This works fine if all that we need are only cycle precise timestamps
(for use with some kind of instrumentation at the beginning/end of the
interesting parts of code). But the core of Oprofile functionality is
the statistical sampling, it means that we actually want interrupts to
be generated, and lots of them. As the performance counter is more
likely to overflow in the code which uses a lot of cpu cycles (or
whatever other event being monitored), more interrupts will be
triggered in that code and recorded as oprofile samples.
Statistically, there will be more samples collected for the addresses
which are close to the performance bottlenecks. That's the basic idea,
kind of Monte Carlo method from mathematics.

A usable workaround for oprofile PMU based driver should not skew the
statistics and provide relevant results. That's what I'm trying to
achieve with a workaround patch. If it fails at this task and still
adds some noticeable unwanted 'noise' to the results, it just has to
be scrapped and a simple timer based driver should be used instead
(fortunately it can provide sufficient samples collection frequency).

If anybody could try some test profiling with and without workaround
applied and compare results for different test cases and
configurations, that would be very nice.

Best regards,
Siarhei Siamashka

Kai

unread,

Mar 20, 2013, 6:31:10 AM3/20/13

to beagl...@googlegroups.com, Laurent Desnogues

I know this is old thread regarding profiling on arm. I tried the patch on kernel 2.6.32. but oprofile still doesn't work on XM.
I also checked PMNC by using
.
.
.
__asm__ __volatile__ ( "mrc P15, 0, %0, c9, c13, 2" : "=r" (count));

it is still zero. Has someone solved this issue.

Reply all

Reply to author

Forward