The angstrom demos have oprofile installed by default.
regards,
Koen
I saw some emails about oprofile having counter issues on the armv7.
Does anyone have a good summary of the issue and how it impacts
oprofile on the beagle?
Philip
Without going into the details, the summary of the problem is the following.
Under certain conditions, PMU unit of Cortex-A8 core (at least for
r1pX revisions which are used in beagleboard) gets messed up,
interrupts get disabled and oprofile stops collecting samples. If you
are profiling just some number crunching application which does not
use system calls much, you are unlikely to encounter it. On the other
hand, for example repeatedly calling 'gettimeofday' function in a
tight loop triggers the problem almost instantly.
Best regards,
Siarhei Siamashka
> for example repeatedly calling 'gettimeofday' function in a
> tight loop triggers the problem almost instantly.
Doesn't gettimeofday kinda kill perfomance anyway?
regards,
Koen
This is just a testcase to reproduce the problem. I don't feel much
relieved knowing that it takes a lot longer (dozens of seconds or even
several minutes instead of a fraction of second) to break when
profiling some real applications.
Best regards,
Siarhei Siamashka
Ever done an strace on firefox?
--
Måns Rullgård
ma...@mansr.com
I have used oprofile quite a bit when working on FFmpeg. Occasionally
a run will generate very few or no samples. Restarting it seems to
get it back on track again. FFmpeg is, of course, mostly in the
"number crunching" domain, and doesn't do many system calls, certainly
not gettimeofday.
--
Måns Rullgård
ma...@mansr.com
Siarhei Siamashka schrieb:
> Under certain conditions, PMU unit of Cortex-A8 core (at least for
> r1pX revisions which are used in beagleboard) gets messed up,
> interrupts get disabled and oprofile stops collecting samples. If you
> are profiling just some number crunching application which does not
> use system calls much, you are unlikely to encounter it. On the other
> hand, for example repeatedly calling 'gettimeofday' function in a
> tight loop triggers the problem almost instantly.
OMG, that makes using oprofile for a lot of applications quiet useless
IMO. :(
a) Is there a way of detecting that the issue has occured?
b) Is a workaround (e.g. kernel patch) possible to fix this issue reliably?
Regards
Robert
It is possible to check PMU state periodically (PMNC or CNTENS
registers for example) and if it unexpectedly changes to something
else (resets to zero) then it is broken.
> b) Is a workaround (e.g. kernel patch) possible to fix this issue reliably?
Well, thanks a lot for asking. Really. I just wanted to reply that no
practical workaround is available but then realized that I had
overlooked something simple :)
A patch with a workaround is attached. It is probably missing proper
locking/synchronization which would need to be added, but at least
should work in practice and seems to have almost no impact on
profiling statistics (samples which are related to 'watchdog' timer
activity which monitors PMU state get filtered out and are not taken
into account).
Testing and feedback is very much welcome.
Best regards,
Siarhei Siamashka
I took a quick look at your patch so I can't say much about it.
When I first heard of that bug, I had another idea: why not collect
active counters on a regular basis, accumulate the results and
clear the counters. I don't know if that fits well with oprofile, but that
would prevent any counter from overflowing (and so would prevent
the bug from occurring) provided the timer interrupt happens often
enough (I guess one second is enough given the frequency of
Cortex-A8). Does that make sense?
Laurent
This works fine if all that we need are only cycle precise timestamps
(for use with some kind of instrumentation at the beginning/end of the
interesting parts of code). But the core of Oprofile functionality is
the statistical sampling, it means that we actually want interrupts to
be generated, and lots of them. As the performance counter is more
likely to overflow in the code which uses a lot of cpu cycles (or
whatever other event being monitored), more interrupts will be
triggered in that code and recorded as oprofile samples.
Statistically, there will be more samples collected for the addresses
which are close to the performance bottlenecks. That's the basic idea,
kind of Monte Carlo method from mathematics.
A usable workaround for oprofile PMU based driver should not skew the
statistics and provide relevant results. That's what I'm trying to
achieve with a workaround patch. If it fails at this task and still
adds some noticeable unwanted 'noise' to the results, it just has to
be scrapped and a simple timer based driver should be used instead
(fortunately it can provide sufficient samples collection frequency).
If anybody could try some test profiling with and without workaround
applied and compare results for different test cases and
configurations, that would be very nice.
Best regards,
Siarhei Siamashka