Thanks a bunch....
----------------------------------------------------------------
Michael Caffrey PHONE:(505) 667-2422
MS: D440 FAX:(505)665-4197
Los Alamos National Laboratory EMAIL: m...@lanl.gov
Los Alamos, NM 87545 GROUP: NIS-3
We use a Vmetro VME bus analyzer. By simply writing to a memory location a
value of say 0x0 at the beginning of the function and a value of 0x1 at the
end then the Vmetro displays the precise time required for the function. The
only overhead is the actual VME writes which could be determined also with
the Vmetro and removed from the overall time of the function. Of course, this
is a real pain if you have either a lot of code or a need for very fine
granularity over the entire program. This method has proven very useful for
our needs.
Don
The easiest way to profile vxWorks tasks is with the spy profiler.
There are other profilers, namely ScopeProfile from RTI which is
part of Stethoscope.
Timex and timexN are routines which can provide actual times
of execution for routines based on when they begin and end.
Provided with Windview is typically functions for doing
timestamp so you can install your own timestamp points if you wish.
I.e., see sysTimestamp, sysTimestampEnable, sysTimestampPeriod, etc.
These would be included in the timer's driver source file.
Wind View can be used to provide profile information but currently
you have to gather the times for a period of time for all the task
you want to profile and add them up, etc. WindView provides
(when a timestamp timer is installed) up to nanosecond or microsecond
resulotion for the times provided. Typically one would click the
left mouse button at a couple of points (horizontally) on the graph
and this would bound the area and print the time difference.
The act of profiling typically would not affect the act of execution
if the spy profiler task was a lessor priority than you application;
otherwise it may.
compiling with -g does not necessarily slow down the speed of execution,
however it does not call the compiler's optimization either.
I hope this gives you a clue on how you might do what you need.
Jon Hillman
Field Applications Engineer
Wind River Systems
Orlando, FL
(407) 273-4646
jhil...@wrs.com
>The easiest way to profile vxWorks tasks is with the spy profiler.
I agree, using the spyLib routines that come with VxWorks are the easiest
mechanism to use to measure performance. However, they only give you
information about CPU utilization at the task level. If you want to
see what percentage of the CPU is being used by each task then they can
be ok.
One thing to note, though, is that the spy routines use the "auxiliary"
clock on your CPU. This is not a problem unless your CPU doesn't have
a second clock or you are already using it. Also, spy uses a "sampling"
approach to measuring performance; it sees what task is running each
time a periodic interrupt occurs. In some applications, this approach
to performance measurement can yield inaccurate results.
>There are other profilers, namely ScopeProfile from RTI which is
>part of Stethoscope.
The ScopeProfile routines in Stethoscope are quite powerful for doing
profiling. The Stethoscope User Manual has a good description of the
issues involved with profiling; where it works well and what the
limitations are. Like any "profile" based performance measurement,
ScopeProfile works by keeping track of what routines are running at
periodic intervals. It gives information on what percentage of execution
time is being used by the various global functions in VxWorks and your
application. Results can be skewed if you have a lot of local functions
or if your application has the same periodicity as the profiling clock
(which, as with spyLib, is the "auxiliary" clock by default). Using
the profiler is the best way to get an idea of percentage CPU utilization
at the function level.
>Timex and timexN are routines which can provide actual times
>of execution for routines based on when they begin and end.
These routines run a function and measure how many clock ticks elapse.
You need to run enough iterations such that the elapsed time is a
reasonable number of clock ticks. The resulting time never accounts
for the overhead of the interrupts that occur while the timing is being
done. These routines can be useful for testing simple functions or
sequential pieces of an application. They are not typically much use
for measuring performance of a complete application.
>Wind View can be used to provide profile information but currently
>you have to gather the times for a period of time for all the task
>you want to profile and add them up, etc. WindView provides
>(when a timestamp timer is installed) up to nanosecond or microsecond
>resulotion for the times provided. Typically one would click the
>left mouse button at a couple of points (horizontally) on the graph
>and this would bound the area and print the time difference.
Using WindView provides very different information than using the profiling
tools. WindView shows a time line of critical events within a processor.
Wind River has instrumented the VxWorks kernel to automatically provide
information on kernel events. You can easily see the overhead due to
interrupts and the effect of task switching. You can add event logging
calls to your code to measure the duration of specific pieces of code
that you are interested in. If your processor has a high resolution
clock that the timestamp routines can use, then you can get very accurate
measurements of how long things take.
I have been using these specific VxWorks tools for a few years but have
used event logging and profiling tools in general for many years. My
experience is that profiling tools are good for optimizing application
code. Tools like WindView are of much more general use for detecting
why an application isn't working. I typically find event logging tools to
be the primary debug tool during the stage of a project where you are
integrating all the pieces together and want to see multitasking
effects or find out why things stop working.
>The act of profiling typically would not affect the act of execution
>if the spy profiler task was a lessor priority than you application;
>otherwise it may.
Using the spy or profiling routines adds very little execution overhead
to your application since they only collect information periodically.
Using WindView can add a bit more overhead since events are typically
logged at a fairly high rate and there is overhead for offloading the
collected event data to a host for display. For a "normal" application
something in the ballpark of 5% CPU utilization for event logging is about
what you could expect. If you log lots of events, this figure can go up.
It turns out that my company is getting ready to release a product that
makes WindView operate even better in a number of ways. To avoid too much
commercial content, I will simply say our product (called TraceMaker)
works in VME based systems and:
- takes care of offloading event data so overhead of logging is lower
- makes WindView "post-mortem" mode easy to use
- provides a microsecond clock that is used for all event timestamping; this
allows accurate event timing within and between processors
Anyway, we will be demoing the product at the VxWorks User Group meeting
later this month; stop by and take a look if you want. Now back to the
program.
>compiling with -g does not necessarily slow down the speed of execution,
>however it does not call the compiler's optimization either.
Using the -g option with the GNU tool chain (unlike with many compilers)
has no effect on code optimization. The -g option for debugging causes
symbol table information to be included in your program. This makes the
object code bigger and makes it take a little longer to load but doesn't
make it run any differently. You need -g so that tools like scopeProfile
can tell what part of your code is running when they sample execution.
You don't need -g for spy or WindView use.
>I hope this gives you a clue on how you might do what you need.
I provided more detail here on the VxWorks performance assessment tools
because I view them as an important resource in getting projects done
well. Fred
| Fred J Roeber, Bolt Beranek and Newman Inc |
| 50 Enterprise Place Middletown, RI 02842-5202 |
| fro...@bbn.com 401-849-2543 (X48) |
>> Submitted-by m...@globesat.lanl.gov Wed Nov 1 07:46:13 1995
>> Submitted-by: m...@globesat.lanl.gov (Michael Caffrey)
>>
>> I saw a message a while back asking about profiling vxworks
>> tasks, but no reply. I would like to ask that question again, what is
>> the easiest way to profile vxworks? I have briefly looked at windview
>> and it looks like it may be possible to use, but it isn't clear how to
>> interpret the time figures it produces, are they clock ticks or ms ?
>> Also, it isn't clear whether the act of profiling actually effects the
>> speed of execution; windview has a server task and the code must be
>> compiled with a -g, both of which must slow a task. Is this the
>> uncertainty principle in action? does it give a worst case only?
>> Do any of you grizzled old vxperts have an opinion?
Hi,
I've appended a short (marketing) description of our ScopeProfile product.
This is included with StethoScope.
-- Stan
=============================================================================
= = =
= Stan Schneider = email: st...@rti.com =
= Real-Time Innovations, Inc. = Phone: (408) 720-8312 =
= 954 Aster, Sunnyvale, CA 94086 = Fax: (408) 720-8419 =
= = =
=============================================================================
===============================================================================
Product name: ScopeProfile
Overview: Real-time dynamic execution profiler.
Category: Software, Development tools
Highlights: Detailed procedure-by-procedure analysis of CPU usage.
Tree or flat structure model.
Quickly spot performance bottlenecks.
Minimally intrusive. Run your code at full execution speed.
No special compilation. Analyze already running code.
ScopeProfile is a dynamic execution profiler for VxWorks. It shows you exactly
where you're spending your CPU cycles.
Profiling task 0x3b20e0: net4
Current
sum % direct %
10.43 .12 _TcpWrite
10.31 0.01 . _write
10.27 .18 . . _iosWrite
10.06 .16 . . . _send
9.35 .64 . . . . _sosend
7.92 .27 . . . . . _tcp_usrreq
7.25 .88 . . . . . . _tcp_output
4.97 .49 . . . . . . . _ip_output
3.87 0.06 . . . . . . . . _eiOutput
3.80 .51 . . . . . . . . . _ether_output
2.32 .31 . . . . . . . . . . _eiTxStartup
1.12 1.12 . . . . . . . . . . . _bcopy
.69 .69 . . . . . . . _cksum
.63 .63 . . . . . _bcopy
The example shows a network server's TCP write activity. The profile clearly
shows the overhead incurred at each layer: the I/O system, the socket layer,
TCP, IP, and the ethernet driver. Both cumulative (including all subroutines)
and direct CPU usage of each routine are captured. You can easily see how much
time was required to copy data (bcopy) and calculate checksums (cksum).
The output can be plotted in real-time with StethoScope. ScopeProfile also
keeps a task-level execution record, and can function as a graphical "spy".
ScopeProfile is both powerful and easy to use. ScopeProfile will help you tune
your time-critical system for maximum performance.
Great summary! Thanks, Fred.
A couple very minor corrections:
* ScopeProfile does not require the "-g" flag, but it does require that you
load the local symbols (ld "syms" flag = 1)
* local functions shouldn't affect the profile
* ScopeProfile can use any clock source (in fact, we ship (free to all!)
drivers for the other two clocks available on VIC chips, e.g. the mv167
board. (send requests to "in...@rti.com")
We've had many requests to explain the diff between StethoScope, ScopeProfile,
and WindView. Here's our canned answer; it seems appropriate here:
>> What is the difference between StethoScope and WindView?
>>
There is a superficial similarity---both graphically represent information
gathered from an executing VxWorks target---but there's very little real
functional overlap.
There are three "types" of visualization/analysis tools for VxWorks: a
source-level debugger (vxgdb), a real-time data monitor (StethoScope), and a
operating system visualization tool (WindView).
A debugger can be thought of as similar to a voltmeter. It measures static
values. It's nearly useless for finding peak values, occasional "glitches", or
time-dependent quantities like durations of events or noise characteristics.
StethoScope is like a digital oscilloscope; it displays all these with ease.
WindView is like a logic analyzer. It doesn't show analog values (program
variables), but it excels at displaying events, such as task switches,
semaphore activity, interrupts, etc.
In particular, StethoScope is a window into your *application*. You can view
any variable or memory location in your system. For example, you can see how
much overshoot your controller has, measure how long it took a door to close,
analyze noise in a sensor, or see how long a data queue is getting.
StethoScope will graphically display your data, collect it for analysis, and
save it to disk. StethoScope helps you understand what your program is doing.
WindView, on the other hand, allows you to visualize the relationships and
timing between operating system events. For example, you can measure how long
it took for an interrupt to execute, how much later that interrupt was
serviced, etc. It's also useful for watching and understanding task switching
activity: which tasks are running, why tasks are preempted, how long the task
switch took, etc. Issues like semaphore priority inversion, lock-outs, and
interrupt latency are easily revealed.
So, StethoScope shows you what your VxWorks application is doing, and WindView
shows you how VxWorks is running your application. There's a big difference.
There's almost nothing WindView can show that could be viewed with StethoScope
and vice-versa.
On another note, StethoScope also includes a dynamic execution profiler known
as ScopeProfile. ScopeProfile displays which routines in your system are using
the CPU, as a percentage of total CPU time. It's very useful for tuning your
application code for maximum performance. WindView is also useful for this
purpose, but it shows entirely different information. The tools are quite
complementary.
For example, if your problem is that your application is thrashing (spending
most of its time context switching), ScopeProfile will simply report that some
large percentage of the CPU is being spent in the kernel. It can show you
which kernel routines are active, but that may not be of much help. WindView
will show you which tasks are switching, and display the events (e.g.
interrupts) that are causing the switches.
On the other hand, if the problem is that you have one task with inefficient
code, WindView will only tell you that one task is running most of the time.
ScopeProfile will break down the individual routines within the task that are
burning the CPU in great detail. It will give you a direct map of what that
task is doing, what routines are being called, what routines they call, and
point out exactly where the inefficiencies are. ScopeProfile provides a
detailed function-by-function analysis.
So, both WindView and ScopeProfile will help tune the performance of your
code. However, they provide different perspectives: WindView offers precise
measurements of event timing, ScopeProfile provides a detailed analysis of
where your CPU time is being used.