Itanium Performance tools

pin...@gmail.com

unread,

Apr 26, 2007, 5:01:57 AM4/26/07

to

I'm trying to get greater speeds from our VMS application running on
Itanium. I've had moderate success going from using the lock manager
(sys$enqw) to using spinning bitlocks, but feel there is more to be
gained in other areas too. However, PCA does not have system service
analysis implemented on Itanium and when I use PCA to analyse
processor time in functions, 90% is spent in "SYSTEM$SPACE", which
doesn't tell me a great deal.

So, does anyone have any recommendations/suggestions on application
performance analysis tools that work on Itanium, please? I'm using VMS
v8.3 (ia64) with PCA v4.9.

John Reagan

unread,

Apr 26, 2007, 9:23:11 AM4/26/07

to

Since you didn't mention what you've already done, so I'll mention the
top three things to look for on I64 to improvement performance for you
and others.

1. Alignment Faults
2. Alignment Faults
3. Alignment Faults

Look at MONITOR ALIGN, if it is not showing 0 (or close to 0), then you
can have PCA collect alignment fault data for your image (assuming it is
YOUR image that is causing the faults). We might have to look for other
faulting processes however.

If MONITOR ALIGN says you are clean, then you can do some PC sampling
inside of SDA (start at SDA> PCS for quick help).

--
John Reagan
OpenVMS Pascal/Macro-32/COBOL Project Leader
Hewlett-Packard Company

kenneth...@verizon.net

unread,

Apr 26, 2007, 9:27:33 AM4/26/07

to

How about using SDA extensions ???

$ MCR SDA *
SDA> PRF LOAD

Others may be of interest -- IO, SPL, LCK, etc. depending on what
exactly you are looking for

Jur van der Burg

unread,

Apr 26, 2007, 9:38:27 AM4/26/07

to

Don't forget the FLT alignment fault SDA extension (since V8.3 (or was it 8.2?)).

Jur.

kenneth...@verizon.net

unread,

Apr 26, 2007, 9:52:11 AM4/26/07

to

On Apr 26, 5:01 am, "pin...@gmail.com" <pin...@gmail.com> wrote:

Sorry, may be a duplicate...

How about using SDA extensions?

$ MCR SDA *
SDA> PRF LOAD

Other extensions may be of interest -- IO, SPL, LCK, LNM, etc.,
depending on what you are tracking or what your application is using.

Ian Miller

unread,

Apr 26, 2007, 9:59:49 AM4/26/07

to

You could look at system service logging
http://h71000.www7.hp.com/doc/82FINAL/6549/6549pro_043.html#sys_svc_logging
but the best win is usually fixing any unaligned data accesses you
have.
Do MONITOR ALIGN to see the system wide rate and if it goes up when
running your application then use the use the Alignment fault utility
to trace them.
http://h71000.www7.hp.com/doc/82FINAL/6549/6549pro_030.html#sda_flt

Mark Daniel

unread,

Apr 26, 2007, 11:21:24 AM4/26/07

to

HP rx2600 (900MHz/1.5MB) OpenVMS I64 V8.3

CUR AVE MIN MAX

Kernel Fault Rate 1.33 1554.95 1.33 5007.02
Exec Fault Rate 0.00 0.00 0.00 0.00
Super Fault Rate 0.00 0.00 0.00 0.00
User Fault Rate 0.00 1116.18 0.00 3642.80

Total Fault Rate 1.33 2671.14 1.33 8553.53

I just ran a user mode (only) application (WASD) in serveral different
activities (main image only - no other process(es) involved), exercised
from another system using Apache Bench. Kernel mode faults were
consistently higher than user mode. Super and exec modes consistently
zero (even though RMS calls featured in at least some of the activity).

What can be inferred from such a snapshot?

Why not MONITOR ALIGN on Alpha where such issues have been emphasized
from the beginning?

TIA.

--
I wish one and all long and happy lives, no matter what may become of
them afterwards. Use sunscreen! Dont smoke cigarettes. Cigars, however,
are good for you ... Firearms are also good for you. Gunpowder has zero
fat and zero cholesterol. That goes for dumdums, too.
[Kurt Vonnegut; God Bless You, Dr Kevorkian]

John Reagan

unread,

Apr 26, 2007, 11:29:52 AM4/26/07

to

Mark Daniel wrote:

> HP rx2600 (900MHz/1.5MB) OpenVMS I64 V8.3
>
> CUR AVE MIN MAX
>
> Kernel Fault Rate 1.33 1554.95 1.33 5007.02
> Exec Fault Rate 0.00 0.00 0.00 0.00
> Super Fault Rate 0.00 0.00 0.00 0.00
> User Fault Rate 0.00 1116.18 0.00 3642.80
>
> Total Fault Rate 1.33 2671.14 1.33 8553.53
>
> I just ran a user mode (only) application (WASD) in serveral different
> activities (main image only - no other process(es) involved), exercised
> from another system using Apache Bench. Kernel mode faults were
> consistently higher than user mode. Super and exec modes consistently
> zero (even though RMS calls featured in at least some of the activity).
>
> What can be inferred from such a snapshot?
>
> Why not MONITOR ALIGN on Alpha where such issues have been emphasized
> from the beginning?
>
> TIA.

1) Some application is generating alignment faults.

2) Some piece of kernel code is also generating alignment faults. Could
be OpenVMS itself (I would call that a bug that we need to fix once we
identify the guilty party); could be some piece of your application if
you have kernel code.

3) On Alpha, alignment faults are fixed up quickly by the PAL code.
Since the PAL code understands the page table entries and has direct
access to the machine below OpenVMS, it can quickly fix up the alignment
fault and make sure that no other CPU is in the middle of deleting the
address space at the same time. On I64 however, all that happens is
that the chip interrupts OpenVMS. The OS has to go take some some
spinlocks, etc. to prevent other CPUs from playing with the memory in
question while the alignment fault is fixed up. On Alpha, an alignment
fault might cost 100 instructions give or take. On I64, it could be
10,000 to 15,000 instructions give or take (my SWAG, not really measured
with any level of confidence).

So you can do things like:

1) Use the SDA FLT extension to figure out which process/image is
faulting along with PC values. Using those, plus .MAPs, and .LISs files
, you can go back to the source.

2) PCA will collect fault data and plot it for you.

3) The debugger lets you say SET BREAK/UNALIGNED so it will stop at
alignment faults.

4) You can call things like SYS$PERM_REPORT_ALIGN_FAULT which will
generate messages to SYS$OUTPUT for alignment faults. It is
process-wide and will survive image rundown. You have to call
SYS$PERM_DIS_ALIGN_FAULT_REPORT (don't get me started on the confusing
naming scheme) to turn them off.

Bob Gezelter

unread,

Apr 26, 2007, 1:32:43 PM4/26/07

to

I concur with John and the others, alignment faults are a BIG penalty
on IA64 and should be completely eligible for removal.

- Bob Gezelter, http://www.rlgsc.com

Mark Daniel

unread,

Apr 26, 2007, 2:31:28 PM4/26/07

to

John Reagan wrote:
> Mark Daniel wrote:
>
>> HP rx2600 (900MHz/1.5MB) OpenVMS I64 V8.3
>>
>> CUR AVE MIN MAX
>>
>> Kernel Fault Rate 1.33 1554.95 1.33 5007.02
>> Exec Fault Rate 0.00 0.00 0.00 0.00
>> Super Fault Rate 0.00 0.00 0.00 0.00
>> User Fault Rate 0.00 1116.18 0.00 3642.80
>>
>> Total Fault Rate 1.33 2671.14 1.33 8553.53
>>
>> I just ran a user mode (only) application (WASD) in serveral different
>> activities (main image only - no other process(es) involved),
>> exercised from another system using Apache Bench. Kernel mode faults
>> were consistently higher than user mode. Super and exec modes
>> consistently zero (even though RMS calls featured in at least some of
>> the activity).
>>
>> What can be inferred from such a snapshot?
>>
>> Why not MONITOR ALIGN on Alpha where such issues have been emphasized
>> from the beginning?
>>
>> TIA.
>
>
> 1) Some application is generating alignment faults.
>
> 2) Some piece of kernel code is also generating alignment faults. Could
> be OpenVMS itself (I would call that a bug that we need to fix once we
> identify the guilty party); could be some piece of your application if
> you have kernel code.

No elevated modes, all user.

> 3) On Alpha, alignment faults are fixed up quickly by the PAL code.
> Since the PAL code understands the page table entries and has direct
> access to the machine below OpenVMS, it can quickly fix up the alignment
> fault and make sure that no other CPU is in the middle of deleting the
> address space at the same time. On I64 however, all that happens is
> that the chip interrupts OpenVMS. The OS has to go take some some
> spinlocks, etc. to prevent other CPUs from playing with the memory in
> question while the alignment fault is fixed up. On Alpha, an alignment
> fault might cost 100 instructions give or take. On I64, it could be
> 10,000 to 15,000 instructions give or take (my SWAG, not really measured
> with any level of confidence).

Orders of magnitude at any rate.

> So you can do things like:
>
> 1) Use the SDA FLT extension to figure out which process/image is
> faulting along with PC values. Using those, plus .MAPs, and .LISs files
> , you can go back to the source.

Application mainly doing network and file I/O, along with some internal
processing. Approx two minutes duration.

SDA> LOAD FLT
SDA> FLT START TRACE
[do some processing]
SDA> FLT STOP TRACE
SDA> FTL SHOW TRACE /SUMM

Edited results ...

Exception PC Count Exception PC
00000000.00147CB0 20520 SDA$SHARE+00147CB0 SDA$SHARE
00000000.00147990 19980 SDA$SHARE+00147990
00000000.001A61D0 9810 SDA$SHARE+001A61D0
00000000.001AB2F0 2978 SDA$SHARE+001AB2F0
00000000.001AB2F1 2978 SDA$SHARE+001AB2F1
00000000.001AB2A0 2161 SDA$SHARE+001AB2A0
00000000.001AB2C0 2161 SDA$SHARE+001AB2C0
00000000.001AB420 2117 SDA$SHARE+001AB420
FFFFFFFF.8049E180 1147 EXE_STD$CARRIAGE_C+00AC0 IO_ROUTINES
FFFFFFFF.8049E170 1146 EXE_STD$CARRIAGE_C+00AB0
00000000.0009F580 1140 SDA$SHARE+9F580 SDA$SHARE
00000000.001D6FB1 984 SDA$SHARE+001D6FB1
00000000.001D8410 940 SDA$SHARE+001D8410
00000000.0009F530 760 SDA$SHARE+9F530
00000000.0009F550 760 SDA$SHARE+9F550
00000000.0009F7A0 760 SDA$SHARE+9F7A0
FFFFFFFF.8045A460 700 IOC_STD$SIMREQCOM_C+00910 IO_ROUTINES
FFFFFFFF.8045A480 700 IOC_STD$SIMREQCOM_C+00930
FFFFFFFF.804B2D30 600 IO_ROUTINES+94B30
00000000.00142510 590 SDA$SHARE+00142510 SDA$SHARE
FFFFFFFF.804B2ED0 500 IO_ROUTINES+94CD0 IO_ROUTINES
00000000.002EE950 497 SDA$SHARE+002EE950 SDA$SHARE
FFFFFFFF.804B3D60 400 IO_ROUTINES+95B60 IO_ROUTINES
00000000.001D7081 376 SDA$SHARE+001D7081 SDA$SHARE
00000000.001D7210 376 SDA$SHARE+001D7210
00000000.002EF020 365 SDA$SHARE+002EF020
00000000.002EF020 365 SDA$SHARE+002EF020
8< snip 32 similar entries 8<
FFFFFFFF.80141770 6 EXE$PRIMITIVE_FORK_C+00070
SYSTEM_PRIMITIVES_MIN
FFFFFFFF.80141771 6 EXE$PRIMITIVE_FORK_C+00071
FFFFFFFF.80142A40 6 EXE_STD$IOFORK_CPU_C+00B60
FFFFFFFF.80142A50 6 EXE_STD$IOFORK_CPU_C+00B70
FFFFFFFF.805B2D51 2 PROCESS_MANAGEMENT+A1E51
PROCESS_MANAGEMENT
FFFFFFFF.805B2DB1 2 PROCESS_MANAGEMENT+A1EB1
FFFFFFFF.805B4C90 2 PROCESS_MANAGEMENT+A3D90
FFFFFFFF.805B6B20 2 PROCESS_MANAGEMENT+A5C20
FFFFFFFF.805B19F0 1 PROCESS_MANAGEMENT+A0AF0
FFFFFFFF.805B1A70 1 PROCESS_MANAGEMENT+A0B70
8< snip 16 similar entries 8<

Can this be understood in general terms (without needing to be an
internals specialist)?

> 2) PCA will collect fault data and plot it for you.
>
> 3) The debugger lets you say SET BREAK/UNALIGNED so it will stop at
> alignment faults.
>
> 4) You can call things like SYS$PERM_REPORT_ALIGN_FAULT which will
> generate messages to SYS$OUTPUT for alignment faults. It is
> process-wide and will survive image rundown. You have to call
> SYS$PERM_DIS_ALIGN_FAULT_REPORT (don't get me started on the confusing
> naming scheme) to turn them off.

Thanks for the useful explanations.

--
So it goes.
[Kurt Vonnegut; Slaughterhouse-Five]

Craig A. Berry

unread,

Apr 27, 2007, 12:08:47 AM4/27/07

to

In article <f0q963$eaa$1...@usenet01.boi.hp.com>,
John Reagan <john....@hp.com> wrote:

> pin...@gmail.com wrote:
> > I'm trying to get greater speeds from our VMS application running on
> > Itanium. I've had moderate success going from using the lock manager
> > (sys$enqw) to using spinning bitlocks, but feel there is more to be
> > gained in other areas too. However, PCA does not have system service
> > analysis implemented on Itanium and when I use PCA to analyse
> > processor time in functions, 90% is spent in "SYSTEM$SPACE", which
> > doesn't tell me a great deal.
> >
> > So, does anyone have any recommendations/suggestions on application
> > performance analysis tools that work on Itanium, please? I'm using VMS
> > v8.3 (ia64) with PCA v4.9.
> >
>
> Since you didn't mention what you've already done, so I'll mention the
> top three things to look for on I64 to improvement performance for you
> and others.
>
> 1. Alignment Faults
> 2. Alignment Faults
> 3. Alignment Faults

And I think #4 is exception handling, no? If you do a lot of setjmp()
/ longjmp(), it can make a difference to compile C code with
/DEFINE=__FAST_SETJMP. However, I don't know how common it is to be
limited by that. I built Perl this way, thinking it might make a
difference because every Perl opcode is a setjmp() / longjmp()
sequence, but it only made a difference of a few seconds in an
hour-long test (in other words, in the noise). In a raw (but
artificial) test consisting of nothing but setjmp() calls, however, it
was about 10,000 times faster to use __FAST_SETJMP. It saved
something like 50,000 nanoseconds per call, but 50,000 nanoseconds
still isn't very long. This on td183.testdrive.hp.com.

--
Posted via a free Usenet account from http://www.teranews.com

Jur van der Burg

unread,

Apr 27, 2007, 2:47:12 AM4/27/07

to

Beware that alignment faults showing up as kernel mode faults may very well be
caused by user mode code by passing unaligned parameters to system services.

Jur.

Bob Koehler

unread,

Apr 27, 2007, 9:00:07 AM4/27/07

to

In article <1331dhi...@corp.supernews.com>, Mark Daniel <mark....@vsm.com.au> writes:
>
> What can be inferred from such a snapshot?

If you have sufficient RAM, then I'd raise the system working set
a little to see if the kernel faults go away. On highly restricted
RAM systems I used to lower the system working set until those faults
just barely started to happen.

There is no such thing as a user-mode only program. You can't load
a program from the disk without using the kernel to catch the page
faults and do the disk I/O. (There's no separate program loader in
VMS, the OS just maps the pages and calls the entry point, the process
will probably experience a hard page fault during the call instruction).

Bob Koehler

unread,

Apr 27, 2007, 9:05:39 AM4/27/07

to

In article <1331dhi...@corp.supernews.com>, Mark Daniel <mark....@vsm.com.au> writes:
>

> HP rx2600 (900MHz/1.5MB) OpenVMS I64 V8.3
>
> CUR AVE MIN MAX
>
> Kernel Fault Rate 1.33 1554.95 1.33 5007.02
> Exec Fault Rate 0.00 0.00 0.00 0.00
> Super Fault Rate 0.00 0.00 0.00 0.00
> User Fault Rate 0.00 1116.18 0.00 3642.80
>
> Total Fault Rate 1.33 2671.14 1.33 8553.53
>
> I just ran a user mode (only) application (WASD) in serveral different
> activities (main image only - no other process(es) involved), exercised
> from another system using Apache Bench.

There's no such thing as a user mode only application. You can't
even load an application into RAM without using kernel services.
And I would suspect WASD does a good bit of I/O, which requires
the kernel at some point.

Hein RMS van den Heuvel

unread,

Apr 27, 2007, 10:50:07 AM4/27/07

to

On Apr 27, 9:00 am, koeh...@eisner.nospam.encompasserve.org (Bob
Koehler) wrote:

> In article <1331dhi6tv1e...@corp.supernews.com>, Mark Daniel <mark.dan...@vsm.com.au> writes:
>
> > What can be inferred from such a snapshot?
>
> If you have sufficient RAM, then I'd raise the system working set
> a little to see if the kernel faults go away.

BZZZZ....

Mark cut the MONI ALIGN screen a little short.
The header would have read: "ALIGNMENT FAULT STATISTICS"
Those faults are NOT page faults, but alignment fault... and a lot!
Increasing memory will have zero effect on this as you now surely
realize.

Cheers,
Hein.

Bob Koehler

unread,

Apr 27, 2007, 1:48:58 PM4/27/07

to

In article <1177685406.8...@t38g2000prd.googlegroups.com>, Hein RMS van den Heuvel <heinvand...@gmail.com> writes:
> On Apr 27, 9:00 am, koeh...@eisner.nospam.encompasserve.org (Bob
> Koehler) wrote:
>> In article <1331dhi6tv1e...@corp.supernews.com>, Mark Daniel <mark.dan...@vsm.com.au> writes:
>>
>> > What can be inferred from such a snapshot?
>>
>> If you have sufficient RAM, then I'd raise the system working set
>> a little to see if the kernel faults go away.
>
> BZZZZ....
>

Yeah, I realized that right after I posted and canceled the post,
but I suppose it got to some servers anyhow.

John Reagan

unread,

Apr 27, 2007, 2:14:44 PM4/27/07

to

Mark Daniel wrote:

>
> Application mainly doing network and file I/O, along with some internal
> processing. Approx two minutes duration.
>
> SDA> LOAD FLT
> SDA> FLT START TRACE
> [do some processing]
> SDA> FLT STOP TRACE
> SDA> FTL SHOW TRACE /SUMM
>
> Edited results ...
>
> Exception PC Count Exception PC
> 00000000.00147CB0 20520 SDA$SHARE+00147CB0 SDA$SHARE
> 00000000.00147990 19980 SDA$SHARE+00147990
> 00000000.001A61D0 9810 SDA$SHARE+001A61D0
> 00000000.001AB2F0 2978 SDA$SHARE+001AB2F0
> 00000000.001AB2F1 2978 SDA$SHARE+001AB2F1

The exception PC symbolization is comes out with SDA$SHARE+xxxxxxx since
you are in SDA.

Look at the SHOW TRACE without the /SUMMARY. Find one of the lines with
147CB0 and find the process index. Now do a SET PROC/INDEX with that
number and repeat the SHOW TRACE/SUMMARY. The symbolization will be
more informative.

To figure out the image that the process is running, you can do things like:

SHOW PROC/CHANNEL
SHOW PROC/IMAGE

Once you have the image name that the process is running, the
symbolization plus the .MAP; plus the .LIS, should eventually point you
to the right place.

John Reagan

unread,

Apr 27, 2007, 2:24:01 PM4/27/07

to

Jur van der Burg wrote:
> Beware that alignment faults showing up as kernel mode faults may very
> well be
> caused by user mode code by passing unaligned parameters to system
> services.
>
> Jur.

Ah yes. Forgot about that.