Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Floating point performance of UNIX workstations.

13 views
Skip to first unread message

Dr. David Kirkby

unread,
Jun 7, 2003, 12:07:10 AM6/7/03
to
I have a number of oldish computers on which I check portability of
software.
I've collected some peformance data that might be of interest to
some.
Some of it, espeically reguard to the dismal performance of a Dec
Alpha, is very odd.

Here's the data - notes about it follow afterwards.

The column marked: t_seq is the time in seconds to
run on one processor.

The column marked: t_par is the time is seconds to
run on multiple processors.


HARDWARE and EXECUTION TIMES:
-----------------------------------------

Hostname Machine CPU(s) t_seq t_par speedup
efficiency
----------------------------------------------------------------
blackbird Sun SPARCstation 20 75 MHz 1601 N/A N/A N/A
bluetit Sun SPARCstation 20 2x125 MHz 1141 682 1.67
0.837
cheetah PC, Solaris 8 (x86) 2x450 MHz 410 255 1.61
0.804
crow Sun SPARCstation 20 75 MHz 3457 3522 0.98
0.982
dobermann[cc] Dec Alpha PW 600a 600 MHz 950 N/A N/A N/A
dove Sun SPARCstation 20 75 MHz 1647 N/A N/A N/A
robin HP C3000 400 MHz 157 N/A N/A
N/A
sparrow[cc] Sun Ultra 80 4x450 MHz 272 83 3.28
0.819
sparrow[gcc] Sun Ultra 80 4x450 MHz 261 86 3.03
0.759
tiger Pentium II PC 350 MHz 333 N/A N/A
N/A
woodpecker Sun SPARCstation 20 2x125 MHz 1035 FAIL FAIL
FAIL


CACHE + RAM SIZES - Cache data might not be 100% accurate.
hostname Machine RAM Cache
-----------------------------------------------------------
blackbird Sun SPARCstation 20 256 Mb 1.000 Mb
bluetit Sun SPARCstation 20 320 Mb 0.250 Mb
cheetah PC, Solaris 8 (x86) 768 Mb 0.544 Mb 16kb L1
data+16 kb L1 inst. + 512 kb L2
crow Sun SPARCstation 20 320 Mb 1.000 Mb
dobermann Dec Alpha PW 600a 1024 Mb 4.000 Mb
dove Sun SPARCstation 20 288 Mb 1.000 Mb
robin HP C3000 1536 Mb 1.500 Mb (1 Mb
data, 0.5 Mb instruction)
sparrow Sun Ultra 80 4096 Mb 4.000 Mb
tiger Pentium II PC 128 Mb 0.544 Mb 16 kb L1
data+16 kb L1 inst. + 512 kb L2
woodpecker Sun SPARCstation 20 224 Mb 0.250 Mb


hostname OS-version kernel gsl compiler MPCIH
---------------------------------------------------------------------
blackbird NetBSD 1.6 1.1 gcc-2.93.3 NI
bluetit Solaris 2.5 0.5 gcc-2.93.3 NI
cheetah Solaris 8 x86 --- gcc-3.2.3 NI
crow OpenBSD 3.2 --- gcc-2.93.3 NI
dobermann Tru64 5.1B 1.4 V6.5-011 NI
dove Debian ??? 2.2.20 1.2 gcc-2.96 NI
robin HP-UX 11 --- gcc-3.2.2 NI
sparrow[gcc] Solaris 9 112233-03 1.3 gcc-3.2.2 1.2.5
sparrow[cc] Solaris 9 112233-03 1.3 cc-5.3 1.2.5
tiger Redhat 7.2 2.4.10-5 0.8 gcc-2.96 1.2.5
woodpecker Linux 6.2 2.2.14-5.0smp 1.0 egcs-2.91.66 NI

The makefile for the program

http://atlc.sourceforge.net/ (version 4.3.1 to be precise)

runs about 90 checks, the last of which is a longish floating
point intensive computation. I did not design this as a
benchmark, but given it's what I want the software to do, for
me at least, it's a useful benchmark. Others might like to look
at these results. In particular I'd like any comments on why a
Dec Alpha seems to fare so badly.

The machine with the slowest clock speed is a Sun SPARCstation
20 with a 75 MHz CPU, the fastest clock speed a Dec Alpha
Personal Workstation 600a, running at 600 MHz. There's a couple
of machines with multiple-processors, including one with 4x450
MHz UltraSPARC II CPUs in a Sun Ultra 80.

One of the 75 MHz Suns (crow running OpenBSD) seems very
slow, but I think that must be the operating system, as
almost identical Sun hardware (blackbird and dove) are
twice as fast.

What I can't understand is quite why the Dec Alpha seems to
fair so very badly, when I run a floating point intensive simulation
on these systems. Whilst not the poorest performer,
the Dec's performance is pretty poor considering its clock
speed. See the data above, which I collected.

A simulation taking 950s ( on the Dec Alpha running at 600 MHz
is taking a mere 157 s on a 400 MHz HP C3000, or 260 s on a
single 450 MHz UltraSPARC II chip. A 75 MHz Sun SPARCstation 20
is taking less that twice the time of the Dec Alpha, yet is
running at one eight of the clock speed. (Some data below
above for parallel code, so this clearly runs quicker on the quad
processor Sun Ultra 80).

I don't expect a 600 MHz Dec Alpha to outperform a quad 450
MHz Sun, but looking at this data, one can't help but notice
how poor the Dec Alpha performs. The code for the Dec Alpha was
compiled using Compaq's compiler, not gcc.

The memory used in the simulation is much greater than the
cache size of the largest processor (4 Mb on UltraSPARC II
chip). I suspect the cache hit rate is quite low, since memory
is not accessed in the best possible manner - something I
intend to correct later.

See notes after the data, to understand the data better. If
anyone want to run this, download the code from

http://atlc.sourceforge.net/

Use version 4.3.1 (not the latest, as I intend changing
the algorithm later, so data wont be conisistanct unless
the same version of the software is used.


$ configure --with-threads
$ make check
don't worry about the libraries that it asks for, since
these are not needed to run this test.


Notes:
1) Ignore columns mark'ed gsl and MPCIH.

2) t_seq is the time in seconds to execute, using a
sequential algorithm which does *not* exploit multiple CPUs.

3) t_par is the time to execute, using an identical
algorithm to above, but written to exploit multiple CPUs,
using POSIX threads.

4) 'speedup' is defined as the ratio of t_par/t_seq.

5) 'efficiency' is speedup/number_of_CPUs present. Ideally
this would be 1.0.

6) Tests on 'woodpecker' a Sun SPARC 20 running Redhat
Linux, failed when built for parallel operation, so
results are not shown, despite the fact his machine
had dual processors and the kernel was supposed to support
multiple CPUs. I think the thread support is Redhat 6.2
for SPARC is suspect.

7) Data for both gcc and cc are given on the Sun Ultra 80 (sparrow).

8) I suspect the code does not make good use of the
cache, and will in future be changed to make better
use of the cache. However, quite why the 600 MHz
Dec Alpha, running Compaq's compiler performs so badly
is a mystery.

9) This is essentially floating point intensive. IO
should have little effect, although all machines
have 10,000 rpm SCSI disks, of various vintages. There is
some integer arithmetic, but is is small compared to the
floating point.


10) Machines had differing memory from 128 Mb to
4 Gb, but none should be swapping.


--
Dr. David Kirkby,
Senior Research Fellow,
Department of Medical Physics,
University College London,
11-20 Capper St, London, WC1E 6JA.
Tel: 020 7679 6408 Fax: 020 7679 6269
Internal telephone: ext 46408
e-mail da...@medphys.ucl.ac.uk

Dr. David Kirkby

unread,
Jun 7, 2003, 12:17:12 AM6/7/03
to
I'm sending this again, as the table got wrapped, making it even
more difficult to read.

GreyCloud

unread,
Jun 7, 2003, 2:45:36 PM6/7/03
to
"Dr. David Kirkby" wrote:
>
> I have a number of oldish computers on which I check portability of
> software.
> I've collected some peformance data that might be of interest to
> some.
> Some of it, espeically reguard to the dismal performance of a Dec
> Alpha, is very odd.
>

You might post this over in comp.os.vms and see what they
have to say.

Bradford J. Hamilton

unread,
Jun 7, 2003, 3:28:18 PM6/7/03
to

If you look closely at the tables, you'll notice that he's running TRU64;
hence, his post to comp.unix.tru64. Posting to c.o.v. would be OT,
technically.

_________________________________________________________________
Bradford J. Hamilton "All opinions are my own"
bMradAha...@atMtAbi.cPoSm "Lose the MAPS"

Dr. David Kirkby

unread,
Jun 7, 2003, 9:40:55 PM6/7/03
to
"Bradford J. Hamilton" wrote:

> >You might post this over in comp.os.vms and see what they
> >have to say.
>
> If you look closely at the tables, you'll notice that he's running TRU64;
> hence, his post to comp.unix.tru64. Posting to c.o.v. would be OT,
> technically.

I don't have OpenVMS, so can't comment on how well the 600 MHz Dec
Alpha would perform with OpenVMS. But the 400 MHz HP C3000 running
HP-UX is 6x faster, while running at only two thirds (66.6%) the clock
speed of the Dec Alpha. That seems to be one hell of a difference!

The HP is faster than the Sun Ultra 80 too, unless the code uses
multiple CPUs on the U80. I must say I can't help agree with the
comments at
http://www.unixnerd.demon.co.uk/hp_unix.html
about the HP's speed, but the Dec just seem to be on a go-slow. I'm
wondering if mine has a fault.

I'd be interested if anyone else with a Dec Alpha Personal Workstation
600a or 600au would like to run the same test, by going to
http://atlc.sourceforge.net/
clicking on 'Downloads' and downloading atlc-4.3.1. Then run

$ ./configure --with-threads
$ make check

and let me know their results.

bob smith

unread,
Jun 7, 2003, 10:05:31 PM6/7/03
to

Dr. David Kirkby wrote:
> "Bradford J. Hamilton" wrote:
>
>
>>>You might post this over in comp.os.vms and see what they
>>>have to say.
>>
>>If you look closely at the tables, you'll notice that he's running TRU64;
>>hence, his post to comp.unix.tru64. Posting to c.o.v. would be OT,
>>technically.
>
>
> I don't have OpenVMS, so can't comment on how well the 600 MHz Dec
> Alpha would perform with OpenVMS. But the 400 MHz HP C3000 running
> HP-UX is 6x faster, while running at only two thirds (66.6%) the clock
> speed of the Dec Alpha. That seems to be one hell of a difference!
>
> The HP is faster than the Sun Ultra 80 too, unless the code uses
> multiple CPUs on the U80. I must say I can't help agree with the
> comments at
> http://www.unixnerd.demon.co.uk/hp_unix.html
> about the HP's speed, but the Dec just seem to be on a go-slow. I'm
> wondering if mine has a fault.

I have setiathome running under netbsd on an alpha 600. I had it
running on an AMD Tbird @1.2Ghz, with NetBSD, same verison of hte OS,
the same memory size, 1GB, and the alpha seems to be a little better
than 3x the performance, <8 hours per result on the alpha, and more than
22 hours per result on the AMD. This is with both machines doing
nothing else, the seti task as highest priority.

Dr. David Kirkby

unread,
Jun 7, 2003, 10:50:24 PM6/7/03
to
bob smith wrote:
>
> Dr. David Kirkby wrote:
> > "Bradford J. Hamilton" wrote:
> >
> >
> >>>You might post this over in comp.os.vms and see what they
> >>>have to say.
> >>
> >>If you look closely at the tables, you'll notice that he's running TRU64;
> >>hence, his post to comp.unix.tru64. Posting to c.o.v. would be OT,
> >>technically.
> >
> >
> > I don't have OpenVMS, so can't comment on how well the 600 MHz Dec
> > Alpha would perform with OpenVMS. But the 400 MHz HP C3000 running
> > HP-UX is 6x faster, while running at only two thirds (66.6%) the clock
> > speed of the Dec Alpha. That seems to be one hell of a difference!
> >
> > The HP is faster than the Sun Ultra 80 too, unless the code uses
> > multiple CPUs on the U80. I must say I can't help agree with the
> > comments at
> > http://www.unixnerd.demon.co.uk/hp_unix.html
> > about the HP's speed, but the Dec just seem to be on a go-slow. I'm
> > wondering if mine has a fault.
> I have setiathome running under netbsd on an alpha 600. I had it
> running on an AMD Tbird @1.2Ghz, with NetBSD, same verison of hte OS,
> the same memory size, 1GB, and the alpha seems to be a little better
> than 3x the performance, <8 hours per result on the alpha, and more than
> 22 hours per result on the AMD. This is with both machines doing
> nothing else, the seti task as highest priority.

Certainly from what I have read, the 600a would seem to be no
slow-coach. I'll try it with setiathome and see how it works on that.

If you have a spare half-hour, I'd like to know how well your machine
does on the same code I've been running it on. Unlike the setiathome,
where the time per work unit can be very variable, this code should
take the same time on each run.

Bradford J. Hamilton

unread,
Jun 7, 2003, 10:54:47 PM6/7/03
to
In article <3EE29427...@ntlworld.com>, "Dr. David Kirkby" <drki...@ntlworld.com> writes:
>"Bradford J. Hamilton" wrote:
>
>> >You might post this over in comp.os.vms and see what they
>> >have to say.
>>
>> If you look closely at the tables, you'll notice that he's running TRU64;
>> hence, his post to comp.unix.tru64. Posting to c.o.v. would be OT,
>> technically.
>
>I don't have OpenVMS, so can't comment on how well the 600 MHz Dec
>Alpha would perform with OpenVMS. But the 400 MHz HP C3000 running
>HP-UX is 6x faster, while running at only two thirds (66.6%) the clock
>speed of the Dec Alpha. That seems to be one hell of a difference!

The other thing I noticed was the fact that you used gcc for a compiler on
almost all the other machines, except the 600a - would it be helpful for
you (or someone else) to compile with gcc on the 600a, and run the test
again?

If you wish to use Google, search the comp.os.vms archives for posts by David
Mathog, regarding his extensive testing of VMS C programs, vs. the same
programs run on TRU64. These posts were dated about two years or so ago, and
had some quite interesting results, as well as benchmark programs run by him.
I think one or two folks were able to find optimizations on the "native"
compiler which made the performance better on VMS, but not as good as on TRU64.

<snip>


>--
>Dr. David Kirkby,
>Senior Research Fellow,
>Department of Medical Physics,
>University College London,
>11-20 Capper St, London, WC1E 6JA.
>Tel: 020 7679 6408 Fax: 020 7679 6269
>Internal telephone: ext 46408
>e-mail da...@medphys.ucl.ac.uk

_________________________________________________________________

Benjamin Gawert

unread,
Jun 8, 2003, 7:00:48 AM6/8/03
to
bob smith wrote:

> I have setiathome running under netbsd on an alpha 600. I had it
> running on an AMD Tbird @1.2Ghz, with NetBSD, same verison of hte OS,
> the same memory size, 1GB, and the alpha seems to be a little better
> than 3x the performance, <8 hours per result on the alpha, and more
> than 22 hours per result on the AMD. This is with both machines doing
> nothing else, the seti task as highest priority.

Then something mus be seriously wrong with Your AMD. 22hrs is what I need
around on an SGI Octane with R10k 250MHz, my slow HP Pavilion with Duron
1.3GHz takes around 5hrs per WU (WindowsXP with Command Line Client)...

BTW: There's a nice list of different systems running SETI:

http://cox-internet.com/setispy/tlctop200.htm

No. 16 was from me btw ;-)

The WU used for the benchmark can be downloaded here:

http://www.teamlambchop.com/bench/benchfile.htm

if anyone runs this benchmark on a system not listed please send the results
to the maintainers of this site:

http://www.teamlambchop.com/bench/submit.htm

Benjamin


GreyCloud

unread,
Jun 8, 2003, 9:04:45 PM6/8/03
to
"Bradford J. Hamilton" wrote:
>
> In article <3EE232D0...@mist.com>, GreyCloud <cum...@mist.com> writes:
> >"Dr. David Kirkby" wrote:
> >>
> >> I have a number of oldish computers on which I check portability of
> >> software.
> >> I've collected some peformance data that might be of interest to
> >> some.
> >> Some of it, espeically reguard to the dismal performance of a Dec
> >> Alpha, is very odd.
> >>
> >
> >You might post this over in comp.os.vms and see what they
> >have to say.
>
> If you look closely at the tables, you'll notice that he's running TRU64;
> hence, his post to comp.unix.tru64. Posting to c.o.v. would be OT,
> technically.
>

Really doesn't matter over there... a lot of Alphas are sold
as OpenVMS or TRU64 and many questions are fielded covering
both.

Mark Hittinger

unread,
Jun 8, 2003, 9:36:14 PM6/8/03
to
>> >> Some of it, espeically reguard to the dismal performance of a Dec
>> >> Alpha, is very odd.

I wonder if the compiler was emiting some floating point instructions
that were emulated in software. On the old microvaxen there was a range
of floating point instructions of different precisions and data formats.
Some of the data formats (or precisions) became obsolete and were
emulated in the operating system.

Perhaps re-compiling the benchmark with compiler options to use different
floating formats (G/H) or double rather than single precision would improve
performance.

Also if using single precision there may be alignment problems which would
restrain the alpha.

Later

Mark Hittinger
bu...@pu.net

Dr. David Kirkby

unread,
Jun 9, 2003, 10:53:22 AM6/9/03
to
Mark Hittinger wrote:
>
> >> >> Some of it, espeically reguard to the dismal performance of a Dec
> >> >> Alpha, is very odd.
>
> I wonder if the compiler was emiting some floating point instructions
> that were emulated in software. On the old microvaxen there was a range
> of floating point instructions of different precisions and data formats.
> Some of the data formats (or precisions) became obsolete and were
> emulated in the operating system.

Perhaps. It's Compaq's compiler supplied, designed to run on that
machine. I thought if anything that would be faster than using gcc,
but when Compaq acquired Dec, and later HP acquired Compaq, I doubt
there was too much thought put into the performance of modern
compilers on old hardware.


> Perhaps re-compiling the benchmark with compiler options to use different
> floating formats (G/H) or double rather than single precision would improve
> performance.

I've not at this point put effort into compiling with different
compilers. I bought the Dec just to check portability of code, so
installing gcc was not something I bothered doing, but I can do that
where I have a spare few hours (or will it be a spare few weeks if I
compile it myself??).


> Also if using single precision there may be alignment problems which would
> restrain the alpha.

It uses double throughout.

If you look at the data - a subset of which is formatted somewhat
nicer at
http://www.medphys.ucl.ac.uk/~davek/
it appears a single processor 350 MHz Pentium II PC is a bit quicker
than a dual 450 MHz PC, unless both CPUs on the later are used, in
which case the dual 450 is quicker. That result seems rather counter
intuitive too. However, the Dec Alpha's performance is in a class of
its own.

I never intended writing this as a benchmark - I just added a bit of
code to do some timing, so that I could check the performance gains
when using multiple CPUs. However, these timing results show some odd
effects.

Chip Coldwell

unread,
Jun 9, 2003, 10:34:59 AM6/9/03
to
In comp.sys.dec Dr. David Kirkby <drki...@ntlworld.com> wrote:

> What I can't understand is quite why the Dec Alpha seems to
> fair so very badly, when I run a floating point intensive simulation
> on these systems.

Which compiler were you using? Ages ago, I did benchmarks comparing
gcc generated code with the code generated by the Digital compilers.
The difference was huge; 3x or 4x speedup was not unusual.

Also, be careful with floating point on the Alpha. It supports two
floating point formats, IEEE 754 (standard) and VAX (for backward
combatibility) as well as software emualtion:

-mno-soft-float -mfp-reg -mieee

might be a good set of compiler flags to try.

Chip

--
Charles M. "Chip" Coldwell
"Turn on, log in, tune out"

Lord Isildur

unread,
Jun 9, 2003, 2:09:01 PM6/9/03
to


I just tested it on a 433au running netbsd 1.4.1, and got 863 sec, without
tweaking any compiler flags or anything. the system has 2 megs of cache, 640
megs core, and some random 10k rpm disks. gcc version is 2.91.60. one thing i
noticed is that the makefiles generated did not turn on ANY optimization,
not even -O! i'm recompiling with -O3 now. that just finished, in 385 sec.
At the same time, i decided to compile and test it on a dec3000/600 running
netbsd 1.5.2. that is, of course, an old-school alpha, 21064 at 175 MHz.
with no optimization, it ran in 2764 sec. With -O3, it ran in 1276 sec.
Looking at it again, i notice that the configure script doesnt seem to
be able to figure out what kind of machine it is, and apparently is defaulting
to a 'safe' configuration or sorts.. including no optimization.

Isildur

Chris Morgan

unread,
Jun 9, 2003, 2:17:42 PM6/9/03
to
"Dr. David Kirkby" <drki...@ntlworld.com> writes:

> I never intended writing this as a benchmark - I just added a bit of
> code to do some timing, so that I could check the performance gains
> when using multiple CPUs. However, these timing results show some odd
> effects.

Just two brief comments : i agree with other posters, you're almost
certainly handicapping the Alpha (inadvertantly) by requiring IEEE-754
compliance. Secondly, you have to be a bit careful with ad-hoc
benchmarking as people get (understandably) quite upset when their
favourite systems are shown in an unflattering light. Using gcc is
regarded as a sin for benchmarking alpha and intel cpus since the
native compilers give much higher peak performance (normally!). If you
look through www.spec.org you'll see that results have to disclose all
material parameters such as compiler, compiler flags, operating
system, O/S revision level, cpu type, speed, cache sizes, system ram
etc.

Chris
--
Chris Morgan
"Post posting of policy changes by the boss will result in
real rule revisions that are irreversible"

- anonymous correspondent

Wolfgang Rupp

unread,
Jun 10, 2003, 5:13:03 AM6/10/03
to
In comp.sys.dec Dr. David Kirkby <drki...@ntlworld.com> wrote:
> and let me know their results.

I just tested on a DS25, 2xEV68/1000Mhz, 2GB RAM, local and SAN disks.
I did not do any tweaks, the box is under general load (Oracle and
application development), and it seems _much_ faster than your results.
See output below. If I have time, I will try it on a PWS500au also.

Wolfgang Rupp

Number of CPUs active = 4396972786432
CPU_type = 60
FPU_type = unknown
Speed = 1000 MHz
Run times: T_sequential = 97 s; T_parallel = 72 s
Speedup=T_parallel/T_sequential = 1.35; Efficiency = Speedup/N_cpus = 0.000
PASS: benchmark.test
======================
All 84 tests passed
(6 tests were not run)
======================

--
--------------------------------------------------------------------
"spamtrap" is a valid address. For faster response, write to "rupp".
--------------------------------------------------------------------

Wolfgang Rupp

unread,
Jun 10, 2003, 10:19:02 AM6/10/03
to
In comp.sys.dec Dr. David Kirkby <drki...@ntlworld.com> wrote:
> I'd be interested if anyone else with a Dec Alpha Personal Workstation
> 600a or 600au would like to run the same test, by going to

Sorry, the 500au is somehow broken. Maybe tomorrow.

On an ES45, 4xEV68/1000MHz, 8GB RAM, all disks in SAN I get:
T_seq = 105 s; T_par = 68 s, Speedup= 1.54; Efficiency = 0.000

This machine had about 300 users online at the time.
The tests ran as user, no tweaking done.

Wolfgang Rupp


Raymond Toy

unread,
Jun 10, 2003, 11:07:35 AM6/10/03
to
>>>>> "Wolfgang" == Wolfgang Rupp <spam...@coredump.at> writes:

Wolfgang> In comp.sys.dec Dr. David Kirkby <drki...@ntlworld.com> wrote:
>> and let me know their results.

Wolfgang> I just tested on a DS25, 2xEV68/1000Mhz, 2GB RAM, local and SAN disks.
Wolfgang> I did not do any tweaks, the box is under general load (Oracle and
Wolfgang> application development), and it seems _much_ faster than your results.
Wolfgang> See output below. If I have time, I will try it on a PWS500au also.

Wolfgang> Wolfgang Rupp

Wolfgang> Number of CPUs active = 4396972786432

That's an amazing number of CPUs active. And it's still that slow?

:-)

Ray

Dr. David Kirkby

unread,
Jun 10, 2003, 11:30:03 AM6/10/03
to
"Dr. David Kirkby" wrote:
>
> I have a number of oldish computers on which I check portability of
> software. I've collected some peformance data that might be of interest to
> some. Some of it, espeically reguard to the dismal performance of a Dec
> Alpha, is very odd.

I have now found the cause of the problem with the Alpha - it was the
lack of a suitable option to the Compaq compiler, to direct it to
produce code for the chip it was running on.

Although I have not spent much time tweaking compiler flags (and don't
really intend to as its not my intention to make it run super fast), the
'-arch host' (going from memory, I've not got access to the system from
here), made a ***huge*** improvement (in excess of 3x). That coupled
with the '-fast' option finally sped it up by a factor of 3.5 from those
originally obtained with the same compiler, but without those flags.

Although I've experience in the PC and Sun world of using appropriate
compiler flags, I've never before seen anywhere near a 3.5x reduction in
execution time.

One learns something every day. Today I learnt just how much compiler
flags can have on Alpha - I've never seen anywhere near that performance
change on Intel or SPARC, no matter how hard I've tried to optimise the
compiler flags.

Dr. David Kirkby PhD,


Senior Research Fellow,
Department of Medical Physics,
University College London,
11-20 Capper St, London, WC1E 6JA.
Tel: 020 7679 6408 Fax: 020 7679 6269
Internal telephone: ext 46408
e-mail da...@medphys.ucl.ac.uk

Web page: http://www.medphys.ucl.ac.uk/~davek

Lord Isildur

unread,
Jun 10, 2003, 11:27:16 AM6/10/03
to

like i mentioned, i dont think the configure script knows about alpha, it
just said 'unknown' on my machine for quantity and kind of cpu.. and 1 mhz
for the speed.

isildur

Adam Price

unread,
Jun 10, 2003, 11:57:17 AM6/10/03
to
Dr. David Kirkby wrote:

> dobermann[cc] Dec Alpha PW 600a 600 MHz 950 N/A N/A N/A

Swallows is a dec alpha PW 600a 433Mhz with 192 Mb ram and no extra cache.
This is the output...

Number of CPUs active = 5368723600 CPU_type = 56 FPU_type = unknown Speed = 433 MHz
Run times: T_sequential = 329 s; T_parallel = 330 s
Speedup=T_parallel/T_sequential = 1.00; Efficiency = Speedup/N_cpus = 0.000


PASS: benchmark.test
======================
All 84 tests passed
(6 tests were not run)
======================

I think something may be wrong in the code as that number for 'cpus active' is
clearly wrong.
I also thing your alpha may be broken since mine clearly runs a bit faster yet
has a slower processor and less memory.

Recompiling with -O3 -tune host improves things to ...

Number of CPUs active = 5368723600 CPU_type = 56 FPU_type = unknown Speed = 433 MHz
Run times: T_sequential = 316 s; T_parallel = 320 s
Speedup=T_parallel/T_sequential = 0.99; Efficiency = Speedup/N_cpus = 0.000
PASS: benchmark.test

Only marginaly better but better none the less.
Hope this helps some
Adam


Bob Marcan

unread,
Jun 11, 2003, 2:57:06 AM6/11/03
to


PW600a is a cripled machine (no sec cache) sold to run Windoze.

Regards, Bob

--
Bob Marcan mailto:bob.m...@snt.si
Aster^H^H...HermesPlus^H^H^H...S&T mailto:bob.m...@aster.si
Nade Ovcakove 1 tel: +386 (1) 5894-329
1000 Ljubljana, Slovenia http://www.snt.si

Dr. David Kirkby

unread,
Jun 11, 2003, 5:51:56 AM6/11/03
to
Bob Marcan wrote:

> PW600a is a cripled machine (no sec cache) sold to run Windoze.
>
> Regards, Bob

I don't think all of that is true. It is correct to say it was sold to
run Windoze NT, but it does have a cache - or if it does not, psrinfo
is lying, as it claims the machine has 2 Mb of cache.

I don't believe there is any difference between the 600a and the 600au
(they share the same manuals) except the graphics and SCSI card, and
of course the operating system which was pre-loaded. The graphics card
in the 600a was supported only under Windoze, whereas the one supplied
supplied in the 600au was supported under both Windoze and NT, so the
one can swap operating systems.

As I've said elsewhere, the problem seems to have been that I had not
put the compiler flag to produce code for the specific platform I was
running on. I've never seen compiler flags make such a huge
improvement before - but this was my first play with an Alpha.

Rob B

unread,
Jun 12, 2003, 9:58:14 AM6/12/03
to
Bob Marcan <bob.m...@aster.si> wrote in
news:%oAFa.809$78.3...@news.siol.net:

<snip>


>
>
> PW600a is a cripled machine (no sec cache) sold to run Windoze.
>

No it's not, You are most likely thinking of the PC164

600a is a Miata box, only has AlphaBIOS and no SCSI built-in, and a
graphics card that only worked under NT (PowerStorm?)

Rob

Joachim Ring

unread,
Jun 13, 2003, 8:22:37 AM6/13/03
to
> Although I've experience in the PC and Sun world of using appropriate
> compiler flags, I've never before seen anywhere near a 3.5x reduction in
> execution time.
>
> One learns something every day. Today I learnt just how much compiler
> flags can have on Alpha - I've never seen anywhere near that performance
> change on Intel or SPARC, no matter how hard I've tried to optimise the
> compiler flags.

or on hpux - i've run your test on my c200 (pa8200 200mhz 1.5 mb L1 cache):

gcc 3.2.3 no optimizing: 2133s
same with -O3 -mpa-risc-2-0 -mschedule=8000 -mlinker-opt: 366

aC++ 3.27.01 no optimizing: 663s
same with +O4: 398s

joachim

Pit Linnartz

unread,
Jun 14, 2003, 2:35:33 PM6/14/03
to
"Dr. David Kirkby" <da...@medphys.ucl.ac.uk> wrote in message news:<3EE5F97B...@medphys.ucl.ac.uk>...
> "Dr. David Kirkby" wrote:
Hello,

http://www.fortran-2000.com/ArnaudRecipes/CompilerTricks.html depicts
nicely the variants of switches on the different platforms. Regarding
speed, have a look at the Optimization section.

Pit

Vladimir Rodionov

unread,
Jan 27, 2024, 2:11:36 PMJan 27
to
Redhat under VirtualBox, gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)

Run times: T_sequential = 18 s; T_parallel = 12 s

cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 14
model name : Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz
stepping : 9
cpu MHz : 2712.546
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2
bogomips : 5085.59

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 14
model name : Intel(R) Core(TM) i5-7300U CPU @ 2.60GHz
stepping : 9
cpu MHz : 2712.546
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2
bogomips : 5334.63

0 new messages