Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Yet another benchmark results..

5 views
Skip to first unread message

Dragon Fly

unread,
Dec 6, 1993, 10:16:14 PM12/6/93
to
Seeing so many benchmark tests contradicting one another
gotta be confusing for insightful observer. For me perpetrating
mostly scientific calculations they do not offer much to
swallow to say nothing about digesting..
So in deep despair have I decided to run the following
short and, I hope, comprehensive code on various boxes widely
spread in academic community.

- - - - - - - - Cut here - - - - - - - - - - - - - - - - - - - - - -
#include <stdio.h>
#include <math.h>
#include <time.h>
main()
{
double x,y[1000000];
int i;
time_t t;

time(&t);
for (i=0;i<1000000;i++)
{
x=11.0+(33.5*i)*(33.5*i);
y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
}
printf("time=%d\n",time(0)-t);
}
- - - - - - - - - - - - Cut here - - - - - - - - - - - - - - - - - -

As everybody with eyes can see, the program calculates some stuff
in a loop storing it in memory (gotta be ~ 8Mb of RAM taken)
and gives on output the number of seconds spent. And here are the
results of calculation:

Computer Time spent

486DX2-66 EISA/VL 16Mb RAM
running Linux (Slackware 1.1.0).
gcc compiler.
Single user 27 sec.

SUN Sparc-2 with >= 16 Mb RAM
running SunOS
Single user 69 sec.

DEC VAX with ALPHA chip
running VMS
With quite a few users on 69 sec.

SUN-4
running SunOS
Single user 73 sec.

DEC VAXstation 3100
running VMS
Single user 405 sec.

So comments are welcome.

Cordially,
Serge

Xavier Llobet EPFL - CRPP 1015 Lausanne CH

unread,
Dec 7, 1993, 4:53:59 AM12/7/93
to

In article <1993Dec7.0...@pacific.mps.ohio-state.edu>, viz...@mps.ohio-state.edu (Dragon Fly) writes:

[...]

: Computer Time spent


:
:486DX2-66 EISA/VL 16Mb RAM
:running Linux (Slackware 1.1.0).
:gcc compiler.
:Single user 27 sec.
:
:SUN Sparc-2 with >= 16 Mb RAM
:running SunOS
:Single user 69 sec.
:
:DEC VAX with ALPHA chip
:running VMS
:With quite a few users on 69 sec.

Single user (DEC 3000 Model 400) 9 sec

:SUN-4


:running SunOS
:Single user 73 sec.
:
:DEC VAXstation 3100
:running VMS
:Single user 405 sec.
:
:
:
:So comments are welcome.
:
:Cordially,
:Serge

-xavier

yuan%...@mp.cs.niu.edu

unread,
Dec 7, 1993, 6:50:27 AM12/7/93
to
Dragon Fly (viz...@mps.ohio-state.edu) wrote:
: Seeing so many benchmark tests contradicting one another

: gotta be confusing for insightful observer. For me perpetrating
: mostly scientific calculations they do not offer much to
: swallow to say nothing about digesting..
: So in deep despair have I decided to run the following
: short and, I hope, comprehensive code on various boxes widely
: spread in academic community.

[ source code deleted .. ]

: Computer Time spent

: 486DX2-66 EISA/VL 16Mb RAM
: running Linux (Slackware 1.1.0).
: gcc compiler.
: Single user 27 sec.

486DX50 ISA 8Mb RAM, 256K cashe
running Debian Linux 0.81BETA
4 users 59 sec.
single user 54 sec.

Brett L. Huber

unread,
Dec 7, 1993, 6:42:01 PM12/7/93
to
> Computer Time spent
Sparc-10
SunOS 4.1.3A 30 sec.

However, why not use an established benchmark? There are more of
them out there than you shake a stick at.

Brett Huber

--
... Our continuing mission: To seek out knowledge of C, to explore
strange UNIX commands, and to boldly code where no one has man page 4.


Vadim Maslov

unread,
Dec 7, 1993, 11:16:25 AM12/7/93
to
In article <1993Dec7.0...@pacific.mps.ohio-state.edu> viz...@mps.ohio-state.edu (Dragon Fly) writes:
>
>As everybody with eyes can see, the program calculates some stuff
>in a loop storing it in memory (gotta be ~ 8Mb of RAM taken)
>and gives on output the number of seconds spent. And here are the
>results of calculation:
>
> Computer Time spent
>
>486DX2-66 EISA/VL 16Mb RAM
>running Linux (Slackware 1.1.0).
>gcc compiler.
>Single user 27 sec.
>
>SUN Sparc-2 with >= 16 Mb RAM
>running SunOS
>Single user 69 sec.
>
>DEC VAX with ALPHA chip
>running VMS
>With quite a few users on 69 sec.
>
>SUN-4
>running SunOS
>Single user 73 sec.
>
>DEC VAXstation 3100
>running VMS
>Single user 405 sec.
>


This is result on SPARCstation 10 with 3-4 users: 21.9 sec.


Something's wrong with your brain, Viznyuk,
not with machines. Your glorification of i486
is quite senseless and your claims are unsubstantiated.

As to the ALPHA it is faster than SPARC 10,
but still I think that even with multiple users
your result seems to be a mistake if not forgery.

The hint: use gcc on all the computers above.
Specify -O2. Compilers is what really matters.
There are always ways to slow down any program
with appropriately selected compiler or compiler flags.

FYI, my compile was:

gcc -O2 viz.c -lm


Vadim Maslov.

P.S. When, I did not hear hear anything about VAXes with ALPHA
chip. I heard that VAXes is the line that will be abandoned
for ALPHA. What they have is a translator which translates binary
VAX codes to ALPHA codes.

Newsgroups: comp.sys.ibm.pc.hardware,comp.os.linux.misc,comp.os.vms,relcom.talk,relcom.fido.su.general
Subject: Re: Yet another benchmark results..
Summary:
Expires:
References: <1993Dec7.0...@pacific.mps.ohio-state.edu>
Sender:
Followup-To:
Distribution:
Organization: U of Maryland, Dept. of Computer Science, Coll. Pk., MD 20742
Keywords:

In article <1993Dec7.0...@pacific.mps.ohio-state.edu> viz...@mps.ohio-state.edu (Dragon Fly) writes:
>
>As everybody with eyes can see, the program calculates some stuff
>in a loop storing it in memory (gotta be ~ 8Mb of RAM taken)
>and gives on output the number of seconds spent. And here are the
>results of calculation:
>
> Computer Time spent
>
>486DX2-66 EISA/VL 16Mb RAM
>running Linux (Slackware 1.1.0).
>gcc compiler.
>Single user 27 sec.
>
>SUN Sparc-2 with >= 16 Mb RAM
>running SunOS
>Single user 69 sec.
>
>DEC VAX with ALPHA chip
>running VMS
>With quite a few users on 69 sec.
>
>SUN-4
>running SunOS
>Single user 73 sec.
>
>DEC VAXstation 3100
>running VMS
>Single user 405 sec.
>


This is result on SPARCstation 10 with 3-4 users: 21.9 sec.


Something's wrong with your brain, Viznyuk,
not with machines. Your glorification of i486
is quite senseless and your claims are unsubstantiated.

As to the ALPHA it is faster than SPARC 10,
but still I think that even with multiple users
your result seems to be a mistake if not forgery.

The hint: use gcc on all the computers above.
Specify -O2. Compilers is what really matters.
There are always ways to slow down any program
with appropriately selected compiler or compiler flags.

FYI, my compile was:

gcc -O2 viz.c -lm


Vadim Maslov.

P.S. When, I did not hear hear anything about VAXes with ALPHA
chip. I heard that VAXes is the line that will be abandoned
for ALPHA. What they have is a translator which translates binary
VAX codes to ALPHA codes.

Carl Boernecke

unread,
Dec 7, 1993, 6:01:27 PM12/7/93
to
llo...@elpp1.epfl.ch (Xavier Llobet EPFL - CRPP 1015 Lausanne CH) writes:
>In article <1993Dec7.0...@pacific.mps.ohio-state.edu>, viz...@mps.ohio-state.edu (Dragon Fly) writes:
>:

[various system results removed... the higest being 405 seconds
from a DEC Vax, if I remember correctly]

>:
>:So comments are welcome.
>:
>:Cordially,
>:Serge

I don't like your benchmark! Waaaahhh! Took a total of 4109
seconds on my 386/33 (without 387) and 8 MB of RAM. Yes, the
machine was a bit 'loaded' (three dial-in users, myself on
two virtual terminals, one with X running a few color xterms),
and an ftp session or two over my SLIP connection. Still seems
like a long time, though.

Also, why did you use 'time(&t)' for the first line? Why not
't=time(NULL)'. Guess it doesn't really matter, but the pro-
gram didn't want to run without that mod on my other system
(SVR3.2.2). Weird.

-- Carl Boernecke (ca...@inex.com)
"Time flies like an arrow... fruit flies like a banana."

Anatoly....@kamaz.kazan.su

unread,
Dec 7, 1993, 3:52:30 PM12/7/93
to
In
comp.sys.ibm.pc.hardware,comp.os.linux.misc,comp.os.vms,relcom.talk,relcom.fido.su.general
article <1993Dec7.0...@pacific.mps.ohio-state.edu> Dragon

Frank J. Wood

unread,
Dec 7, 1993, 5:56:14 PM12/7/93
to

: Computer Time spent
:
:486DX2-66 EISA/VL 16Mb RAM
:running Linux (Slackware 1.1.0).
:gcc compiler.
:Single user 27 sec.
:
:SUN Sparc-2 with >= 16 Mb RAM
:running SunOS
:Single user 69 sec.
:
:DEC VAX with ALPHA chip
:running VMS
:With quite a few users on 69 sec.
Single user (DEC 3000 Model 400) 9 sec

:SUN-4
:running SunOS
:Single user 73 sec.
:
:DEC VAXstation 3100
:running VMS
:Single user 405 sec.
:

HP Apollo
running HP-UX 9.0 16 sec.
(4 users)

Andrew A. Lyovochkin

unread,
Dec 7, 1993, 8:23:59 PM12/7/93
to

> Computer Time spent

>486DX2-66 EISA/VL 16Mb RAM
>running Linux (Slackware 1.1.0).
>gcc compiler.
>Single user 27 sec.

>SUN Sparc-2 with >= 16 Mb RAM
>running SunOS
>Single user 69 sec.

>So comments are welcome.

А какие компиляторы были на других тачках, кроме 486?
И опции. А то gcc там такого наоптимизирует - очень мудрая программа.

>Cordially,
>Serge

Андрей Левочкин

Dennis J Robinson

unread,
Dec 8, 1993, 12:46:34 AM12/8/93
to

Took 42 seconds on a RS/6000 with two users.

Took 101 seconds on a sparc ipx with 3 users and 3 xterms open on console.


Klaus-Georg Adams

unread,
Dec 8, 1993, 2:52:18 AM12/8/93
to
|> Seeing so many benchmark tests contradicting one another
|> gotta be confusing for insightful observer. For me perpetrating
|> mostly scientific calculations they do not offer much to
|> swallow to say nothing about digesting..
|> So in deep despair have I decided to run the following
|> short and, I hope, comprehensive code on various boxes widely
|> spread in academic community.
|>

source deleted

|>
|> As everybody with eyes can see, the program calculates some stuff
|> in a loop storing it in memory (gotta be ~ 8Mb of RAM taken)
|> and gives on output the number of seconds spent. And here are the
|> results of calculation:
|>
|> Computer Time spent
|>
|> 486DX2-66 EISA/VL 16Mb RAM
|> running Linux (Slackware 1.1.0).
|> gcc compiler.
|> Single user 27 sec.

486DX-33 ISA 8Mb RAM
running Linux
Single user, but many Windows,
Swapping heavily 94 sec. real, 58 sec. CPU

IBM PowerServer 520, 32 Mb RAM
RS/6000 Chip
running AIX 3.2.3e
compilation in Background 30 sec. real, 16 sec. CPU

IBM PowerStation 320H, 32 Mb RAM
RS/6000 Chip
running AIX 3.2.3e
single user 12 sec. real, 12 sec. CPU

IBM PowerServer 560, >32 Mb RAM
RS/6000 Chip
running AIX 3.2.5
single user 7 sec. real, 7 sec. CPU

|> SUN Sparc-2 with >= 16 Mb RAM
|> running SunOS
|> Single user 69 sec.
|>
|> DEC VAX with ALPHA chip
|> running VMS
|> With quite a few users on 69 sec.
|>
|> SUN-4
|> running SunOS
|> Single user 73 sec.
|>
|> DEC VAXstation 3100
|> running VMS
|> Single user 405 sec.
|>
|>
|>
|> So comments are welcome.

The Benchmark is unfair to machines with less than 10 Mb
RAM, for they have to start paging (see result of my linuxbox
with 8 Mb.

I started the benchmark with 'time bench' to be less dependent
of the relative load of the machines.

|>
|> Cordially,
|> Serge

Klaus-Georg Adams (ad...@achibm1.chemie.uni-karlsruhe.de)

Harvey Brydon (918)250-4312

unread,
Dec 8, 1993, 8:13:47 AM12/8/93
to
>Seeing so many benchmark tests contradicting one another
>gotta be confusing for insightful observer. For me perpetrating
>mostly scientific calculations they do not offer much to
>swallow to say nothing about digesting..
>So in deep despair have I decided to run the following
>short and, I hope, comprehensive code on various boxes widely
>spread in academic community.

[...]

>DEC VAXstation 3100
>running VMS
>Single user 405 sec.

What kind of 3100 was it? There is quite a range of CPU speed in the 3100
line, depending mostly on model. You also didn't describe memory sizes, which
will make somewhat of a difference.
_______________________________________________________________
Harvey Brydon | Internet: bry...@dsn.SINet.slb.com
Schlumberger Dowell | P.O.T.S.: (918)250-4312
"...but this is the art of the machines - they serve that they may rule..."
- Samuel Butler, Erewhon (1872)

Dan Mattrazzo

unread,
Dec 8, 1993, 8:13:45 AM12/8/93
to
>#include <stdio.h>
>#include <math.h>
>#include <time.h>
>main()
>{
>double x,y[1000000];
>int i;
>time_t t;
>
>time(&t);
>for (i=0;i<1000000;i++)
> {
> x=11.0+(33.5*i)*(33.5*i);
> y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
> }
>printf("time=%d\n",time(0)-t);
>}
>- - - - - - - - - - - - Cut here - - - - - - - - - - - - - - - - - -
>
>As everybody with eyes can see, the program calculates some stuff
>in a loop storing it in memory (gotta be ~ 8Mb of RAM taken)
>and gives on output the number of seconds spent. And here are the
>results of calculation:

Another problem that I can see is that the code is small enough
to fit in cache, which will easily skew the results. That would
explain why the poor chap with the 386 might have taken a few
minutes to run, if he didn't have cache.

-------------------------------------------------------------------------------
Dan Mattrazzo
dcm...@ritvax.isc.rit.edu

Mastering that Parallel thing
Graduate Studies
Computer Science
Rochester Institute of Technology

Dave Sill

unread,
Dec 8, 1993, 10:44:11 AM12/8/93
to
>Seeing so many benchmark tests contradicting one another
>gotta be confusing for insightful observer. For me perpetrating
>mostly scientific calculations they do not offer much to
>swallow to say nothing about digesting..
>So in deep despair have I decided to run the following
>short and, I hope, comprehensive code on various boxes widely
>spread in academic community.
>
>- - - - - - - - Cut here - - - - - - - - - - - - - - - - - - - - - -
>#include <stdio.h>
>#include <math.h>
>#include <time.h>
>main()
>{
>double x,y[1000000];
>int i;
>time_t t;
>
>time(&t);
>for (i=0;i<1000000;i++)
> {
> x=11.0+(33.5*i)*(33.5*i);
> y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
> }
>printf("time=%d\n",time(0)-t);
>}
>- - - - - - - - - - - - Cut here - - - - - - - - - - - - - - - - - -

I had to make the declaration of y global to prevent a segmentation
violation on the DEC Alpha I ran it on.

>As everybody with eyes can see, the program calculates some stuff
>in a loop storing it in memory (gotta be ~ 8Mb of RAM taken)
>and gives on output the number of seconds spent. And here are the
>results of calculation:
>
> Computer Time spent
>
>486DX2-66 EISA/VL 16Mb RAM
>running Linux (Slackware 1.1.0).
>gcc compiler.
>Single user 27 sec.
>
>SUN Sparc-2 with >= 16 Mb RAM
>running SunOS
>Single user 69 sec.
>
>DEC VAX with ALPHA chip
>running VMS
>With quite a few users on 69 sec.
>
>SUN-4
>running SunOS
>Single user 73 sec.
>
>DEC VAXstation 3100
>running VMS
>Single user 405 sec.

DEC 3000 Model 500 6.7 s (avg. of 10 runs)
DEC OSF/1 1.3
Multi-user mode, one user logged in
cc -O3 -o bench bench.c -lm -non_shared

>So comments are welcome.

Probably too small to relate well with performance on real applications.

--
Dave Sill (d...@ornl.gov) Computers should work the way beginners
Martin Marietta Energy Systems expect them to, and one day they will.
Workstation Support -- Ted Nelson
URL http://gatekeeper.dec.com/archive/pub/DEC/DECinfo/html/dsill.html

Brett L. Huber

unread,
Dec 8, 1993, 11:38:03 AM12/8/93
to
Carl Boernecke (ca...@inex.com) wrote:
> I don't like your benchmark! Waaaahhh! Took a total of 4109
> seconds on my 386/33 (without 387) and 8 MB of RAM. Yes, the
No 387? What do you want, a miracle?

Tim Llewellyn in Bristol. (0272) 303030 ext 3691.

unread,
Dec 8, 1993, 2:35:21 PM12/8/93
to
>Seeing so many benchmark tests contradicting one another
>gotta be confusing for insightful observer. For me perpetrating
>mostly scientific calculations they do not offer much to
>swallow to say nothing about digesting..
>So in deep despair have I decided to run the following
>short and, I hope, comprehensive code on various boxes widely
>spread in academic community.
>
[source deleted]
>
>
[some benchmark "results" deleted]
>
>So comments are welcome.
>
In all the followups to this so far I have only seen one person quote
CPU used and elasped time. You claim you are attempting to "perpetrate"
scientific calculations, but the data you and others are providing
appears frighteningly unscientific.

eg DEC VAX, quite a few users.

(1)Doing what? VMS does provide used and elapsed time information as
well as other useful infomation using eg $ show proc/accounting or
at the end of a batch job log. On a multi-tasking operating system
the ratio of CPU to elapsed time will depend on other activilty on
the system (including paging if the program requires it as yours
clearly will do except on a severely over-configured and under-used machine).
Maybe the other systems you and others quote do not have a means of
providing this information? :-)

(2) What SORT of VAX. 11/730, MicroVax II, 6550 or what?
In case you are unaware there is a vast difference in performance between
the the slowest and fastest VAXen, and lots of models inbetween.

I think you will find CERN did a pretty thorough set of benchamarks
sometime last year (maybe more recently too). You can probably pick up the
results via ftp or something.


>Cordially,
>Serge

No problem.

-----------------------------------------------------+---------------+
Tim Llewellyn - OpenVMS, Soukous and Cricket Addict | Read at your |
Physicist Programmer, Bristol Uni Particle Physics. | own risk. |
HEPNET/SPAN 19716::TJL Internet t...@siva.bris.ac.uk | Std disclaimer|
Pet Hates: Case Sensitivity! Unix. Tremolo systems. | implicit |
-----------------------------------------------------+---------------+

Dimitry A. Sazonov

unread,
Dec 8, 1993, 2:34:56 PM12/8/93
to
Dragon Fly (viz...@pacific.mps.ohio-state.edu) wrote:

: - - - - - - from another correspondent - - - - - - - - -

: I had to make the declaration of y global to prevent a segmentation
: violation on the DEC Alpha I ran it on.

: DEC 3000 Model 500 6.7 s (avg. of 10 runs)


: DEC OSF/1 1.3
: Multi-user mode, one user logged in
: cc -O3 -o bench bench.c -lm -non_shared

: ************************************
similar,
DEC 4000/710 with 256MB of memory.
DEC OSF/1 1.3 12 users, load avg 1.0
axposf.pa.dec.com> cc -O3 viz.c -lm -non_shared
axposf.pa.dec.com> time ./a.out
time=6
5.54u 0.12s 0:06 91% 0+0k 0+0io 0pf+0w


: Serge

Ron Story

unread,
Dec 8, 1993, 1:33:44 PM12/8/93
to
Whoops, responded to the wrong article.

On the following machines running HP-Vue multi-user with approximently 80 processes running on each:

HP/PA 720 HPUX 9.01 64 Meg RAM -> 10 sec
HP/PA 735 HPUX 9.01 64 Meg RAM -> 5 sec

Not bad for a desktop box...

Ron


--
Ron Story N7TLC (ron_...@mentorg.com)| These are my opinions so: |
Mentor Graphics Corp. (503)685-7000 | #include <std_disclaim.h> |
"...Yes, I believe that's sarcasm, the lowest form of wit." |

Skip Sauls

unread,
Dec 8, 1993, 12:06:05 PM12/8/93
to

I had to do the same thing to get it to run under OS/2.

486DX2-66 VLB Clone, 16M, 256k cache 45 sec.
running OS/2 2.1
gcc -O2 -m486 yab.c -o yab.exe

>>So comments are welcome.
>
>Probably too small to relate well with performance on real applications.

Well that makes it a perfect benchmark, doesn't it? :-)

Skip Sauls
sk...@cy.cs.olemiss.edu

Dragon Fly

unread,
Dec 8, 1993, 12:17:32 PM12/8/93
to
OK, I'm appending the new "benchmarks" below.
Some folks using DEC boxes had to modify slightly the
original code (which is below) to avoid "segmentation fault" error.

As of today the accumulated results are:

- - - - - - - - Original code - - - - - - - - - - - - - - - - - - - - - -


#include <stdio.h>
#include <math.h>
#include <time.h>
main()
{
double x,y[1000000];
int i;
time_t t;

time(&t);
for (i=0;i<1000000;i++)
{
x=11.0+(33.5*i)*(33.5*i);
y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
}
printf("time=%d\n",time(0)-t);
}
- - - - - - - - - - - - Cut here - - - - - - - - - - - - - - - - - -

Computer Time spent

486DX2-66 EISA/VL 16Mb RAM
running Linux (Slackware 1.1.0).
gcc compiler.
Single user 27 sec.

SUN Sparc-2 with >= 16 Mb RAM
running SunOS
Single user 69 sec.

DEC VAX

running VMS
With quite a few users on 69 sec.

SUN-4
running SunOS
Single user 73 sec.

DEC VAXstation 3100
running VMS
Single user 405 sec.

IBM RS6000/model 530
running AIX 3.2.2
RAM: 50mb
single user 13 sec.

IBM RS6000/model 320
running AIX 3.2.2
RAM: 20mb
single user 16 sec.

IBM RS6000/model 550
running AIX 3.2.2
RAM: 90mb
single user 7 sec.

SUN Sparc-IPX 74 sec.

VAX 3100/80
running VMS
Other users, but not much going on 182 sec.

IBM RS6000 320
running AIX 3.2.5
Other users, but not much going on 18 sec.

IBM RS6000 530
running AIX 3.2.5
Other users, but not much going on 13 sec.

DEC VAX 6630
running VMS 79 sec.



486DX50 ISA 8Mb RAM, 256K cashe
running Debian Linux 0.81BETA
4 users 59 sec.
single user 54 sec.

486DX-33
64Kb read cache
16 megs memory
Single user, only program running. 53 sec.

486DX-33 59 sec.
running Linux (pl13 kernel)
16MB RAM

DEC Alpha AXP 150Mhz
OSF1 1.2
Multiuser mode 7 sec

SGI 4D/35TG (MIPS R3000 based) 48Mb RAM 21 sec.
running Irix 4.0.5C
Single user

SGI Onyx/4 (4xR4400/150 MIPS CPUS) 10 sec.
128 Mb RAM
Single user

- - - from another correspondent - -
I had to modify the code:

double x,*y;
int i;
time_t t;

y = (double *) malloc (1000000 * sizeof(double));

The DEC compiler didn't like the large array.

DEC 5000/240 Ultrix 4.3 (load=0.11) 17 sec.

DEC 5000/200 Ultrix 4.2 (load=0.00) 26 sec.

SPARCstation 10/30 Solaris 2.2 (load=0.02) 47 sec.

SPARCstation 10/42 Solaris 2.2 (load=0.20) 52 sec.


- - - - - - - - - - - - - - - -

Single user (DEC 3000 Model 400) 9 sec.

HP Apollo
running HP-UX 9.0 16 sec.
(4 users)

486DX2-66 ISA/VL 32Mb RAM
running NextStep 3.2
gcc compiler.
Multiple User 32 sec.

Sparc-10
SunOS 4.1.3A 30 sec.

486DX-33 ISA 8Mb RAM
running Linux
Single user, but many Windows,
Swapping heavily 94 sec. real, 58 sec. CPU

IBM PowerServer 520, 32 Mb RAM
RS/6000 Chip
running AIX 3.2.3e
compilation in Background 30 sec. real, 16 sec. CPU

IBM PowerStation 320H, 32 Mb RAM
RS/6000 Chip
running AIX 3.2.3e
single user 12 sec. real, 12 sec. CPU

IBM PowerServer 560, >32 Mb RAM
RS/6000 Chip
running AIX 3.2.5
single user 7 sec. real, 7 sec. CPU

- - - - - - from another correspondent - - - - - - - - -

I had to make the declaration of y global to prevent a segmentation
violation on the DEC Alpha I ran it on.

DEC 3000 Model 500 6.7 s (avg. of 10 runs)


DEC OSF/1 1.3
Multi-user mode, one user logged in
cc -O3 -o bench bench.c -lm -non_shared

************************************

Serge

Fred Kleinsorge

unread,
Dec 8, 1993, 1:02:05 PM12/8/93
to

Doesn't sound like much of a benchmark, but what the heck:


DECPc AXP 150 (6.6ns pass 2.1 EV4), 32mb RAM
OpenVMS AXP V2-FT3
Single User, DECnet, Motif 11 sec.
Single User, No DECnet, No Motif 10 sec.


DEC 3000-400 (6.6ns pass 2.1 EV4) 128mb RAM
OpenVMS AXP V1.5
Single User, DECnet, Motif 9 sec.


--
+--------------------------------------+
Fred Kleinsorge | All opinions expressed are mine, and |
klein...@star.enet.dec.com | may not reflect those of my employer |
+--------------------------------------+

M I Parsons

unread,
Dec 8, 1993, 12:33:56 PM12/8/93
to
|> In article <1993Dec7.0...@pacific.mps.ohio-state.edu>, viz...@mps.ohio-state.edu (Dragon Fly) writes:
|> >Seeing so many benchmark tests contradicting one another
|> >gotta be confusing for insightful observer. For me perpetrating
|> >mostly scientific calculations they do not offer much to
|> >swallow to say nothing about digesting..
|> >So in deep despair have I decided to run the following
|> >short and, I hope, comprehensive code on various boxes widely
|> >spread in academic community.
|> >

I tried to refrain from posting on this but there have been so many
ill informed posts today I can't resist any longer ;-). I've been
away for a week so my apologies if all this has been said before.

The code that was posted and the many followups with "my box ran it
this fast with 'n' users on it" all show a complete lack of understanding
of the relationships between a computer program, the box it is running
on, and the operating system it is running under.

1. You can't use 'time()' to measure CPU usage. It measures real time.
If 10 users run that program at the same time then all will complete
in at least 10X more time than if only one ran it.

2. The program allocates a lot of memory. I have no idea how this is
done under UNIX but under VMS this program could be made to run
very slowly just by making it page a lot.

Actually, now I look at it, it won't page much - your lucky.
This is beause it doesn't skip about in the array, only addresses
it sequentially. When a page needs to be swapped out it will be
and it won't ever be read back in. Of course if you have a huge
8Mbyte working set and at least 8Mbytes of free mememory then
you're laughing - you could probably make anything perform well
under those circumstances!

3. If you'd ever read up even a little bit on how most computation
is found to proceed you might realise that continuous arithmetic
is very rare - even with the very highly compute intensive jobs
which Monte Carlo our 3000 tonne detector my code still spends
a heck of a lot of time making system calls etc.

Basically what I saying is go and get a book on modern operating
systems ("Silberschatz and Peterson - Operating System Concepts"
if I remember correctly is a reasonably good one but a bit anti
VAX and UNIX worshipful) and then think about how to measure
performance. The most reliable way is to run your normal code
on a variety of machines - I trust you write machine independent
code? - and see which runs it fastest and learn about how to
measure CPU time used, _properly_.

Oh and a word of warning - make sure you've got optimisation
turned on - especially on the RISC boxes - they only go fast
when it is on - I leave it as an exercise to work out why
(Hint: read up on pipelining).

Just a few thoughts,

Mark Parsons Edinburgh HEP Group / Aleph Collaboration, LEP, CERN
--=ooOOoo=-- Par...@edinburgh.ac.uk

P.S. Followups to comp.benchmarks. This has no place in comp.os.vms
where most people have at least an idea of how a modern computer
running a modern (ie non-DOS) operating system works.

Heikki Suopanki

unread,
Dec 8, 1993, 4:16:30 PM12/8/93
to

Silly benchmak....

SGI Indy R4000, 16M RAM real 12.40, user 10.66, sys 0.90
SGI Challenge M R4000, 128M RAM real 11.47, user 10.07, sys 0.39

-Heikki

--
***************************************19********************************
***3************************11******************************21***********
****************9******************************5*************************
***********23*************************************7**********************

Vadim Antonov

unread,
Dec 8, 1993, 5:01:14 PM12/8/93
to
Please keep that thread in appropriate newsgroups.
No need to pollute everything with yet another useless "benchmark".

--vadim

Carl Boernecke

unread,
Dec 8, 1993, 5:52:30 PM12/8/93
to
blh...@mtu.edu (Brett L. Huber) writes:
>Carl Boernecke (ca...@inex.com) wrote:
>> I don't like your benchmark! Waaaahhh! Took a total of 4109
>> seconds on my 386/33 (without 387) and 8 MB of RAM. Yes, the
>No 387? What do you want, a miracle?

Of course not, silly. The performace was simply what I expected...
I just knew that everyone would go try it on their fastest machines,
so I wanted to throw some reality into it. After all, not everyone
has a Pentium or DEC Alpha sitting on their motherboard generating
lots of heat while looking for something to do.

--

Carl Boernecke

unread,
Dec 8, 1993, 5:48:36 PM12/8/93
to
dcm...@ritvax.isc.rit.edu (Dan Mattrazzo) writes:
>In article <1993Dec7.0...@pacific.mps.ohio-state.edu>, viz...@mps.ohio-state.edu (Dragon Fly) writes:
[code deleted...]

>>As everybody with eyes can see, the program calculates some stuff
>>in a loop storing it in memory (gotta be ~ 8Mb of RAM taken)
>>and gives on output the number of seconds spent. And here are the
>>results of calculation:

> Another problem that I can see is that the code is small enough
> to fit in cache, which will easily skew the results. That would
> explain why the poor chap with the 386 might have taken a few
> minutes to run, if he didn't have cache.

No, I have 64K worth of cache installed on my machine... I think
the main reasons it took so long were the limited memory (8 MB),
and usage level (all that X and other user activity). If there's
just no free memory avaialble, then it takes a while to swap it
all.

I just did the benchmark on my 486DX2/66 (256k cache, 24 MB of RAM),
and with a light load finished somewhere around 40+ seconds. Things
like X going on, nntp, news batching/sending/receiving. All kinds
of merry activity that you might expect from a server.

Bill Broadley

unread,
Dec 9, 1993, 12:14:07 AM12/9/93
to

> DEC 3000-400 (6.6ns pass 2.1 EV4) 128mb RAM 9 sec.

Hp-735 64 MB ram, pretty much idle, 2 users HPUX 9.01

I added a for 1-10 loop:
Viper> ./a.out
time=39

Thats 3.9 seconds per loop.

--
Bill Broadley@{neurocog,schneider3,lrdc5}.lrdc.pitt.edu (in order of preference)
Linux is great. Bike to live, live to bike. PGP-ok

Ryan B Gran

unread,
Dec 8, 1993, 4:28:14 PM12/8/93
to
>>>486DX2-66 EISA/VL 16Mb RAM
>>>running Linux (Slackware 1.1.0).
>>>gcc compiler.
>>>Single user 27 sec.
>>>
>>>SUN Sparc-2 with >= 16 Mb RAM
>>>running SunOS
>>>Single user 69 sec.

486DX2-66 EISA/VL 32 MB RAM, 256k Cache (Gateway 2000)
running SCO 3.2v4.2
cc compiler (single user) 45 sec.
cc compiler (multi user) 47 sec.
gcc compiler (single & multi use) 44 sec.

SGI - MIPS 4000-100 64 MB RAM
running IRIX Release 4.0.5H
cc compile (multi user) 11 sec.

The SGI machine belongs to another department, and as such I don't have
the authority to bring it down to single user mode for testing; nor do
I know all of the technical details about the machine (cache, etc).

Ryan Gran
Par...@world.std.com

Andres Kruse (NIKHEF)

unread,
Dec 8, 1993, 4:03:25 PM12/8/93
to
Please, please!! Don't continue running this program on your machines
and post the results.
Look at the code first and decide if it makes any sense to do it.
There are several oddities:

- It is using the time(2) function... check your man pages to
see what that means...
- It is putting a lot of emphasis on trigonometric functions.
- You have to quote the CPU type, the cache size, the memory
size, the compiler options etc. All this has a big influence!

There are many benchmarking sources around, well established,
as Brett Huber (blh...@mtu.edu) correctly sais.
If you are looking for some code to check your FPU, what about
the good'ol WHETSTONE. What about LINPAK, DHRYSTONE, etc.

Seeing so many people blindly taking this source and wasting their
CPU cycles and bandwidth in the NET I think that it's good to
have SPEC around. SPEC *does* give a quite good estimate on how
the performance compares. If you want to run some code on your own
machine, check out 'ftp.nosc.mil', subdirectory 'pub/aburto'.
Here you find a lot of popular benchmarking sources, together
with a lot of results from various workstations. You also find
some nice comments by Al (Aburto) in the source files.

Cheers,

Andres

----------------------------------------------------------------------------
Andres Kruse | NIKHEF - National Institute for Nuclear Physics and
A.K...@nikhef.nl | High-Energy Physics, Amsterdam, The Netherlands


J. D. McDonald

unread,
Dec 9, 1993, 12:03:27 PM12/9/93
to
In article <CHqIH...@dscomsa.desy.de> kr...@zow.desy.de (Andres Kruse (NIKHEF)) writes:


>Please, please!! Don't continue running this program on your machines
>and post the results.

Please DO do so. It's interesting.


> Look at the code first and decide if it makes any sense to do it.
>There are several oddities:

>- It is using the time(2) function... check your man pages to
> see what that means...


Yes, indeed . This means that it measures the ACTUAL time the program takes.
This is what ACTUALLY MATTERS to the user.


>- It is putting a lot of emphasis on trigonometric functions.

Sort of .. sqrt and log are not trig functions.. So what it tells you is,
how fast programs that are dominated with transcendtal functions
will ACTUALLY run on your system under real conditions.


>- You have to quote the CPU type, the cache size, the memory
> size, the compiler options etc. All this has a big influence!

Yes, of course.


>Seeing so many people blindly taking this source and wasting their
>CPU cycles and bandwidth in the NET I think that it's good to
>have SPEC around. SPEC *does* give a quite good estimate on how
>the performance compares.


Well, yes and no. SPEC is a good set of representative programs,
but it does not normally measure the ACTUAL time to run a program.


Both types of benchmarks are useful. But the ACTUAL clock times
are, in truth, more important. If a computer does well
on SPEC and poorly on this, it is telling you something ... mainly
that SPEC is not really truly going to tell you how long your
program will take to run, because it does not normally measure
actual clock time. Also, this benchmark will generate a BIG difference
in numbers on similar hardware depending on how that hardware is used.
Knowing the RANGE is a VERY important fact. OF course, the actual
code in SPEC could be changed to give the same information .. and THAT
would, I argue, be an excellent, if for each machine a histogram
of such times was published.

Doug McDonald

Carl J Lydick

unread,
Dec 9, 1993, 4:35:48 PM12/9/93
to
In article <mcdonald.9...@aries.scs.uiuc.edu>, mcdo...@aries.scs.uiuc.edu (J. D. McDonald) writes:
=> Look at the code first and decide if it makes any sense to do it.
=>There are several oddities:
=
=>- It is using the time(2) function... check your man pages to
=> see what that means...
=
=
=Yes, indeed . This means that it measures the ACTUAL time the program takes.
=This is what ACTUALLY MATTERS to the user.

It also renders the results absolutely meaningless for multi-user systems
unless whomever posts the results also posts a description of the load being
put on his system by other users in his post.

=>- It is putting a lot of emphasis on trigonometric functions.
=
=Sort of .. sqrt and log are not trig functions.. So what it tells you is,
=how fast programs that are dominated with transcendtal functions
=will ACTUALLY run on your system under real conditions.

Something which is heavily dependent on the math library against which you link
your program. There ARE a number of different math libraries out there for
various platforms, you know. Again, this renders the results meaningless.

Why is it that folks who put their faith in benchmarks are generally clueless?
--------------------------------------------------------------------------------
Carl J Lydick | INTERnet: CA...@SOL1.GPS.CALTECH.EDU | NSI/HEPnet: SOL1::CARL

Disclaimer: Hey, I understand VAXen and VMS. That's what I get paid for. My
understanding of astronomy is purely at the amateur level (or below). So
unless what I'm saying is directly related to VAX/VMS, don't hold me or my
organization responsible for it. If it IS related to VAX/VMS, you can try to
hold me responsible for it, but my organization had nothing to do with it.

Warner Losh

unread,
Dec 9, 1993, 5:39:30 PM12/9/93
to
In article <mcdonald.9...@aries.scs.uiuc.edu> mcdo...@aries.scs.uiuc.edu (J. D. McDonald) writes:
>Yes, indeed . This means that it measures the ACTUAL time the program takes.
>This is what ACTUALLY MATTERS to the user.

True, but misleading. You can't compare times between machines if you
use wall time and there is other activity on the system. It is like
comparing apples and oranges. They are two different things. Since
this program did NO I/O, CPU time is a much better representation of
how fast the machine runs.

If you use wall time on a system where other things are happening,
that introduces variables that are hidden dependancies which in
staticital terms make your results meaningless.

I got this great benchmark that makes my pdp-8 seem faster than a
Sparc-10. :-)

Finally, a quote:
" There are five kinds of lies: Lies, damn lies,statistics,
benchmarks and release dates" -- unknown from Mark Twain.

Warner
--
Warner Losh i...@boulder.parcplace.COM ParcPlace Boulder
I've almost finished my brute force solution to subtlety.

Dragon Fly

unread,
Dec 9, 1993, 7:26:23 PM12/9/93
to
Notwithstanding possible critique from alleged
computer specialists the insightful observer might note
that the "benchmark" code is pretty typical for scientific
calculations. Whatever other merits the system might have,
if it's dragging its feet on this test it means the system
from the point of view of consumer [insightful observer] is
a crap. As many insightful observers probably have already
noticed, the crap is being limited mainly to two mainstreams:
SUN Sparcs and DECs running VMS.

I excluded the benchmarks obtained on boxes with unknown
specifications.


As of today the accumulated results are:

- - - - - - - - Original code - - - - - - - - - - - - - - - - - - - - - -
#include <stdio.h>
#include <math.h>
#include <time.h>
main()
{
double x,y[1000000];
int i;
time_t t;

time(&t);
for (i=0;i<1000000;i++)
{
x=11.0+(33.5*i)*(33.5*i);
y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
}
printf("time=%d\n",time(0)-t);
}
- - - - - - - - - - - - Cut here - - - - - - - - - - - - - - - - - -

Computer Time spent

486DX2-66 EISA/VL 16Mb RAM
running Linux (Slackware 1.1.0).

gcc -O3 -o bench bench.c -lm
Single user 27 sec.

486DX2-66
AMI Enterprise III VL/EISA m/b with 32MB ram
Linux 0.99pl14
gcc 2.4.5.
Standalone machine. 27 sec.

486DX2-66 ISA/VL 16Mb RAM 256K Cache
MS-DOS
MicroWay NDPC 4.30 -n2 -n3 -OLM -exp 25 sec

486DX50 ISA 8Mb RAM, 256K cashe
running Debian Linux 0.81BETA
4 users 59 sec.
single user 54 sec.

486DX-33
64Kb read cache
16 megs memory
Single user, only program running. 53 sec.

486DX-33 59 sec.
running Linux (pl13 kernel)
16MB RAM

486DX-33 ISA 8Mb RAM


running Linux
Single user, but many Windows,
Swapping heavily 94 sec. real, 58 sec. CPU

486DX2-66 ISA/VL 32Mb RAM


running NextStep 3.2
gcc compiler.
Multiple User 32 sec.

486DX2-66 VLB Clone, 16M, 256k cache 45 sec.


running OS/2 2.1
gcc -O2 -m486 yab.c -o yab.exe

SUN Sparc-10
SunOS 4.1.3A 30 sec.

SUN Sparc-2 with >= 16 Mb RAM
running SunOS
Single user 69 sec.

SUN Sparc-IPX 74 sec.

SUN-4
running SunOS
Single user 73 sec.

VAX 3100/80
running VMS
Other users, but not much going on 182 sec.

DEC VAX 6630
running VMS 79 sec.

IBM RS6000/model 530


running AIX 3.2.2
RAM: 50mb
single user 13 sec.

IBM RS6000/model 320
running AIX 3.2.2
RAM: 20mb
single user 16 sec.

IBM RS6000/model 550
running AIX 3.2.2
RAM: 90mb
single user 7 sec.

IBM RS6000 320
running AIX 3.2.5
Other users, but not much going on 18 sec.

IBM RS6000 530
running AIX 3.2.5
Other users, but not much going on 13 sec.

IBM PowerServer 520, 32 Mb RAM
RS/6000 Chip
running AIX 3.2.3e
compilation in Background 30 sec. real, 16 sec. CPU

IBM PowerStation 320H, 32 Mb RAM
RS/6000 Chip
running AIX 3.2.3e
single user 12 sec. real, 12 sec. CPU

IBM PowerServer 560, >32 Mb RAM
RS/6000 Chip
running AIX 3.2.5
single user 7 sec. real, 7 sec. CPU

HP Apollo


running HP-UX 9.0 16 sec.

HP/PA 720 HPUX 9.01 64 Meg RAM 10 sec
HP/PA 735 HPUX 9.01 64 Meg RAM 5 sec

Hp-735 64 MB ram, pretty much idle,

2 users HPUX 9.01 3.9 sec

SGI 4D/35TG (MIPS R3000 based) 48Mb RAM 21 sec.
running Irix 4.0.5C
Single user

SGI Onyx/4 (4xR4400/150 MIPS CPUS) 10 sec.
128 Mb RAM
Single user

SGI Indigo, 32 Mb RAM
running IRIX 4.0.5.
multiuser but idle
cc -O2 bench.c -o bench -lm 10 sec.

DEC Alpha AXP 150Mhz
OSF1 1.2
Multiuser mode 7 sec

DEC 3000 Model 400
Single user 9 sec.

DECPc AXP 150 (6.6ns pass 2.1 EV4), 32mb RAM
OpenVMS AXP V2-FT3
Single User, DECnet, Motif 11 sec.
Single User, No DECnet, No Motif 10 sec.

DEC 3000-400 (6.6ns pass 2.1 EV4) 128mb RAM


OpenVMS AXP V1.5
Single User, DECnet, Motif 9 sec.

DEC 4000/710 with 256MB of memory.


DEC OSF/1 1.3 12 users, load avg 1.0

cc -O3 viz.c -lm -non_shared 6 sec.

- - - from another correspondent - -
I had to modify the code:

double x,*y;
int i;
time_t t;

y = (double *) malloc (1000000 * sizeof(double));

The DEC compiler didn't like the large array.

DEC 5000/240 Ultrix 4.3 (load=0.11) 17 sec.

DEC 5000/200 Ultrix 4.2 (load=0.00) 26 sec.

SPARCstation 10/30 Solaris 2.2 (load=0.02) 47 sec.

SPARCstation 10/42 Solaris 2.2 (load=0.20) 52 sec.


- - - - - - from another correspondent - - - - - - - - -

I had to make the declaration of y global to prevent a segmentation
violation on the DEC Alpha I ran it on.

DEC 3000 Model 500 6.7 s (avg. of 10 runs)
DEC OSF/1 1.3
Multi-user mode, one user logged in
cc -O3 -o bench bench.c -lm -non_shared

Serge

Todd Walk

unread,
Dec 10, 1993, 10:45:49 AM12/10/93
to
viz...@mps.ohio-state.edu (Dragon Fly) writes:

> Notwithstanding possible critique from alleged
>computer specialists the insightful observer might note
>that the "benchmark" code is pretty typical for scientific
>calculations. Whatever other merits the system might have,
>if it's dragging its feet on this test it means the system
>from the point of view of consumer [insightful observer] is
>a crap. As many insightful observers probably have already
>noticed, the crap is being limited mainly to two mainstreams:
>SUN Sparcs and DECs running VMS.

Well I'm not an "alleged computer specialists", I'm a PhD.
candidate at UTK, and I'm in agreement with the others that
say that your benchmark is "crap".

Inaccuarate benchmarking is easy.
Accuarate benchmarking is something that the Federal Government
spends millions of $$$ on for grants to university professors
who then work for YEARS refining test suites.

(At UTK here Jack Dongarra does a lot of work on benchmark programs,
esp. Linpack. He's one of those million $$$ professors.
Take a good look a Linpack and then compare it to your little
code blurb, then if you're still interrested come back
with a new, more reasonable program.)


--
Todd Walk
wa...@mrcnext.cso.uiuc.edu

Message has been deleted

Timothy D. Shoppa x4256

unread,
Dec 10, 1993, 12:25:00 PM12/10/93
to
In article <2ea5jd$8...@vixen.cso.uiuc.edu>, wa...@mrcnext.cso.uiuc.edu (Todd Walk) writes...

>viz...@mps.ohio-state.edu (Dragon Fly) writes:
>
>> Notwithstanding possible critique from alleged
>>computer specialists the insightful observer might note
>>that the "benchmark" code is pretty typical for scientific
>>calculations. Whatever other merits the system might have,
>>if it's dragging its feet on this test it means the system
>>from the point of view of consumer [insightful observer] is
>>a crap. As many insightful observers probably have already
>>noticed, the crap is being limited mainly to two mainstreams:
>>SUN Sparcs and DECs running VMS.
>
>Well I'm not an "alleged computer specialists", I'm a PhD.
>candidate at UTK, and I'm in agreement with the others that
>say that your benchmark is "crap".

I wouldn't go this far. The "benchmark" tests exactly what it looks
like - the ability of the c math library to do sin, cos, exp, and log.

I would violently disagree with the originator of the benchmark that
this is "typical for scientific calculations", of course. No calculation
I know of does only transcendental functions. The vast majority of
the cpu time I use is spent doing matrix algebra. This benchmark
tells you nothing about the speed of doing this.
Perhaps a high school student would propose that sin, exp,
and log are "typical" of scientific computation, but that would only be
because these are buttons on his calculator that do "scientific" functions.

The allegation that the wall time is what matters only shows that the person
proposing the use of wall times has never even been involved with serious
scientific computation.

The observation that the "crap" is predominately "SUN Sparcs and DECs
running VMS" is the STUPIDEST thing I've ever heard. The originator
of the benchmark doesn't know the simplest thing about benchmarking if
he compares the wall time of poorly specified CPU's with an unspecified
number of users and jobs and an unspecified operating system and an unspecified
amount of memory. You can get any number you want by loading the system
down, changing the priority of the process, etc.

It's about as useful as comparing the 0-60 time of a
"Volkswagon" in unspecified traffic with the 0-60 time of a "Ford"
in unspecified traffic. You'll obviously get different numbers
depending on if you have an old Volkswagon Microbus competing agains a
Taurus SHO, or if you have a Volkswagon Jetta going against a model T.
And depending on the traffic, the situation could be again reversed.
The variation in CPU speed between different VAX 3100 models is actually
*much* greater than the performance difference between a Model T and
the Taurus SHO, as a matter of fact.


>
>Inaccuarate benchmarking is easy.
>Accuarate benchmarking is something that the Federal Government
>spends millions of $$$ on for grants to university professors
>who then work for YEARS refining test suites.
>
>(At UTK here Jack Dongarra does a lot of work on benchmark programs,
>esp. Linpack. He's one of those million $$$ professors.
>Take a good look a Linpack and then compare it to your little
>code blurb, then if you're still interrested come back
>with a new, more reasonable program.)

Linpack is the standard used in my circle for benchmarks, too.
I don't think it says anything about math library functions, though.
(I could stand corrected on this point. The only Linpack benchmarks
I regularly see are the matrix ones - are there special function ones as
well? I kind of doubt it, considering what Linpack was written for.)

Some computer systems do have crippled math libraries. In particular,
our Alpha's ( OSF 1.2 ) have a FORTRAN exp function that runs 30 times
slower doing single precision exp than double precision exp. Our benchmarks
didn't show this, because they were all double precision. We were
extremely confused when a particular code, that did do a good number of
single precision exp calls, ran slower on our Alpha's than our old
VAXen. Modifying the calls to be all double precision sped things
back up to our expectations!

Tim (sho...@altair.krl.caltech.edu)

Dan Hildebrand

unread,
Dec 10, 1993, 10:29:00 AM12/10/93
to
In article <1993Dec8.1...@ornl.gov>,

Dave Sill <d...@de5.ctd.ornl.gov> wrote:
>In article <1993Dec7.0...@pacific.mps.ohio-state.edu>, viz...@mps.ohio-state.edu (Dragon Fly) writes:
>>- - - - - - - - Cut here - - - - - - - - - - - - - - - - - - - - - -
>>#include <stdio.h>
>>#include <math.h>
>>#include <time.h>
>>main()
>>{
>>double x,y[1000000];
>>int i;
>>time_t t;
>>
>>time(&t);
>>for (i=0;i<1000000;i++)
>> {
>> x=11.0+(33.5*i)*(33.5*i);
>> y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
>> }
>>printf("time=%d\n",time(0)-t);
>>}
>>- - - - - - - - - - - - Cut here - - - - - - - - - - - - - - - - - -
>I had to make the declaration of y global to prevent a segmentation
>violation on the DEC Alpha I ran it on.

I used a -N 9000K option for QNX 4.2 to allow sufficient stack space.

>> Computer Time spent
>>
>>486DX2-66 EISA/VL 16Mb RAM

>>Single user 27 sec.
>>
>>SUN Sparc-2 with >= 16 Mb RAM

>>Single user 69 sec.
>>
>>DEC VAX with ALPHA chip

>>With quite a few users on 69 sec.
>>
>>SUN-4

>>Single user 73 sec.
>>
>>DEC VAXstation 3100

>>Single user 405 sec.
>
>DEC 3000 Model 500 6.7 s (avg. of 10 runs)
>DEC OSF/1 1.3

>cc -O3 -o bench bench.c -lm -non_shared

60 MHz ALR Pentium Evolution 9 sec.
QNX 4.2 with Watcom C v9.5
cc -Otax -o bench bench.c -5 -Wc,-fp5 -N9000k

The -5 and -Wc,-fp5 enable the Pentium-specific optimizations in the
Watcom compiler.
--
Dan Hildebrand email: da...@qnx.com
QNX Software Systems, Ltd. QUICS: danh (613) 591-0934 (data)
(613) 591-0931 x204 (voice) mail: 175 Terence Matthews
(613) 591-3579 (fax) Kanata, Ontario, Canada K2M 1W8

Sam Howard

unread,
Dec 10, 1993, 7:49:51 PM12/10/93
to
>Seeing so many benchmark tests contradicting one another
>gotta be confusing for insightful observer. For me perpetrating
>mostly scientific calculations they do not offer much to
>swallow to say nothing about digesting..
>So in deep despair have I decided to run the following
>short and, I hope, comprehensive code on various boxes widely
>spread in academic community.
>
>
>As everybody with eyes can see, the program calculates some stuff
>in a loop storing it in memory (gotta be ~ 8Mb of RAM taken)
>and gives on output the number of seconds spent. And here are the
>results of calculation:
>
> Computer Time spent
IBM 250 (the new PPC) 13 (no optimization)
AIX 3.2.5/X11R5 10 (optimization on)
64M ram
just me (which means that i'm 56M used before running this... :(

--
Samuel P. Howard, PHIGS/PEX Test, KGS
TelTech
...graphics is cool...

"Winter is the season in which people try to keep the house as warm as
it was in the summer, when they complained about the heat."

<these are my words, and no one else can have them!>

Brian

unread,
Dec 11, 1993, 12:19:46 AM12/11/93
to
wa...@mrcnext.cso.uiuc.edu (Todd Walk) writes:

> Accuarate benchmarking is something that the Federal Government
> spends millions of $$$ on for grants to university professors
> who then work for YEARS refining test suites.

Tests by the federal government for $$$. Boo. Last year I read of the
high school graduate SAT scores. A certain class of people scored
too low (women vs men, but please don't turn this into a social thread).
The governmental statement? The test must be changed because what
we already "know to be true" (class1=class2) wasn't supported by
the test.

I balked at the phrase "refining test suites." This sounds like tweaking
the test until the "right" answers come out!

---
Brian Mork Internet bm...@opus-ovh.spk.wa.us (BBS 509-244-9260)
. . . .. Amateur Radio (AX.25) ka9snf@wb7nnf.#spokn.wa.usa
... . . USMail 6006-B Eaker, Fairchild, WA 99011

Anatoly....@kamaz.kazan.su

unread,
Dec 11, 1993, 6:48:08 AM12/11/93
to
In
comp.sys.ibm.pc.hardware,comp.os.vms,comp.os.linux.misc,comp.benchmarks,relcom.talk,relcom.fido.su.general
article <10DEC199...@almach.caltech.edu> Timothy D. Shoppa x
writes:

>
>The allegation that the wall time is what matters only shows that the person
>proposing the use of wall times has never even been involved with serious
>scientific computation.
>

Stop flame, please!!! :-)
It's not technical problem, it's medical one.
Our modern Fantomas ("Dragon Fly") wanted to fool others that
his poor PC with pseudo-UNIX is best solution. :-)
May be his poor mother dropped him from 10'th floor,
so he still want to show others that he anyway is SuperMan. :)
What is the English for "Complex Nepolnotsennosti"?


--
Anatoly M. Lisovsky, KAMAZ Inc., General Economics Department, STAR division
------------ The Network is The Computer. Per Aspera ad Sun! ---------------

Todd Walk

unread,
Dec 11, 1993, 1:11:04 PM12/11/93
to
bm...@opus-ovh.spk.wa.us (Brian) writes:

>wa...@mrcnext.cso.uiuc.edu (Todd Walk) writes:

>> Accuarate benchmarking is something that the Federal Government
>> spends millions of $$$ on for grants to university professors
>> who then work for YEARS refining test suites.

>Tests by the federal government for $$$. Boo. Last year I read of the
>high school graduate SAT scores. A certain class of people scored
>too low (women vs men, but please don't turn this into a social thread).
>The governmental statement? The test must be changed because what
>we already "know to be true" (class1=class2) wasn't supported by
>the test.

>I balked at the phrase "refining test suites." This sounds like tweaking
>the test until the "right" answers come out!

"refining test suites" means making large statistical studies on
the instruction types used for different common problem types and
changing the test suite to conform to it. It has nothing to do
with changing things until you get the results you want and
everything to do with getting good relative performance differences
between computers for running REAL WORLD programs.

BTW, computer testing is MUCH more straightforward than human testing.

--
Todd Walk
wa...@mrcnext.cso.uiuc.edu

Mighty Firebreather

unread,
Dec 8, 1993, 2:09:18 PM12/8/93
to
viz...@mps.ohio-state.edu (Dragon Fly) writes:

>Seeing so many benchmark tests contradicting one another
>gotta be confusing for insightful observer. For me perpetrating
>mostly scientific calculations they do not offer much to
>swallow to say nothing about digesting..
>So in deep despair have I decided to run the following
>short and, I hope, comprehensive code on various boxes widely
>spread in academic community.
>

>- - - - - - - - Cut here - - - - - - - - - - - - - - - - - - - - - -
>#include <stdio.h>
>#include <math.h>
>#include <time.h>
>main()
>{
>double x,y[1000000];
>int i;
>time_t t;
>
>time(&t);
>for (i=0;i<1000000;i++)
> {
> x=11.0+(33.5*i)*(33.5*i);
> y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
> }
>printf("time=%d\n",time(0)-t);
>}
>- - - - - - - - - - - - Cut here - - - - - - - - - - - - - - - - - -
>

>As everybody with eyes can see, the program calculates some stuff
>in a loop storing it in memory (gotta be ~ 8Mb of RAM taken)
>and gives on output the number of seconds spent. And here are the
>results of calculation:
>
> Computer Time spent
>

>486DX2-66 EISA/VL 16Mb RAM
>running Linux (Slackware 1.1.0).

>gcc compiler.


>Single user 27 sec.
>
>SUN Sparc-2 with >= 16 Mb RAM

>running SunOS


>Single user 69 sec.
>
>DEC VAX with ALPHA chip

>running VMS


>With quite a few users on 69 sec.
>
>SUN-4

>running SunOS


>Single user 73 sec.
>
>DEC VAXstation 3100

>running VMS
>Single user 405 sec.
>
>
>

>So comments are welcome.
>

Well, this helps to illustrate why benchmarks are confusing. The
test program requires some 8mb of RAM for data storage. If this RAM is
available, you are measuring CPU speed (more or less, see further
discussion). If, as is likely on many VMS systems, the user does not have
an 8+MB working set, you measurement includes paging time as well as CPU
speed. If 8+Mb is available on the free list, the system will "soft
fault". If, as is likely on many VMS systems, there is not 8+mb on the
free list, the system will page to disk. Now what are you measuring?

You are invoking library functions sin(), cos(), sqrt(), exp(), and
log(). Do you suppose that differences between implementations of these
functions might have some effect on the results?

Your program was compiled by mostly unspecified c compilers which
introduce their own variations due to varying quality of the generated
code.

Benchmarks must be carefully designed to measure what they claim to
measure and not something else. Modifying the test program to, in effect,
throw away the results of the calculation by not storing them in a
1,000,000 element array, should have a substantial effect on the time
required.

Reports of benchmarks need to carefully specify the systems and
software used. There is, for example, no such thing as a "VAX with an
Alpha chip". DEC produces Alpha AXP systems in various models with
different CPU chips (differing in maximum clock rates if nothing else,
Alpha AXP is an *architecture* not a particular chip design, or integrated
circuit fabrication technology. In principle, the Alpha AXP architecture
could be implemented with mechanical relays instead of silicon. Such a
machine would fill a large building, have incredible power and A/C
requirements, be unreliable and incredibly slow but it would still be an
Alpha AXP!)

The VAXstation 3100 comes in models 30,40,38,48, 76 and maybe a
model 90 as well. These models have different CPU speeds but you didn't
say which one your benchmark ran on. It was probably one of the faster
ones; I just ran a modified benchmark (eliminating the array storage) on a
3100 Model 48 with 16Mb of memory, VMS 5.4, VAX C 3.1 and it took 522
seconds.

A VAX 8200 is up to 27 minutes and counting!

So, while your benchmark suggests that the 486 is a more powerful
machine than the other iron you cited, it's hardly a scientific test.
FWIW, my own, highly unscientific, benchmark suggests that a 486DX33 should
be rated in the 10-15 VUPs range.

*************************************************************************
* Here, there be dragons! *
* dra...@nscvax.princeton.edu *
* *
* I'm job hunting. Any offers or leads will be appreciated. *
* Thanks! *
* Richard B. Gilbert *
*************************************************************************

Arne Vajhoej

unread,
Dec 9, 1993, 4:24:57 AM12/9/93
to
> Seeing so many benchmark tests contradicting one another
> gotta be confusing for insightful observer. For me perpetrating
> mostly scientific calculations they do not offer much to
> swallow to say nothing about digesting..
> So in deep despair have I decided to run the following
> short and, I hope, comprehensive code on various boxes widely
> spread in academic community.

It is actually difficult to write good benchmarks, so you will need to
think very deeply over what you actually want to test and how to test
it, if you want some meaningfull results.

> Computer Time spent
>
> 486DX2-66 EISA/VL 16Mb RAM
> running Linux (Slackware 1.1.0).
> gcc compiler.
> Single user 27 sec.

> DEC VAX with ALPHA chip


> running VMS
> With quite a few users on 69 sec.

(first: the term "VAX with ALPHA chip" is absolutely inconsistent, since
a VAX is a chip and a ALPHA is a different chip)

Our DEC AXP 3000-400 gives 9 sec. every time with no load !

Possible explanations of the difference:

1) The time function measures real time not CPU time. So if there are
about 7-8 computable proceses on your AXP, then the results are
consistent.
2) Your working-set quotas are to small on your AXP, so that you are
comparing RAM-access-time on the 486DX2 with disk-access-time
on the AXP.

> DEC VAXstation 3100
> running VMS
> Single user 405 sec.

There are quite a difference between the various VAXstation 3100 models,
so this is not a sufficient name to give in benchmark results (a VAXstation
3100 model 30 is about 2.8 VUPS while a model 76 is about 7.6 VUPS).

Arne

Arne Vajhøj local DECNET: KO::ARNE
Computer Department PSI: PSI%238310013040::ARNE
Business School of Southern Denmark Internet: AR...@KO.HHS.DK


Rupa Schomaker

unread,
Dec 12, 1993, 10:54:56 AM12/12/93
to
In article <2e51ht$4...@Tut.MsState.Edu> sk...@cy.cs.olemiss.edu (Skip Sauls) writes:
>In article <1993Dec8.1...@ornl.gov> d...@de5.ctd.ornl.gov (Dave Sill) writes:
>>In article <1993Dec7.0...@pacific.mps.ohio-state.edu>, viz...@mps.ohio-state.edu (Dragon Fly) writes:
>>I had to make the declaration of y global to prevent a segmentation
>>violation on the DEC Alpha I ran it on.
>
>I had to do the same thing to get it to run under OS/2.

Same here. You can increase the stack size also, but... (it is
easier to just declare it global)

386/387-40 ISA Clone, 8M, 64k cache 102 sec.
running OS/2 2.1
bcc -O2 yab.c

(the 387 is an ITT chip, not Intel)

>>>So comments are welcome.
>>
>>Probably too small to relate well with performance on real applications.
>
>Well that makes it a perfect benchmark, doesn't it? :-)

heh! Makes my 386 look fairly fast.


>Skip Sauls
>sk...@cy.cs.olemiss.edu


--
ru...@sugar.NeoSoft.COM | #include "disclaimer.h"
scho...@sptvx2.sinet.slb.com | OS/2 -- Your chance to run the world.

Thomas Haywood

unread,
Dec 12, 1993, 6:40:48 AM12/12/93
to
I actually couldn't compile it.
I got error messaged like this;
unknown symbol _sin in bench.o
unknown symbol _log in bench.o
Plus for the other three maths functions.

I'm using Slackware Linux 1.1.0.
and compiled like this
gcc -O6 -m486 -c bench.c
gcc -O6 -m486 -o test bench.o

My guess is that the maths functions are missing from my standard libraries.
Any other reasons why this would happen?

Any way to fix it?

Thanks.......
--
Welcome to my new mail box...........
Tommy Haywood: to...@zikzak.apana.org.au
2nd Year BCSE, Monash Uni/Clayton, Vic, Aus
Home: Wantirna Sth, Melbourne, Victoria, Australia

Zhenya Sorokin

unread,
Dec 12, 1993, 4:37:16 PM12/12/93
to
Todd Walk (wa...@mrcnext.cso.uiuc.edu) wrote:
>viz...@mps.ohio-state.edu (Dragon Fly) writes:

>> Notwithstanding possible critique from alleged
>>computer specialists the insightful observer might note
>>that the "benchmark" code is pretty typical for scientific
>>calculations. Whatever other merits the system might have,
>>if it's dragging its feet on this test it means the system
>>from the point of view of consumer [insightful observer] is
>>a crap. As many insightful observers probably have already
>>noticed, the crap is being limited mainly to two mainstreams:
>>SUN Sparcs and DECs running VMS.

>Well I'm not an "alleged computer specialists", I'm a PhD.
>candidate at UTK, and I'm in agreement with the others that
>say that your benchmark is "crap".

Anyway, it produced correct results.
We compared a FORTRAN program (mostly FFT) on a 486 and on a IBM R6000/350
The speed difference was about 4 times.
It is exactly the same difference as predicted by this test (27 sec/7 sec).
I hope the professor's benchmarks could be so exact.

BTW LaTeX speed difference was only 2 times. Dvips - 4 times.

All this only proves that the best benchmark is the application itself.
And for _some_ groups of sci. calculations this test _is_ correct.
Naturally, the generalization of the author is ridiculous, but ... well,
the author is known in selected newsgroups for such generalizations.

>Inaccuarate benchmarking is easy.
>Accuarate benchmarking is something that the Federal Government
>spends millions of $$$ on for grants to university professors
>who then work for YEARS refining test suites.

>(At UTK here Jack Dongarra does a lot of work on benchmark programs,
>esp. Linpack. He's one of those million $$$ professors.
>Take a good look a Linpack and then compare it to your little
>code blurb, then if you're still interrested come back
>with a new, more reasonable program.)


--

--------------
Zhenya Sorokin
Vienna, Austria

Brian

unread,
Dec 12, 1993, 4:07:47 PM12/12/93
to
wa...@mrcnext.cso.uiuc.edu (Todd Walk) writes:

> BTW, computer testing is MUCH more straightforward than human testing.

Tally Ho! No debate here!

Carl Boernecke

unread,
Dec 12, 1993, 8:06:47 PM12/12/93
to
to...@zikzak.apana.org.au (Thomas Haywood) writes:
>I actually couldn't compile it.
>I got error messaged like this;
>unknown symbol _sin in bench.o
>unknown symbol _log in bench.o
>Plus for the other three maths functions.

Easy enough... you just need to include the math libraries.

>I'm using Slackware Linux 1.1.0.
>and compiled like this
>gcc -O6 -m486 -c bench.c
>gcc -O6 -m486 -o test bench.o

A line such as: 'gcc -O6 -m486 bench.c -o bench -lm' should do
the trick.

>My guess is that the maths functions are missing from my standard libraries.
>Any other reasons why this would happen?

Normally, things like math libraries, terminal/cursor libraries,
anything X-related, bsd-specific, etc, etc, etc, aren't included
so you need to add an appropriate '-l<libname>' to your compile
options.

Anatoly....@kamaz.kazan.su

unread,
Dec 10, 1993, 8:07:56 AM12/10/93
to
In
comp.sys.ibm.pc.hardware,comp.os.vms,comp.os.linux.misc,comp.benchmarks,relcom.talk,relcom.fido.su.general
article <1993Dec10....@pacific.mps.ohio-state.edu> Dragon
Fly writes:

> Notwithstanding possible critique from alleged
>computer specialists the insightful observer might note
>that the "benchmark" code is pretty typical for scientific
>calculations. Whatever other merits the system might have,
>if it's dragging its feet on this test it means the system
>from the point of view of consumer [insightful observer] is
>a crap. As many insightful observers probably have already
>noticed, the crap is being limited mainly to two mainstreams:
>SUN Sparcs and DECs running VMS.

Our dear flying friend invited "benchmark" which used to
test multiuser systems in terms of calculator. :)
Dear Dragon,
take Z80 WITHOUT ANY OS, neither multithreaded, nor multiuser,
to get best results.

BTW, it seems to me, i know, who is the flying birdanimal.
You must find in Moscow, if want to kill Dragon. :)

[stupid results deleted]

Arne Vajhøj

unread,
Dec 10, 1993, 7:26:45 AM12/10/93
to
> : Computer Time spent

>
> : 486DX2-66 EISA/VL 16Mb RAM
> : running Linux (Slackware 1.1.0).
> : gcc compiler.
> : Single user 27 sec.
>
> 486DX50 ISA 8Mb RAM, 256K cashe
> running Debian Linux 0.81BETA
> 4 users 59 sec.
> single user 54 sec.

So you have proved that an application that uses 8 MB memory of data
runs double as fast on a machine with 16 MB physical memory than on a
machine with only 8 MB physical memory.

I am not surprised !

Anders Rolff

unread,
Dec 13, 1993, 6:49:55 AM12/13/93
to
In article <2e85nk$g...@gap.cco.caltech.edu>, ca...@SOL1.GPS.CALTECH.EDU (Carl J Lydick) writes:
> Something which is heavily dependent on the math library against which you link
> your program. There ARE a number of different math libraries out there for
> various platforms, you know. Again, this renders the results meaningless.

This benchmark tests the system more so than the hardware. What's the point
in spending lots of money on fast hardware when compilers, operating system,
standard [math] libraries, etc make programs run slowly? Lab generated
benchmarks that don't reflect the actual speed with which programs will run
on a platform are particularly meaningless.

Anders.

Dragon Fly

unread,
Dec 13, 1993, 4:50:20 PM12/13/93
to
Thanks to everybody who presented the
figures from their boxes.

As of today the accumulated results are:

- - - - - - - - Original code - - - - - - - - - - - - - - - - - - - - - -

#include <stdio.h>
#include <math.h>
#include <time.h>
main()
{
double x,y[1000000];
int i;
time_t t;

time(&t);
for (i=0;i<1000000;i++)
{
x=11.0+(33.5*i)*(33.5*i);
y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
}
printf("time=%d\n",time(0)-t);
}
- - - - - - - - - - - - Cut here - - - - - - - - - - - - - - - - - -

Computer Time spent

486DX2-66 EISA/VL 16Mb RAM
running Linux (Slackware 1.1.0).
gcc -O3 -o bench bench.c -lm
Single user 27 sec.

486DX2-66
AMI Enterprise III VL/EISA m/b with 32MB ram
Linux 0.99pl14
gcc 2.4.5.
Standalone machine. 27 sec.

486DX2-66 ISA/VL 16Mb RAM 256K Cache
MS-DOS
MicroWay NDPC 4.30 -n2 -n3 -OLM -exp 25 sec

486DX2-66 ISA/VL 24Mb RAM
running Linux pl99.14, Xfree2.0, fvwm
gcc -O6 compiler.
Multiuser mode/Single User 28 sec.

486-DX50 16MB Opti-Eisa-Chipset
gcc -O2
running Linux 0.99p13r 36 sec.

486DX50 ISA 8Mb RAM, 256K cashe
running Debian Linux 0.81BETA
4 users 59 sec.
single user 54 sec.

486DX-33


64Kb read cache
16 megs memory
Single user, only program running. 53 sec.

486DX-33 59 sec.
running Linux (pl13 kernel)
16MB RAM

486DX-33 ISA 8Mb RAM
running Linux
Single user, but many Windows,
Swapping heavily 94 sec. real, 58 sec. CPU

AMD386DX40 8Mb 64k cache 8149 sec.
OS/2 2.1
gcc -O2 -o bench.cc bench .exe
CPU load 100%, Active task count 9

386/387-40 ISA Clone, 8M, 64k cache 102 sec.
running OS/2 2.1
bcc -O2 yab.c

486DX2-66 ISA/VL 32Mb RAM


running NextStep 3.2
gcc compiler.
Multiple User 32 sec.

486DX2-66 VLB Clone, 16M, 256k cache 45 sec.


running OS/2 2.1
gcc -O2 -m486 yab.c -o yab.exe

486DX2-66 EISA/VL 32 MB RAM,


256k Cache (Gateway 2000)
running SCO 3.2v4.2
cc compiler (single user) 45 sec.
cc compiler (multi user) 47 sec.
gcc compiler (single & multi use) 44 sec.

60 MHz ALR Pentium Evolution 9 sec.


QNX 4.2 with Watcom C v9.5
cc -Otax -o bench bench.c -5 -Wc,-fp5 -N9000k

SUN Sparc-10
SunOS 4.1.3A 30 sec.

SUN Sparc-2 with >= 16 Mb RAM
running SunOS
Single user 69 sec.

SUN Sparc-IPX 74 sec.

SUN-4
running SunOS
Single user 73 sec.

IBM 250 (the new PPC) 13 (no optimization)


AIX 3.2.5/X11R5 10 (optimization on)
64M ram

HP Apollo


running HP-UX 9.0 16 sec.

HP/PA 720 HPUX 9.01 64 Meg RAM 10 sec
HP/PA 735 HPUX 9.01 64 Meg RAM 5 sec

Hp-735 64 MB ram, pretty much idle,
2 users HPUX 9.01 3.9 sec

HP 755, 2Mbyte cache, 766 Mbytes Ram 10 sec.
100 Mhz PA7100 processor
Running HP-UX, 34 users, 0.2 load average
HP C compiler: cc +O3 yab.c -o yab.out -lm

SGI 4D/35TG (MIPS R3000 based) 48Mb RAM 21 sec.
running Irix 4.0.5C
Single user

SGI Onyx/4 (4xR4400/150 MIPS CPUS) 10 sec.
128 Mb RAM
Single user

SGI Indigo, 32 Mb RAM
running IRIX 4.0.5.
multiuser but idle
cc -O2 bench.c -o bench -lm 10 sec.

SGI - MIPS 4000-100 64 MB RAM


running IRIX Release 4.0.5H
cc compile (multi user) 11 sec.

DEC VAXstation 3100 M76
16M RAM
Running VMS 5.5-2
DECWindows Motif
Single User with 8 process 262 sec.

- - - - - - from another correspondent - - - - - - - - -


I had to make the declaration of y global to prevent a segmentation
violation on the DEC Alpha I ran it on.

DEC 3000 Model 500 6.7 s (avg. of 10 runs)
DEC OSF/1 1.3
Multi-user mode, one user logged in
cc -O3 -o bench bench.c -lm -non_shared

Serge

Matthew Dillon

unread,
Dec 13, 1993, 11:16:39 PM12/13/93
to

Computer Time spent


486DX2-66
AMI Enterprise III VL/EISA m/b with 32MB ram
Linux 0.99pl14
gcc 2.4.5.
Standalone machine. 27 sec.

486DX2-66 ISA/VL 16Mb RAM 256K Cache
MS-DOS
MicroWay NDPC 4.30 -n2 -n3 -OLM -exp 25 sec

486DX50 ISA 8Mb RAM, 256K cashe
running Debian Linux 0.81BETA
4 users 59 sec.
single user 54 sec.

486DX-33
64Kb read cache
16 megs memory
Single user, only program running. 53 sec.

486DX-33 ISA 8Mb RAM


running Linux
Single user, but many Windows,
Swapping heavily 94 sec. real, 58 sec. CPU

486DX2-66 VLB Clone, 16M, 256k cache 45 sec.

SGI 4D/35TG (MIPS R3000 based) 48Mb RAM 21 sec.


running Irix 4.0.5C
Single user

SGI Indigo, 32 Mb RAM
running IRIX 4.0.5.
multiuser but idle
cc -O2 bench.c -o bench -lm 10 sec.

DEC Alpha AXP 150Mhz
OSF1 1.2
Multiuser mode 7 sec

DEC 3000 Model 400
Single user 9 sec.

DECPc AXP 150 (6.6ns pass 2.1 EV4), 32mb RAM
OpenVMS AXP V2-FT3
Single User, DECnet, Motif 11 sec.
Single User, No DECnet, No Motif 10 sec.

DEC 3000-400 (6.6ns pass 2.1 EV4) 128mb RAM
OpenVMS AXP V1.5
Single User, DECnet, Motif 9 sec.

DEC 4000/710 with 256MB of memory.
DEC OSF/1 1.3 12 users, load avg 1.0
cc -O3 viz.c -lm -non_shared 6 sec.

HP Apollo
running HP-UX 9.0 16 sec.

HP/PA 720 HPUX 9.01 64 Meg RAM 10 sec
HP/PA 735 HPUX 9.01 64 Meg RAM 5 sec

Hp-735 64 MB ram, pretty much idle,
2 users HPUX 9.01 3.9 sec


My contribution:


HP-UX A.09.01 E 9000/755 4.0 sec
several users but unloaded.
average over 10 iterations


All I can say is, HP's are not slow. I'm not sure how the HP735
beat moriah out, though.

-Matt


Matthew Dillon dil...@apollo.west.oic.com
1005 Apollo Way
Incline Village, NV. 89451 ham: KC6LVW (no mail drop)
USA Sandel-Avery Engineering (702)831-8000
[always include a portion of the original email in any response!]

--

Matthew Dillon dil...@apollo.west.oic.com
1005 Apollo Way
Incline Village, NV. 89451 ham: KC6LVW (no mail drop)
USA Sandel-Avery Engineering (702)831-8000
[always include a portion of the original email in any response!]

Matthew Dillon

unread,
Dec 13, 1993, 11:22:36 PM12/13/93
to
: Another problem that I can see is that the code is small enough
: to fit in cache, which will easily skew the results. That would
: explain why the poor chap with the 386 might have taken a few
: minutes to run, if he didn't have cache.
:
:-------------------------------------------------------------------------------
: Dan Mattrazzo
: dcm...@ritvax.isc.rit.edu

That's a problem? What are caches for?

A lot of code these days fits into machine caches, machine caches these
days are a lot larger then they were just two years ago.

People, stop complaining about how unfair the test is, what do you expect
out of a little 10 line program ?

-Matt

Matthew Dillon dil...@apollo.west.oic.com
1005 Apollo Way
Incline Village, NV. 89451 ham: KC6LVW (no mail drop)
USA Sandel-Avery Engineering (702)831-8000
[always include a portion of the original email in any response!]

#begin lite humor

Greg Bothe

unread,
Dec 13, 1993, 10:55:47 PM12/13/93
to

> >I had to make the declaration of y global to prevent a segmentation
> >violation on the DEC Alpha I ran it on.

> 60 MHz ALR Pentium Evolution 9 sec.


> QNX 4.2 with Watcom C v9.5
> cc -Otax -o bench bench.c -5 -Wc,-fp5 -N9000k

> The -5 and -Wc,-fp5 enable the Pentium-specific optimizations in the
> Watcom compiler.

On A DEC Alpha AXP 4000/610, 128 MB RAM, 200 mips, I got 9 sec.

cc -O2 -o bench bench.c -lm

Arne Vajhøj

unread,
Dec 12, 1993, 8:15:34 AM12/12/93
to
> OK, I'm appending the new "benchmarks" below.

Please note that:
- results from machines with less than 12-16 MB RAM are
almost useless (because the program needs 8 MB for data)
- results from machines with no floating point instruction
(emulation) are almost useless (because it is only FP
calculations you use)
- results from machines with a multitasking OS and other active
processes are doubtfull (because you measure real-time not CPU-time)

So ignore all results violating the above conditions. The rest can
only be used as a benchmark for some very special mathematical
programs since most of the time is probably spent in COS/SIN/EXP/SQRT,
so what you are actual benchmarking are the mathematic libraries
(which BTW IMHO also is an interesting topic !!!!).

Anatoly....@kamaz.kazan.su

unread,
Dec 14, 1993, 10:05:23 AM12/14/93
to
In
comp.sys.ibm.pc.hardware,comp.os.vms,comp.os.linux.misc,comp.benchmarks,relcom.talk,relcom.fido.su.general
article <1993Dec13.2...@pacific.mps.ohio-state.edu> Dragon
Fly writes:

>Thanks to everybody who presented the
>figures from their boxes.
>As of today the accumulated results are:

(about how to use multiuser computers in terms of stupid calculator,
and, of course, his PC with pseudoUNIX is best in such a case:)

May be his poor mother dropped him from 10'th floor,
so he still want to show others that he anyway is SuperMan. :)
What is the English for "Complex Nepolnotsennosti"?

Does it mean inferiority complex?


[stupid results deleted]

> Serge

g...@waikato.ac.nz

unread,
Dec 14, 1993, 6:37:56 PM12/14/93
to

As the person who achieved these results I now wish to add to them

Sma e machine (New motherboard)

AMD486DX40 8Mb 256k Cache 51 sec.


OS/2 2.1
gcc -O2 -o bench.cc bench.exe

CPU load 43%, Active task count 13


Then for a bit of excitment
AMD486DX40+10 (ie overclocked to 50, with a heatsink and fan)
AMD486DX50 8 Mb 256k Cache 41 sec (Linear aye)


OS/2 2.1
gcc -O2 -o bench.cc bench.exe

CPU load 32%, Active task count 13

Tonight I will recompile with 486 optmisations on and see if there is any
difference (Note this means I didn't have them before)

Arne Vajhøj

unread,
Dec 13, 1993, 5:38:45 AM12/13/93
to
> >- It is using the time(2) function... check your man pages to
> > see what that means...
>
>
> Yes, indeed . This means that it measures the ACTUAL time the program takes.
> This is what ACTUALLY MATTERS to the user.

> Both types of benchmarks are useful. But the ACTUAL clock times
> are, in truth, more important. If a computer does well
> on SPEC and poorly on this, it is telling you something ... mainly
> that SPEC is not really truly going to tell you how long your
> program will take to run, because it does not normally measure
> actual clock time. Also, this benchmark will generate a BIG difference
> in numbers on similar hardware depending on how that hardware is used.
> Knowing the RANGE is a VERY important fact. OF course, the actual
> code in SPEC could be changed to give the same information .. and THAT
> would, I argue, be an excellent, if for each machine a histogram
> of such times was published.

No no no !

The facts are:
- users are interested in actual wall time
- the CPU usage time are the best measure for that

The first one is obvious (users are users and wil always be that). The
second deserves a little explanation.

This list is INFO-VAX implying that most readers uses VMS boxes (hey what
about a name-change to INFO-VMS ?). On PC's and work-stations (without
to many system processes/daemons hanging around), then real time and
CPU time are quite correlated and the difference between identical
hardware depends on the OS and the libraries etc.. But on real
multi-user systems the real-time depends heavily on the load. And
most VMS boxes are real multi-user systems.

If you are told that a program ran at 25 sec. real time on f.ex. a VAX4100,
then you do not have any ideas at all about how long it will take to
run the same program on your VAX4100, because the load may be different.
And you can not measure the load in the number of users, because users
may be doing different things.

If you are told that a program ran at 25 CPU sec. on f.ex. a VAX4100,
then you can estimate quite good how fast it will run on your
VAX4100, because you know (or can easily find out) how the CPU-time/real-time
ratio are on your machine !

Thomas Haywood

unread,
Dec 14, 1993, 7:09:44 PM12/14/93
to

486-DX2 50MHz 8Mb RAM 256Kb cache
gcc compiler
1 user (not single mode) -> 37s user 35.57s actual

In case this seems unbelievable (I couldn't believe it myself)
This what I did and the output.

redknobs:/root# gcc -O6 -m486 -o bench bench.c -lm
redknobs:/root# time bench
time=37
35.57user 0.53system 0:37.46elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+0minor)pagefaults 0swaps
redknobs:/root#

It was swapping, cos I could hear it, nothing else was being done
while this was happening though.

I modified it to work under borlandc compiler (it won't let you more than 64k)
I changed the array to size 5000 and added a for i:1..200
It took 67 seconds.

Dragon Fly

unread,
Dec 14, 1993, 8:40:35 PM12/14/93
to
Some new terrifying results from CRAY, IBM, HP, and AMD boxes
were added/corrected.

- - - - - - - - Original code - - - - - - - - - - - - - - - - - - - - - -
#include <stdio.h>
#include <math.h>
#include <time.h>
main()
{
double x,y[1000000];
int i;
time_t t;

time(&t);
for (i=0;i<1000000;i++)
{
x=11.0+(33.5*i)*(33.5*i);
y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
}
printf("time=%d\n",time(0)-t);
}
- - - - - - - - - - - - Cut here - - - - - - - - - - - - - - - - - -

Computer Time spent

60 MHz ALR Pentium Evolution 9 sec.
QNX 4.2 with Watcom C v9.5
cc -Otax -o bench bench.c -5 -Wc,-fp5 -N9000k

486DX2-66 EISA/VL 16Mb RAM


running Linux (Slackware 1.1.0).
gcc -O3 -o bench bench.c -lm

Single user in multiuser mode
(9 processes altogether) 27 sec.



486DX2-66
AMI Enterprise III VL/EISA m/b with 32MB ram
Linux 0.99pl14
gcc 2.4.5.
Standalone machine. 27 sec.

486DX2-66 ISA/VL 16Mb RAM 256K Cache
MS-DOS
MicroWay NDPC 4.30 -n2 -n3 -OLM -exp 25 sec

486DX2-66 ISA/VL 24Mb RAM
running Linux pl99.14, Xfree2.0, fvwm
gcc -O6 compiler.
Multiuser mode/Single User 28 sec.

486DX2-66 ISA/VL 32Mb RAM


running NextStep 3.2
gcc compiler.
Multiple User 32 sec.

486DX2-66 VLB Clone, 16M, 256k cache 45 sec.
running OS/2 2.1
gcc -O2 -m486 yab.c -o yab.exe

486DX2-66 EISA/VL 32 MB RAM,
256k Cache (Gateway 2000)
running SCO 3.2v4.2
cc compiler (single user) 45 sec.
cc compiler (multi user) 47 sec.
gcc compiler (single & multi use) 44 sec.

486DX-50 16MB Opti-Eisa-Chipset


gcc -O2
running Linux 0.99p13r 36 sec.

486DX2-50MHz 8Mb RAM 256Kb cache
gcc compiler
1 user (not single mode) 36 sec.

486DX-50 ISA 8Mb RAM, 256K cashe


running Debian Linux 0.81BETA
4 users 59 sec.
single user 54 sec.

486DX-33
64Kb read cache
16 megs memory
Single user, only program running. 53 sec.

486DX-33 59 sec.
running Linux (pl13 kernel)
16MB RAM

486DX-33 ISA 8Mb RAM
running Linux
Single user, but many Windows,
Swapping heavily 94 sec. real, 58 sec. CPU

AMD386DX40 8Mb 64k cache 8149 sec.
OS/2 2.1
gcc -O2 -o bench.cc bench .exe
CPU load 100%, Active task count 9

Same machine (New motherboard):

AMD486DX40 8Mb 256k Cache 51 sec.
OS/2 2.1
gcc -O2 -o bench.cc bench.exe
CPU load 43%, Active task count 13

AMD486DX40+10


(ie overclocked to 50,
with a heatsink and fan)

AMD486DX50 8 Mb 256k Cache 41 sec.


OS/2 2.1
gcc -O2 -o bench.cc bench.exe

CPU load 32%, Active task count 13

386/387-40 ISA Clone, 8M, 64k cache 102 sec.


running OS/2 2.1
bcc -O2 yab.c

SUN Sparc-10
SunOS 4.1.3A 30 sec.

SUN Sparc-2 with >= 16 Mb RAM
running SunOS
Single user 69 sec.

SUN Sparc-IPX 74 sec.

SUN-4
running SunOS
Single user 73 sec.

VAX 3100/80
running VMS
Other users, but not much going on 182 sec.

DEC VAX 6630
running VMS 79 sec.

IBM RS6000/model 355
running AIX 3.2.4
xlc -O3 -lm, using time command 7 sec.

IBM RS6000/model 375
running AIX 3.2.4
xlc -O3 -lm, using time command 5 sec.

2 users HPUX 9.01 4 sec.

HP 755, 2Mbyte cache, 766 Mbytes Ram 4 sec.


100 Mhz PA7100 processor
Running HP-UX, 34 users, 0.2 load average
HP C compiler: cc +O3 yab.c -o yab.out -lm

HP-UX A.09.01 E 9000/755 4 sec.


several users but unloaded.
average over 10 iterations

SGI 4D/35TG (MIPS R3000 based) 48Mb RAM 21 sec.


running Irix 4.0.5C
Single user

SGI Onyx/4 (4xR4400/150 MIPS CPUS) 10 sec.
128 Mb RAM
Single user

SGI Indigo, 32 Mb RAM
running IRIX 4.0.5.
multiuser but idle
cc -O2 bench.c -o bench -lm 10 sec.

SGI - MIPS 4000-100 64 MB RAM
running IRIX Release 4.0.5H
cc compile (multi user) 11 sec.

DEC VAXstation 3100 M76
16M RAM
Running VMS 5.5-2
DECWindows Motif
Single User with 8 process 262 sec.

DEC Alpha AXP 150Mhz
OSF1 1.2

Multiuser mode 7 sec.

DEC Alpha AXP 4000/610,
128 MB RAM, 200 mips 9 sec.


cc -O2 -o bench bench.c -lm

DEC 3000 Model 400
Single user 9 sec.

DECPc AXP 150 (6.6ns pass 2.1 EV4), 32mb RAM
OpenVMS AXP V2-FT3
Single User, DECnet, Motif 11 sec.
Single User, No DECnet, No Motif 10 sec.

DEC 3000-400 (6.6ns pass 2.1 EV4) 128mb RAM
OpenVMS AXP V1.5
Single User, DECnet, Motif 9 sec.

DEC 4000/710 with 256MB of memory.
DEC OSF/1 1.3 12 users, load avg 1.0
cc -O3 viz.c -lm -non_shared 6 sec.

Cray Y-MP C90 0.34 sec.

furio ercolessi

unread,
Dec 14, 1993, 10:40:58 PM12/14/93
to
In article <1993Dec15....@pacific.mps.ohio-state.edu>, viz...@mps.ohio-state.edu (Dragon

Fly) writes:
|> - - - - - - - - Original code - - - - - - - - - - - - - - - - - - - - - -
|> #include <stdio.h>
|> #include <math.h>
|> #include <time.h>
|> main()
|> {
|> double x,y[1000000];
|> int i;
|> time_t t;
|>
|> time(&t);
|> for (i=0;i<1000000;i++)
|> {
|> x=11.0+(33.5*i)*(33.5*i);
|> y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
|> }
|> printf("time=%d\n",time(0)-t);
|> }

I am a Fortran programmer and I am not very familiar with C, but
it seems to me that there is nothing to prevent an optimizer
from wiping away all the computations, after having recognized
that no use is made of the results. If I were designing this
benchmark, I would have _at least_ printed the value of a
certain y[i] at the end, with i defined (and computed!) as a
random number between 0 and 999999.

My experience with Fortran compilers is that many
of them happily omit to perform "unuseful" computations, and I
design all my benchmarks accordingly. Returning the results
as subroutine arguments to a caller (which may ignore them) is
usually a good enough trick to ensure that computations are
really carried out, with the current generation of compilers.

--
furio ercolessi .
<fu...@uiuc.edu>,<fu...@sissa.it> / \ (__)
materials research laboratory / U \ :::::::: (oo)
university of illinois o o / o f \ :::::::: \/-----\ o o o
at urbana-champaign | | /| I |\ :::::::: ||___| * | | |
_____________________________|_|___| |____::::::::_____|| \\______|_|_|___
/ / / / / / / / / / / / / / / /
corn campus connection cow
machine

Brian Tillman

unread,
Dec 14, 1993, 4:33:02 AM12/14/93
to
In a previous article, viz...@mps.ohio-state.edu (Dragon Fly) wrote:
> Notwithstanding possible critique from alleged
>computer specialists the insightful observer might note
>that the "benchmark" code is pretty typical for scientific
>calculations. Whatever other merits the system might have,
>if it's dragging its feet on this test it means the system
>from the point of view of consumer [insightful observer] is
>a crap. As many insightful observers probably have already
>noticed, the crap is being limited mainly to two mainstreams:
>SUN Sparcs and DECs running VMS.
>
Your protests notwithstanding, conditions on any of your "fast" machines can
make your benchmark run more slowly than the slowest number you've received so
far.

[chomp]


>
> DEC 3000 Model 400
> Single user 9 sec.
>
> DECPc AXP 150 (6.6ns pass 2.1 EV4), 32mb RAM
> OpenVMS AXP V2-FT3
> Single User, DECnet, Motif 11 sec.
> Single User, No DECnet, No Motif 10 sec.
>
> DEC 3000-400 (6.6ns pass 2.1 EV4) 128mb RAM
> OpenVMS AXP V1.5
> Single User, DECnet, Motif 9 sec.
>

DEC boxes running VMS are crap? Seems to me that DEC boxes running VMS are
among your *best* performers, not your worst.

-----------------------------+--------------------------------
Brian Tillman | Internet: til...@swdev.si.com
Smiths Industries, Inc. | tillma...@si.com
4141 Eastern Ave., MS129 | Hey, I said this stuff myself.
Grand Rapids, MI 49518-8727 | My company has no part in it.
-----------------------------+--------------------------------

Jan Christiaan van Winkel

unread,
Dec 15, 1993, 4:03:56 AM12/15/93
to
In <CHqJn...@world.std.com> Par...@world.std.com (Ryan B Gran) writes:

>>>>486DX2-66 EISA/VL 16Mb RAM
>>>>running Linux (Slackware 1.1.0).

>>>>gcc compiler.
>>>>Single user 27 sec.


>>>>
>>>>SUN Sparc-2 with >= 16 Mb RAM
>>>>running SunOS
>>>>Single user 69 sec.

>486DX2-66 EISA/VL 32 MB RAM, 256k Cache (Gateway 2000)


>running SCO 3.2v4.2
>cc compiler (single user) 45 sec.
>cc compiler (multi user) 47 sec.
>gcc compiler (single & multi use) 44 sec.

Giving me the impression that the standard C library of Linux is very good
and that the library of SCO iiiisss sssllooww

JC
--
___ __ ____________________________________________________________________
|/ \ Jan Christiaan van Winkel j...@sci.kun.nl
| Alternative e-mail addresses: j...@oreo.atcmp.nl and j...@atcmp.nl
__/ \__/ ____________________________________________________________________

Jerzy Michal Pawlak

unread,
Dec 16, 1993, 5:51:01 AM12/16/93
to
Well spotted Furio! I have a new entry in this stupid contest:

MicroVAX II, 16 MB, VAX/VMS 5.5-1
VAX FORTRAN v. 5.8 (I know, I should upgrade...)
6 users, (av. CPU load 10%)
time =0.04 s (average of 10 runs)

Hahahahahaha.. I have the fastest machine in the world! All you have to do
is to recode a bit:

DOUBLE PRECISION x,y(1000000)
t = SECNDS(0.0)
DO 1 i=1,1000000
x=11.0+(33.5*i)*(33.5*i)
y(i)=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)))
1 CONTINUE
t = SECNDS(t)
PRINT *,'Time=",t
END
--
Michal (paw...@zeubac.desy.de)

Tim Llewellyn

unread,
Dec 16, 1993, 1:46:19 PM12/16/93
to

In article <CI4JH...@dscomsa.desy.de>, paw...@zeubac.desy.de (Jerzy Michal

Yeah, this FORTRAN program runs apprx 20 times faster on our VAX 4100
with VMS 5.5-2 and DEC Fortran 6.0 than our AXP 3000/400 (VMS 1.5)! A quick
investigation reveals that on the AXP the process does page thru 8 megs
of virtual memory, not on the VAX though. I append the VAX Fortran listing.
As far as my limited (and rusty) macro allows me to understand, the optimized
fortran does all the calculations in the processor registers and never writes
them out to virtual memory (and hence disk).

It seems the VAX Fortran compiler is "better" (at spotting such optimizations
in trivial programs) than the VAXC and AXP fortran
compiler. Heres my results with bench.for, built "vanilla" (ie fortran bench,
link bench) for various nodes in our cluster.

VAX 4100, 128MB Ram, many users and full batch load.
Cluster boot node and disk server. Time= 0.4609375
Page faults 123
VAXStation 3100 M76, no users (just this job in batch).
32 mb ram, local paging disk Time= 1.804688
Page faults 152

MicroVax II, 10 MB ram. 2 users logged in not doing much Time= 18.40625
Page Faults 135

AXP 3000/400, 128 MB Ram, VMS 1.5, several Xsessions running
and 100% compute bound batch load. Time= 10.99219
Page Faults 1056

Note that on AXP system pages are larger (8kbyte) that on VAX (512 byte).
The AXP program wades thru the large array in virtual memory, paging to
disk when necessary. The VAX version just calculates in its internal
registers! (why doesn't the VAX prog just do a null operation).

Michal, I don't believe your figure for your MicroVax II though :-). Unless the
FORTRAN 5.8 compiler is even better at optimising this garbage than 6.0

Finally, I and others have pointed out the many fallacies in the original
benchmark, I think this finally shows that its results are pretty meaningless,
unless lots of other preconditions (ie the compiler MUST store the results
in virtual memory or equivalent (this is what the original poster
WANTS to measure, I think)) are also specified.

Also, that the VAX Fortran compiler is pretty hot.

--
-----------------------------------------------------+---------------+
Tim Llewellyn - OpenVMS, Soukous and Cricket Addict | Read at your |
Physicist Programmer, Bristol Uni Particle Physics. | own risk. |
HEPNET/SPAN 19716::TJL Internet t...@siva.bris.ac.uk | Std disclaimer|
Pet Hates: Case Sensitivity! Unix. Tremolo systems. | implicit |
-----------------------------------------------------+---------------+


16-Dec-1993
17:56:44 DEC Fortran V6.0-1 Page 1
16-Dec-1993
17:45:48 DISK$USERS_2:[TJL]BENCH.FOR;3

00001 DOUBLE PRECISION x,y(1000000)
00002 t = SECNDS(0.0)
00003 DO 1 i=1,1000000
00004 x=11.0+(33.5*i)*(33.5*i)
00005 y(i)=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)))
00006 1 CONTINUE
00007 t = SECNDS(t)
00008 PRINT *,'Time=',t
00009 END

BENCH$MAIN 16-Dec-1993
17:56:44 DEC Fortran V6.0-1 Page 2
01 16-Dec-1993
17:45:48 DISK$USERS_2:[TJL]BENCH.FOR;3

.TITLE BENCH$MAIN
.IDENT 01

0000 .PSECT $PDATA
0000 .LONG ^X00000000
0004 .XBYTE 54,69,6D,65,3D

0004 .PSECT $LOCAL
0004 .LONG ^X00000001
0008 .ADDR $PDATA
000C .LONG ^X00000001
0010 .ADDR T
0014 .LONG ^X010E0005
0018 .ADDR $PDATA+^X4

0000 .PSECT $CODE
; 00001
0000 BENCH$MAIN::
0000 .WORD ^M<IV,R2,R3,R11>
0002 MOVAL $LOCAL, R11
; 00002
0009 CALLG $LOCAL+^X4(R11), FOR$SECNDS
0011 MOVL R0, T(R11)
; 00003
0014 MOVL #1, I
0017 NOP
0018 L$1:
; 00004
0018 CVTLF I, R12
001B MULF2 #^X4306, R12
0022 MULF2 R12, R12
0025 ADDF2 #^X23, R12
0028 CVTFD R12, R2
; 00006
002B AOBLEQ #1000000, I, L$1
; 00007
0033 CALLG $LOCAL+^XC(R11), FOR$SECNDS
003B MOVL R0, T
; 00008
003E MNEGL #1, -(SP)
0041 CALLS #1, FOR$WRITE_SL
0048 PUSHAB $LOCAL+^X14(R11)
004B CALLS #1, FOR$IO_T_DS
0052 PUSHL T
0054 CALLS #1, FOR$IO_F_V
005B CALLS #0, FOR$IO_END
; 00009
0062 MOVL #1, R0
0065 RET
.END

BENCH$MAIN 16-Dec-1993
17:56:44 DEC Fortran V6.0-1 Page 3
01 16-Dec-1993
17:45:48 DISK$USERS_2:[TJL]BENCH.FOR;3

PROGRAM SECTIONS

Name Bytes Attributes

0 $CODE 102 PIC CON REL LCL SHR EXE
RD NOWRT QUAD
1 $PDATA 9 PIC CON REL LCL SHR NOEXE
RD NOWRT QUAD
2 $LOCAL 28 PIC CON REL LCL NOSHR NOEXE
RD WRT QUAD

Total Space Allocated 139


ENTRY POINTS

Address Type Name

0-00000000 BENCH$MAIN


VARIABLES

Address Type Name Address Type Name Address
Type Name

** I*4 I 2-00000000 R*4 T **
R*8 X


ARRAYS

Address Type Name Bytes Dimensions

** R*8 Y 8000000 (1000000)


LABELS

Address Label

** 1


FUNCTIONS AND SUBROUTINES REFERENCED

Type Name Type Name Type Name Type Name
Type Name Type Name

R*4 COS R*8 EXP R*8 LOG R*4 SECNDS
R*4 SIN R*8 SQRT


COMMAND QUALIFIERS

FOR BENCH/EXT/MAC/LIS

/ASSUME=(ACCURACY_SENSITIVE,NODUMMY_ALIASES)
/BLAS=(INLINE,MAPPED)
/CHECK=(NOALIGNMENT,NOASSERTIONS,NOBOUNDS,OVERFLOW,NOUNDERFLOW)
/DEBUG=(NOSYMBOLS,TRACEBACK)
/DESIGN=(NOCOMMENTS,NOPLACEHOLDERS)
/DIRECTIVES=(DEPENDENCE)
/MATH_LIBRARY=(ACCURATE,NOV5)
/PARALLEL=(NOAUTOMATIC,NOMANUAL)

/SHOW=(NODATA_DEPENDENCES,NODICTIONARY,NOINCLUDE,NOLOOPS,MAP,NOPREPROCESSOR,SING
GLE)
/STANDARD=(NOMIA,NOSEMANTIC,NOSOURCE_FORM,NOSYNTAX)

/WARNINGS=(NOALIGNMENT,NOAlpha_AXP,NODECLARATIONS,GENERAL,NOINLINE,NOTRUNCATED_S
SOURCE,NOULTRIX,NOVAXELN)
/CONVERT=NATIVE /NOCROSS_REFERENCE /NOD_LINES /ERROR_LIMIT=30
/EXTEND_SOURCE
/F77 /NOG_FLOATING /I4 /MACHINE_CODE /OPTIMIZE=LEVEL=3
/NORECURSIVE /NOSYNCHRONOUS_EXCEPTIONS /TERMINAL=NOSTATISTICS /NOVECTOR
/NOANALYSIS_DATA
/NODIAGNOSTICS
/LIST=DISK$USERS_2:[TJL]BENCH.LIS;1
/OBJECT=DISK$USERS_2:[TJL]BENCH.OBJ;6

BENCH$MAIN 16-Dec-1993
17:56:44 DEC Fortran V6.0-1 Page 4
01 16-Dec-1993
17:45:48 DISK$USERS_2:[TJL]BENCH.FOR;3

COMPILATION STATISTICS

Run Time: 0.14 seconds
Elapsed Time: 0.81 seconds
Page Faults: 404
Dynamic Memory: 512 pages


--
-----------------------------------------------------+---------------+
Tim Llewellyn - OpenVMS, Soukous and Cricket Addict | Read at your |
Physicist Programmer, Bristol Uni Particle Physics. | own risk. |
HEPNET/SPAN 19716::TJL Internet t...@siva.bris.ac.uk | Std disclaimer|
Pet Hates: Case Sensitivity! Unix. Tremolo systems. | implicit |
-----------------------------------------------------+---------------+

Tim Llewellyn in Bristol. (0272) 303030 ext 3691.

unread,
Dec 16, 1993, 2:20:23 PM12/16/93
to
[snip]

I should have waited till I had tried VAX Fortran V5.8. Unfortunately I don't
use that system much and my diskquota had been reduced below usage so I had to
clean up a bit. Here is result for:

VAXStation 3100 M30 16MB Ram Motif session (7 processes) Time= 2.3437500E-02
Page Faults 99

Now I copy the V5_8 .EXE to our cluster, and it runs in Time= 2.3437500E-02
on the fully loaded 4100 !
I append the generated assembler. Note here that the compiler doesn't even bother
to execute the loop (the V6 compiler executes the loop but never moves the data
from processor register to memory).

>Finally, I and others have pointed out the many fallacies in the original
>benchmark, I think this finally shows that its results are pretty meaningless,
>unless lots of other preconditions (ie the compiler MUST store the results
>in virtual memory or equivalent (this is what the original poster
>WANTS to measure, I think)) are also specified.
>
>Also, that the VAX Fortran compiler is pretty hot.
>
>--

Sorry couln't restist :-)


>-----------------------------------------------------+---------------+
>Tim Llewellyn - OpenVMS, Soukous and Cricket Addict | Read at your |
>Physicist Programmer, Bristol Uni Particle Physics. | own risk. |
>HEPNET/SPAN 19716::TJL Internet t...@siva.bris.ac.uk | Std disclaimer|
>Pet Hates: Case Sensitivity! Unix. Tremolo systems. | implicit |
>-----------------------------------------------------+---------------+
>

00001 DOUBLE PRECISION x,y(1000000)
00002 t = SECNDS(0.0)
00003 DO 1 i=1,1000000
00004 x=11.0+(33.5*i)*(33.5*i)
00005 y(i)=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)))
00006 1 CONTINUE
00007 t = SECNDS(t)
00008 PRINT *,'Time=',t
00009 END

BENCH$MAIN 16-Dec-1993 20:07:45 VAX FORTRAN V5.8-155 Page 2
01 16-Dec-1993 17:45:48 DISK$ONLINE:[TJL]BENCH.FOR;3

.TITLE BENCH$MAIN
.IDENT 01

0000 .PSECT $PDATA
0000 .LONG ^X00000000
0004 .XBYTE 54,69,6D,65,3D

007A1204 .PSECT $LOCAL
007A1204 .LONG ^X00000001
007A1208 .ADDR $PDATA
007A120C .LONG ^X00000001
007A1210 .ADDR T
007A1214 .LONG ^X010E0005
007A1218 .ADDR $PDATA+^X4

0000 .PSECT $CODE
; 00001
0000 BENCH$MAIN::

0000 .WORD ^M<IV,R11>
0002 MOVAL $LOCAL+^X7A1200, R11
; 00002
0009 CALLG $LOCAL+^X7A1204(R11), FOR$SECNDS


0011 MOVL R0, T(R11)
; 00003

0014 L$1:
; 00007
0014 CALLG $LOCAL+^X7A120C(R11), FOR$SECNDS
001C MOVL R0, R12
; 00008
001F MNEGL #1, -(SP)
0022 CALLS #1, FOR$WRITE_SL
0029 PUSHAB $LOCAL+^X7A1214(R11)
002C CALLS #1, FOR$IO_T_DS
0033 PUSHL R12
0035 CALLS #1, FOR$IO_F_V
003C CALLS #0, FOR$IO_END
; 00009
0043 MOVL #1, R0
0046 RET
.END

BENCH$MAIN 16-Dec-1993 20:07:45 VAX FORTRAN V5.8-155 Page 3
01 16-Dec-1993 17:45:48 DISK$ONLINE:[TJL]BENCH.FOR;3

PROGRAM SECTIONS

Name Bytes Attributes

0 $CODE 71 PIC CON REL LCL SHR EXE RD NOWRT QUAD


1 $PDATA 9 PIC CON REL LCL SHR NOEXE RD NOWRT QUAD

2 $LOCAL 8000028 PIC CON REL LCL NOSHR NOEXE RD WRT QUAD

Total Space Allocated 8000108


ENTRY POINTS

Address Type Name

0-00000000 BENCH$MAIN


VARIABLES

Address Type Name Address Type Name Address Type Name

** I*4 I 2-007A1200 R*4 T ** R*8 X


ARRAYS

Address Type Name Bytes Dimensions

2-00000000 R*8 Y 8000000 (1000000)


LABELS

Address Label

** 1


FUNCTIONS AND SUBROUTINES REFERENCED

Type Name Type Name Type Name Type Name Type Name Type Name

R*4 COS R*8 EXP R*8 LOG R*4 SECNDS R*4 SIN R*8 SQRT

BENCH$MAIN 16-Dec-1993 20:07:45 VAX FORTRAN V5.8-155 Page 4
01 16-Dec-1993 17:45:48 DISK$ONLINE:[TJL]BENCH.FOR;3

COMMAND QUALIFIERS

FORTR BENCH/MAC/LIS

/CHECK=(NOBOUNDS,OVERFLOW,NOUNDERFLOW)
/DEBUG=(NOSYMBOLS,TRACEBACK)
/DESIGN=(NOCOMMENTS,NOPLACEHOLDERS)
/SHOW=(NODICTIONARY,NOINCLUDE,MAP,NOPREPROCESSOR,SINGLE)
/STANDARD=(NOSEMANTIC,NOSOURCE_FORM,NOSYNTAX)
/WARNINGS=(NODECLARATIONS,GENERAL,NOULTRIX,NOVAXELN)
/CONTINUATIONS=19 /NOCROSS_REFERENCE /NOD_LINES /NOEXTEND_SOURCE
/F77 /NOG_FLOATING /I4 /MACHINE_CODE /OPTIMIZE /NOPARALLEL
/NOANALYSIS_DATA
/NODIAGNOSTICS
/LIST=DISK$ONLINE:[TJL]BENCH.LIS;1
/OBJECT=DISK$ONLINE:[TJL]BENCH.OBJ;3


COMPILATION STATISTICS

Run Time: 0.52 seconds
Elapsed Time: 1.45 seconds
Page Faults: 598
Dynamic Memory: 472 pages

Manuel Eduardo Correia

unread,
Dec 17, 1993, 5:25:20 AM12/17/93
to

>> On A DEC Alpha AXP 4000/610, 128 MB RAM, 200 mips, I got 9 sec.
>>
>> cc -O2 -o bench bench.c -lm

What is really sad is that on a Sun SPARCCenter 2000 with 8
processors running Solaris 2.3 it takes 70 sec ( 20 sec worse then if
the program is run on Solaris 2.2 ). These guys must be joking !!!

I have used gcc -O3 -o bench bench.c -lm

Manuel Correia

--
===============================================================================
Manuel Eduardo C. D. Correia (Phd. Student)
===============================================================================
Centro de Informatica da Universidade do Porto (CIUP),
Rua do Campo Alegre, 823, 4100 Porto, PORTUGAL
Tel: (351-02) 600 1672, Ext: 113, Fax: (351-02) 600 3654,
Internet: m...@ciup1.ncc.up.pt
===============================================================================

Vitali X 6290

unread,
Dec 17, 1993, 5:23:00 PM12/17/93
to
In article <CI55H...@info.bris.ac.uk>, t...@bristol.ac.uk writes...

Hi, there
I modified the code to :
a) get rid of elapsed time - use cpu instead
b) remove (partly) page swapping - use smaller array
c) protect from too clever compiler - a dummy write string
d) assure real *8 operations - put d0 in var

parameter (nsize=200000)
DOUBLE PRECISION x,y(nsize)
tt=0
do k=1,5
t = cputime(d)
DO 1 i=1,nsize
x=11.0d0+(33.5d0*i)*(33.005d0*i)
y(i)=(sin(3.1d0*i)+
+ cos(5.1d0*i))*sqrt(x+exp(3.14d0*log(x+i)))
if(i.eq.nsize+1)write(*,*)y(i)
1 CONTINUE
tt= tt+cputime(d)-t
PRINT *,'Time=',tt
enddo
END
function cputime(d)
integer*2 jpi$_cputim,buflen
integer*4 ibufadd,rlen,sys$getjpi,ilong,ibuff
data buflen/4/,ilong/0/,jpi$_cputim/1031/
common /itemlist/ buflen,jpi$_cputim,ibufadd,rlen,ilong
equivalence (itmlst,buflen)
ibufadd = %loc(ibuff)
i = sys$getjpi(,,,itmlst,,,)
cputime = ibuff*0.01
if (i .ne. 1) cputime = -100.
return
end

The results are:
vax 9000/410 32.3
dec 3000/400 7.8
dec 3000/500 6.9


Running this code on HP 9000 (with modified cputime)
i got:

750/50 8.3
735/50 5.0 (!)

From above Hp 9000 735/50 over vax 9000/410 is more the 6(!), though
running many of (scientific) appl on both platforms i can say
they are roughly equal. Anyway the results shows that cos sin exp
operation on vax are slower hp ones.
But this is nothing to do with computer performance
indeed if remove the y(i)= blablabla
in the code above the results will be:
vax 9000/410 - 0.12 sec
hp 735/50 - 0.5 sec
so we see now vax is four times faster than hp

Below there is a table of my comparision of comp performances based on
some sort of whetstone program

Source : VXCRNA::DISK$L3:[SHOUTKO]SPEED.FOR

Relative computer performance in VAX 11/780 units for mainly real*4 operations
Name Model

VAX 11/780 1.0 = 1200 Wh.U.
VAX 11/785 3.2
uVAX 3400 2.0
VaxStation 3100/GPX 3.3
VAX 8800 4.7
VAX 6000 3.0
VAX 9000/210 23.8
VAX 9000/410 28.7
VAX 7000 31.9
DEC 3000/400 101.
DEC 3000/500 111.
C-1 3.7
IBM 3090/E 44.5
HPUX 9000 710/720 15.0
HPUX 9000 750 20.0
HPUX 9000 735/50 38.6
Apollo DN 10K 7.2
SUN (PDSF) ??? 5.3
PC Ast Bravo 486/50 4.8


DIMENSION TIMES(3)
INTEGER IMUCH
integer*4 temp
C
COMMON /ff/T,T1,T2,E1(4),J,K,L
COMMON /LUNS/ ICRD,ILPT,IKBD,ITTY
write(*,*)'numiter'
read(*,*)numiter
c numiter=10
ww=0.
do jjk=1,numiter
ITTY = 0
IKBD = 0
T=0.499975E00
T1=0.50025E00
T2=2.0E00

C
IMUCH =500
C
C ***** BEGININNING OF TIMED INTERVAL *****
DO 200 ILOOP = 1,3
I = ILOOP * IMUCH
c call timex(tt2)
TIMES(ILOOP) = cputime(dd)
C *******************************************
C
C ***** *****
C
ISAVE=I
N1=0
N2=12*I
N3=14*I
N4=345*I
N5=0
N6=210*I
N7=32*I
N8=899*I
N9=616*I
N10=0
N11=93*I
N12=0
X1=1.0E0
X2=-1.0E0
X3=-1.0E0
X4=-1.0E0
IF(N1)19,19,11
11 DO 18 I=1,N1,1
X1=(X1+X2+X3-X4)*T
X2=(X1+X2-X3+X4)*T
X4=(-X1+X2+X3+X4)*T
X3=(X1-X2+X3+X4)*T
18 CONTINUE
19 CONTINUE
CALL POUT(N1,N1,N1,X1,X2,X3,X4)
E1(1)=1.0E0
E1(2)=-1.0E0
E1(3)=-1.0E0
E1(4)=-1.0E0
IF(N2)29,29,21
21 DO 28 I=1,N2,1
E1(1)=(E1(1)+E1(2)+E1(3)-E1(4))*T
E1(2)=(E1(1)+E1(2)-E1(3)+E1(4))*T
E1(3)=(E1(1)-E1(2)+E1(3)+E1(4))*T
E1(4)=(-E1(1)+E1(2)+E1(3)+E1(4))*T
28 CONTINUE
29 CONTINUE
CALL POUT(N2,N3,N2,E1(1),E1(2),E1(3),E1(4))
IF(N3)39,39,31
31 DO 38 I=1,N3,1
38 CALL PA(E1)
39 CONTINUE
CALL POUT(N3,N2,N2,E1(1),E1(2),E1(3),E1(4))
J=1
IF(N4)49,49,41
41 DO 48 I=1,N4,1
IF(J-1)43,42,43
42 J=2
GOTO44
43 J=3
44 IF(J-2)46,46,45
45 J=0
GOTO47
46 J=1
47 IF(J-1)411,412,412
411 J=1
GOTO48
412 J=0
48 CONTINUE
49 CONTINUE
CALL POUT(N4,J,J,X1,X2,X3,X4)
J=1
K=2
L=3
IF(N6)69,69,61
61 DO 68 I=1,N6,1
J=J*(K-J)*(L-K)
K=L*K-(L-J)*K
L=(L-K)*(K+J)
E1(L-1)=J+K+L
E1(K-1)=J*K*L
68 CONTINUE
69 CONTINUE
CALL POUT(N6,J,K,E1(1),E1(2),E1(3),E1(4))
X=0.5E0
Y=0.5E0
IF(N7)79,79,71
71 DO 78 I=1,N7,1
X=T*ATAN(T2*SIN(X)*COS(X)/(COS(X+Y)+COS(X-Y)-1.0E0))
Y=T*ATAN(T2*SIN(Y)*COS(Y)/(COS(X+Y)+COS(X-Y)-1.0E0))
78 CONTINUE
79 CONTINUE
CALL POUT(N7,J,K,X,X,Y,Y)
X=1.0E0
Y=1.0E0
Z=1.0E0
IF(N8)89,89,81
81 DO 88 I=1,N8,1
88 CALL P3(X,Y,Z)
89 CONTINUE
CALL POUT(N8,J,K,X,Y,Z,Z)
J=1
K=2
L=3
E1(1)=1.0E0
E1(2)=2.0E0
E1(3)=3.0E0
IF(N9)99,99,91
91 DO 98 I=1,N9,1
98 CALL P0
99 CONTINUE
CALL POUT(N9,J,K,E1(1),E1(2),E1(3),E1(4))
J=2
K=3
IF(N10)109,109,101
101 DO 108 I=1,N10,1
J=J+K
K=J+K
J=J-K
K=K-J-J
108 CONTINUE
109 CONTINUE
CALL POUT(N10,J,K,X1,X2,X3,X4)
X=0.75E0
IF(N11)119,119,111
111 DO 118 I=1,N11,1
118 X=SQRT(EXP(LOG(X)/T1))
119 CONTINUE
CALL POUT(N11,J,K,X,X,X,X)
C
C ***** END OF TIMED INTERVAL *****
C200 TIMES(ILOOP)=SECNDS(TIMES(ILOOP))
c call timex(tt2)
200 TIMES(ILOOP)=cputime(dd)-TIMES(ILOOP)
C
C WHET. IPS = 1000/(TIME FOR 10 ITERATIONS OF PROGRAM LOOP)
WHETS = (10000.0 * FLOAT(IMUCH)/100.0)/(TIMES(3)-TIMES(2))
c WRITE (*,201) WHETS
201 FORMAT(' SPEED IS: ',1PE10.3,' THOUSAND WHETSTONE',
2 ' ONE PRECISION INSTRUCTIONS PER SECOND')
c WRITE (*,*) 'Elapsed=',INT((TIMES(3)-TIMES(1))*100),' whetd3h '
write(*,*)'whets',whets
ww=ww+whets
c call hf1(1,whets,1.)
enddo
ww=ww/numiter
write(*,*)'ww = ',ww
c call hprint(0)
c call hstore(0,20)
C
C
END
SUBROUTINE PA(E)
c DOUBLE PRECISION T,T1,T2,E
COMMON /ff/T,T1,T2
DIMENSION E(4)
J=0
1 E(1)=(E(1)+E(2)+E(3)-E(4))*T
E(2)=(E(1)+E(2)-E(3)+E(4))*T
E(3)=(E(1)-E(2)+E(3)+E(4))*T
E(4)=(-E(1)+E(2)+E(3)+E(4))/T2
J=J+1
IF(J-6)1,2,2
2 CONTINUE
RETURN
END
SUBROUTINE P0
c DOUBLE PRECISION T,T1,T2,E1
COMMON /ff/T,T1,T2,E1(4),J,K,L
E1(J)=E1(K)
E1(K)=E1(L)
E1(L)=E1(J)
RETURN
END
SUBROUTINE P3(X,Y,Z)
c DOUBLE PRECISION T,T1,T2,X1,Y1,X,Y,Z
COMMON /ff/T,T1,T2
X1=X
Y1=Y
X1=T*(X1+Y1)
Y1=T*(X1+Y1)
Z=(X1+Y1)/T2
RETURN
END
SUBROUTINE POUT(N,J,K,X1,X2,X3,X4)
C
C WRITE STATEMENT COMMENTED OUT TO IMPROVE REPEATABILITY OF TIMINGS
C
c DOUBLE PRECISION X1,X2,X3,X4
C WRITE(2,1)N,J,K,X1,X2,X3,X4
1 FORMAT(' ',3I7,4E12.4)
RETURN
END

K. M. Sherif

unread,
Dec 17, 1993, 5:18:16 PM12/17/93
to
> What is really sad is that on a Sun SPARCCenter 2000 with 8
>processors running Solaris 2.3 it takes 70 sec ( 20 sec worse then if
>the program is run on Solaris 2.2 ). These guys must be joking !!!
>
As many people have pointed out before, these timings are not
meaningful unless run in single user mode (forgetting everything about
the fact that the benchmark may infact be optimised to just a printf).
If you want to compare the response for a user in real life usage, then qualify
the system with the number of users who were using it and the type of programs
they were running at that time.
Also the fact that it is an 8 processor machine is irrelevant to this
benchmark even if run on single user mode, since you are not using a
parallelising compiler

Sherif

Art Kendall

unread,
Dec 16, 1993, 8:40:56 AM12/16/93
to
> Subject: Re: Yet another benchmark results..
> Comments: To: Info...@CRVAX.SRI.COM
> In <1993Dec7.0...@pacific.mps.ohio-state.edu> viz...@mps.ohio-state.edu
> (Dragon Fly) writes:
> >So comments are welcome.

"Benchmarking" can be done for different purposes.
I concur with most of the comments that have been made.
The "benchmarks" cast no light on the platforms per se.
However, IFF the multiuser environments were in a typical
state, the "benchmarks" shed some light on the
installations available to Dragon Fly.


There are other considerations in choosing an installation.

For example, how easy is the editor to use? How powerful?
How easy is it to do the skillion tasks getting ready
to do the actual run?

It is my experience, in twenty-some years of consulting
on statistical computing that person-hours are much more
important in the costs of a project than cpu minutes.

The human part of the cost estimate is VERY difficult to
measure but represents over 90% of the cost.

In the mid-70's I benchmarked a set of installations available
to the Census Bureau.
The rank order of the systems was completely reversed when
looking at total cost vs (cpu cost, machine costs, cpu time).

Screaming IBM mainframes did the final execution of a job the
fastest, but required more person-time.
The KL-10 took more cpu minutes, but cost the least in people
time.
The KL-10 runs cost 2x what the IBM runs did in machine charges.
The IBM run cost 3x what the KL-10 runs in total cost.

Well-done benchmarks can be very useful.
But they are only one consideration in choosing how to do work.

Art Kendall ajk@nihcu
Sr Math Stat
US GAO
Wash., DC 20548
P.S. Does anyone have a bitnet address or phone # for Dr. Dongarra?


Georges Tomazi

unread,
Dec 13, 1993, 8:25:08 PM12/13/93
to
In <1993Dec8.1...@pacific.mps.ohio-state.edu> viz...@pacific.mps.ohio-state.edu (Dragon Fly) writes:

Another one to add in your database:

Encore Multimax MM510 128Mb RAM (6 ns32000 processors)
running UMAX 2.4.1.P3 (Unix System V Release 3)
native C compiler with optimisation (Green Hills)
Same result with two or 100 users (thanks multiprocessors !)

time=270
real=4:31.90
user=4:26.50
sys=5.38

That's it !

Georges

--
Georges A. Tomazi / Internet: tom...@kralizec.zeta.org.au / And
Sydney * Australia / tom...@tctel.frmug.fr.net / God created
+61 2 264 6892 / tom...@smop-oz.frmug.fr.net / Unix...

Steve Thompson

unread,
Dec 18, 1993, 3:45:17 PM12/18/93
to
In article <2ej4lk$7...@kralizec.zeta.org.au>, tom...@kralizec.zeta.org.au (Georges Tomazi) writes:
> Georges A. Tomazi / Internet: tom...@kralizec.zeta.org.au / And
> Sydney * Australia / tom...@tctel.frmug.fr.net / God created
> +61 2 264 6892 / tom...@smop-oz.frmug.fr.net / Unix...
^^^^^^^^^^^

Yeah, but he had a day off to rest, and never got around to finishing the
job. :-)

-steve

---------------------------------------------------------------------------
Steve Thompson, System Mangler Internet: thom...@cheme.cornell.edu
School of Chemical Engineering Bitnet: thompson@crnlchme
Olin Hall, Cornell University Phone: (607) 255 5573
Ithaca NY 14853 FAX: (607) 255 9166
---------------------------------------------------------------------------

Arne Vajhøj

unread,
Dec 17, 1993, 4:41:12 PM12/17/93
to
> |> for (i=0;i<1000000;i++)
> |> {
> |> x=11.0+(33.5*i)*(33.5*i);
> |> y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
> |> }
>
> I am a Fortran programmer and I am not very familiar with C, but
> it seems to me that there is nothing to prevent an optimizer
> from wiping away all the computations, after having recognized
> that no use is made of the results. If I were designing this
> benchmark, I would have _at least_ printed the value of a
> certain y[i] at the end, with i defined (and computed!) as a
> random number between 0 and 999999.
>
> My experience with Fortran compilers is that many
> of them happily omit to perform "unuseful" computations, and I
> design all my benchmarks accordingly. Returning the results
> as subroutine arguments to a caller (which may ignore them) is
> usually a good enough trick to ensure that computations are
> really carried out, with the current generation of compilers.

Yes - VAX FORTRAN is actually good at eliminating such usefull code,
but most other compilers (VAX C, VAX PASCAL etc.) are not !

I once did some benchmarking of very simple numric code and found
that VAX FORTRAN was much better than VAX PASCAL for that very
special kind of programming (code similar to BLAS level 1 routines).

Graham Mainwaring

unread,
Dec 19, 1993, 5:22:19 PM12/19/93
to
ca...@SOL1.GPS.CALTECH.EDU (Carl J Lydick) writes:

> Why is it that folks who put their faith in benchmarks are generally
> clueless?

Because benchmarks are generally meaningless. (Not just this one, all of
them.)

John Francis Lynn

unread,
Dec 20, 1993, 3:51:59 PM12/20/93
to

"There are three kinds of lies: Lies, Damned Lies & Statistics"
..... Disraeli

Substituting "benchmarks" for "statistics" in the above works
just as well :-)

But seriously, haven't we compared apples and oranges long enough now
to return to something more mundane ?

John

--
| John F. Lynn <jl...@engin.umich.edu> | (X.500 service at umich.edu) |
| The University of Michigan, Ann Arbor | Aerospace Engineering & LASC |

Manuel Eduardo Correia

unread,
Dec 20, 1993, 10:18:00 AM12/20/93
to
In article <2etb78...@abyss.West.Sun.COM> she...@salaam.West.Sun.COM (K. M. Sherif ) writes:

>> If you want to compare the response for a user in real life usage, then qualify
>> the system with the number of users who were using it and the type of programs
>> they were running at that time.

I was the only user at the time !!! The machine is being used
as a file server, but since it was Saturday almost no one was using
the Network !!! The interesting thing is that the same code in a MIPS
machine with 12 users loged in ( some of them with heavy processes )
executed the program in less then 15 sec... There must be something
wrong with the SPARCcenter, no doubts about that....



>> Also the fact that it is an 8 processor machine is irrelevant to this
>> benchmark even if run on single user mode, since you are not using a
>> parallelising compiler

I don't think so !!! The fact that there are other processors
to run the system and other users processes should make the time it
takes to run the test more independent of the load of the machine, and
as a consequence faster in heavy loading conditions...

Jerry Huck

unread,
Dec 21, 1993, 3:14:19 PM12/21/93
to
Dragon Fly (viz...@mps.ohio-state.edu) wrote:
: Notwithstanding possible critique from alleged
: computer specialists the insightful observer might note
: that the "benchmark" code is pretty typical for scientific
: calculations. Whatever other merits the system might have,
: if it's dragging its feet on this test it means the system
: from the point of view of consumer [insightful observer] is
: a crap. As many insightful observers probably have already
: noticed, the crap is being limited mainly to two mainstreams:
: SUN Sparcs and DECs running VMS.

: I excluded the benchmarks obtained on boxes with unknown
: specifications.
: As of today the accumulated results are:
:
: - - - - - - - - Original code - - - - - - - - - - - - - - - - - - - - - -


: #include <stdio.h>
: #include <math.h>
: #include <time.h>
: main()
: {
: double x,y[1000000];
: int i;
: time_t t;
:
: time(&t);

: for (i=0;i<1000000;i++)
: {
: x=11.0+(33.5*i)*(33.5*i);
: y[i]=(sin(3.1*i)+cos(5.1*i))*sqrt(x+exp(3.14*log(x+i)));
: }
: printf("time=%d\n",time(0)-t);
: }
Several people have commented on some of the dangers in this benchmark
and I would like to add a couple of specific comments:

1.) The computation of trig functions for large arguments is a
controversial area. An accurate computation, when you assume the
argument is exact, can be somewhat time consuming. It is not
considered an important performance path to compute sin(1000000*3.1),
and would not be fair to compare naive implementations, that are
completely "inaccurate", with more sophisticated approaches. Check
out the IBM JRD a couple of years back to see the nature of the
problem in computing trig functions.

It would be more representative to keep the arguments to these
functions within a couple of pi of 0. I know on SUN, HP, and IBM
machines you exercise different code when computing large argument
trig functions. Another danger is to only compute arguments very
close to "special" values. These are special cased and may
dramatically alter the run-time. For example, sin(x) = x for small x.

In your example, I'ld recommend using the expression:

-5+.00001*i

This would result in the range -5..5 or ~-1.6pi..~+1.6pi. I'll leave it to
others to comment on "useful" ranges for log and exp.

2.) Additionally, did you intend to test the compilers position
on floating-point associativity in the statement?

x=11.0+(33.5*i)*(33.5*i);

Some compilers respect C parens, others allow optimizations that ignore them.
Assuming you want to see the execution of a fixed number of multiplies and
adds and not test the compilers policies in another controversial area, then
I'ld recommend removing code that would benefit from re-association.
Something like:
x = 11.0 + 33.5*i*i;

If you wanted to test common sub-expression elimination try something like:
x = (11.0 + 33.5*i)*33.5*i

3.) Finally, do you want to test the cleverness of the compiler in
applying math identities in the sub-expression?:

exp(3.14*log(x+i))

If you only want to see calls to exp and log AND you don't want to reward
compilers that notice exp(x*log(y)) = pow(y,x) [I hope I did my math right],
then you should change that expression. Since the Whetstone program
had some statements like the above, some compilers are well versed in
this area. The POW function can be somewhat faster than 2 calls to
exp and log. You also push on the compilers policies in performance vs.
programmer faithfullness. Math library implementations often rely on
expressions like:
(x + y) - x
not being optimized.

Jerry

0 new messages