How to calculate MIPS from SPEC benchmark?

Josephine Chang

unread,

Apr 29, 1994, 7:30:13 PM4/29/94

to

Hi,

I found recently people use SPEC benchmarks to measure performance of
processors. Can anybody tell me how to transform SPEC to MIPS? Thanks.

Josephine Chang
ch...@pacific.usc.edu

John R. Mashey

unread,

Apr 29, 1994, 8:47:05 PM4/29/94

to

In article <2ps5a5$3...@pacific.usc.edu>, ch...@pacific.usc.edu (Josephine Chang) writes:
|> Hi,
|>
|> I found recently people use SPEC benchmarks to measure performance of
|> processors. Can anybody tell me how to transform SPEC to MIPS? Thanks.

This is a meaningless question, unfortunately:
a) If you mean marekting-mips, there's no correlation.
b) If you mean VAX-mips ... well SPECint is vax-relatively, so of course
it correlates.
c) If you mean internal peak mips, there's a lot of variance.

ADVICE: don't ever use the unadorned term "mips" without saying what you
mean: it's really useless. SPECwas done to have something at least
a little better than mips-ratings.

-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: ma...@sgi.com
DDD: 415-390-3090 FAX: 415-967-8496
USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

David1008

unread,

May 2, 1994, 1:22:02 PM5/2/94

to

>In article <2ps5a5$3...@pacific.usc.edu>, ch...@pacific.usc.edu (Josephine
Chang) writes:

>|> I found recently people use SPEC benchmarks to measure performance of
>|> processors. Can anybody tell me how to transform SPEC to MIPS? Thanks.

And |In article <Cp1rI...@odin.corp.sgi.com>, ma...@mash.engr.sgi.com (John R.
Mashey) replies::

>This is a meaningless question, unfortunately:
>a) If you mean marekting-mips, there's no correlation.
>b) If you mean VAX-mips ... well SPECint is vax-relatively, so of course
>it correlates.
>c) If you mean internal peak mips, there's a lot of variance.

>ADVICE: don't ever use the unadorned term "mips" without saying what you
>mean: it's really useless. SPECwas done to have something at least
>a little better than mips-ratings.

Pardon my monologue, but I am tired of the old refrain that MIPs mean nothing
(admittedly this is not quite what John is saying but it creates the
opportunity...)

I think that Joesephine (please correct me if I am wrong) is looking for a map
from SpecInts to MIPs (which stands for Millions of Instructions Per second and
ought to mean that without elaboration!) for the more common processors in
common configurations. This is an empirical set of numbers and quite real. In
addition, it is quite useful to implementors who often are more interested in
the number of instructions of a particular machine configuration they can
expect per microsecond than some general purpose wiegthed benchmark. The
benchmark is useful to compare chips but the MIPs number is useful when you
have an existing application on an existing configuration, say a 25mz 960/CF
with so much cache, and so on, and you want to know the effect of using a
similar processor, say a PowerPC or a 960/HA in the next generation of a
product.

Bryan O'Sullivan

unread,

May 2, 1994, 3:40:05 PM5/2/94

to

davi...@aol.com (David1008) writes:

JRM> ADVICE: don't ever use the unadorned term "mips" without saying
JRM> what you mean: it's really useless. SPECwas done to have
JRM> something at least a little better than mips-ratings.

David> I think that Joesephine (please correct me if I am wrong) is
David> looking for a map from SpecInts to MIPs (which stands for
David> Millions of Instructions Per second and ought to mean that
David> without elaboration!) for the more common processors in common
David> configurations. This is an empirical set of numbers and quite
David> real.

Forgive me if I am missing something in what you say, but you appear
to favour precisely the meaningless indication against which John
warns. Millions of *what* instructions, what mix of different types?
On modern RISCs, `most' instructions take about a cycle to execute.
But what about FP instructions, divides, branches, and so on, and the
frequencies with which they may occur? A single number offers no help
when you start factoring such things in.

David> In addition, it is quite useful to implementors who often are
David> more interested in the number of instructions of a particular
David> machine configuration they can expect per microsecond than some
David> general purpose weighted benchmark.

I rather doubt this imputation of utility to be correct.

Unless you know on what code mix the quoted MIPS figures were given
and how closely your own code compares to that (and it would want to
be *very* close for any useful data to be pickable out), you may as
well be staring at chicken entrails.

David> The benchmark is useful to compare chips but the MIPs number is
David> useful when you have an existing application on an existing
David> configuration, say a 25mz 960/CF with so much cache, and so on,
David> and you want to know the effect of using a similar processor,
David> say a PowerPC or a 960/HA in the next generation of a product.

Certainly not. With SPECint and SPECfp, I have at least *some* vague
notion of whether my application which I know to be, say, FP-intensive
after the manner of spice, might move well to a given processor, be it
in the same family or not.

But with MIPS? Consider the following scenario (admittedly silly):
the Foo-1 and Foo-2 are members of the same processor family. Foo
Processor Inc gives me a MIPS rating for each processor, based on an
unrolled cache-friendly NOP loop (as they often were before people
more or less stopped paying attention to MIPS numbers). My Foo-1 code
uses lots of floating point, but the MIPS numbers for both processors
are about the same. So, based on the MIPS ratings and my cursory
reading of the manufacturer's promotional material, I don't switch to
the more expensive Foo-2 (`ach, they're both the same MIPS, right?'),
even though it has faster FP multiply and divide.

You can't count on many instruction timings to stay constant across
iterations of the *same* processor family, and if you factor in a
different ISA too (cf. your example of i960->PPC), you might as well
forget about extracting meaningful data from MIPS numbers.

<b
--
Bryan O'Sullivan Will herd cats for food. BOTW: `The Crow Road'
Computer Science Department Email: bosu...@maths.tcd.ie, bosu...@tcd.ie
University of Dublin Web: http://www.scrg.cs.tcd.ie/scrg/u/bos.html

John R. Mashey

unread,

May 3, 1994, 1:04:12 PM5/3/94

to

You may be tired of the old refrain that MIPS mean nothing ... but that
doesn't stop it from being true, as illustrated by what you have
just asked for.

If MIPS is an empirical set of numbers and quite real, please state exactly
how they are measured, and what those numbers are for any interesting set
of CPUs, i.e., for example,
in such a way that you can predict the relative performance of
a PowerPC or 960/HA compared to a 960CF.

(It is easy to compute peak MIPS, i.e., highest issue rate X clock rate;
unfortunately, this has little to do with performance as can be seen
by machines that show similar performance on some application, but
have wildly varying peak MIPS. Alternatively, it is trivial to
have 2 CPUs whose peak MIPS are identical .. .but whose performance
on real codes varies a lot: example: R4000PC (primary-cache) and
R4000SC (secondary cache) have identical peak MIPS, but typical SC configurations have close to twice the performance (as seen on SPECint92, SPECfp92). Within an archiecture+implementation, performance
does tend to correlate somewhat with clock rate ... but memory system
counts, yet doesn't affect peak MIPS.)

Anway, *please* supply some more information in support of the contention that
there is a useful MIPS rating that tells you much about relative
performance across architectures. If this is true, it will be
very interesting and exciting, as finding the single number that predicts
such performance differences has been the Holy Grail for a long time.

Andrew Harrison SUNUK Consultancy

unread,

May 4, 1994, 12:57:15 PM5/4/94

to

In order to calculate the MIP rating of a system from the SPECint performance
you must multiply SPECint by 2ish. Actually this is reasonably acurate for the
SPARC10 and SPARC20 give or take 20%.

Before you all flame me I am joking.

Since by MIPS many people mean drystone MIPS I am not so sure that relating
MIPS to SPECint is at all valid.

Drystone is very succeptable to good compilers/naughty compilers and it can
also be made to fit into comparatively small cache sizes which most of the
SPECint benchmarks and real applications cannot.

The table below indicates why MIPS and SPECint do not correlate.

Cache SPECint MIPS Ratio MIPS/SPECint
SPARCstation2 64K I+D 22 30 1.36
SPARCclassic 2K I 4K D 26.3 60 2.28

The SPARCclassic has a much faster memory bus than the SPARCstation 2 but the
MIP rating is skewed by good compilers and which fit it into the small cache
SPECint is a much better measure of the real performance ratio of the SPARCclassic
to the SPARCstation2.

Equating MIPS to SPEC is probably as valid as measuring your systems performance
by its clock rate or how fast it reboots after a panic.

Andrew Harrison

Ping-Shun Huang

unread,

May 7, 1994, 2:21:26 AM5/7/94

to

In article <2q3crq$9...@search01.news.aol.com> davi...@aol.com (David1008) writes:

> In addition, it is quite useful to implementors who often are more
> interested in the number of instructions of a particular machine
> configuration they can expect per microsecond than some general
> purpose wiegthed benchmark. The benchmark is useful to compare
> chips but the MIPs number is useful when you have an existing
> application on an existing configuration, say a 25mz 960/CF with so
> much cache, and so on, and you want to know the effect of using a
> similar processor, say a PowerPC or a 960/HA in the next generation

> of a product. [....]

You seem to think that MIPS get measured on "real-world" systems
(counting in memory subsystems, disk subsystems, etc.) while SPECmarks
get measured on just the processor, when in fact exactly the opposite
has quite often been true. I remember lots of meaningless uses of MIPS
where people claimed "CPU A is 7.98334 times better than CPU B because
that's the ratio of their MIPS ratings", but such misuse of ratings
has decreased, in my opinion, since MIPS were replaced. That's not to
say, of course, that marketers don't bandy around SPECmark numbers, of
course they still do, but at least SPECmarks are a little bit closer
to most people's "reality" than MIPS, and there is a much stronger
tendency for SPECmarks to be attached to systems, not CPU's.

My use of computing resources may very well be atypical (and thus not
perfectly match the mix represented by SPECmarks), but I would venture
to guess that Machine A rated at 100 SPECmarks will feel noticeably
faster to me than Machine B rated at 50 SPECmarks. You seem to be
stating that I'd find MIPS more useful, but even if I knew that my use
of computing resources broke down exactly into 99% NOP's {grin} and 1%
ADD instructions, most MIPS figures didn't tell you what instruction
or instruction mix they were measuring anyway. Knowing that Machine A
is rated at 17000 MIPS and Machine B at 50000 MIPS tells me virtually
nothing, even with my knowledge about my usage.

--
Ping Huang (INTERNET: psh...@mit.edu), probably speaking for himself

Andre Yew

unread,

May 7, 1994, 1:56:46 PM5/7/94

to

Another problem with MIPS, as pointed out by Nick Tredenick (not sure
of the spelling), one of the original 68000 designers and 360 people, is that
it depends very heavily on your compiler. If you look at the number of
instructions produced starting at the VAX until today, you'll realize that
trying to scale anything to VAX MIPS is completely meaningless since compilers
today can squeeze Dhrystones into as little as 1/3 the number of instructions
on a RISC chip compared to the VAX. On top of that, people get even more
inflated numbers by inlining Dhrystone completely (many compilers today can
do intermodule inlining and Dhrystone is too small to suffer from it),
preprofiling it, so comparisons are never done, and completely aligning
string copies (so on things like 68040's which have 16-byte memory moves,
they win big), among many other egregious compiler optimizations. BTW, for
those that don't know, Dhyrstone is used as a linear measure of MIPS, usually
defining the VAX Dhyrstone number as 1 MIP, and scaling everyone else to that.

SPEC is better, but it gets tricky when an embedded chip company
insists on using it to benchmark your compiler on their board with a serial
port. God knows what they were on when they decided that.

--Andre

--
PGP public key available