Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

SPEC2k results for G4

10 views
Skip to first unread message

Anton Ertl

unread,
Feb 26, 2002, 4:47:34 AM2/26/02
to
c't 5/2002, pages 182-183 publishes SPEC CPU2000 results for various
Apple G4s and a Pentium III/1000 (all with 512MB RAM in single-user
mode). Here they are:

C compiler SPECint_base SPECfp_base
G4 800 gcc 242 147
G4 867 gcc 259 153
G4 1000 gcc 306 187
P3 1000 gcc 309
P3 1000 MSC 236

The Metroworks compiler produced slightly worse results than gcc. The
gcc version used was gcc-2.95.[23] on all machines. The compiler used
for the Fortran benchmarks was Absoft Pro Fortran 7.0.

The article lists the individual results for SPECint_base; The G4 1GHz
is significantly faster then the P3 1GHz on 176.gcc (444 vs. 286) and
300.twolf (522 vs. 340), and significantly slower on 186.crafty (334
vs. 444).

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Norbert Juffa

unread,
Feb 26, 2002, 1:00:22 PM2/26/02
to
"Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote in message
news:a5fljm$tg$1...@news.tuwien.ac.at...

> c't 5/2002, pages 182-183 publishes SPEC CPU2000 results for various
> Apple G4s and a Pentium III/1000 (all with 512MB RAM in single-user
> mode). Here they are:
>
> C compiler SPECint_base SPECfp_base
> G4 800 gcc 242 147
> G4 867 gcc 259 153
> G4 1000 gcc 306 187
> P3 1000 gcc 309
> P3 1000 MSC 236
>
> The Metroworks compiler produced slightly worse results than gcc. The
> gcc version used was gcc-2.95.[23] on all machines. The compiler used
> for the Fortran benchmarks was Absoft Pro Fortran 7.0.

I read the German article pointed to by Julian and one thing that is pointed
out there that wasn't clear to me is that gcc 2.95.3 for x86 Linux is from
1999. So the comparison seems to be between a 2 year old x86 CPU
using a 3 year old compiler versus the latest and greatest from Apple? Are
there any better compilers available for the Apple machine than the gcc
used by c't? Maybe the IBM or Motorola compilers would be of help here?

Does anybody know whether Apple is planning to make an official SPEC
submission? There got to be a reason SPEC2K was made compatible with
the Macintosh platform (I think this did not happen until version 1.2?)

-- Norbert


Nick Maclaren

unread,
Feb 26, 2002, 1:21:35 PM2/26/02
to

In article <W2Qe8.18232$ZC3.1...@newsread2.prod.itd.earthlink.net>, "Norbert Juffa" <ju...@earthlink.net> writes:
|> "Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote in message
|> news:a5fljm$tg$1...@news.tuwien.ac.at...
|> > c't 5/2002, pages 182-183 publishes SPEC CPU2000 results for various
|> > Apple G4s and a Pentium III/1000 (all with 512MB RAM in single-user
|> > mode). Here they are:
|> >
|> > C compiler SPECint_base SPECfp_base
|> > G4 800 gcc 242 147
|> > G4 867 gcc 259 153
|> > G4 1000 gcc 306 187
|> > P3 1000 gcc 309
|> > P3 1000 MSC 236
|> >
|> > The Metroworks compiler produced slightly worse results than gcc. The
|> > gcc version used was gcc-2.95.[23] on all machines. The compiler used
|> > for the Fortran benchmarks was Absoft Pro Fortran 7.0.
|>
|> I read the German article pointed to by Julian and one thing that is pointed
|> out there that wasn't clear to me is that gcc 2.95.3 for x86 Linux is from
|> 1999. So the comparison seems to be between a 2 year old x86 CPU
|> using a 3 year old compiler versus the latest and greatest from Apple? Are
|> there any better compilers available for the Apple machine than the gcc
|> used by c't? Maybe the IBM or Motorola compilers would be of help here?

Not really. Intel's own Spec submission for the 1 GHz Pentium III
states the hardware availability as August 1st 2001, and it was and
is pretty comparable with a 2 GHz Pentium 4 on GUI codes (which are
very much Apple's target). The same generation of compiler was used
for both systems, and nobody should describe gcc 2.95 as the latest
and greatest on any platform. The comparison of the hardware speeds
is fair.

The fact that Intel's compiler delivers 402/451 versus the 309 of
gcc tells us more about the effort Intel puts into tuning its
compiler for SPEC than anything else. Yes, a comparison with an
equivalent compiler for the G4 would be interesting.

I read that as indicating that the G4 is comparable with Intel's
chips for GUI and similar work, and singularly useless for HPC :-)


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679

JD

unread,
Feb 26, 2002, 1:38:09 PM2/26/02
to

"Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message news:a5gjnf$1mo$1...@pegasus.csx.cam.ac.uk...

>
> The fact that Intel's compiler delivers 402/451 versus the 309 of
> gcc tells us more about the effort Intel puts into tuning its
> compiler for SPEC than anything else. Yes, a comparison with an
> equivalent compiler for the G4 would be interesting.
>
From what I have seen (and I use it alot), the Intel C compiler isn't
just good at Spec, but is good in general. GCC has a long way to
go in order to ferret out the performance that the ICC compiler
can do.

John

Peter Boyle

unread,
Feb 26, 2002, 1:56:31 PM2/26/02
to

On 26 Feb 2002, Nick Maclaren wrote:


> I read that as indicating that the G4 is comparable with Intel's
> chips for GUI and similar work, and singularly useless for HPC :-)

The G3 had particularly evil issue rules for double precision fp.
Anyone know the G4 well enough to comment?
Peter

>
> Regards,
> Nick Maclaren,
> University of Cambridge Computing Service,
> New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
> Email: nm...@cam.ac.uk
> Tel.: +44 1223 334761 Fax: +44 1223 334679
>

Peter Boyle pbo...@physics.gla.ac.uk

Skipper Smith

unread,
Feb 26, 2002, 3:05:20 PM2/26/02
to
Peter Boyle <pbo...@holyrood.ed.ac.uk> wrote:
>
>On 26 Feb 2002, Nick Maclaren wrote:
>> I read that as indicating that the G4 is comparable with Intel's
>> chips for GUI and similar work, and singularly useless for HPC :-)
>
>The G3 had particularly evil issue rules for double precision fp.
>Anyone know the G4 well enough to comment?

Wow, evil is an interesting value judgement to place on a design that
optimized for die size and integer performance over FP performance because
of its' target market. It issued one double-FP every two clocks ONLY in
the case of a double-FP multiply because it had only a single-precision
multiplier. If you weren't doing a multiply, then it could do one every
clock (actually, not quite that high, worked out to 5 every 7 clocks due
to some other pressures- I published a paper for '97's Design Super Con to
elaborate on this issue which you can find on Moto's web site).

Yes, the G4 does much better because it expanded on most of the issues
that caused the FP unit pressures- dp multiplier and more renames. There
is still an issue related to the distance of the pipeline from the
dispatcher, though, IIRC.

--
Skipper Smith Helpful Knowledge Consulting
Worldwide Microprocessor Architecture Training
PowerPC, ColdFire, 68K, CPU32 Hardware and Software

/* Remove no-spam. from the reply address to send mail directly */

Anton Ertl

unread,
Feb 26, 2002, 3:31:33 PM2/26/02
to
In article <W2Qe8.18232$ZC3.1...@newsread2.prod.itd.earthlink.net>,
"Norbert Juffa" <ju...@earthlink.net> writes:
>"Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote in message
>news:a5fljm$tg$1...@news.tuwien.ac.at...
>> c't 5/2002, pages 182-183 publishes SPEC CPU2000 results for various
>> Apple G4s and a Pentium III/1000 (all with 512MB RAM in single-user
>> mode). Here they are:
>>
>> C compiler SPECint_base SPECfp_base
>> G4 800 gcc 242 147
>> G4 867 gcc 259 153
>> G4 1000 gcc 306 187
>> P3 1000 gcc 309
>> P3 1000 MSC 236
>>
>> The Metroworks compiler produced slightly worse results than gcc. The
>> gcc version used was gcc-2.95.[23] on all machines. The compiler used
>> for the Fortran benchmarks was Absoft Pro Fortran 7.0.
>
>I read the German article pointed to by Julian and one thing that is pointed
>out there that wasn't clear to me is that gcc 2.95.3 for x86 Linux is from
>1999. So the comparison seems to be between a 2 year old x86 CPU
>using a 3 year old compiler versus the latest and greatest from Apple?

They used gcc-2.95.2 for the G4. Of course you are right wrt the
CPUs.

As for gcc-2.95 vs. gcc-3.0, you can find SPEC95 results at

http://www.complang.tuwien.ac.at/anton/spec95/athlon-800/CINT95.041.asc
http://www.complang.tuwien.ac.at/anton/spec95/athlon-800/gcc-3.0/CINT95.051.asc

26.8 for gcc-2.95.1 and 27.5 for gcc-3.0 on the same machine
(Athlon-800).

> Are
>there any better compilers available for the Apple machine than the gcc
>used by c't?

Apparently not.

> Maybe the IBM or Motorola compilers would be of help here?

According to the articles, Metroworks is owned by Motorola. What
happened to MrC (Apple's compiler)?

I found it interesting that gcc outperformed both proprietary
compilers in this comparison.

Peter Boyle

unread,
Feb 26, 2002, 3:55:50 PM2/26/02
to

On 26 Feb 2002, Skipper Smith wrote:

Hi,

> Wow, evil is an interesting value judgement to place on a design that
> optimized for die size and integer performance over FP performance because
> of its' target market. It issued one double-FP every two clocks ONLY in

;)
Perhaps using an embedded optimized part in a desktop is what is evil.

Anyway, I'll resist the temptation to religiously chant that single cycle
throughput is the only clean issue rate, instead say that while it is at
least easy write a scheduler for pure multiplies, the 5 of 7 is pretty
hard to schedule for, and then remember you have both the multiplies, adds
(and fmadds) in the same pipeline.

It's fairly hard to know what optimal performance for a given mix, let
alone how to get it! Seriously, would designing a fully pipelined d.p.
fpu have added that much fractional chip area and power?

> Yes, the G4 does much better because it expanded on most of the issues
> that caused the FP unit pressures- dp multiplier and more renames. There

Glad to hear it!

> is still an issue related to the distance of the pipeline from the
> dispatcher, though, IIRC.

I'm curious - what is it?

Thanks,

Peter

Norbert Juffa

unread,
Feb 26, 2002, 5:16:00 PM2/26/02
to
"JD" <dy...@jdyson.com> wrote in message news:<3tQe8.1123$K%4.27...@news1.iquest.net>...

I would second that observation. I started using Intel C for personal
use at version 2.4 and kept upgrading until version 4.5. From my
experience, the advantage the Intel C compiler has over MSVC6 is that
it has very powerful high-level optimizations missing in MSVC6. To
name just two: cross-file inlining, and feedback directed optimizations.
It also is able to take advantage of instruction set extensions introduced
after Pentium (MMX, conditional moves), which helps but usually not
massively.

In fact, when compiling with the Intel compiler and leaving out all
optimizations that MSVC6 does not do, the performance is very similar,
meaning MSVC and Intel C apply the remaining set of optimization equally
well. Looking at the generated code, I found that MSVC6 is often more
"x86 aware" than the Intel 4.5 code generator (i.e. compiler back end).
This might have changed with new compilers (5.0 and later) which I do
not have a my disposal.

Best I understand gcc is taking steps to implement some of techniques
used in the Intel compiler such as feedback directed optimization.

For Intel C 4.5 vs MSVC6/SP4 I have found that the Intel compiler is
about 20%-25% faster across many codes. Rarely does it do worse than
MSVC6.

-- Norbert

Norbert Juffa

unread,
Feb 26, 2002, 5:39:47 PM2/26/02
to
nm...@cus.cam.ac.uk (Nick Maclaren) wrote in message news:<a5gjnf$1mo$1...@pegasus.csx.cam.ac.uk>...

While using same frequency parts and same version compilers seems most
fair, it does seem to have little practical relevance (IMHO). All these
results show is that architectural performance of a 1 GHz MPC7455 is
comparable to a 1 GHz PIII (at least for integer work).

What I would like to know is how do the "best" systems available today
compare. I can now buy a 1.66 GHz Athlon or a 2.2 GHz P4, and more
modern compilers are available for the x86 platform. The SPEC results
for those x86 machines with the latest compilers I can get from the SPEC
site. But it wouldn't be fair to compare these results to the G4 results
above if there are compilers available to the end user that deliver much
higher performance than gcc 2.95.2. Thus my questions about the availa-
bility of other compilers. Maybe no such compilers are available and gcc
is the best one can get? If anybody has insight into this please let us
know.


> I read that as indicating that the G4 is comparable with Intel's
> chips for GUI and similar work, and singularly useless for HPC :-)

Funny, my interpretation is different: The MPC7455 seems comparable "in
GUI work" to an Intel part that is outdated, but it is not even close to
an Athlon or P4 one might find in a comparably priced PC available today.

As for the floating-point performance, I wonder how much of that is the
compiler and how much is due to the FPU in the G4 or the memory subsystem?
SPECfp loves memory bandwidth and I think the Macintosh only uses PC133
SDRAM (versus DDR in Athlon systems and RDRAM in high-end P4 systems).
Absoft Fortran has OK performance on x86 as per www.polyhedron.co.uk, so
I have to assume that their PPC compiler isn't shabby either. But one can
ever be sure.


-- Norbert

Tony Nelson

unread,
Feb 26, 2002, 6:11:38 PM2/26/02
to
In article <W2Qe8.18232$ZC3.1...@newsread2.prod.itd.earthlink.net>,
"Norbert Juffa" <ju...@earthlink.net> wrote:

> ...Are there any better compilers available for the Apple machine

> than the gcc used by c't? Maybe the IBM or Motorola compilers would
> be of help here?

Well, there's MrC (and MrCpp) from Apple, used for the MacOS, but
unavailable for *nix.


In article <a5grb5$9lr$1...@news.tuwien.ac.at>,
an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

> ...What happened to MrC (Apple's compiler)?

It isn't compatible with the Nexties' world view, and has been Steved.
____________________________________________________________________
TonyN.:' tony...@shore.net
'

Nick Maclaren

unread,
Feb 26, 2002, 6:34:33 PM2/26/02
to
In article <cf4fab37.02022...@posting.google.com>,

Norbert Juffa <ju...@acm.org> wrote:
>
>While using same frequency parts and same version compilers seems most
>fair, it does seem to have little practical relevance (IMHO). All these
>results show is that architectural performance of a 1 GHz MPC7455 is
>comparable to a 1 GHz PIII (at least for integer work).

I would agree with the latter, but not the former! The former
remark also applies to the choice of language - it isn't uncommon
for different languages to be much better for different systems.

>What I would like to know is how do the "best" systems available today
>compare. I can now buy a 1.66 GHz Athlon or a 2.2 GHz P4, and more
>modern compilers are available for the x86 platform. The SPEC results
>for those x86 machines with the latest compilers I can get from the SPEC
>site. But it wouldn't be fair to compare these results to the G4 results
>above if there are compilers available to the end user that deliver much
>higher performance than gcc 2.95.2. Thus my questions about the availa-
>bility of other compilers. Maybe no such compilers are available and gcc
>is the best one can get? If anybody has insight into this please let us
>know.

That is a reasonable question, but it is critical to say what your
criteria are for "best". The criteria for an HPC system are VERY
different from the criteria for a GUI handheld or even PC.

>> I read that as indicating that the G4 is comparable with Intel's
>> chips for GUI and similar work, and singularly useless for HPC :-)
>
>Funny, my interpretation is different: The MPC7455 seems comparable "in
>GUI work" to an Intel part that is outdated, but it is not even close to
>an Athlon or P4 one might find in a comparably priced PC available today.

Er, did you see the results of investigations by various electronic
journals etc.? The Pentium 4 is pretty dire at such use, and I did
really mean that it is no better than the Pentium III despite its
hight clock speed and Spec results.

>As for the floating-point performance, I wonder how much of that is the
>compiler and how much is due to the FPU in the G4 or the memory subsystem?
>SPECfp loves memory bandwidth and I think the Macintosh only uses PC133
>SDRAM (versus DDR in Athlon systems and RDRAM in high-end P4 systems).
>Absoft Fortran has OK performance on x86 as per www.polyhedron.co.uk, so
>I have to assume that their PPC compiler isn't shabby either. But one can
>ever be sure.

I don't know, but it is a good question, and very relevant to the
viability of certain research projects.

David T. Wang

unread,
Feb 26, 2002, 6:59:42 PM2/26/02
to
Nick Maclaren <nm...@cus.cam.ac.uk> wrote:

> In article <W2Qe8.18232$ZC3.1...@newsread2.prod.itd.earthlink.net>, "Norbert Juffa" <ju...@earthlink.net> writes:

> |> "Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote:

> |> > c't 5/2002, pages 182-183 publishes SPEC CPU2000 results for various
> |> > Apple G4s and a Pentium III/1000 (all with 512MB RAM in single-user
> |> > mode). Here they are:

> |> > C compiler SPECint_base SPECfp_base
> |> > G4 800 gcc 242 147
> |> > G4 867 gcc 259 153
> |> > G4 1000 gcc 306 187
> |> > P3 1000 gcc 309
> |> > P3 1000 MSC 236

> |> > The Metroworks compiler produced slightly worse results than gcc. The
> |> > gcc version used was gcc-2.95.[23] on all machines. The compiler used
> |> > for the Fortran benchmarks was Absoft Pro Fortran 7.0.

> |> out there that wasn't clear to me is that gcc 2.95.3 for x86 Linux is from


> |> 1999. So the comparison seems to be between a 2 year old x86 CPU
> |> using a 3 year old compiler versus the latest and greatest from Apple? Are
> |> there any better compilers available for the Apple machine than the gcc
> |> used by c't? Maybe the IBM or Motorola compilers would be of help here?

> Not really. Intel's own Spec submission for the 1 GHz Pentium III
> states the hardware availability as August 1st 2001, and it was and
> is pretty comparable with a 2 GHz Pentium 4 on GUI codes (which are
> very much Apple's target). The same generation of compiler was used
> for both systems, and nobody should describe gcc 2.95 as the latest
> and greatest on any platform. The comparison of the hardware speeds
> is fair.

> The fact that Intel's compiler delivers 402/451 versus the 309 of
> gcc tells us more about the effort Intel puts into tuning its
> compiler for SPEC than anything else. Yes, a comparison with an
> equivalent compiler for the G4 would be interesting.

gcc != gcc. :)

I ran some tests last year, using gcc 2.95.3, and it was apparently
patched with pgcc. That brought most scores to within 10 to 15% of
IRC 4.5

http://groups.google.com/groups?selm=8vsls0%241qt%241%40hecate.umd.edu

I recall hearing some snippets about Intel giving some money to
Cygnus or some group to incorporate processor specific backend
optimizations into gcc. When I see the large delta here, I would
like to know if the "optimized gcc" was used or not.

> I read that as indicating that the G4 is comparable with Intel's
> chips for GUI and similar work, and singularly useless for HPC :-)

You know there are other uses for computers aside from HPC's right? :)

--
E PLURIBUS UNUM

ddaavveewwaanngg@@wwaamm..uummdd..eedduu

Julian Ruhe

unread,
Feb 26, 2002, 7:04:45 PM2/26/02
to
Hi,

> As for the floating-point performance, I wonder how much of that is the
> compiler and how much is due to the FPU in the G4 or the memory subsystem?
> SPECfp loves memory bandwidth and I think the Macintosh only uses PC133
> SDRAM (versus DDR in Athlon systems and RDRAM in high-end P4 systems).
> Absoft Fortran has OK performance on x86 as per www.polyhedron.co.uk, so
> I have to assume that their PPC compiler isn't shabby either. But one can
> ever be sure.


As usual ATLAS (http://www.netlib.org/atlas) kowns the answer ;-). Newer
versions of ATLAS include hand optimized kernels of ?GEMM (matrix
multiplication) for PIII,P4,Athlon and G4. A blocked matrix
multiplication is one of most computational power using algorithm and
gives one a good overview over the performance of a CPU (FPU,ALU,inst
decoding ablility,AGU,caches). Examples:

double precision (DGEMM) 1000x1000

733Mhz G4e, 256K L2, 1MB L3: 943 MFLOPS => 1.28 FLOPS/cyc (64% of peak)
1200MHz Athlon : 1905 MFLOPS => 1.59 FLOPS/cyc (79% of peak)
1500MHz P4 : 1941 MFLOPS => 1.29 FLOPS/cyc (65% of peak)
800MHz Itanium : 2135 MFLOPS => 2.67 FLOPS/cyc (66% of peak)
933MHz PIII : 677 MFLOPS => 0.72 FLOPS/cyc (72% of peak)
Alpha 21264 : no exact numbers (90% of peak)

single precison (SGEMM) 1000x1000

733Mhz G4e (AliVec) : 2632 MFLOPS => 3.59 FLOPS/cyc (??% of peak)

1000MHz Athlon (3dnow!) : 2298 MFLOPS => 2.23 FLOPS/cyc (58% of peak)
1500 MHz P4 : 3922 MFLOPS => 2.61 FLOPS/cyc (65% of peak)
933 MHz PIII : 1610 MFLOPS => 1.73 FLOPS/cyc (43% of peak)

Does the G4 have 8*MHz MFLOPS peak performance in single prec?
What I can say is the the Alphas have reached their absolute limit (GOTO's DGEMM!),
Athlon its limit in the range of ATLAS's concept (a full assembly DGEMM will
probably reach 85% of peak), the P4 and IA64 have still room (MKL 5.1 is probably
faster) and about the G4 I am not sure. All in double precision.
In single precision on all architectures there are also some performance improvements
possible I think.

Some numbers can be found here:
http://www.netlib.org/atlas/atlas-comm/msg00458.html
http://www.netlib.org/atlas/atlas-comm/msg00235.html
http://www.netlib.org/atlas/atlas-comm/msg00392.html
http://www.netlib.org/atlas/atlas-comm/msg00187.html
http://www.netlib.org/atlas/atlas-comm/msg00380.html

Concerned to SPEC2K, there are some subtests that that are heavily bandwidth dependent and some
are not. It is a pitty that c't did not release details about SPECfp for the G4. In an older
article of c't, they compared SPEC runs of P4s equipped with RDRAM,DDR SDRAM and SDR DRRAM. There
one can easy see, which SPEC subtests are bandwidth dependent and which are not. The interesting thing
is, that Athlon won all tests that were *not* bandwidth dependent agaist the P4 with RDRAM and lost all
test that were bandwidth dependent. If I find the article, I am going to post the numbers. Then we
can at least SPECint subtest on G4 vs Athlon/p4.

Julian


Rupert Pigott

unread,
Feb 26, 2002, 7:23:27 PM2/26/02
to

Norbert Juffa <ju...@acm.org> wrote in message
news:cf4fab37.02022...@posting.google.com...
[SNIP]

> While using same frequency parts and same version compilers seems most
> fair, it does seem to have little practical relevance (IMHO). All these
> results show is that architectural performance of a 1 GHz MPC7455 is
> comparable to a 1 GHz PIII (at least for integer work).

There are other considerations of course... Power consumption
and cost of the part. These have implications kinds of system
you can build and price points of them. If I was sticking a
few hundred chips into a wardrobe, I'd want something which
gave decent performance, but had low external component count
and didn't burn megawatts.

Horses for courses. :)

PIII's certainly weren't designed with as much emphasis on
low-cost embedded systems as MPC7455's.

Cheers,
Rupert


Nick Maclaren

unread,
Feb 26, 2002, 7:27:12 PM2/26/02
to
In article <a5h7he$mp$4...@grapevine.wam.umd.edu>,

David T. Wang <f...@bar.invalid> wrote:
>
>gcc != gcc. :)
>
>I ran some tests last year, using gcc 2.95.3, and it was apparently
>patched with pgcc. That brought most scores to within 10 to 15% of
>IRC 4.5
>
>http://groups.google.com/groups?selm=8vsls0%241qt%241%40hecate.umd.edu

Most interesting. That does make comparisons, er, confusing.

>I recall hearing some snippets about Intel giving some money to
>Cygnus or some group to incorporate processor specific backend
>optimizations into gcc. When I see the large delta here, I would
>like to know if the "optimized gcc" was used or not.

I heard that one, too, but only third-hand.

>> I read that as indicating that the G4 is comparable with Intel's
>> chips for GUI and similar work, and singularly useless for HPC :-)
>
>You know there are other uses for computers aside from HPC's right? :)

Rumour has it :-)

Sander Vesik

unread,
Feb 26, 2002, 7:48:27 PM2/26/02
to
Julian Ruhe <ruhe...@linux.zrz.tu-berlin.de> wrote:
> Hi,
>

[snip]

> single precison (SGEMM) 1000x1000
>
> 733Mhz G4e (AliVec) : 2632 MFLOPS => 3.59 FLOPS/cyc (??% of peak)
>
> 1000MHz Athlon (3dnow!) : 2298 MFLOPS => 2.23 FLOPS/cyc (58% of peak)
> 1500 MHz P4 : 3922 MFLOPS => 2.61 FLOPS/cyc (65% of peak)
> 933 MHz PIII : 1610 MFLOPS => 1.73 FLOPS/cyc (43% of peak)
>
> Does the G4 have 8*MHz MFLOPS peak performance in single prec?

Actually, I think its 4*MHz unless they use MADD. Which gives quite high
percentage of peak, otoh, it can move Altivec register contents around
at the same time it is doing fp.

[snip]

>
> Julian
>
>

--
Sander

+++ Out of cheese error +++

Julian Ruhe

unread,
Feb 26, 2002, 8:13:32 PM2/26/02
to
Hi,


> Concerned to SPEC2K, there are some subtests that that are heavily
> bandwidth dependent and some
> are not. It is a pitty that c't did not release details about SPECfp for
> the G4. In an older
> article of c't, they compared SPEC runs of P4s equipped with RDRAM,DDR
> SDRAM and SDR DRRAM. There
> one can easy see, which SPEC subtests are bandwidth dependent and which
> are not. The interesting thing
> is, that Athlon won all tests that were *not* bandwidth dependent agaist
> the P4 with RDRAM and lost all
> test that were bandwidth dependent. If I find the article, I am going to
> post the numbers. Then we
> can at least SPECint subtest on G4 vs Athlon/p4.
>

I found the article

SPECint:

164.gzip:
P4 1.8GHz PC800 :603
P4 1.8GHz PC133 :546 =>mainly CPU dependent

Athlon 1.53GHz PC2100:757

G4 1GHz :395

175.vpr:
P4 1.8GHz PC800 :342
P4 1.8GHz PC133 :290 =>mainly CPU dependent

Athlon 1.53GHz PC2100:355

G4 1GHz :316


176.gcc:
P4 1.8GHz PC800 :640
P4 1.8GHz PC133 :405 =>Bandwidth dependent

Athlon 1.53GHz PC2100:363

G4 1GHz :444

181.mcf:
P4 1.8GHz PC800 :503
P4 1.8GHz PC133 :282 =>Bandwidth dependent
Athlon 1.53GHz PC2100:282
G4 1GHz :319

186.crafty:
P4 1.8GHz PC800 :580
P4 1.8GHz PC133 :580 =>CPU dependent
Athlon 1.53GHz PC2100:879
G4 1GHz :334

197.parser:
P4 1.8GHz PC800 :535
P4 1.8GHz PC133 :474 =>mainly CPU dependent
Athlon 1.53GHz PC2100:521
G4 1GHz :259

252.eon:
P4 1.8GHz PC800 :739
P4 1.8GHz PC133 :748 =>CPU dependent
Athlon 1.53GHz PC2100:1074
G4 1GHz :77

253.perlbmk:
P4 1.8GHz PC800 :679
P4 1.8GHz PC133 :661 =>CPU dependent
Athlon 1.53GHz PC2100:883
G4 1GHz :345

254.gap:
P4 1.8GHz PC800 :826
P4 1.8GHz PC133 :693 =>mainly bandwidth dependent
Athlon 1.53GHz PC2100:700
G4 1GHz :312

255.vortex:
P4 1.8GHz PC800 :924
P4 1.8GHz PC133 :853 =>mainly CPU dependent
Athlon 1.53GHz PC2100:986
G4 1GHz :310

256.bzip2:
P4 1.8GHz PC800 :472
P4 1.8GHz PC133 :382 =>mainly CPU dependent
Athlon 1.53GHz PC2100:500
G4 1GHz :333

300.twolf:
P4 1.8GHz PC800 :376
P4 1.8GHz PC133 :298 =>mainly CPU dependent
Athlon 1.53GHz PC2100:389
G4 1GHz :522


--------Now the numbers normalized to 1GHZ----------


164.gzip:
P4 1.8GHz PC800 :335
P4 1.8GHz PC133 :303 =>mainly CPU dependent

Athlon 1.53GHz PC2100:494

G4 1GHz :395

175.vpr:
P4 1.8GHz PC800 :190
P4 1.8GHz PC133 :161 =>mainly CPU dependent

Athlon 1.53GHz PC2100:232

G4 1GHz :316


186.crafty:
P4 1.8GHz PC800 :322
P4 1.8GHz PC133 :322 =>CPU dependent
Athlon 1.53GHz PC2100:574
G4 1GHz :334

197.parser:
P4 1.8GHz PC800 :297
P4 1.8GHz PC133 :263 =>mainly CPU dependent
Athlon 1.53GHz PC2100:340
G4 1GHz :259

252.eon:
P4 1.8GHz PC800 :410
P4 1.8GHz PC133 :415 =>CPU dependent
Athlon 1.53GHz PC2100:701
G4 1GHz :77

253.perlbmk:
P4 1.8GHz PC800 :377
P4 1.8GHz PC133 :367 =>CPU dependent
Athlon 1.53GHz PC2100:577
G4 1GHz :345

255.vortex:
P4 1.8GHz PC800 :513
P4 1.8GHz PC133 :473 =>mainly CPU dependent
Athlon 1.53GHz PC2100:644
G4 1GHz :310

256.bzip2:
P4 1.8GHz PC800 :262
P4 1.8GHz PC133 :212 =>mainly CPU dependent
Athlon 1.53GHz PC2100:326
G4 1GHz :333

300.twolf:
P4 1.8GHz PC800 :208
P4 1.8GHz PC133 :165 =>mainly CPU dependent
Athlon 1.53GHz PC2100:254
G4 1GHz :522

What is the conclusion?

(1) G4 wins 2/3 bandwidth intens tests against P4/PC133 and
Athlon/PC2100. This is probably due to the large and fast L3 cache.
The lost test is probably caused by the poor main memory subsystem.
(2) In CPU intense test G4+gcc are not bad compared to normalized Athlon
and P4 (except 252.eon, which is clearly a compiler issue).
(3) The G4 is a solid CPU that lacks of higher clockspeeds. But not more
effective than Athlon or P4 as G4 fans often say.

Julian

Julian Ruhe

unread,
Feb 26, 2002, 8:20:32 PM2/26/02
to

Paul Hsieh

unread,
Feb 26, 2002, 11:45:25 PM2/26/02
to
nm...@cus.cam.ac.uk (Nick Maclaren) wrote:
> "Norbert Juffa" <ju...@earthlink.net> writes:
> |> "Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote:
> |> > c't 5/2002, pages 182-183 publishes SPEC CPU2000 results for various
> |> > Apple G4s and a Pentium III/1000 (all with 512MB RAM in single-user
> |> > mode). Here they are:
> |> >
> |> > C compiler SPECint_base SPECfp_base
> |> > G4 800 gcc 242 147
> |> > G4 867 gcc 259 153
> |> > G4 1000 gcc 306 187
> |> > P3 1000 gcc 309
> |> > P3 1000 MSC 236

You know, I remember a time where I dared to suggest that the PPC
arhitecture was highly unlikely to have higher IPC than the P6 or
Athlon. Then one "Bruce Hoult" berated me for ignoring all the
authoritative Apple demos that clearly showed otherwise. He was
incredulous that I could possibly believe that the PPC did not have a
significantly higher IPC.

> |> > The Metroworks compiler produced slightly worse results than gcc. The
> |> > gcc version used was gcc-2.95.[23] on all machines. The compiler used
> |> > for the Fortran benchmarks was Absoft Pro Fortran 7.0.
> |>
> |> I read the German article pointed to by Julian and one thing that is
> |> pointed out there that wasn't clear to me is that gcc 2.95.3 for x86 Linux
> |> is from 1999. So the comparison seems to be between a 2 year old x86 CPU
> |> using a 3 year old compiler versus the latest and greatest from Apple? Are
> |> there any better compilers available for the Apple machine than the gcc
> |> used by c't? Maybe the IBM or Motorola compilers would be of help here?
>
> Not really. Intel's own Spec submission for the 1 GHz Pentium III
> states the hardware availability as August 1st 2001, and it was and
> is pretty comparable with a 2 GHz Pentium 4 on GUI codes (which are
> very much Apple's target).

Intel announced the 1Ghz P-III in early March 2000 during their IDF
conference, and actually shipped units to customers later that month.
The Spec submission predated this announcement, and thus I think they
were just trying to make sure people didn't get any advanced info on
their roadmap. Norbert's estimation of 2 years is only off by about a
week and a half.

> [...] The same generation of compiler was used for both systems, and nobody

> should describe gcc 2.95 as the latest and greatest on any platform. The
> comparison of the hardware speeds is fair.

Uhh ... the latest and greatest gcc is version 3.0, its come out
recently, its a beta-quality compiler and is only currently available
for the x86 afaik. By going back to 2.95 I think they were making the
maximum possible concession to Apple by going with the latest compiler
available for the PPC and trying to match it up as close as possible
to something comparable for the x86. In otherwords, its state of the
art for PPC, and ancient for the x86 (not the other way around.)

That said, does anyone have results from the rumored IBM compiler
(that supposedly was used to recompile numerous applications for Mac
OS X?)

> The fact that Intel's compiler delivers 402/451 versus the 309 of
> gcc tells us more about the effort Intel puts into tuning its
> compiler for SPEC than anything else.

I don't quite get your point. Apparently this compiler is also
optimized for Max Payne, 3DMark, ScienceMark, the FLOPS benchmarks,
FlaskMPEG, and 30 some odd random tests I read in a recent MSVC vs GCC
vs INTEL compiler shoot out (I cannot remember the web page, it was
linked on Slashdot.org)

As nVidia once put it "Yes we optimize for WinBench! And if our
competitors don't they're either lying or foolish! -- that said, we
optimize for everything else too"

> [...] Yes, a comparison with an equivalent compiler for the G4 would be
> interesting.

What about an equivalent manufacturing process? What about an
equivalent price point? What about an equivalent ship date?



> I read that as indicating that the G4 is comparable with Intel's
> chips for GUI and similar work, and singularly useless for HPC :-)

Of course you did. You read it on comp.mac.advocacy I am sure.
Everything I've read in the past 5 years about GUI performance is that
the x86 world is just plain faster. The recent sluggish Mac OS X 10
debacle (which was apparently fixed with 10.1 patch) is a telling
indicator of the serious performance problems of Apple's platform in
advanced graphical environments.

Apple secret photoshop test remains the only application where Apple
can claim any kind of a lead and that's just because 1) Steve Jobs
picks the filters of his choice (which have not been revealed) 2)
Motorola added in the Photoshop acceleration instructions
(euphemistically called "AltiVec")

--
Paul Hsieh
http://www.pobox.com/~qed/apple.html

Norbert Juffa

unread,
Feb 27, 2002, 12:17:19 AM2/27/02
to
nm...@cus.cam.ac.uk (Nick Maclaren) wrote in message news:<a5h629$kd0$1...@pegasus.csx.cam.ac.uk>...

> In article <cf4fab37.02022...@posting.google.com>,
> Norbert Juffa <ju...@acm.org> wrote:
[...]

> >> I read that as indicating that the G4 is comparable with Intel's
> >> chips for GUI and similar work, and singularly useless for HPC :-)
> >
> >Funny, my interpretation is different: The MPC7455 seems comparable "in
> >GUI work" to an Intel part that is outdated, but it is not even close to
> >an Athlon or P4 one might find in a comparably priced PC available today.
>
> Er, did you see the results of investigations by various electronic
> journals etc.? The Pentium 4 is pretty dire at such use, and I did
> really mean that it is no better than the Pentium III despite its
> hight clock speed and Spec results.

Yes, I read many of those sites. What does it tell me? The P4 2.2 GHz is
about as fast as the Athlon 1.66 GHz, which is much faster than a 1 GHz
PIII. So, by transitive closure the Athlon 1.66 GHz and 2.2 GHz P4 are
still much faster than the G4. Now, there are exceptions to this when
AltiVec comes into play. If I understand the postings by Chris Cox of
Adobe correctly, on Photoshop the G4 > AthlonXP > P4 in terms of performance
looking across the top-off-the-line processors, with Athlon performance
close to the G4. I don't own Photoshop so I cannot confirm or refute this,
but Chris is quite knowledagable about performance issues and does a lot of
Photoshop performance optimization and benchmarking so I have no reason to
doubt him until I see evidence to the contrary.

My conclusion is this: Except for a few cases where AltiVec comes into
play, the G4 performance is not even close to current generation Athlons
and P4s. This doesn't come as a surprise to me. Looking at the G4 and
the system architecture, and taking the frequency into account I'd say
the G4 performs pretty much as expected, although I am a bit surprised
at the low floating-point performance.


-- Norbert

Bruce Hoult

unread,
Feb 27, 2002, 12:26:11 AM2/27/02
to
In article <796f488f.02022...@posting.google.com>,
q...@pobox.com (Paul Hsieh) wrote:

> nm...@cus.cam.ac.uk (Nick Maclaren) wrote:
> > "Norbert Juffa" <ju...@earthlink.net> writes:
> > |> "Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote:
> > |> > c't 5/2002, pages 182-183 publishes SPEC CPU2000 results for
> > |> > various
> > |> > Apple G4s and a Pentium III/1000 (all with 512MB RAM in
> > |> > single-user
> > |> > mode). Here they are:
> > |> >
> > |> > C compiler SPECint_base SPECfp_base
> > |> > G4 800 gcc 242 147
> > |> > G4 867 gcc 259 153
> > |> > G4 1000 gcc 306 187
> > |> > P3 1000 gcc 309
> > |> > P3 1000 MSC 236
>
> You know, I remember a time where I dared to suggest that the PPC
> arhitecture was highly unlikely to have higher IPC than the P6 or
> Athlon. Then one "Bruce Hoult" berated me for ignoring all the
> authoritative Apple demos that clearly showed otherwise. He was
> incredulous that I could possibly believe that the PPC did not have a
> significantly higher IPC.

Nice that I'm in your thoughts and that you remember how to spell my
name, but a pity that you don't recall that was *before* Motorola gave
in to the "MHz is everything" marketing people and redesigned the G4
with a longer pipeline and higher latency to cache.

Performance of the last of the old G4's -- the 533 MHz -- was much the
same as the next G4 at 733 MHz.

Given those implementation changes, you'd *expect* the new G4 to be
about the same IPC as the P3. And surprise, surprise, it is! Woopie.
The marketing department wins, the technical designers lose.


> Everything I've read in the past 5 years about GUI performance is that
> the x86 world is just plain faster. The recent sluggish Mac OS X 10
> debacle (which was apparently fixed with 10.1 patch) is a telling
> indicator of the serious performance problems of Apple's platform in
> advanced graphical environments.

I struggle to believe that even you can think that makes sense. OS X is
a new, unoptimized OS. Jobs had the Finder people, in particular,
rewriting the whole user interface right up to the day it shipped.
Trying to pin OS X 10.0.x poor performance on the CPU is ... well,
"cretinous" is too kind.

Try running comparable software on the two. Say, RedHat Linux vs
LinuxPPC. LinuxPPC is way faster.


> Apple secret photoshop test remains the only application where Apple
> can claim any kind of a lead and that's just because 1) Steve Jobs
> picks the filters of his choice (which have not been revealed) 2)
> Motorola added in the Photoshop acceleration instructions
> (euphemistically called "AltiVec")

Yuk yuk yuk

-- Bruce

Norbert Juffa

unread,
Feb 27, 2002, 12:27:10 AM2/27/02
to
Peter Boyle <pbo...@holyrood.ed.ac.uk> wrote in message news:<Pine.SOL.4.33.020226...@holyrood.ed.ac.uk>...

> On 26 Feb 2002, Skipper Smith wrote:
>
> Hi,
>
> > Wow, evil is an interesting value judgement to place on a design that
> > optimized for die size and integer performance over FP performance because
> > of its' target market. It issued one double-FP every two clocks ONLY in
>
> ;)
> Perhaps using an embedded optimized part in a desktop is what is evil.

I second that sentiment, and think it applies equally to the G4. I work
with G4 class processors in an embedded application (4-way non-symetrical
multi-processing) and they are great for that. I have looked multiple times
at the possibility of using x86 mobile parts instead and it is a no-fit
because of the power consumption and other short comings. The short pipeline
is very advantageous when dealing with lots of "unpredictable" branches. The
support for large L2s is also very helpful.

The only similarly suitable parts I am aware of for high performance embedded
applications are high end MIPS processors. x86 isn't even playing.


-- Norbert

Paul Hsieh

unread,
Feb 27, 2002, 12:27:54 AM2/27/02
to
nm...@cus.cam.ac.uk (Nick Maclaren) wrote:
> Norbert Juffa <ju...@acm.org> wrote:
> >While using same frequency parts and same version compilers seems most
> >fair, it does seem to have little practical relevance (IMHO). All these
> >results show is that architectural performance of a 1 GHz MPC7455 is
> >comparable to a 1 GHz PIII (at least for integer work).
>
> I would agree with the latter, but not the former! The former
> remark also applies to the choice of language - it isn't uncommon
> for different languages to be much better for different systems.

You're right. Its unfair to measure the PPC on niche languages like
"C". Perhaps you'd agree that we should measure the two running
AppleScript.

Give me a break. The fairest comparison is to run what developers use
on each platform. For Max OS X versus Windows XP that would be
Metrowerks versus MSVC++. According to that article, Metrowerks did
worse than gcc, and other articles by c't magazine indicate that
MSVC++ is actually better than gcc on the x86 for Spec CPU. Either
way, it doesn't look good for the poor PPC. If we're picking the best
compiler that is available to developers for the PPC for this test,
perhaps we could fair it up by using Intel's compiler for the x86.



> >What I would like to know is how do the "best" systems available today
> >compare.

But don't you care about the color of your case? Don't you want your
PC to look like a desklamp? Don't you want your computer to think
different??

> >I can now buy a 1.66 GHz Athlon or a 2.2 GHz P4, and more
> >modern compilers are available for the x86 platform. The SPEC results
> >for those x86 machines with the latest compilers I can get from the SPEC
> >site. But it wouldn't be fair to compare these results to the G4 results
> >above if there are compilers available to the end user that deliver much
> >higher performance than gcc 2.95.2. Thus my questions about the availa-
> >bility of other compilers. Maybe no such compilers are available and gcc
> >is the best one can get? If anybody has insight into this please let us
> >know.
>
> That is a reasonable question, but it is critical to say what your
> criteria are for "best". The criteria for an HPC system are VERY
> different from the criteria for a GUI handheld or even PC.

Yeah, that's why companies like Cyrix, Exponential, and system vendors
like Amiga, Atari, and Acorn have been thriving. Because they know
that performance is not a serious concern from consumers and ... oh
wait ...



> >> I read that as indicating that the G4 is comparable with Intel's
> >> chips for GUI and similar work, and singularly useless for HPC :-)
> >
> >Funny, my interpretation is different: The MPC7455 seems comparable "in
> >GUI work" to an Intel part that is outdated, but it is not even close to
> >an Athlon or P4 one might find in a comparably priced PC available today.
>
> Er, did you see the results of investigations by various electronic
> journals etc.? The Pentium 4 is pretty dire at such use, and I did
> really mean that it is no better than the Pentium III despite its
> hight clock speed and Spec results.

You obviously didn't read these articles very carefully. In general
they showed that the P-!!! 1Ghz was somewhat faster than the 1.3Ghz
P4, but only reached the performance of the 1.5Ghz P4 in a few
exceptional cases. Now, obviously this is indicative of the lower
architectural performance of the P4. But armed with the knowledge
that the P4 1.5Ghz is mostly faster than the P-!!! 1Ghz (both
outfitted by RDRAM, BTW), what can we say about a P4 @ 2.2Ghz with a
bigger L2 cache compared to say a P-!!! 1Ghz with PC133 (laugh) SDRAM?

(BTW, I hear Rambus demonstrated dual channel RDRAM at the IDF ...)



> >As for the floating-point performance, I wonder how much of that is the
> >compiler and how much is due to the FPU in the G4 or the memory subsystem?
> >SPECfp loves memory bandwidth and I think the Macintosh only uses PC133
> >SDRAM (versus DDR in Athlon systems and RDRAM in high-end P4 systems).
> >Absoft Fortran has OK performance on x86 as per www.polyhedron.co.uk, so
> >I have to assume that their PPC compiler isn't shabby either. But one can
> >ever be sure.
>
> I don't know, but it is a good question, and very relevant to the
> viability of certain research projects.

Personally, I don't see how investigating a machine that clocks in a
187 on SpecFP is relevant to anything. A quick survey of the
submitted results on spec.org show that it gets beat by every major
shipping architecure, and typically by more than 2x. They actually
get beat by MIPS (another "designed for embedded" architecture) and
the old UltraSparcs -- now that's pretty lame.

This pathetic chip is obsolete by design. Why can't we just accept
this and move on?

Douglas Siebert

unread,
Feb 27, 2002, 12:53:19 AM2/27/02
to
"Norbert Juffa" <ju...@earthlink.net> writes:

>Does anybody know whether Apple is planning to make an official SPEC
>submission? There got to be a reason SPEC2K was made compatible with
>the Macintosh platform (I think this did not happen until version 1.2?)


I imagine Apple is planning that for the G5. If the rumors about its
performance are true, it'll beat the Pentium 4 easily and give a run
for the money to POWER4 and 21364. Even if the rumors are overstated
it is certain to be far better than the problem-plagued G4, which was
a performance leader only in Steve Jobs' mind. I kind of want to see
the G5 top the SPEC list at introduction, if only out of curiousity to
see how overboard he is capable of going when he has something truly
impressive to crow about.

At any rate, testing PPC vs. x86 performance with gcc is sort of like
comparing 0-60 acceleration times in the rain.

--
Douglas Siebert dsie...@excisethis.khamsin.net

A good friend will help you move, a true friend will help you move a body.

Christopher Brian Colohan

unread,
Feb 27, 2002, 1:13:13 AM2/27/02
to
"JD" <dy...@jdyson.com> writes:

I tried out the Intel compiler when they released it for Linux, and I
was not so happy with it. It may produce fast code, but when I tried
to use it to compile my processor simulator (the one piece of code I
really care about), it either crapped out with an internal error or
produced code that dumped core as soon as it was run...

Perhaps either their C++ support is not super reliable, or the Linux
port is still a bit shakey. But for now I am sticking with gcc.

Chris
--
Chris Colohan Email: ch...@colohan.ca PGP: finger col...@cs.cmu.edu
Web: www.cs.cmu.edu/~colohan Phone: (412)268-4751

Rupert Pigott

unread,
Feb 27, 2002, 1:18:51 AM2/27/02
to
Bruce Hoult <br...@hoult.org> wrote in message
news:bruce-7AD27A....@news.paradise.net.nz...
[SNIP]

> I struggle to believe that even you can think that makes sense. OS X is
> a new, unoptimized OS. Jobs had the Finder people, in particular,

Cough cough cough...

Mach isn't new.
FreeBSD userland isn't new.

That amounts to a fair chunk of the OS, although to be fair
most people will care more about the virtual blinkenlicht
rendering code which may well be brand spanking new. I keep
hearing DPS type noises from OS-X though, and lemme see,
NeXT had a Mach/BSD core with a DPS rendering system.

[SNIP]

> Yuk yuk yuk

Keep laughing, we need cheering up while we chained to this
here wheel of reincarnation.

Cheers,
Rupert


Douglas Siebert

unread,
Feb 27, 2002, 1:29:06 AM2/27/02
to
q...@pobox.com (Paul Hsieh) writes:

>Intel announced the 1Ghz P-III in early March 2000 during their IDF
>conference, and actually shipped units to customers later that month.
>The Spec submission predated this announcement, and thus I think they
>were just trying to make sure people didn't get any advanced info on
>their roadmap. Norbert's estimation of 2 years is only off by about a
>week and a half.


They announced it, and shipped a few, but it was well known that there
were very few available until the end of that summer (it was referred
to by some as a "paper launch" which Intel was rushed into by AMD's
success at ramping the Athlon's clock speed)

Peter Boyle

unread,
Feb 27, 2002, 1:33:20 AM2/27/02
to

On Wed, 27 Feb 2002, Sander Vesik wrote:


> Actually, I think its 4*MHz unless they use MADD. Which gives quite high
> percentage of peak, otoh, it can move Altivec register contents around
> at the same time it is doing fp.


Oh, come on. Peak is peak. No excuses.

You can quote fp instruction issue rates, speed of light for the
particular instruction mix, but doctoring peak is like claiming
a loss is a loan.... no wait Enron... bugger.

Peter Boyle pbo...@physics.gla.ac.uk

Jeremy Fox

unread,
Feb 27, 2002, 1:33:30 AM2/27/02
to
Douglas Siebert <dsie...@excisethis.khamsin.net> wrote:
: "Norbert Juffa" <ju...@earthlink.net> writes:

: I imagine Apple is planning that for the G5. If the rumors about its


: performance are true, it'll beat the Pentium 4 easily and give a run
: for the money to POWER4 and 21364.

More credible recent rumors suggest that Motorola is readying new G4's
for the upcoming year. This G5 chip might exist but we have heard
precious little that's concrete about it so far.

http://www.theregister.co.uk/content/3/24018.html

Best G5 rumors so far are at

http://www.architosh.com/news/2002-02/2002b-0201-futurg5.phtml

which admittedly are pretty hype-prone.

--
------------------------
Jeremy T. Fox
jer...@stanford.edu

John S. Dyson

unread,
Feb 27, 2002, 1:42:27 AM2/27/02
to

"Christopher Brian Colohan" <colo...@cs.cmu.edu> wrote in message
news:uclofib...@gs138.sp.cs.cmu.edu...

> "JD" <dy...@jdyson.com> writes:
>
> > "Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
news:a5gjnf$1mo$1...@pegasus.csx.cam.ac.uk...
> > >
> > > The fact that Intel's compiler delivers 402/451 versus the 309 of
> > > gcc tells us more about the effort Intel puts into tuning its
> > > compiler for SPEC than anything else. Yes, a comparison with an
> > > equivalent compiler for the G4 would be interesting.
> > >
> > From what I have seen (and I use it alot), the Intel C compiler isn't
> > just good at Spec, but is good in general. GCC has a long way to
> > go in order to ferret out the performance that the ICC compiler
> > can do.
>
> I tried out the Intel compiler when they released it for Linux, and I
> was not so happy with it. It may produce fast code, but when I tried
> to use it to compile my processor simulator (the one piece of code I
> really care about), it either crapped out with an internal error or
> produced code that dumped core as soon as it was run...
>
> Perhaps either their C++ support is not super reliable, or the Linux
> port is still a bit shakey. But for now I am sticking with gcc.
>
I use the ICC on Win2000, and haven't played with the Linux version
of the compiler on FreeBSD, for example (where I'd use it.) I really
haven't had any real troubles with the ICC compiler on Win2000, but
know nothing about the Linux version.

The ICC compiler is especially useful on C++, where if there are no
side-effects with the full program optimization, alot of unneeded
manipulation can be expunged. It is also very good for DSP-style
code for multimedia apps. It does an excellent job at complementing
hand optimized code segments, where performance is critical.

It is unfortunate that the Linux version of the compiler hasn't worked
for you, because I have been considering using it.

John

Bruce Hoult

unread,
Feb 27, 2002, 1:45:37 AM2/27/02
to
In article <a5hto6$jtu$1...@knossos.btinternet.com>, "Rupert Pigott"
<Dark...@btinternet.com> wrote:

> Bruce Hoult <br...@hoult.org> wrote in message
> news:bruce-7AD27A....@news.paradise.net.nz...
> [SNIP]
> > I struggle to believe that even you can think that makes sense. OS X is
> > a new, unoptimized OS. Jobs had the Finder people, in particular,
>
> Cough cough cough...
>
> Mach isn't new.
> FreeBSD userland isn't new.
>
> That amounts to a fair chunk of the OS

Yes it does, and that part runs just fine. I've got a P3/1000 and a
G4/867 here and I'm spending most of my time developing and running Un*x
stuff and the G4 with OS X is noticably faster on CPU-intensive stuff.
If you do zillions of fork()s then it's not, but then that's expected
with Mach.


> although to be fair
> most people will care more about the virtual blinkenlicht
> rendering code which may well be brand spanking new. I keep
> hearing DPS type noises from OS-X though, and lemme see,
> NeXT had a Mach/BSD core with a DPS rendering system.

Except that OS X isn't DPS, but an entirely new rendering layer based on
PDF instead of PostScript, with I think no Adobe code in it since Adobe
wanted a per-copy amount of money for DPS that was reasonable for the
NeXT market of niche applications but too much for Apple's market.

-- Bruce

John S. Dyson

unread,
Feb 27, 2002, 1:45:40 AM2/27/02
to

"Rupert Pigott" <Dark...@btinternet.com> wrote in message
news:a5hto6$jtu$1...@knossos.btinternet.com...

> Bruce Hoult <br...@hoult.org> wrote in message
> news:bruce-7AD27A....@news.paradise.net.nz...
> [SNIP]
> > I struggle to believe that even you can think that makes sense. OS X is
> > a new, unoptimized OS. Jobs had the Finder people, in particular,
>
> Cough cough cough...
>
> Mach isn't new.
> FreeBSD userland isn't new.
>
The vast majority of code that runs in a GUI system is the GUI code. Basic,
non GUI OS code shouldn't really use a majority of the CPU in most cases.
There are cases where OS code is predominant, but it is often in massive
file transfers or intense raw networking.

If you take the highly performant full FreeBSD kernel, and then put a slow
GUI on it, then it will run slowly.

John

Bruce Hoult

unread,
Feb 27, 2002, 1:50:30 AM2/27/02
to
In article <a5hs8e$e7r$1...@sword.avalon.net>,
dsie...@excisethis.khamsin.net (Douglas Siebert) wrote:

> At any rate, testing PPC vs. x86 performance with gcc is sort of like
> comparing 0-60 acceleration times in the rain.

Funny you should mention that, since they're essentially identical in my
car, at least. (about 8.8 seconds -- it's a family station wagaon, not a
performance car)

-- Bruce

Skipper Smith

unread,
Feb 27, 2002, 2:07:25 AM2/27/02
to
Peter Boyle <pbo...@holyrood.ed.ac.uk> wrote:
>
>On 26 Feb 2002, Skipper Smith wrote:
>
>Hi,
>
>> Wow, evil is an interesting value judgement to place on a design that
>> optimized for die size and integer performance over FP performance because
>> of its' target market. It issued one double-FP every two clocks ONLY in
>
>;)
>Perhaps using an embedded optimized part in a desktop is what is evil.
>
>Anyway, I'll resist the temptation to religiously chant that single cycle
>throughput is the only clean issue rate, instead say that while it is at
>least easy write a scheduler for pure multiplies, the 5 of 7 is pretty
>hard to schedule for, and then remember you have both the multiplies, adds
>(and fmadds) in the same pipeline.

All it would have cost was a pipeline that was one or two stages longer
(or accepted additions to other parts of the chip), along with a 15% or so
larger FPU. That wasn't acceptable. As it was, even other unintended
compromises were permitted in the design of the 603e. Look at the
performance of an fres on the ORIGINAL 603e.

>It's fairly hard to know what optimal performance for a given mix, let
>alone how to get it! Seriously, would designing a fully pipelined d.p.
>fpu have added that much fractional chip area and power?

Yes, it would have, given the defined constraints. People who wanted
high-performance FP were directed to the 604, which WAS able to do
single-cycle dispatching. The 604 burned about 4x the watts (granted,
only part of that was due to the FPU) and was a significanly larger die
(again, only partly the fault of

>> Yes, the G4 does much better because it expanded on most of the issues
>> that caused the FP unit pressures- dp multiplier and more renames. There
>
>Glad to hear it!

Oh, and I forgot to mention TRUE reservation stations, not the fake things
the 603(e) called reservation stations.

>> is still an issue related to the distance of the pipeline from the
>> dispatcher, though, IIRC.
>
>I'm curious - what is it?

The pipeline was moved further from the dispatcher to give priority to
Altivec. This leads to some signaling issues when the pipeline fills up
(how can the dispatcher dispatch an instruction when it can't get a signal
early enough in the clock cycle to let it know that space will be
available- reservation stations help a lot with this, though).

--
Skipper Smith Helpful Knowledge Consulting
Worldwide Microprocessor Architecture Training
PowerPC, ColdFire, 68K, CPU32 Hardware and Software

/* Remove no-spam. from the reply address to send mail directly */

Rupert Pigott

unread,
Feb 27, 2002, 3:28:38 AM2/27/02
to
Bruce Hoult <br...@hoult.org> wrote in message
news:bruce-937E37....@news.paradise.net.nz...

> In article <a5hto6$jtu$1...@knossos.btinternet.com>, "Rupert Pigott"
> <Dark...@btinternet.com> wrote:
[SNIPPETY SNIP]

> > although to be fair
> > most people will care more about the virtual blinkenlicht
> > rendering code which may well be brand spanking new. I keep
> > hearing DPS type noises from OS-X though, and lemme see,
> > NeXT had a Mach/BSD core with a DPS rendering system.
>
> Except that OS X isn't DPS, but an entirely new rendering layer based on
> PDF instead of PostScript, with I think no Adobe code in it since Adobe
> wanted a per-copy amount of money for DPS that was reasonable for the
> NeXT market of niche applications but too much for Apple's market.

Take note of "DPS type noises" :P

I'm pretty sure I've stumbled over some free DPS implementation
recently. GNUStep related perhaps.

Cheers,
Rupert

Nick Maclaren

unread,
Feb 27, 2002, 4:55:26 AM2/27/02
to
In article <cf4fab37.02022...@posting.google.com>,
Norbert Juffa <ju...@acm.org> wrote:
>nm...@cus.cam.ac.uk (Nick Maclaren) wrote in message news:<a5h629$kd0$1...@pegasus.csx.cam.ac.uk>...
>> In article <cf4fab37.02022...@posting.google.com>,
>> Norbert Juffa <ju...@acm.org> wrote:
>[...]
>> >> I read that as indicating that the G4 is comparable with Intel's
>> >> chips for GUI and similar work, and singularly useless for HPC :-)
>> >
>> >Funny, my interpretation is different: The MPC7455 seems comparable "in
>> >GUI work" to an Intel part that is outdated, but it is not even close to
>> >an Athlon or P4 one might find in a comparably priced PC available today.
>>
>> Er, did you see the results of investigations by various electronic
>> journals etc.? The Pentium 4 is pretty dire at such use, and I did
>> really mean that it is no better than the Pentium III despite its
>> hight clock speed and Spec results.
>
>Yes, I read many of those sites. What does it tell me? The P4 2.2 GHz is
>about as fast as the Athlon 1.66 GHz, which is much faster than a 1 GHz
>PIII. So, by transitive closure the Athlon 1.66 GHz and 2.2 GHz P4 are
>still much faster than the G4. ...

Er, no. Some said that, but rather more indicated that a 1.7 GHz
Pentium 4 was comparable with a 733 MHz Pentium III. Now, things
HAVE changed as systems have been tuned for the Pentium 4, but it
is still the case that a LOT of important code is shipped as a
single binary - for very good reasons, if you know the third-party
software business.

Note that I said "comparable" - not "identical" - getting excited
over 10-20% improvements isn't worth the effort. The G4's posted
SpecFP is a factor of 4 down on the Pentium 4, which is a different
matter.

Nick Maclaren

unread,
Feb 27, 2002, 5:06:24 AM2/27/02
to
In article <796f488f.02022...@posting.google.com>,

Paul Hsieh <q...@pobox.com> wrote:
>nm...@cus.cam.ac.uk (Nick Maclaren) wrote:
>> Norbert Juffa <ju...@acm.org> wrote:
>> >While using same frequency parts and same version compilers seems most
>> >fair, it does seem to have little practical relevance (IMHO). All these
>> >results show is that architectural performance of a 1 GHz MPC7455 is
>> >comparable to a 1 GHz PIII (at least for integer work).
>>
>> I would agree with the latter, but not the former! The former
>> remark also applies to the choice of language - it isn't uncommon
>> for different languages to be much better for different systems.
>
>You're right. Its unfair to measure the PPC on niche languages like
>"C". Perhaps you'd agree that we should measure the two running
>AppleScript.

You are being obtuse. A lot of the flame wars over Cray vector versus
commodity RISC missed the point that the former was best programmed
in Fortran and the latter often in C. The Cray vector systems were
dire when running C.

>Give me a break. The fairest comparison is to run what developers use
>on each platform. For Max OS X versus Windows XP that would be
>Metrowerks versus MSVC++. According to that article, Metrowerks did
>worse than gcc, and other articles by c't magazine indicate that
>MSVC++ is actually better than gcc on the x86 for Spec CPU. Either
>way, it doesn't look good for the poor PPC. If we're picking the best
>compiler that is available to developers for the PPC for this test,
>perhaps we could fair it up by using Intel's compiler for the x86.

No, that tell's you what the fastest SYSTEM for your purpose is,
which is what most users want to know. BUt it tells you very little
about what is the fastest machine (AS a machine), because the
difference could be entirely in the software.

>Personally, I don't see how investigating a machine that clocks in a
>187 on SpecFP is relevant to anything. A quick survey of the
>submitted results on spec.org show that it gets beat by every major
>shipping architecure, and typically by more than 2x. They actually
>get beat by MIPS (another "designed for embedded" architecture) and
>the old UltraSparcs -- now that's pretty lame.

Doubtless you don't. But perhaps you don't read German and aren't
familiar with the Apple and its users. That article clearly stated
that the SpecFP was dire, but that it was essentially irrelevant to
most Apple users. All that is true.

>This pathetic chip is obsolete by design. Why can't we just accept
>this and move on?

I don't know of ANY current general-purpose chip that isn't obsolete
by design, and there isn't any particular reason to believe that the
G4 is more obsolete than any other. Even if it WERE to underperform
on its target applications by a factor of two, that isn't the only
criterion.

Holger Bettag

unread,
Feb 27, 2002, 5:08:31 AM2/27/02
to
Sander Vesik <san...@haldjas.folklore.ee> writes:

For a matrix multiplication, the bulk of flops should be fused multiply-adds,
thus peak is indeed 8 flops per MHz. The 2nd issue slot per clock can not
only be used for data movement, but also for fixed point operations, some
of which are of use for fp (e.g. sign changes). A 3rd issue slot is available
for loads and stores. The low percentage of peak might be due to a mismatch
between bandwidth and number crunching capability.

Holger

Nick Maclaren

unread,
Feb 27, 2002, 5:08:43 AM2/27/02
to
In article <a5hubi$eph$1...@sword.avalon.net>,

Douglas Siebert <dsie...@excisethis.khamsin.net> wrote:
>q...@pobox.com (Paul Hsieh) writes:
>
>>Intel announced the 1Ghz P-III in early March 2000 during their IDF
>>conference, and actually shipped units to customers later that month.
>>The Spec submission predated this announcement, and thus I think they
>>were just trying to make sure people didn't get any advanced info on
>>their roadmap. Norbert's estimation of 2 years is only off by about a
>>week and a half.
>
>They announced it, and shipped a few, but it was well known that there
>were very few available until the end of that summer (it was referred
>to by some as a "paper launch" which Intel was rushed into by AMD's
>success at ramping the Athlon's clock speed)

And it didn't really hit the market in numbers for another 6 months.
I can't remember if it was another example of a faster chip not
being much use until the motherboard was upgraded - certainly, that
would account for Intel's dates on the Spec submissions.

Holger Bettag

unread,
Feb 27, 2002, 5:18:22 AM2/27/02
to
ju...@acm.org (Norbert Juffa) writes:

[...]


> My conclusion is this: Except for a few cases where AltiVec comes into
> play, the G4 performance is not even close to current generation Athlons
> and P4s.

Correct.

> This doesn't come as a surprise to me. Looking at the G4 and
> the system architecture, and taking the frequency into account I'd say
> the G4 performs pretty much as expected, although I am a bit surprised
> at the low floating-point performance.
>

So am I; the FPU is actually quite nice.

With regard to AltiVec coming into play in few cases, I believe few people
ever thought about wether this is a matter of applicability of AltiVec.

Outside of Apple, there is no (AltiVec-)software market visible to the
general public. And inside Apple's pond, there is not enough money to be
made to justify big tuning efforts. Apple would need to give away licenses
of, say, VAST/AltiVec (an auto-vectorizer) for free if they wanted to make
a difference.

AltiVec itself can well make a difference, look at www.distributed.net for
a non-Photoshop example.

Holger

Sander Vesik

unread,
Feb 27, 2002, 6:28:02 AM2/27/02
to
Peter Boyle <pbo...@holyrood.ed.ac.uk> wrote:
>
> On Wed, 27 Feb 2002, Sander Vesik wrote:
>
>
>> Actually, I think its 4*MHz unless they use MADD. Which gives quite high
>> percentage of peak, otoh, it can move Altivec register contents around
>> at the same time it is doing fp.
>
>
> Oh, come on. Peak is peak. No excuses.

Yes, peak is peak - they achieved 3.59 with atlas which is almost 90% of
peak (assuming peak is 4 * Mhz).

>
> You can quote fp instruction issue rates, speed of light for the
> particular instruction mix, but doctoring peak is like claiming
> a loss is a loan.... no wait Enron... bugger.

I'm not doctoring. the issue rate is 1 altivec fp instruction/cycle
and that instruction can operate on 4 fp values in parallel (SIMD).
Which IMHO comes down to MFLOPS of 4 * MHz. If they did use
fused mul-add and a mull-add counts as 2 flops for some reason,
then yes, its 8 * Mhz. Except the article didn't say which was the
case.

>
> Peter Boyle pbo...@physics.gla.ac.uk

Mark Horsburgh

unread,
Feb 27, 2002, 6:54:42 AM2/27/02
to
On 26 Feb 2002 20:45:25 -0800, Paul Hsieh <q...@pobox.com> wrote:
<<< Rant snipped >>>

> Uhh ... the latest and greatest gcc is version 3.0, its come out
> recently, its a beta-quality compiler and is only currently available
> for the x86 afaik. By going back to 2.95 I think they were making the
> maximum possible concession to Apple by going with the latest compiler
> available for the PPC and trying to match it up as close as possible
> to something comparable for the x86. In otherwords, its state of the
> art for PPC, and ancient for the x86 (not the other way around.)
<<< More ranting snipped>>>

I'm going to ignore the rest of the rant, because it is more or less
just that. However, this section is just completely _made up_! You are
correct in saying that the latest and greatest gcc is 3.0(.4). However,
it targets 39 architectures including PPC and is is by no means a
beta-quality compiler. Therefore, gcc-3.0 is the latest version for PPC
as well. However, if people are concerned about the stability of a 6
month old compiler (gcc-3.0 was released in June 2001), then 2.95 is the
correct fall back position. I also completely fail to see how 2.95 is
ancient for x86 since that is what _all_ current x86 Linux distributions
are using (apart from crap ones like RedHat). Besides, if it's ancient
for x86 then it's ancient for PPC - end of story.

You're in a completely unassailable position when you assert that x86
processors (AMD or Intel) are faster than Motorola ones for almost all
workloads. I fail to see how making things up will help your case - the
truth is _quite_ strong enough.

Mark

Bernd Paysan

unread,
Feb 27, 2002, 6:46:28 AM2/27/02
to
Jeremy Fox wrote:
> More credible recent rumors suggest that Motorola is readying new
> G4's for the upcoming year. This G5 chip might exist but we have
> heard precious little that's concrete about it so far.
>
> http://www.theregister.co.uk/content/3/24018.html
>
> Best G5 rumors so far are at
>
> http://www.architosh.com/news/2002-02/2002b-0201-futurg5.phtml
>
> which admittedly are pretty hype-prone.

On the other hand, the Clawhammer has already been demoed yesterday
with Windows XP and SuSE Linux for x86-64, see

http://www.extremetech.com/article/0,3396,s=201&a=23304,00.asp
http://www.tomshardware.de/cpu/02q1/020227/index.html (lots of
photos, text in German, the american version of tomshardware doesn't
have that article on line yet).

Tomshardware claims that the Windows XP was a 64 bit version, while
Fred Weber didn't give a definitive statement, just that the
availability of Linux for x86-64 will put Microsoft to shame if they
don't follow.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Erik Corry

unread,
Feb 27, 2002, 7:13:37 AM2/27/02
to
Mark Horsburgh <M.K.Ho...@damtp.cam.ac.uk> wrote:
> On 26 Feb 2002 20:45:25 -0800, Paul Hsieh <q...@pobox.com> wrote:
> <<< Rant snipped >>>
>> Uhh ... the latest and greatest gcc is version 3.0, its come out
>> recently, its a beta-quality compiler and is only currently available
>> for the x86 afaik. By going back to 2.95 I think they were making the
>> maximum possible concession to Apple by going with the latest compiler
>> available for the PPC and trying to match it up as close as possible
>> to something comparable for the x86. In otherwords, its state of the
>> art for PPC, and ancient for the x86 (not the other way around.)
> <<< More ranting snipped>>>

> I'm going to ignore the rest of the rant, because it is more or less
> just that. However, this section is just completely _made up_! You are
> correct in saying that the latest and greatest gcc is 3.0(.4). However,
> it targets 39 architectures including PPC and is is by no means a
> beta-quality compiler.

The reason c't used 2.95.3 for MacOS is because that is what
you get in the official developers' toolkit from Apple. I
guess they used the same compiler for x86 to be 'fair'.

--
Erik Corry er...@arbat.com
Interviewer: "Real programmers use cat as their editor."
Bill Joy: "That's right! There you go! It is too much trouble to say ed,
because cat's smaller and only needs two pages of memory."

Erik Corry

unread,
Feb 27, 2002, 7:25:32 AM2/27/02
to
Bernd Paysan <bernd....@gmx.de> wrote:

> http://www.extremetech.com/article/0,3396,s=201&a=23304,00.asp
> http://www.tomshardware.de/cpu/02q1/020227/index.html (lots of
> photos, text in German, the american version of tomshardware doesn't
> have that article on line yet).

> Tomshardware claims that the Windows XP was a 64 bit version, while
> Fred Weber didn't give a definitive statement, just that the

According to all other sources than Tom's, the Windows version was
32 bit Windows. That includes AMD themselves.

http://www.amd.com/us-en/Corporate/VirtualPressRoom/0,,51_104_543~15118,00.html

> availability of Linux for x86-64 will put Microsoft to shame if they
> don't follow.

That is certainly true, though it was a little daring of an AMD
spokesman to point it out.

Congratulations to AMD and to the Linux-for-x86-64 team who
apparently did an entire port without a single chip to run it
on!

Nick Maclaren

unread,
Feb 27, 2002, 10:07:53 AM2/27/02
to

In article <W2Qe8.18232$ZC3.1...@newsread2.prod.itd.earthlink.net>,
"Norbert Juffa" <ju...@earthlink.net> writes:
|>
|> ... So the comparison seems to be between a 2 year old x86 CPU
|> (a 1 GHz Pentium III) ...

I missed a trick here :-) I hadn't read Fister's presentation
at the IDF that is just going on.

To show the advantages of the Pentium 4 with 512 KB of cache,
he compared it with a 1 GHz Pentium III with 256 KB.

But, to show the advantages of the Itanic, he used an 800 MHz
Pentium III (and a 733 Itanic, to be fair)! The Sun comparison
has already been publicised, so I shall ignore it.

Well, if these G4 people were comparing against an obsolete
CPU, what was Fister doing?

Peter Boyle

unread,
Feb 27, 2002, 10:39:21 AM2/27/02
to

On Wed, 27 Feb 2002, Sander Vesik wrote:

> > You can quote fp instruction issue rates, speed of light for the
> > particular instruction mix, but doctoring peak is like claiming
> > a loss is a loan.... no wait Enron... bugger.
>
> I'm not doctoring. the issue rate is 1 altivec fp instruction/cycle
> and that instruction can operate on 4 fp values in parallel (SIMD).
> Which IMHO comes down to MFLOPS of 4 * MHz. If they did use
> fused mul-add and a mull-add counts as 2 flops for some reason,
> then yes, its 8 * Mhz. Except the article didn't say which was the
> case.

I doubt Motorola marketing would agree with your view ;)

It doesn't matter what they used, there were 8 flops per clock
available, and that defines peak.

I'm curious why merging a multiply and add into one
instruction counts as 1 flop, but merging four adds into
one instruction counts as 4?

Peter

> >
> > Peter Boyle pbo...@physics.gla.ac.uk
> >
> >
> >
>
> --
> Sander
>
> +++ Out of cheese error +++
>

Peter Boyle pbo...@physics.gla.ac.uk

Douglas Siebert

unread,
Feb 27, 2002, 11:13:39 AM2/27/02
to
Bernd Paysan <bernd....@gmx.de> writes:

>On the other hand, the Clawhammer has already been demoed yesterday
>with Windows XP and SuSE Linux for x86-64, see

>http://www.extremetech.com/article/0,3396,s=201&a=23304,00.asp
>http://www.tomshardware.de/cpu/02q1/020227/index.html (lots of
>photos, text in German, the american version of tomshardware doesn't
>have that article on line yet).

>Tomshardware claims that the Windows XP was a 64 bit version, while
>Fred Weber didn't give a definitive statement, just that the
>availability of Linux for x86-64 will put Microsoft to shame if they
>don't follow.


While I have little doubt that MS has a 64 bit port of Windows XP for
x86-64, I'd be very surprised if that's what was demoed. There's just
no reason for Microsoft to unnecessarily piss off Intel. If Microsoft
wanted to go out of their way to do that (for whatever reason) they'd
wait until Intel does a big showy announcement of McKinley, and do a
Clawhammer demo with AMD running on 64 bit Windows at that time. If
they don't want to piss Intel off, x86-64 Windows won't be known about
until Clawhammer is ready to ship.

Douglas Siebert

unread,
Feb 27, 2002, 11:27:27 AM2/27/02
to
Erik Corry <er...@arbat.com> writes:

>The reason c't used 2.95.3 for MacOS is because that is what
>you get in the official developers' toolkit from Apple. I
>guess they used the same compiler for x86 to be 'fair'.


Yikes, Apple actually uses gcc for their native compiler? Talk about
handicapping yourself in a race. I know Jobs wanted to reinvent NeXT,
but that's just ridiculous!

I remember in the early PPC days, the way to get the best performance
on the Mac was to compile on an RS/6000. IBM's compiler's produced
much better code than the stuff Apple's developer compiler did. All
the big name software (Photoshop, etc.) was done this way, and AFAIK
MacOS itself. They made the object format compatible for that reason
(among others, I'm sure) I wonder if OS/X has the same object file
format, if so it should still be possible to compile code on an RS/6000
(using the Mac libraries and include files, not IBM's, obviously)

Norbert Juffa

unread,
Feb 27, 2002, 11:27:55 AM2/27/02
to
"Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
news:a5iaee$nja$1...@pegasus.csx.cam.ac.uk...

> In article <cf4fab37.02022...@posting.google.com>,
> Norbert Juffa <ju...@acm.org> wrote:
[...]
> >Yes, I read many of those sites. What does it tell me? The P4 2.2 GHz is
> >about as fast as the Athlon 1.66 GHz, which is much faster than a 1 GHz
> >PIII. So, by transitive closure the Athlon 1.66 GHz and 2.2 GHz P4 are
> >still much faster than the G4. ...
>
> Er, no. Some said that, but rather more indicated that a 1.7 GHz
> Pentium 4 was comparable with a 733 MHz Pentium III. Now, things

This seems like an isolated incident to me. I would be interested to know
which application demonstrates that P4 @ 1.7GHz == PIII @ 733 MHz.
I am not doubting that at least one such application exists. If you could
provide a link I would appreciate it.

I have seen results that indicate that the latest PIIIs with 1.4 GHz and
512K L2
can get close to a P4 2 GHz on some "GUI applications". But doubling the L2
cache alone can make for a nice boost in some of the apps.

-- Norbert


Peter Boyle

unread,
Feb 27, 2002, 11:37:58 AM2/27/02
to

On 26 Feb 2002, Skipper Smith wrote:


Hi,

> All it would have cost was a pipeline that was one or two stages longer


> (or accepted additions to other parts of the chip), along with a 15% or so
> larger FPU. That wasn't acceptable. As it was, even other unintended

I would guess thats an incremental 1.5-3.0 % on the whole die.
Not sure Apple would agree, but I guess it IBM and Moto decided
they didn't care about fp performance with this chip :(

> Yes, it would have, given the defined constraints. People who wanted
> high-performance FP were directed to the 604, which WAS able to do
> single-cycle dispatching. The 604 burned about 4x the watts (granted,
> only part of that was due to the FPU) and was a significanly larger die

> (again, only partly the fault of)

The 604 had an entirely different life span: releases
in '94 - '97, and the 750 from '97 to today.
According to the Apple marketing blurb, the 750 was brought in to
replace the 604 (and indeed had vastly better integer perf, but
all the improved cache/bus structure bought it on FP was approximate
parity with the 604).

It is rather amusing to note that the G3 and G4 at 475 Mhz obtain
identical spec95(int/fp) of 21.4/13.8 and 21.4/20.4 respectively!

> The pipeline was moved further from the dispatcher to give priority to
> Altivec. This leads to some signaling issues when the pipeline fills up
> (how can the dispatcher dispatch an instruction when it can't get a signal
> early enough in the clock cycle to let it know that space will be
> available- reservation stations help a lot with this, though).

Presumably this is related to the fp latency being 4 and the altivec
fp latency 3 on the later G4's?

Peter

> --
> Skipper Smith Helpful Knowledge Consulting
> Worldwide Microprocessor Architecture Training
> PowerPC, ColdFire, 68K, CPU32 Hardware and Software
>
> /* Remove no-spam. from the reply address to send mail directly */
>

Peter Boyle pbo...@physics.gla.ac.uk

Norbert Juffa

unread,
Feb 27, 2002, 11:44:22 AM2/27/02
to

"Rupert Pigott" <Dark...@btinternet.com> wrote in message
news:a5h8tr$iqd$1...@paris.btinternet.com...
>
> Norbert Juffa <ju...@acm.org> wrote in message
> news:cf4fab37.02022...@posting.google.com...
> [SNIP]

> > While using same frequency parts and same version compilers seems most
> > fair, it does seem to have little practical relevance (IMHO). All these
> > results show is that architectural performance of a 1 GHz MPC7455 is
> > comparable to a 1 GHz PIII (at least for integer work).
>
> There are other considerations of course... Power consumption
> and cost of the part. These have implications kinds of system
> you can build and price points of them. If I was sticking a
> few hundred chips into a wardrobe, I'd want something which
> gave decent performance, but had low external component count
> and didn't burn megawatts.

I fully agree. Different strokes for different folks as they say. And as I
have said elsewhere in this thread the G4 processors make excellent
embedded parts (at least for the stuff I am working on).

However, in discussing SPEC performance we are implicitly discussing
the realm of high-end PCs, workstations, and the like (although Apple
marketing would make you believe we are talking about supercomputers
in this particular thread). And in that particular context, the G4 turns out
to be an also-ran. I don't find this particularly surprising, but it's nice
to
finally see a decent cross-platform benchmark comparison instead of
comparisons based on some Photoshop filters tweaked to the hilt using
AltiVec. At least with SPEC everybody has a chance to scrutinize the
benchmark programs themselves.


> PIII's certainly weren't designed with as much emphasis on
> low-cost embedded systems as MPC7455's.

I would be surprised to learn that they were designed with _any_ emphasis
on the needs of the embedded market.

-- Norbert

Sander Vesik

unread,
Feb 27, 2002, 11:49:58 AM2/27/02
to
Peter Boyle <pbo...@holyrood.ed.ac.uk> wrote:
>
> On Wed, 27 Feb 2002, Sander Vesik wrote:
>
>> > You can quote fp instruction issue rates, speed of light for the
>> > particular instruction mix, but doctoring peak is like claiming
>> > a loss is a loan.... no wait Enron... bugger.
>>
>> I'm not doctoring. the issue rate is 1 altivec fp instruction/cycle
>> and that instruction can operate on 4 fp values in parallel (SIMD).
>> Which IMHO comes down to MFLOPS of 4 * MHz. If they did use
>> fused mul-add and a mull-add counts as 2 flops for some reason,
>> then yes, its 8 * Mhz. Except the article didn't say which was the
>> case.
>
> I doubt Motorola marketing would agree with your view ;)
>
> It doesn't matter what they used, there were 8 flops per clock
> available, and that defines peak.
>
> I'm curious why merging a multiply and add into one
> instruction counts as 1 flop, but merging four adds into
> one instruction counts as 4?

because that mul-add may not be in use (rounding issues). And also,
four adds are *not* really merged into one instruction - all the adds
are independent of each other.

architectures that have mul-add or similar have two "peak" rates, as
using the mul-add is not possible in all cases (and possibly never,
in some environments) while the 4xMHz rate is always the peak, the other
may just not be applicable.

Nick Maclaren

unread,
Feb 27, 2002, 11:52:20 AM2/27/02
to

In article <fO7f8.20221$0C1.1...@newsread1.prod.itd.earthlink.net>,

"Norbert Juffa" <ju...@earthlink.net> writes:
|> "Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
|> news:a5iaee$nja$1...@pegasus.csx.cam.ac.uk...
|>
|> > >Yes, I read many of those sites. What does it tell me? The P4 2.2 GHz is
|> > >about as fast as the Athlon 1.66 GHz, which is much faster than a 1 GHz
|> > >PIII. So, by transitive closure the Athlon 1.66 GHz and 2.2 GHz P4 are
|> > >still much faster than the G4. ...
|> >
|> > Er, no. Some said that, but rather more indicated that a 1.7 GHz
|> > Pentium 4 was comparable with a 733 MHz Pentium III. Now, things
|>
|> This seems like an isolated incident to me. I would be interested to know
|> which application demonstrates that P4 @ 1.7GHz == PIII @ 733 MHz.
|> I am not doubting that at least one such application exists. If you could
|> provide a link I would appreciate it.

It was quite a few of them, actually. I can't provide a reference
offhand, but look at the ones that appeared shortly after release.
The common property of the applications was that they were wildly
branching spaghetti C (which a huge number of GUIs are), so cost of
a Pentium 4 pipeline drain was (and is) a serious matter.

|> I have seen results that indicate that the latest PIIIs with 1.4 GHz and
|> 512K L2
|> can get close to a P4 2 GHz on some "GUI applications". But doubling the L2
|> cache alone can make for a nice boost in some of the apps.

That is the other factor. No Pentium 4 below 2 GHz has a 512 KB
cache. I can't tell you which of the numerous Pentium III cache
approaches is best for this purpose, or was used in the comparisons.
Obviously, the 512 KB of the Pentium 4 2.2 may give a significant
advantage over 256 KB or a slower cache.


I rather tend to favour the belief that comparing CPU speeds is
a waste of time. All you can compare without significant effort
is system speeds, and the CPU is only one component. But a lot
of people do it, and the G4 bunch were trying to.

Also, this viewpoint implies that any talk of 'obsolete designs'
is nonsense - which brings back the thread of how fast would a
68020 run if reimplemented with modern process technology and a
large cache. It isn't obvious.

Norbert Juffa

unread,
Feb 27, 2002, 12:03:32 PM2/27/02
to

"Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote in message
news:a5grb5$9lr$1...@news.tuwien.ac.at...
> In article <W2Qe8.18232$ZC3.1...@newsread2.prod.itd.earthlink.net>,
> "Norbert Juffa" <ju...@earthlink.net> writes:
[...]
> >I read the German article pointed to by Julian and one thing that is
pointed
> >out there that wasn't clear to me is that gcc 2.95.3 for x86 Linux is
from
> >1999. So the comparison seems to be between a 2 year old x86 CPU
> >using a 3 year old compiler versus the latest and greatest from Apple?
>
> They used gcc-2.95.2 for the G4. Of course you are right wrt the
> CPUs.
>
> As for gcc-2.95 vs. gcc-3.0, you can find SPEC95 results at
>
> http://www.complang.tuwien.ac.at/anton/spec95/athlon-800/CINT95.041.asc
>
http://www.complang.tuwien.ac.at/anton/spec95/athlon-800/gcc-3.0/CINT95.051.
asc
>
> 26.8 for gcc-2.95.1 and 27.5 for gcc-3.0 on the same machine
> (Athlon-800).


Thanks for the data (data is so much better than conjecture!). I wonder
whether there
might be bigger differences on the high-end Athlon/P4s? The c't magazin
article linked
by Julian mentions that "using gcc 3.0 or the Intel compiler, SPEC runs 10%
to 20%
faster" as compared to executable produced with gcc 2.95.3 , which seems to
be a much
bigger difference than shown by your data set.

> I found it interesting that gcc outperformed both proprietary
> compilers in this comparison.

When is the last time Microsoft cared much for performance in their C
compiler? I
think it was around 1996-1997 when they de-throned Watcom as the compiler
with
the fastest executables. The only C/C++ compiler on x86 singularly tuned for
performance seems to be Intel C/C++. Unfortunately, reports surface from
time to
time that it is still not very robust for big, ambitious projects. It works
fine for the
stuff I do.

It is interesting to note that there is little competition in C/C++
compilers. It's basically
a monoculture with MSVC dominating the Windows world and gcc dominating the
Linux world. I think some competition would be good for end users. There is
much
more competition so far in the Fortran compiler market on both Linux and
Windows
(4-5 players each) which I really appreciate. Not surprisingly some nice
improvements
in the quality of optimizations have taken place over the past few years

-- Norbert

Norbert Juffa

unread,
Feb 27, 2002, 12:13:16 PM2/27/02
to

"Jeremy Fox" <jer...@Stanford.EDU> wrote in message
news:a5hujq$lvi$1...@usenet.Stanford.EDU...
> Douglas Siebert <dsie...@excisethis.khamsin.net> wrote:
> : "Norbert Juffa" <ju...@earthlink.net> writes:
>
> : I imagine Apple is planning that for the G5. If the rumors about its
> : performance are true, it'll beat the Pentium 4 easily and give a run
> : for the money to POWER4 and 21364.

Ah, I did not write this. I think this part should be attributed to Douglas?

Anyhow I would be highly surprised if the G5 would give P4/Athlons/
Hammers/Alphas etc a run for the money at introduction. It's not like
the development of these platforms is standing still.

I have to admit that some of the G5 rumor pages are very well made up.
One was so good I almost feel for it (luckily the folks at aceshardware
set me straight before that could happen). It should be of interest that
rumors about the G5 have been around for years. I think I found old
references using Google going back to 1999 or earlier.

-- Norbert


Norbert Juffa

unread,
Feb 27, 2002, 12:23:29 PM2/27/02
to

"Nick Maclaren" <nm...@cus.cam.ac.uk> wrote in message
news:a5iso9$eh5$1...@pegasus.csx.cam.ac.uk...

>
> In article <W2Qe8.18232$ZC3.1...@newsread2.prod.itd.earthlink.net>,
> "Norbert Juffa" <ju...@earthlink.net> writes:
> |>
> |> ... So the comparison seems to be between a 2 year old x86 CPU
> |> (a 1 GHz Pentium III) ...
>
> I missed a trick here :-) I hadn't read Fister's presentation
> at the IDF that is just going on.
>
> To show the advantages of the Pentium 4 with 512 KB of cache,
> he compared it with a 1 GHz Pentium III with 256 KB.
>
> But, to show the advantages of the Itanic, he used an 800 MHz
> Pentium III (and a 733 Itanic, to be fair)! The Sun comparison
> has already been publicised, so I shall ignore it.
>
> Well, if these G4 people were comparing against an obsolete
> CPU, what was Fister doing?

No doubt Intel had some marketing in mind. I don't think they
were trying to present an "honest" comparison of performance
between the two architectures using two parts built in the same
process (which would have them compare a 1.4GHZ PIII w/
512KB cache to a 2.2GHZ P4 w/ 512KB cache). What was
the context of the comparison?

I could imagine that Intel is disappointed that people are moving
to P4 more slowly than they had hoped. So in order to make P4
look good, they trashed the old PIII.

Seems to me they have done this before. Didn't they have a "4 is
better than 3" campaign when they switched from 386 to 486?

I will also point out that these cases are not comparable, as the
G4 SPEC comparison was done by a reputable, independent
German computer magazine, not by Apple. I'd guess that the
major reason for c't to do the comparison the way they did is
to avoid getting flamed by the Apple crowd. So instead of testing
the latest Apple box against the latest Dell box they tried to make
the configurations of the two systems as close as possible.

-- Norbert

Holger Bettag

unread,
Feb 27, 2002, 12:25:11 PM2/27/02
to
Peter Boyle <pbo...@holyrood.ed.ac.uk> writes:

[...]


> The 604 had an entirely different life span: releases
> in '94 - '97, and the 750 from '97 to today.
> According to the Apple marketing blurb, the 750 was brought in to
> replace the 604 (and indeed had vastly better integer perf, but
> all the improved cache/bus structure bought it on FP was approximate
> parity with the 604).
>

Actually, the "third generation" of PPCs had continued both lines:
cheap low-power with the 750 (based on the 603e), and high-performance with
a PPC760. Unfortunately, the latter (based on 604e) did not perform
significantly better than its 'smaller' sibling, despite using lots more
silicon and power. The natural decision was to sell it under the name
604e and stop the line there.

The only thing left to be done for the next iteration of the cheap low power
family line was to beef up the FPU, which the MPC7400 does, along with
implementing AltiVec. The two lines finally merged once and for all with
the MPC7450, which on top of that implements a very nice integer multiplier
like the 604 always had.

> It is rather amusing to note that the G3 and G4 at 475 Mhz obtain
> identical spec95(int/fp) of 21.4/13.8 and 21.4/20.4 respectively!
>

With SPECfp, there is always the question wether improvements are due to
FP processing capability or bandwidth. The 7400 has a much improved bus
protocol compared to the 750.

> > The pipeline was moved further from the dispatcher to give priority to
> > Altivec. This leads to some signaling issues when the pipeline fills up
> > (how can the dispatcher dispatch an instruction when it can't get a signal
> > early enough in the clock cycle to let it know that space will be
> > available- reservation stations help a lot with this, though).
>
> Presumably this is related to the fp latency being 4 and the altivec
> fp latency 3 on the later G4's?
>

7400/7410: VFPU has 4 stages, FPU has 3 stages
7450/7451/7455: VFPU has 4 stages, FPU has 5 stages (appears like 6 sometimes)

Holger

Kai Habel

unread,
Feb 27, 2002, 12:26:30 PM2/27/02
to


Hello,

to give you more data have a look at

http://www.suse.de/~aj/SPEC/CINT/sandbox-b/index.html

A. Jaeger tracks the SPEC2000 performance for the current gcc-cvs
compared to 3.0 and 2.95

Bye Kai

Sander Vesik

unread,
Feb 27, 2002, 12:25:57 PM2/27/02
to
Douglas Siebert <dsie...@excisethis.khamsin.net> wrote:
> Erik Corry <er...@arbat.com> writes:
>
>>The reason c't used 2.95.3 for MacOS is because that is what
>>you get in the official developers' toolkit from Apple. I
>>guess they used the same compiler for x86 to be 'fair'.
>
>
> Yikes, Apple actually uses gcc for their native compiler? Talk about
> handicapping yourself in a race. I know Jobs wanted to reinvent NeXT,
> but that's just ridiculous!

Well, howmany other objective C compilers are out there?

>
> I remember in the early PPC days, the way to get the best performance
> on the Mac was to compile on an RS/6000. IBM's compiler's produced
> much better code than the stuff Apple's developer compiler did. All
> the big name software (Photoshop, etc.) was done this way, and AFAIK
> MacOS itself. They made the object format compatible for that reason
> (among others, I'm sure) I wonder if OS/X has the same object file
> format, if so it should still be possible to compile code on an RS/6000
> (using the Mac libraries and include files, not IBM's, obviously)
>

No, OS/X is said to have its own strange object format...

> --
> Douglas Siebert dsie...@excisethis.khamsin.net
>
> A good friend will help you move, a true friend will help you move a body.

--

David T. Wang

unread,
Feb 27, 2002, 1:08:37 PM2/27/02
to
Norbert Juffa <ju...@acm.org> wrote:

> As for the floating-point performance, I wonder how much of that is the
> compiler and how much is due to the FPU in the G4 or the memory subsystem?
> SPECfp loves memory bandwidth and I think the Macintosh only uses PC133
> SDRAM (versus DDR in Athlon systems and RDRAM in high-end P4 systems).
> Absoft Fortran has OK performance on x86 as per www.polyhedron.co.uk, so
> I have to assume that their PPC compiler isn't shabby either. But one can
> ever be sure.

I believe that something else is going on. A Pentium III tied to a 100 MHz
FSB and 100 MHz ECC PC100 SDRAM (using Intel compilers) can still get a
SPECfp of about 205.

With higher frequency, more bandwidth, there ought to be some headroom
about the Intel P !!! scores.

--
E PLURIBUS UNUM

ddaavveewwaanngg@@wwaamm..uummdd..eedduu

Paul Hsieh

unread,
Feb 27, 2002, 5:21:50 PM2/27/02
to
> >Yes, I read many of those sites. What does it tell me? The P4 2.2 GHz is
> >about as fast as the Athlon 1.66 GHz, which is much faster than a 1 GHz
> >PIII. So, by transitive closure the Athlon 1.66 GHz and 2.2 GHz P4 are
> >still much faster than the G4. ...
>
> Er, no. Some said that, but rather more indicated that a 1.7 GHz
> Pentium 4 was comparable with a 733 MHz Pentium III.

I'm going to have to ask for a citation on that. I don't believe any
test showed that dramatic of a difference. Pretty much all the
current reviews have concurred with Norbert's claims.

--
Paul Hsieh
http://www.pobox.com/~qed/apple.html

Paul Hsieh

unread,
Feb 27, 2002, 5:34:08 PM2/27/02
to
dsie...@excisethis.khamsin.net (Douglas Siebert) wrote:
> q...@pobox.com (Paul Hsieh) writes:
>
> >Intel announced the 1Ghz P-III in early March 2000 during their IDF
> >conference, and actually shipped units to customers later that month.
> >The Spec submission predated this announcement, and thus I think they
> >were just trying to make sure people didn't get any advanced info on
> >their roadmap. Norbert's estimation of 2 years is only off by about a
> >week and a half.
>
> They announced it, and shipped a few, but it was well known that there
> were very few available until the end of that summer (it was referred
> to by some as a "paper launch" which Intel was rushed into by AMD's
> success at ramping the Athlon's clock speed)

That's true, but as I recall it only took a month or two before Intel
ramped up serious volume on the 1Ghz, still beating the supposed
August 2K date significantly. Apple's availability track record
recently certainly has been no better. Intel is saddled with the
unique problem of supplying the majority of the desktop CPUs to the
world, so "low volumes" for them may far exceed the peak shipments by
either AMD or Apple. So its all relative anyway. If Intel says they
are available, and Dell actually shipped some (and there wasn't an
immediate recall) then there is little you can argue about whether or
not it actually shipped.

Paul Hsieh

unread,
Feb 27, 2002, 5:57:08 PM2/27/02
to
nm...@cus.cam.ac.uk (Nick Maclaren) wrote:
> Paul Hsieh <q...@pobox.com> wrote:
> >nm...@cus.cam.ac.uk (Nick Maclaren) wrote:

> >> Norbert Juffa <ju...@acm.org> wrote:
> >> >While using same frequency parts and same version compilers seems most
> >> >fair, it does seem to have little practical relevance (IMHO). All these
> >> >results show is that architectural performance of a 1 GHz MPC7455 is
> >> >comparable to a 1 GHz PIII (at least for integer work).
> >>
> >> I would agree with the latter, but not the former! The former
> >> remark also applies to the choice of language - it isn't uncommon
> >> for different languages to be much better for different systems.
> >
> >You're right. Its unfair to measure the PPC on niche languages like
> >"C". Perhaps you'd agree that we should measure the two running
> >AppleScript.
>
> You are being obtuse. [...]

It is you who is being obtuse. What language difference are you
referring to in the PPC versus x86 comparison? You think x86's are
unusually good architectures for either Fortran or C versus the PPC?
You think everything should be recompiled in the esoteric "Objective
C" because its the wave of the future or something? I just don't
understand your comment.

> [...] A lot of the flame wars over Cray vector versus
> commodity RISC missed the point that the former was best programmed
> in Fortran and the latter often in C. The Cray vector systems were
> dire when running C.

Well that's nice. But Cray is a research architecture obviously built
for scientific purposes. Apple claims the Macs to be desktop
computers (they also claim they are supercomputers -- maybe they think
this because they suck at C-code like the Cray.) This example does
not help me understand what you are saying.

> >Give me a break. The fairest comparison is to run what developers use
> >on each platform. For Max OS X versus Windows XP that would be
> >Metrowerks versus MSVC++. According to that article, Metrowerks did
> >worse than gcc, and other articles by c't magazine indicate that
> >MSVC++ is actually better than gcc on the x86 for Spec CPU. Either
> >way, it doesn't look good for the poor PPC. If we're picking the best
> >compiler that is available to developers for the PPC for this test,
> >perhaps we could fair it up by using Intel's compiler for the x86.
>
> No, that tell's you what the fastest SYSTEM for your purpose is,
> which is what most users want to know.

(The whole point of what Spec CPU is trying to accomplish ... or most
people's informed purchasing decisions for that matter)

> [...] BUt it tells you very little about what is the fastest machine (AS a
> machine), because the difference could be entirely in the software.

If you want to know just how fast the machine can rev its engines,
you've might as well just look at the machine specs and the
architecture (either way, you cannot dismiss clock rate.) But on that
account, with the exception of AltiVec, the G4 will lose even more
dramatically.

> >Personally, I don't see how investigating a machine that clocks in a
> >187 on SpecFP is relevant to anything. A quick survey of the
> >submitted results on spec.org show that it gets beat by every major
> >shipping architecure, and typically by more than 2x. They actually
> >get beat by MIPS (another "designed for embedded" architecture) and
> >the old UltraSparcs -- now that's pretty lame.
>
> Doubtless you don't. But perhaps you don't read German and aren't
> familiar with the Apple and its users. That article clearly stated
> that the SpecFP was dire, but that it was essentially irrelevant to
> most Apple users. All that is true.

The SpecInt looked pretty dire to me as well. 1Ghz P-III are very
obsolete CPUs. We're talking notebook speed -- and the G4 can't
manage to even beat that. The SpecInt results are not as dramatically
bad, it just indicates that the Mac/G4 is a worse architecture. The
Spec FP results tell us that in fact its a joke.

Paul Hsieh

unread,
Feb 27, 2002, 6:22:17 PM2/27/02
to
Bruce Hoult <br...@hoult.org> wrote:

> q...@pobox.com (Paul Hsieh) wrote:
> > nm...@cus.cam.ac.uk (Nick Maclaren) wrote:
> > > "Norbert Juffa" <ju...@earthlink.net> writes:
> > > |> "Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote:
> > > |> > c't 5/2002, pages 182-183 publishes SPEC CPU2000 results for
> > > |> > various
> > > |> > Apple G4s and a Pentium III/1000 (all with 512MB RAM in
> > > |> > single-user
> > > |> > mode). Here they are:
> > > |> >
> > > |> > C compiler SPECint_base SPECfp_base
> > > |> > G4 800 gcc 242 147
> > > |> > G4 867 gcc 259 153
> > > |> > G4 1000 gcc 306 187
> > > |> > P3 1000 gcc 309
> > > |> > P3 1000 MSC 236
> >
> > You know, I remember a time where I dared to suggest that the PPC
> > arhitecture was highly unlikely to have higher IPC than the P6 or
> > Athlon. Then one "Bruce Hoult" berated me for ignoring all the
> > authoritative Apple demos that clearly showed otherwise. He was
> > incredulous that I could possibly believe that the PPC did not have a
> > significantly higher IPC.
>
> Nice that I'm in your thoughts and that you remember how to spell my
> name, but a pity that you don't recall that was *before* Motorola gave
> in to the "MHz is everything" marketing people and redesigned the G4
> with a longer pipeline and higher latency to cache.

They did? Well when is Motorola going to release these high clock
rate CPUs? And why haven't we heard anything about them?

> Performance of the last of the old G4's -- the 533 MHz -- was much the
> same as the next G4 at 733 MHz.

Really? Based on what performance benchmarks? Certainly not Spec CPU
2K, and I have never found any Spec CPU 2K runs on a PPC besides from
IBM (who has gone a different route to achieving performance), or this
recent article.

I remember trying to get details about a fabled Spec95 run out of
Motorola and they suggested how the results *might* have come about,
but never straight our answered the question about how they were
really obtained. Apple and Motorola are cowards about revealing the
performance of their CPUs and these recent results show why.

> Given those implementation changes, you'd *expect* the new G4 to be
> about the same IPC as the P3. And surprise, surprise, it is! Woopie.
> The marketing department wins, the technical designers lose.

Who's marketing department? Who won what? I don't understand. The
currently shipping G4 CPUs are still clocked unacceptably low,
performance results such as these show the architecture doesn't make
up for it, and now they are trying to ship two processors in a box to
make is seem like they are delivering more performance. I don't get
it.

--
Paul Hsieh
http://www.pobox.com/~qed/

Bruce Hoult

unread,
Feb 27, 2002, 6:41:55 PM2/27/02
to
In article
<Pine.SOL.4.33.020227...@holyrood.ed.ac.uk>, Peter
Boyle <pbo...@holyrood.ed.ac.uk> wrote:

> It is rather amusing to note that the G3 and G4 at 475 Mhz obtain
> identical spec95(int/fp) of 21.4/13.8 and 21.4/20.4 respectively!

That's what you'd expect, since those early G4's *were* G3 + better DP
FPU + AltiVec.

-- Bruce

Alexis Cousein

unread,
Feb 27, 2002, 3:16:29 AM2/27/02
to JD
JD wrote:
> From what I have seen (and I use it alot), the Intel C compiler isn't
> just good at Spec, but is good in general.

These days, yes. Nick probably refers to a close encounter with an
older incarnation, for which I can't say I can *totally* disagree with
Nick: SPEC codes were certainly the first ones to be treated kindly,
although it is much to Intel's credit that they didn't stop working on
the compiler even when SPEC codes are compiled into decent code.

<these messages express my own views, not those of my employer>
Alexis Cousein Senior Systems Engineer
SGI Belgium and Luxemburg a...@brussels.sgi.com

JD

unread,
Feb 28, 2002, 12:27:33 AM2/28/02
to

"Alexis Cousein" <a...@brussels.sgi.com> wrote in message news:3C7C95DD...@brussels.sgi.com...

> JD wrote:
> > From what I have seen (and I use it alot), the Intel C compiler isn't
> > just good at Spec, but is good in general.
>
> These days, yes. Nick probably refers to a close encounter with an
> older incarnation, for which I can't say I can *totally* disagree with
> Nick: SPEC codes were certainly the first ones to be treated kindly,
> although it is much to Intel's credit that they didn't stop working on
> the compiler even when SPEC codes are compiled into decent code.
>
(Sorry that I also made a private response -- but it says essentially
the same as below.)

Yep, I really meant only to imply that I have been using the recent
versions of Intel C and C++. It really seems to do most of the
optimizations that I would try to do in a perfect world (e.g. unlimited
resources), if I wrote the compiler. It seems to meet my expectations,
with only a few serious compiler bugs. (Maybe I have run into one
over the last year or so.) If using other optimizing compilers as
heavily, where the compiler is really doing alot of stuff over and
above acting as a semantic macro expander, then I would have
expected equivalent or worse quality behavior than I had experienced
using the ICC.

I have to admit that GCC has alot of programmer friendly features,
and a pretty good inline asm scheme, but ICC does normal C, C++
compiler optimizations very, very well. When stretching my C++
skills (which aren't nearly as expert as my normal C programming),
I seem to have fewer problems satisfying the G++ compiler and
template libraries.

John

Nick Maclaren

unread,
Feb 28, 2002, 4:17:21 AM2/28/02
to
In article <3C7C95DD...@brussels.sgi.com>,

Alexis Cousein <a...@brussels.sgi.com> wrote:
>JD wrote:
>> From what I have seen (and I use it alot), the Intel C compiler isn't
>> just good at Spec, but is good in general.
>
>These days, yes. Nick probably refers to a close encounter with an
>older incarnation, for which I can't say I can *totally* disagree with
>Nick: SPEC codes were certainly the first ones to be treated kindly,
>although it is much to Intel's credit that they didn't stop working on
>the compiler even when SPEC codes are compiled into decent code.

Hmm. That is true, but perilously close to revisionism!

For the best part of 15 years, Intel's compilers were almost entirely
targetted at producing impressive (to the naive) benchmark figures,
and were neither useful nor released. There was one immortal occasion
when it came out that the reason that Intel neither passed on their
compiler nor used it internally was because it didn't compile any of
the standard languages! It accepted just the subset used in the
benchmarks, and even then was semantically incompatible (though it
was compatible on the benchmark codes).

Now, quite a long time back (5-10 years), Intel appeared to have a
change of heart, and to want to produce 'real' compilers. But it
took a LONG time for the change to take effect - which is not
surprising to those who know the technical, managerial and (most of
all) psychological problems involved in such changes of direction.
The effect that you are referring to is one of the lingering ones.

But I wasn't actually referring to that, so much as the fact that
comparing one machine using a compiler (real or otherwise) heavily
optimised for Spec with another using a compiler that is not much
optimised tells us more about the compilers than the machines!

Bernd Paysan

unread,
Feb 28, 2002, 4:42:16 AM2/28/02
to
Sander Vesik wrote:
>> Yikes, Apple actually uses gcc for their native compiler? Talk
>> about
>> handicapping yourself in a race. I know Jobs wanted to reinvent
>> NeXT, but that's just ridiculous!
>
> Well, howmany other objective C compilers are out there?

That's only part of the story. How many other C compilers are there
that can compile (without a hitch) the whole bunch of free Unix
software? And since c't also compared GCC to MSC in the same article,
the Microsoft plattform is much more handicapped. While MSC has a
plugin for ICC, most developers don't use it (especially because
turning on all those tuning options makes debugging more difficult,
and additionally you need training data for the feedback
optimization). So using GCC for real-world development is not that
bad as it looks on SPEC.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Christoph Breitkopf

unread,
Feb 28, 2002, 5:30:06 AM2/28/02
to
JD wrote:

>> The fact that Intel's compiler delivers 402/451 versus the 309 of
>> gcc tells us more about the effort Intel puts into tuning its
>> compiler for SPEC than anything else. Yes, a comparison with an
>> equivalent compiler for the G4 would be interesting.


>>
>From what I have seen (and I use it alot), the Intel C compiler isn't

>just good at Spec, but is good in general. GCC has a long way to
>go in order to ferret out the performance that the ICC compiler
>can do.

It seems to depend on the processor family, too. I downloaded
icc for Linux a few weeks ago, and compiled quite a few programs
with it, with all kinds of optimization, even feedback. Next
to no effect on a Pentium III machine (5% at most). I see
no differrence at all with bzip2, for example. Of course, this
is a well written program to begin with, so maybe gcc has an
easier time here. I don't care that much about whether the
compiler speeds up badly written programs, however.

I think things may look much better for the Intel compiler on
a Pentium 4.

Regards,
Chris

JD

unread,
Feb 28, 2002, 6:48:55 AM2/28/02
to

"Christoph Breitkopf" <ch...@chr-breitkopf.de> wrote in message
news:a5l0rf$8bf5d$1...@ID-33690.news.dfncis.de...
The ICC compiler seems to be supurb at eliminating
unused code, and and allow for both modular and efficently
generated code, even in libraries. A properly generated library
by ICC can produce inline optimized code. ICC can be
incredible in C++, but my experience is often with my
own template libraries, rather than the standard libraries.
In some cases, the very general nature of C++ libraries
(for example) don't necessarily cost much, because ICC
is so good at eliminating unused computations, across
files and even in libraries. This is one reason why the
linker phase in ICC appears to be a late pass of the compiler.

There are certainly cases where ICC doesn't perform alot
differently than GCC, but those are going to be where
memory is more a bottleneck or where the code has already
been carefully optimized for GCC.

John

Niels Jørgen Kruse

unread,
Feb 28, 2002, 11:21:48 AM2/28/02
to
I artiklen <8xwuwyh...@nmh.informatik.uni-bremen.de> , Holger Bettag
<hob...@informatik.uni-bremen.de> skrev:

> Peter Boyle <pbo...@holyrood.ed.ac.uk> writes:
>
> [...]

>> Presumably this is related to the fp latency being 4 and the altivec
>> fp latency 3 on the later G4's?
>>
>
> 7400/7410: VFPU has 4 stages, FPU has 3 stages
> 7450/7451/7455: VFPU has 4 stages, FPU has 5 stages (appears like 6 sometimes)

I think he was talking about *load* latencies. Much worse than the increase
in load latencies is the 3 clock throughput fp store. I guess this is due to
the fp value arriving late at the load/store unit, clashing with following
instructions in the l/s pipe, had they been allowed.

I am hoping that the shrink to .13 will change layout enough to fix this. At
least a doubling of L2 cache should double the number of cache blocks,
making layout more flexible. (Haven't seen a die photo of any G4+.)

--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark

Peter Boyle

unread,
Feb 28, 2002, 5:15:43 PM2/28/02
to

On Thu, 28 Feb 2002, Niels Jørgen Kruse wrote:

> I artiklen <8xwuwyh...@nmh.informatik.uni-bremen.de> , Holger Bettag
> <hob...@informatik.uni-bremen.de> skrev:
>
> > Peter Boyle <pbo...@holyrood.ed.ac.uk> writes:
> >
> > [...]
> >> Presumably this is related to the fp latency being 4 and the altivec
> >> fp latency 3 on the later G4's?
> >>
> >
> > 7400/7410: VFPU has 4 stages, FPU has 3 stages
> > 7450/7451/7455: VFPU has 4 stages, FPU has 5 stages (appears like 6 sometimes)
>
> I think he was talking about *load* latencies. Much worse than the increase
> in load latencies is the 3 clock throughput fp store. I guess this is due to
> the fp value arriving late at the load/store unit, clashing with following
> instructions in the l/s pipe, had they been allowed.

Well Done!

Actually, I was sufficiently hazy I had forgotten what I was talking
about. There was a thread on this in March, which I just looked up.

http://groups.google.com/groups?hl=en&frame=right&rnum=21&thl=1256544838,1255411850,1256508019,1257515525,1257883025,1257625817,1257437403,1257535549,1257431635,1257956142,1257902574,1257896135&seekm=j5Fp6.28164%24Fz.7046555%40typhoon.austin.rr.com#link28

Where Jeff Rurpley confirmed this, however I am ashamed to say my brain
over time had scrambled the extra cycle load latency into an extra cycle
operate latency (which apparently also exists!)....

Cheers,

Peter


> I am hoping that the shrink to .13 will change layout enough to fix this. At
> least a doubling of L2 cache should double the number of cache blocks,
> making layout more flexible. (Haven't seen a die photo of any G4+.)
>
> --
> Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark
>

Peter Boyle pbo...@physics.gla.ac.uk

andrew essen

unread,
Feb 28, 2002, 6:29:57 PM2/28/02
to

Norbert Juffa wrote:
> I second that sentiment, and think it applies equally to the G4. I work
> with G4 class processors in an embedded application (4-way non-symetrical
> multi-processing) and they are great for that. I have looked multiple times
> at the possibility of using x86 mobile parts instead and it is a no-fit
> because of the power consumption and other short comings. The short pipeline
> is very advantageous when dealing with lots of "unpredictable" branches. The
> support for large L2s is also very helpful.
>
> The only similarly suitable parts I am aware of for high performance embedded
> applications are high end MIPS processors. x86 isn't even playing.

i don't have much experience with embedded processors, but i'm
curious about what makes x86's inappropriate for embedded use.
looking at all the x86 vendors, it seems like there is almost
something for everybody. cheap integrated parts from national,
medium performance low power from via & transmeta, high performance
from intel & amd. it does seem like x86's tend to burn 2x the
watts/mips at the low end of the performance scale vs. the risc
crowd. interestingly, it's almost the opposite at the highest end.
embedded x86's also should have a fair amount of software support.

so what's wrong with embedded x86's? lack of integrated peripherals?
the 100% power tax? x86 isa just plain sucks too much?

thanks,
andrew essen
<my_last_name>@computer.org

Christopher Brian Colohan

unread,
Feb 28, 2002, 8:21:06 PM2/28/02
to
andrew essen <es...@nospam.invalid> writes:
> so what's wrong with embedded x86's? lack of integrated peripherals?
> the 100% power tax? x86 isa just plain sucks too much?

I would guess it is four things: (Note: I am not an embedded designer,
so please correct me if I am wrong...)

1. The power tax. Often embedded devices are battery powered, so
power consumption really matters.

2. Cost. Since the x86 has a lot of legacy baggage (supporting old
addressing modes, etc), the chip has to be larger and more complex to
support the entire ISA. In desktop CPUs this extra cost is minimal
compared to other portions of the chip -- but in smaller embedded
designs it would be more significant. Larger chips usually cost more.

3. Programmability. Embedded folks often like to program in
assembly. x86 assembly is not as nice to code in as RISC ISAs.

4. Marketing. Intel has embedded products already (ARM), and
probably sees no reason to squeeze their own existing products with a
new competitor. There is little reason for anyone else to produce an
x86 embedded machine.

Chris
--
Chris Colohan Email: ch...@colohan.ca PGP: finger col...@cs.cmu.edu
Web: www.cs.cmu.edu/~colohan Phone: (412)268-4751

Norbert Juffa

unread,
Feb 28, 2002, 10:56:09 PM2/28/02
to
andrew essen <es...@nospam.invalid> wrote in message news:<3C7EBD4E...@nospam.invalid>...
> Norbert Juffa wrote:
[...]

> > The only similarly suitable parts I am aware of for high performance embedded
> > applications are high end MIPS processors. x86 isn't even playing.
>
> i don't have much experience with embedded processors, but i'm
> curious about what makes x86's inappropriate for embedded use.
> looking at all the x86 vendors, it seems like there is almost
> something for everybody. cheap integrated parts from national,
> medium performance low power from via & transmeta, high performance
> from intel & amd. it does seem like x86's tend to burn 2x the
> watts/mips at the low end of the performance scale vs. the risc
> crowd. interestingly, it's almost the opposite at the highest end.
> embedded x86's also should have a fair amount of software support.
>
> so what's wrong with embedded x86's? lack of integrated peripherals?
> the 100% power tax? x86 isa just plain sucks too much?


I think it very much depends on the embedded application. There
certainly
are embedded applications where embedded x86 parts are an excellent
fit.
For example those that essentially are "embedded PCs" running Linux or
FreeBSD.

Where I work, we need lots of computational horsepower in a very tight
power envelope. Also, the nature of the task at hand means that using
large caches (2 MB) we can keep our working set in the cache.
Difficult
to predict branches mean that all other things being equal, a shorter
pipeline is better. Finally, we don't want these parts to be super
expensive. There are also other issues such as support for extended
temperature ranges not available in standard PC-type parts (those are
classed as "commercial", next step up is "industrial", harshest is
"military"). Integrated peripherals are not important for my work, but
many other embedded applications need those.

The only parts that fit the bill are the high-end parts in the PPC and
MIPS lineups, such as Motorola's G4 series or the PMC-Sierra RM series
I don't have statistics at hand but I think you will find that PPC and
MIPS dominate high performance embedded designs, not least because
they
include many parts designed for the embedded market, or at least with
the
embedded market in mind.

In the 500 MHz range, these offer power consumption of about 5W. On
the x86
side, a AMD K6-III would be roughly comparable (short pipe, support
for up
to 2MB L3, similar performance). But it has much higher power
consumption. Mobile x86 parts are expensive, plus they actually draw
quite a bit of power
when running at full speed (e.g. PIII @ 500 MHz @ 1.1V max power about
9W).
For embedded designs, the power consumption we look at is not the
typical
power, but max power.

I worked a number of years in the x86 world so I am not prejudiced
against
x86. And since embedded work in our field is predominantly done in
C/C++ the
ISA has no influence really (plus I am comfortable with x86 assembly).


-- Norbert

Stephen Fuld

unread,
Mar 1, 2002, 12:19:43 AM3/1/02
to

"Norbert Juffa" <ju...@acm.org> wrote in message
news:cf4fab37.02022...@posting.google.com...

> andrew essen <es...@nospam.invalid> wrote in message
news:<3C7EBD4E...@nospam.invalid>...
> > Norbert Juffa wrote:
> [...]
> > > The only similarly suitable parts I am aware of for high performance
embedded
> > > applications are high end MIPS processors. x86 isn't even playing.
> >
> > i don't have much experience with embedded processors, but i'm
> > curious about what makes x86's inappropriate for embedded use.
> > looking at all the x86 vendors, it seems like there is almost
> > something for everybody. cheap integrated parts from national,
> > medium performance low power from via & transmeta, high performance
> > from intel & amd. it does seem like x86's tend to burn 2x the
> > watts/mips at the low end of the performance scale vs. the risc
> > crowd. interestingly, it's almost the opposite at the highest end.
> > embedded x86's also should have a fair amount of software support.
> >
> > so what's wrong with embedded x86's? lack of integrated peripherals?
> > the 100% power tax? x86 isa just plain sucks too much?
>
>
> I think it very much depends on the embedded application. There
> certainly
> are embedded applications where embedded x86 parts are an excellent
> fit.
> For example those that essentially are "embedded PCs" running Linux or
> FreeBSD.

Yup. And a fair number of factory automation type applications (due to
things like lots of sw tools, small form factor complete systems, etc.
available. And, at least until a few years ago, the embedded processor in
IBM's SCSI disk drives was an 8086 clone. The variety of characteristics of
embedded applications is truely astounding!

--
- Stephen Fuld
e-mail address disguised to prevent spam


andrew essen

unread,
Mar 1, 2002, 7:43:34 PM3/1/02
to

Christopher Brian Colohan wrote:
> andrew essen <es...@nospam.invalid> writes:
> > so what's wrong with embedded x86's? lack of integrated peripherals?
> > the 100% power tax? x86 isa just plain sucks too much?
>
> I would guess it is four things: (Note: I am not an embedded designer,
> so please correct me if I am wrong...)

that's not stopping me from speculating either!

> 1. The power tax. Often embedded devices are battery powered, so
> power consumption really matters.

in terms of 32b cpu's, that market seems to be pretty much locked
up by arm. mips & ppc seem to mostly end up in line powered
applications (e.g., nintendos and routers). in that space, my
impression is power only matters to the extent of not needing
ceramic packages, fans, etc.

> 2. Cost. Since the x86 has a lot of legacy baggage (supporting old
> addressing modes, etc), the chip has to be larger and more complex to
> support the entire ISA. In desktop CPUs this extra cost is minimal
> compared to other portions of the chip -- but in smaller embedded
> designs it would be more significant. Larger chips usually cost more.

this one is the most non-intuitive. for those very reasons,
x86's should be more expensive. but they are amazingly cheap due
to other reasons like volume and intense competition between intel
& amd. a quickie web check of cpu prices (retail quantity 1)...

1 ghz duron $73
1.53 ghz athlon ("xp1800+") $168
1.67 ghz athlon ("xp2000+") $298
0.95 ghz celeron $70
1.6 ghz p4-512k $163
2.0 ghz p4-512k $378

and then there's the really cheap stuff from national, via, etc.
i don't have a price list for riscs, but in general i remember
seeing things like ~$300 for stuff like 500-600 mhz mips chips.
frequency isn't everything, but a factor of 4 behind is a big deal.
x86 $/mips have been amazingly cheap since amd came out with the
k6. i don't know if a 1 ghz chip for <$100 or a 2 ghz chip for
<$400 is more amazing.

> 3. Programmability. Embedded folks often like to program in
> assembly. x86 assembly is not as nice to code in as RISC ISAs.

x86 assembly certainly isn't at the top of my list of fun things
to do. although i'd guess most embedded 32b cpu's are programmed
in c/c++ anyways.

> 4. Marketing. Intel has embedded products already (ARM), and
> probably sees no reason to squeeze their own existing products with a
> new competitor. There is little reason for anyone else to produce an
> x86 embedded machine.

that could be. you never know what kind of weird things will come
out of marketing next. we want a vector unit and dessert topping
in one!

ae

andrew essen

unread,
Mar 1, 2002, 8:26:45 PM3/1/02
to

Norbert Juffa wrote:
> andrew essen <es...@nospam.invalid> wrote in message news:<3C7EBD4E...@nospam.invalid>...
> > so what's wrong with embedded x86's? lack of integrated peripherals?
> > the 100% power tax? x86 isa just plain sucks too much?
>
> I think it very much depends on the embedded application. There
> certainly
> are embedded applications where embedded x86 parts are an excellent
> fit.
> For example those that essentially are "embedded PCs" running Linux or
> FreeBSD.

coincidentally, my limited embedded experience consists of 1 time
dealing with exactly such a system. it was such a good fit i
wondered why x86's weren't more popular.

> Where I work, we need lots of computational horsepower in a very tight
> power envelope. Also, the nature of the task at hand means that using
> large caches (2 MB) we can keep our working set in the cache.
> Difficult
> to predict branches mean that all other things being equal, a shorter
> pipeline is better. Finally, we don't want these parts to be super
> expensive. There are also other issues such as support for extended
> temperature ranges not available in standard PC-type parts (those are
> classed as "commercial", next step up is "industrial", harshest is
> "military"). Integrated peripherals are not important for my work, but
> many other embedded applications need those.

hmm, i forgot completely about extended temperature ranges. i can
also readily believe that a given x86 will suck on specific
benchmarks even though the spec numbers look good -- all cpu's
have performance holes.

> The only parts that fit the bill are the high-end parts in the PPC and
> MIPS lineups, such as Motorola's G4 series or the PMC-Sierra RM series
> I don't have statistics at hand but I think you will find that PPC and
> MIPS dominate high performance embedded designs, not least because
> they
> include many parts designed for the embedded market, or at least with
> the
> embedded market in mind.
>
> In the 500 MHz range, these offer power consumption of about 5W. On
> the x86
> side, a AMD K6-III would be roughly comparable (short pipe, support
> for up
> to 2MB L3, similar performance). But it has much higher power
> consumption. Mobile x86 parts are expensive, plus they actually draw
> quite a bit of power
> when running at full speed (e.g. PIII @ 500 MHz @ 1.1V max power about
> 9W).
> For embedded designs, the power consumption we look at is not the
> typical
> power, but max power.

the applications i was primarily thinking about have a cpu density
on the order of 1 cpu per rack unit. in that case 50-60w
cpu's aren't wonderful, but tolerable. as i just posted in
a nearby article, the price/performance of x86's on stuff like
integer spaghetti code is very good. what's the list price on
stuff like 500+ mhz mips/ppc's?

> I worked a number of years in the x86 world so I am not prejudiced
> against
> x86. And since embedded work in our field is predominantly done in
> C/C++ the
> ISA has no influence really (plus I am comfortable with x86 assembly).
>
> -- Norbert

thanks,
andrew essen
<my_last_name>@computer.org

Douglas Siebert

unread,
Mar 1, 2002, 9:37:18 PM3/1/02
to
nm...@cus.cam.ac.uk (Nick Maclaren) writes:

>That is the other factor. No Pentium 4 below 2 GHz has a 512 KB
>cache. I can't tell you which of the numerous Pentium III cache
>approaches is best for this purpose, or was used in the comparisons.
>Obviously, the 512 KB of the Pentium 4 2.2 may give a significant
>advantage over 256 KB or a slower cache.


Not true. When the 2.0 (called 2.0A) and 2.2 GHz Northwoods were
introduced, Intel also introduced the 1.6A and 1.8A Northwoods, with
the same 512K cache. They didn't make a big deal about them, but
they seem to be easy to get, at least on Pricewatch.

Since they are in .13u, they are supposedly the current favorite of
the overclocking crowd, since the Athlon XP is about at the top of
its frequency range and the Tualatin Pentium IIIs and Celerons were
AFAICT deliberately handicapped to increase their power consumption,
to reduce their appeal and overclocking potential (since Intel wants
the P3 line to die quickly and make room for the P4)

Terje Mathisen

unread,
Mar 2, 2002, 3:22:26 AM3/2/02
to
andrew essen wrote:

>
> Christopher Brian Colohan wrote:
> > 3. Programmability. Embedded folks often like to program in
> > assembly. x86 assembly is not as nice to code in as RISC ISAs.
>
> x86 assembly certainly isn't at the top of my list of fun things
> to do. although i'd guess most embedded 32b cpu's are programmed
> in c/c++ anyways.

Huh???

x86 asm is actually quite nice to program in, assuming you have a few
years experience doing so. :-)

Register pressure can suck somewhat on highly optimized code, but
otherwise there's no real problems.

Terje
--
- <Terje.M...@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"

Jonathan D Bradbury

unread,
Mar 2, 2002, 10:44:29 PM3/2/02
to

On Sat, 2 Mar 2002, Terje Mathisen wrote:

> andrew essen wrote:
> >
> > Christopher Brian Colohan wrote:
> > > 3. Programmability. Embedded folks often like to program in
> > > assembly. x86 assembly is not as nice to code in as RISC ISAs.
> >
> > x86 assembly certainly isn't at the top of my list of fun things
> > to do. although i'd guess most embedded 32b cpu's are programmed
> > in c/c++ anyways.
>
> Huh???
>
> x86 asm is actually quite nice to program in, assuming you have a few
> years experience doing so. :-)
>
> Register pressure can suck somewhat on highly optimized code, but
> otherwise there's no real problems.

Have you ever programmed in ARM assembly? The ISA is so slick compared to
x86. The barrel shifter before the ALU is also great for making nice
tight code. Also the lack of branching for conditionals is very nice :)
And not AS many strange addressing modes to think about either.

Jonathan

Terje Mathisen

unread,
Mar 3, 2002, 12:29:33 PM3/3/02
to
Jonathan D Bradbury wrote:
>
> On Sat, 2 Mar 2002, Terje Mathisen wrote:
> > x86 asm is actually quite nice to program in, assuming you have a few
> > years experience doing so. :-)
> >
> > Register pressure can suck somewhat on highly optimized code, but
> > otherwise there's no real problems.
>
> Have you ever programmed in ARM assembly? The ISA is so slick compared to

Not seriously, no, but it does look nice.

> x86. The barrel shifter before the ALU is also great for making nice
> tight code. Also the lack of branching for conditionals is very nice :)

They must have done a few things right since the ARM seems to have a
lock on cell phones!

Terje

> And not AS many strange addressing modes to think about either.

--

Willus

unread,
Mar 3, 2002, 12:52:31 PM3/3/02
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) wrote in message news:<a5grb5$9lr$1...@news.tuwien.ac.at>...
>
> As for gcc-2.95 vs. gcc-3.0, you can find SPEC95 results at
>
> http://www.complang.tuwien.ac.at/anton/spec95/athlon-800/CINT95.041.asc
> http://www.complang.tuwien.ac.at/anton/spec95/athlon-800/gcc-3.0/CINT95.051.asc
>
> 26.8 for gcc-2.95.1 and 27.5 for gcc-3.0 on the same machine
> (Athlon-800).

Here are a couple of other links that might help enlighten some of the
debates that have been going on in this thread:

http://www.willus.com/ccomp_benchmark.shtml
This shows numbers on five different codes for seven different
compilers (including Intel, gcc 2.95, MSVC) on Athlon, P3, and P4
systems. It shows good evidence that the latest Intel release is not
simply tuned for Spec2K.

http://www.willus.com/archive/p4summary.txt
(P4 vs P3 numbers from Intel for standard benchmarks--a little on the
old side)

-Willus

Benjamin Goldsteen

unread,
Mar 6, 2002, 11:13:21 AM3/6/02
to
Holger Bettag wrote:
>
> Outside of Apple, there is no (AltiVec-)software market visible to the
> general public. And inside Apple's pond, there is not enough money to be
> made to justify big tuning efforts. Apple would need to give away licenses
> of, say, VAST/AltiVec (an auto-vectorizer) for free if they wanted to make
> a difference.
>
> AltiVec itself can well make a difference, look at www.distributed.net for
> a non-Photoshop example.

For another example, Apple optimized BLAST for the G4 and the results give it
quite an edge over the P4:
http://www.apple.com/pr/library/2002/feb/07blast.html

For those not familiar, BLAST is a commonly used tool in the bioinformatics
community for sequence searching.

--
Benjamin Z. Goldsteen
Physiology & Biophysics
Mount Sinai School of Medicine
212-241-1614 / 212-860-3369 (FAX)

Peter Boyle

unread,
Mar 6, 2002, 11:47:47 AM3/6/02
to

On Wed, 6 Mar 2002, Benjamin Goldsteen wrote:

> For another example, Apple optimized BLAST for the G4 and the results give it
> quite an edge over the P4:
> http://www.apple.com/pr/library/2002/feb/07blast.html
>
> For those not familiar, BLAST is a commonly used tool in the bioinformatics
> community for sequence searching.

Wow, two G4 processors running hand coded Altivec assembler beat one Intel
processor running compiled code. That's an objective comparison
if ever I've seen one.

3x speedups are typical just coding in asm, without the use of vector
instructions (7x over GCC on ppc). I consider myself a powerpc advocate,
but reading between the lines of this would convince me to buy intel
chips.

Peter Boyle

Rupert Pigott

unread,
Mar 7, 2002, 3:11:33 AM3/7/02
to
"Peter Boyle" <pbo...@holyrood.ed.ac.uk> wrote in message
news:Pine.GSO.4.33.020306...@holyrood.ed.ac.uk...
[SNIP]

> 3x speedups are typical just coding in asm, without the use of vector
> instructions (7x over GCC on ppc). I consider myself a powerpc advocate,

Can't say I've "typically" seen 3x speed up from assembler on
anything other than trivial sequences of code. Maybe I've been
hanging out with the wrong assembly programmers. :)

OTOH I managed a 6x speedup over lovingly hand-crafted x86
assembler using plain old C and the old bitslice trick. :)

Mine code was portable too and would have picked up another
good chunk of speed on an arch which supported 64bit ints.

Cheers,
Rupert


Peter Boyle

unread,
Mar 7, 2002, 9:47:32 AM3/7/02
to

On Thu, 7 Mar 2002, Rupert Pigott wrote:

> "Peter Boyle" <pbo...@holyrood.ed.ac.uk> wrote in message
> news:Pine.GSO.4.33.020306...@holyrood.ed.ac.uk...
> [SNIP]
> > 3x speedups are typical just coding in asm, without the use of vector
> > instructions (7x over GCC on ppc). I consider myself a powerpc advocate,
>
> Can't say I've "typically" seen 3x speed up from assembler on
> anything other than trivial sequences of code. Maybe I've been
> hanging out with the wrong assembly programmers. :)

Well, the sequences I (somewhat biasedly) call "typical" are FP loops from
main memory (with an pointer indirection table) to some of the arrays,
on reasonable size loops, which has a *lot*
of ILP, providing you can make use of it.

Prefetching (even interprocedural prefetching for shorter vectors) makes
an enormous difference to the memory performance. This is most of
the difference I've seen over "vendor" workstation compilers.

The gcc comment is based on the loop unrolling being rather dumb
frankly, and failing to pack the pipelines even from L1, where
the vendor (alpha&sparc at least) compilers typically do rather well.

YMMV, and pointer chasing code clearly will be pretty difficult to speed up!

> OTOH I managed a 6x speedup over lovingly hand-crafted x86
> assembler using plain old C and the old bitslice trick. :)
>
> Mine code was portable too and would have picked up another
> good chunk of speed on an arch which supported 64bit ints.

;)

I heard of a HPC support team with a smart ex-scientist on board
that effectively kicked a user group off an MPP. He noticed
that their MPI based sparse linear solver was being used
on Poissons equation (in a geological context I believe), and therefore
could be solved on a ('95 era) PC using 3-d FFT's. The speed up
was over 50x .

Peter


> Cheers,
> Rupert
>
>
>


Anton Ertl

unread,
Mar 8, 2002, 12:32:51 PM3/8/02
to
In article <Ej8f8.20326$0C1.1...@newsread1.prod.itd.earthlink.net>,
"Norbert Juffa" <ju...@earthlink.net> writes:
>
>"Anton Ertl" <an...@mips.complang.tuwien.ac.at> wrote in message
>news:a5grb5$9lr$1...@news.tuwien.ac.at...

>> As for gcc-2.95 vs. gcc-3.0, you can find SPEC95 results at
>>
>> http://www.complang.tuwien.ac.at/anton/spec95/athlon-800/CINT95.041.asc
>>
>http://www.complang.tuwien.ac.at/anton/spec95/athlon-800/gcc-3.0/CINT95.051.
>asc
>>
>> 26.8 for gcc-2.95.1 and 27.5 for gcc-3.0 on the same machine
>> (Athlon-800).
>
>
>Thanks for the data (data is so much better than conjecture!). I wonder
>whether there
>might be bigger differences on the high-end Athlon/P4s? The c't magazin
>article linked
>by Julian mentions that "using gcc 3.0 or the Intel compiler, SPEC runs 10%
>to 20%
>faster" as compared to executable produced with gcc 2.95.3 , which seems to
>be a much
>bigger difference than shown by your data set.

The difference may also be due to the difference in the benchmarks
(CPU95 vs. CPU2000).

BTW, please don't use comb-style line breaking (for more info, read
<http://www.complang.tuwien.ac.at/anton/mail-news-errors.html#line-breaks>).

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Maynard Handley

unread,
Mar 8, 2002, 8:35:34 PM3/8/02
to
In article <3C802A30...@nospam.invalid>, andrew essen
<es...@nospam.invalid> wrote:

> Norbert Juffa wrote:
> > andrew essen <es...@nospam.invalid> wrote in message
news:<3C7EBD4E...@nospam.invalid>...
> > > so what's wrong with embedded x86's? lack of integrated peripherals?
> > > the 100% power tax? x86 isa just plain sucks too much?
> >
> > I think it very much depends on the embedded application. There
> > certainly
> > are embedded applications where embedded x86 parts are an excellent
> > fit.
> > For example those that essentially are "embedded PCs" running Linux or
> > FreeBSD.
>
> coincidentally, my limited embedded experience consists of 1 time
> dealing with exactly such a system. it was such a good fit i
> wondered why x86's weren't more popular.

For strange historical reasons to do with the code that was written as
people were toying with the idea, the first release of Apple's airport
(WiFi) base station had a 486 in it---and ran very hot. Not so hot that
cooling was a problem, but hot enough that it in a home you notice it as
seeming to be hotter than you'd expect. I'm pretty sure that heat was from
the 486.
BTW I believe current versions of the base station both don't use a 486
and run a lot cooler.

Maynard

ben

unread,
Mar 10, 2002, 10:20:42 PM3/10/02
to
Peter Boyle wrote:

> On Wed, 6 Mar 2002, Benjamin Goldsteen wrote:
>
> > For another example, Apple optimized BLAST for the G4 and the results give it
> > quite an edge over the P4:
> > http://www.apple.com/pr/library/2002/feb/07blast.html
> >
> > For those not familiar, BLAST is a commonly used tool in the bioinformatics
> > community for sequence searching.
>
> Wow, two G4 processors running hand coded Altivec assembler beat one Intel
> processor running compiled code. That's an objective comparison
> if ever I've seen one.

The main purpose of my posting was to give an example of a real-world AltiVec-tuned
code. I am sure Intel could optimize the BLAST kernels to use the SSE2
instructions of the P4. I just wanted to note that the AltiVec instructions can be
used by real applications to do something useful. Something, perhaps, a little
more useful than speeding up PhotoShop filters...

I think it has been clear to most that the 1GHz G4 doesn't compare well to the 2GHz
P4 on plain compiled code.


George N. White III

unread,
Mar 11, 2002, 7:28:00 AM3/11/02
to
On Sun, 10 Mar 2002, ben wrote:

> Peter Boyle wrote:
>
> The main purpose of my posting was to give an example of a real-world
> AltiVec-tuned code. I am sure Intel could optimize the BLAST kernels to
> use the SSE2 instructions of the P4. I just wanted to note that the
> AltiVec instructions can be used by real applications to do something
> useful. Something, perhaps, a little more useful than speeding up
> PhotoShop filters...

Ah, but Photoshop filters are useful because there are many more people
interested in speeding up Photoshop than in running BLAST, with the
result that you can run BLAST on much cheaper hardware than you would
get if Apple's designers had optimized for a different workload.

Now if only Apple's customers would go for packaging that is more
suitable for building massively parallel systems instead of wanting
lots of PCI slots in tower configurations.

--
George N. White III <gn...@acm.org> Bedford Institute of Oceanography

Peter Boyle

unread,
Mar 11, 2002, 9:17:02 AM3/11/02
to

Please, snip posts cleanly. This is confusing - I wrote nothing included
below!
Peter

Peter Boyle pbo...@physics.gla.ac.uk

Terje Mathisen

unread,
Mar 11, 2002, 9:52:17 PM3/11/02
to
Rupert Pigott wrote:
>
> "Peter Boyle" <pbo...@holyrood.ed.ac.uk> wrote in message
> news:Pine.GSO.4.33.020306...@holyrood.ed.ac.uk...
> [SNIP]
> > 3x speedups are typical just coding in asm, without the use of vector
> > instructions (7x over GCC on ppc). I consider myself a powerpc advocate,
>
> Can't say I've "typically" seen 3x speed up from assembler on

Right.

> anything other than trivial sequences of code. Maybe I've been
> hanging out with the wrong assembly programmers. :)

Mike Abrash got almost a factor of 3 when converting John Carmack's
Quake sw 3D render code to asm, but that required a _lot_ of work.

> OTOH I managed a 6x speedup over lovingly hand-crafted x86
> assembler using plain old C and the old bitslice trick. :)

Lovingly handcrafting the wrong algorithm doesn't count you know. :-)

> Mine code was portable too and would have picked up another
> good chunk of speed on an arch which supported 64bit ints.

First you write ANSI portable C, checking out different algorithms.

Then you tweak the best/fastest versions in such a way as to make the
compiler spit out the same code as you would do using assembler, or as
near as you can get, while still staying portable.

Finally you think long and hard about it before using some #ifdef's to
conditionally replace that C with inline asm.

Terje

Benjamin Goldsteen

unread,
Mar 12, 2002, 11:52:01 PM3/12/02
to
On 3/11/02 7:28 AM, in article
Pine.SGI.4.44.020311...@wendigo.bio.dfo.ca, "George N.
White III" <Whi...@dfo-mpo.gc.ca> wrote:

Whether a Macintosh is a good choice for high-density rackmount computing is
a different question. I would say "no" based on several factors. However,
if you want a rack-mount Macintosh, visit www.marathon.com. They will sell
you a kit to convert a PowerMac G4 into a 4U box (2 CPU per 4U -- not great
but no worse than what Sun is trying to sell me with their V880).

Marathon will also sell you a kit to convert old iMacs to 1U rackmount
chassis. Of course that would be a G3 processor and the whole start of the
discussion was the Altivec instruction set specific to the G4 processor.

On the other hand, I have hear rumors of real high-density rackmount
Macintosh systems (running MacOS X, of course) in the near future.

Anyway, how many people are actually bound by PhotoShop filters? Nothing is
wrong with faster, but how often do people with moderns computers sit around
waiting for PhotoShop filter results? Even without Altivec/SSE, I am sure
most of that stuff runs pretty fast on a 1GHz G4 or Pentium III.

I wouldn't mind general improvements to the basic operation of the program,
though. How much computer power does it take to support an image
manipulation program? Some of the most basic tasks are pretty sluggish.
Did anyone ever use CLRpaint for SGI. Snappier on an old Indigo2/XZ than
recent versions of PhotoShop on any computer I've used.

On the other hand, perhaps most computers buyers don't care about the speed
of BLAST but that doesn't mean that the speed of BLAST doesn't affect most
people. Who wants scientists to have slower computers? BLAST is more
likely to save someone's life than PhotoShop filters...

Chris Morgan

unread,
Mar 13, 2002, 12:28:04 PM3/13/02
to
Benjamin Goldsteen <Benjamin....@physbio.mssm.edu> writes:

> Anyway, how many people are actually bound by PhotoShop filters? Nothing is
> wrong with faster, but how often do people with moderns computers sit around
> waiting for PhotoShop filter results? Even without Altivec/SSE, I am sure
> most of that stuff runs pretty fast on a 1GHz G4 or Pentium III.

It's not necessarily people sit around waiting, it's also that the
more power there is, the more that becomes possible - think of
real-time signal processing, say a high-quality reverb or two when
recording music.
--
Chris Morgan

"Not so bad offer to discuss about"

- Best recent email spam subject line

Russell Williams

unread,
Mar 13, 2002, 2:01:17 PM3/13/02
to
"Benjamin Goldsteen" <Benjamin....@physbio.mssm.edu> wrote in message
news:B8B44521.130%Benjamin....@physbio.mssm.edu...

> Anyway, how many people are actually bound by PhotoShop filters? Nothing
is
> wrong with faster, but how often do people with moderns computers sit
around
> waiting for PhotoShop filter results? Even without Altivec/SSE, I am sure
> most of that stuff runs pretty fast on a 1GHz G4 or Pentium III.

For people doing 8X10s from digital cameras or film scans, it's not an
issue.
But for many professionals a 100MB file is small and 1GB documents are
common.
Speed is still very much an issue, and the speedups our professional
customers
see from AltiVec / SSE and SMP are a very big deal to them.

Russell Williams
not speaking for Adobe Systems

Niels Jørgen Kruse

unread,
Mar 13, 2002, 6:21:10 PM3/13/02
to
I artiklen <B8B44521.130%Benjamin....@physbio.mssm.edu> , Benjamin
Goldsteen <Benjamin....@physbio.mssm.edu> skrev:

> Whether a Macintosh is a good choice for high-density rackmount computing is
> a different question. I would say "no" based on several factors. However,
> if you want a rack-mount Macintosh, visit www.marathon.com. They will sell
> you a kit to convert a PowerMac G4 into a 4U box (2 CPU per 4U -- not great
> but no worse than what Sun is trying to sell me with their V880).

There is also a 2U box from Terra Soft, see
<http://www.terrasoftsolutions.com/products/gvs9000/>

It is loading more messages.
0 new messages