In short, the multiply support is the same as x86, except you have
prefixes to specify 64-bit operands and use more registers.
There are several forms that give single-word results, plus a couple
that give a double-word result in RDX:RAX.
I approve.
Providing a double-word result is of course critical for some things.
Some architectures don't have that, because they're dumb.
The additional registers are nice. When writing x86 code, i sometimes
think i really could use just one or two extra registers.
I don't know about the instruction timing, but seeing how the Athlon
beats everything except an Alpha, i bet it's fairly good.
--
Joe Keane, amateur mathematician
Excuse me....
Where does an Athlon beat a POWER4 in performance?
--
John D. McCalpin, Ph.D. mcca...@austin.ibm.com
Senior Technical Staff Member IBM POWER Microprocessor Development
"I am willing to make mistakes as long as
someone else is willing to learn from them."
> In article <vSe08.2608$gf.6...@sjc-read.news.verio.net>,
> Joe Keane <j...@jgk.org> wrote:
>>
>>I don't know about the instruction timing, but seeing how the Athlon
>>beats everything except an Alpha, i bet it's fairly good.
>
> Excuse me....
>
> Where does an Athlon beat a POWER4 in performance?
I think he was talking about multiple precision multiply. What are the
latencies and throughput for the POWER4 multiplier?
--
Mvh./Regards, Niels Jørgen Kruse, Vanløse, Denmark
> Excuse me....
> Where does an Athlon beat a POWER4 in performance?
Do you sell Power4 desktops single/dual cpu machines yet?
And I bet it beats power4 in perfomance you get for a unit amount of money 8-)
> --
> John D. McCalpin, Ph.D. mcca...@austin.ibm.com
> Senior Technical Staff Member IBM POWER Microprocessor Development
> "I am willing to make mistakes as long as
> someone else is willing to learn from them."
--
Sander
+++ Out of cheese error +++
Let's see. We already had an Alpha mention. Now we have someone
claiming that Power4 doesn't matter because it doesn't sell to the
desktop market. This has got to be the shortest thread to ever hit
these two comp.arch laws (are there more?).
How about I just call you all Nazi's and we end this thread right here
Brannon
> [snip]
> > Do you sell Power4 desktops single/dual cpu machines yet?
>
> Let's see. We already had an Alpha mention. Now we have someone
> claiming that Power4 doesn't matter because it doesn't sell to the
> desktop market.
Last week I watched the MacWorld keynote with a friend who owns an Apple
VAR, and we happened to talk about the Power4. While not being aware of
exactly how much IBM sells them for, he seemed pretty sure that he could
sell a good number of higher powered Macs even if they were much more
expensive, provided that they ran Photoshop or Maya or whatever faster.
He thought the downside to Apple rebadging IBM hardware (with OS X on
it) i they didn't want to do the R&D themselves would be zero and the
mere existance in the catalogue of something like that as an upgrade
path would make it easier to sell G4s in some quarters.
-- Bruce
> Last week I watched the MacWorld keynote with a friend who owns an Apple
> VAR, and we happened to talk about the Power4. While not being aware of
> exactly how much IBM sells them for, he seemed pretty sure that he could
> sell a good number of higher powered Macs even if they were much more
> expensive, provided that they ran Photoshop or Maya or whatever faster.
Is the lack of Altivec a problem?
--
Erik Corry er...@arbat.com
Interviewer: "Real programmers use cat as their editor."
Bill Joy: "That's right! There you go! It is too much trouble to say ed,
because cat's smaller and only needs two pages of memory."
> Let's see. We already had an Alpha mention. Now we have someone
> claiming that Power4 doesn't matter because it doesn't sell to the
> desktop market. This has got to be the shortest thread to ever hit
> these two comp.arch laws (are there more?).
I didn't say it doesn't matter. If power4 does not sell to the workstation
market it is trivialy easy to beat it there - a 486 is sufficent.
> Brannon
> Last week I watched the MacWorld keynote with a friend who owns an Apple
> VAR, and we happened to talk about the Power4. While not being aware of
> exactly how much IBM sells them for, he seemed pretty sure that he could
> sell a good number of higher powered Macs even if they were much more
> expensive, provided that they ran Photoshop or Maya or whatever faster.
Apparently, a good lot of machines -- for personal use, that is --
sold are high end. Most places I've worked seems to insist on just
about the most full-featured equipment they can get.
Basically, even a very high end PC is cheap compared to labor costs,
so I suppose there's a market for even faster computers.
-kzm
--
If I haven't seen further, it is by standing in the footprints of giants
The Alpha architecture doesn't have those on purpose: it destroys instruction
regularity, in particular by requiring extra decoding
interdependencies and at least one extra register write port. On
Alpha, you get to choose whether you want the lower or the upper
word of the result instead.
Jan
However, nothing prevents AMD from decoding a double-word mul into two
operations.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
> Bruce Hoult <br...@hoult.org> wrote:
>
> > Last week I watched the MacWorld keynote with a friend who owns
> > an Apple VAR, and we happened to talk about the Power4. While
> > not being aware of exactly how much IBM sells them for, he
> > seemed pretty sure that he could sell a good number of higher
> > powered Macs even if they were much more expensive, provided
> > that they ran Photoshop or Maya or whatever faster.
>
> Is the lack of Altivec a problem?
Not if a Power4 (or 3) is faster than a G4 even without it :-)
IBM had the chance to adopt Altivec, but appeared to think they could
build faster machines without it. Whether that meant faster on vector
codes or only faster on scalar I don't know.
If the Power machines have huge memory and disk bandwidth then that
might make them worthwhile on a lot of Photoshop-type work even without
Altivec. Not everything is gaussian blur or rotate.
There aren't yet many programs that *require* Altivec. iDVD requires a
G4, and I assume that's because of Altivec.
Even without Photoshop and Sorenson encoding and so forth, if Apple is
going to break into the corporate market then they need some real server
hardware with all the good stuff that Suns and Intel-based servers have.
IBM has such machines which presumably shouldn't be *too* hard to get OS
X going on.
I really don't know whether Power3 or Power4 Macs would be any faster
than G4 Macs. I sure hope someone at Apple *does* know and the
non-existence of them is for good reasons rather than because no one
checked.
-- Bruce
Rememeber, Power4 is currently geared towards database and scientific
workloads folks. Neither of these markets use the machines to play
DVDs or edit 2d images. I would guess that multimedia workloads don't
even appear on the designer's radar for these first designs.
If a version of Power4 is created for the consumer/artist market, I
would presume IBM would make sure multimedia performance was decent.
Perhaps they will bolt an Altavec (or Altavec like unit) onto the side
of the design...
Chris
--
Chris Colohan Email: ch...@colohan.ca PGP: finger col...@cs.cmu.edu
Web: www.cs.cmu.edu/~colohan Phone: (412)268-4751
> Bruce Hoult <br...@hoult.org> writes:
> > Not if a Power4 (or 3) is faster than a G4 even without it :-)
> >
> > IBM had the chance to adopt Altivec, but appeared to think they could
> > build faster machines without it. Whether that meant faster on vector
> > codes or only faster on scalar I don't know.
>
> Rememeber, Power4 is currently geared towards database and scientific
> workloads folks.
Right. I would have thought that the scientific workload folks were
interested in Altivec -- or at least the ones or whom single precision
is enough.
> Neither of these markets use the machines to play
> DVDs or edit 2d images. I would guess that multimedia workloads don't
> even appear on the designer's radar for these first designs.
While there are some things in Photoshop which are CPU limited, I would
have thought that there was a lot more that were memory bandwidth
limited and which would therefore benefit from the kinds of machines IBM
makes.
> If a version of Power4 is created for the consumer/artist market, I
> would presume IBM would make sure multimedia performance was decent.
But do we know that it's bad at the moment?
-- Bruce
> IBM had the chance to adopt Altivec, but appeared to think they could
> build faster machines without it. Whether that meant faster on vector
> codes or only faster on scalar I don't know.
There were rumours at the time (sorry, no actual evidence) that
IBM didn't want to build AltiVec chips because their designers
thought AltiVec would be difficult to implement at high clock
speeds. With the IBM G3s now hitting 1Ghz, that looks a pretty
good call in hindsight.
The new Nintendo has an IBM PPC variant in it. Presumably this
has some kind of SIMD instruction set (if only for bragging
rights compared to Sony and XBox). Anybody know anything about
it?
> I really don't know whether Power3 or Power4 Macs would be any faster
> than G4 Macs. I sure hope someone at Apple *does* know and the
> non-existence of them is for good reasons rather than because no one
> checked.
Too small a market niche probably. Apple did in the past
rebadge some IBM servers, but they were never really popular
and eventually Apple gave up on them. One of the clone makers
(when Mac clones were allowed) tried to build massive crunchers
for the graphics market, but that wasn't viable either.
Presumably it is too expensive, and takes too long, to design
a specialised box that will sell in sufficient numbers before
the next iteration of the regular desktop machines overtakes
it.
Hugh Fisher
It packs two floats into one 64bit FP register, much like AMD's 3Dnow!.
With multiply-add this results in 4 floating point operations per cycle,
~1.8 GFLOPS in all.
Cheers
Martin
Although I have no dealings with Apple (other than as a long-time
Mac user), I suspect that the current POWER4 chips have some
features that are disadvantages from Apple's point of view.
The POWER4 chips dissipate a lot of heat. I don't know what we
have disclosed in public, but it is certainly in excess of 100
watts for the dual-cpu chip.
The POWER4 chips are large (over 400 mm^2) and are therefore not
cheap.
Of course, future process technologies will make the chips smaller,
cheaper, and cooler.
Most scientific computing is done, and most scientific code is written,
in 64-bit floating-point. ("Real supercomputing" used to define 64-bit
fp as "single precision" and 128-bit as "double precision", but this
usage is less common today.)
That's not to say there aren't a few areas where 32-bit fp suffices.
But they're relatively uncommon even within the number-crunching world.
So alas, Altivec isn't much use for "normal" (64-bit fp) scientific
work.
--
-- Jonathan Thornburg <jth...@aei.mpg.de>
Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
Golm, Germany http://www.aei.mpg.de/~jthorn/home.html
"Washing one's hands of the conflict between the powerful and the
powerless means to side with the powerful, not to be neutral."
-- quote by Freire / poster by Oxfam
Once upon a time, IBM was sticking PowerPC 604 processors into
some of their boxes and running POWER binaries on them. They'd
handle the missing instructions by simulating them on invalid
opcode trap, which was very slow, but since the 604 was running
most instructions so much faster than the POWER implementations
of the time (might have been "POWER implementations of similar
price bracket"; I don't recall exactly), the net performance was
very good.
It seems to me that apple might do the same thing with POWER4,
handling the AltiVec instructions via trap, if the resulting
system was much faster for most applications (ie, those that did
not rely very heavily on AltiVec).
-- TTK
On 15 Jan 2002, TTK Ciar wrote:
>
> Once upon a time, Bruce Hoult <br...@hoult.org> said:
> >> Is the lack of Altivec a problem?
> >
> >Not if a Power4 (or 3) is faster than a G4 even without it :-)
>
[Snipped emulate missing instruction]
> It seems to me that apple might do the same thing with POWER4,
> handling the AltiVec instructions via trap, if the resulting
> system was much faster for most applications (ie, those that did
> not rely very heavily on AltiVec).
And those that do will have G3 alternative code paths in any case.
Peter
> -- TTK
>
>
Ideally, of course, one would trap to a dynamic recompilation
routine of appropriate complexity...
paul
> The POWER4 chips are large (over 400 mm^2) and are therefore not
> cheap.
>
> Of course, future process technologies will make the chips smaller,
> cheaper, and cooler.
I have been thinking that the low CPU count / lower margin systems are
waiting on a shrink of the POWER4.
There's no need, since most applications containing AltiVec code have
AltiVec and non-AltiVec code which they select at runtime so they can
run on all those iMacs and iBooks.
-- Bruce
> There were rumours at the time (sorry, no actual evidence) that
> IBM didn't want to build AltiVec chips because their designers
> thought AltiVec would be difficult to implement at high clock
> speeds. With the IBM G3s now hitting 1Ghz, that looks a pretty
> good call in hindsight.
What is so difficult about dividing a (vector) pipeline into more stages?
Trying to keep more instructions in flight (than the G3) without splitting a
stage is another matter.
>I have been thinking that the low CPU count / lower margin systems are
>waiting on a shrink of the POWER4.
If so, they'll have to wait a while. IIRC, its already made in .13u, .10u
is going to be a while. They are probably saving up POWER4 dies where one
core tested good and one tested bad to be used in low end desktops.
--
Douglas Siebert dsie...@excisethis.khamsin.net
He who laughs last thinks slowest.
Already selling them for HPC. Besides we got SSTAR and 630 for that.
Aaron Spink
speaking for Myself Inc.
> The POWER4 chips dissipate a lot of heat. I don't know what we
> have disclosed in public, but it is certainly in excess of 100
> watts for the dual-cpu chip.
> The POWER4 chips are large (over 400 mm^2) and are therefore not
> cheap.
> Of course, future process technologies will make the chips smaller,
> cheaper, and cooler.
> --
> John D. McCalpin, Ph.D. mcca...@austin.ibm.com
Plus there is that issue with cache line size and Apple's use of dcbz
YRI. (You recall incorrectly.) ;-)
POWER4 is made in the IBM 8S3 process, which is 0.18 microns.
The 0.13 micron process (IBM 9S) is operational, and we have POWER4
(shrink) parts in the lab produced in the this process, but we have
not publicly announced any of the products that contain these
parts. The CMOS 9S process was first announced in December, 2000.
On the other hand, multimedia processing has a *lot* more in common
with scientific workloads than it does with office-style apps. And the
POWER architecture *does* have a streaming load-store unit that helps a
lot with these sorts of things (better for multimedia than it does for
some scientific work-loads, actually: the latter are too likely to
have too many non-stride-1 streams for it to handle well).
Given my experience optimizing scientific loads on it (and on other
systems), I strongly suspect they *can* do at least as well on simpler
vector codes with their existing architecture than with Altivec.
Carlie J. Coats, Jr. co...@emc.mcnc.org
MCNC-Environmental Modeling Center phone: (919)248-9241
North Carolina Supercomputing Center fax: (919)248-9245
3021 Cornwallis Road P. O. Box 12889
Research Triangle Park, N. C. 27709-2889 USA
"My opinions are my own, and I've got *lots* of them!"
>
> I artiklen <laranzu-1501...@1cust23.tnt3.cbr1.da.uu.net> ,
> lar...@ozemail.deletethis.com.au (Hugh Fisher) skrev:
>
> > There were rumours at the time (sorry, no actual evidence) that
> > IBM didn't want to build AltiVec chips because their designers
> > thought AltiVec would be difficult to implement at high clock
> > speeds. With the IBM G3s now hitting 1Ghz, that looks a pretty
> > good call in hindsight.
>
> What is so difficult about dividing a (vector) pipeline into more stages?
>
I guess nothing. Compare the vector ALUs of MPC7400 and MPC7450.
Holger
I would just be curious to know how many bytes a dcbz will set to zero
on a POWER4. I wouldn't be surprised at all if the number is 32, same as
on all PowerPC processors ever built, no matter what the cache line size
is. Unfortunately I have never been able to find a POWER4 instruction
manual.