Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

RISC vs. CISC

17 views

Skip to first unread message

Randall Hyde

unread,

Oct 26, 2001, 1:55:55 PM10/26/01

This is a continuation of a subthread from
"Re:Disadvantages of Assembly" that I'm separating
out for the obvious reasons...
Randy Hyde

"Jerry Coffin" <jco...@taeus.com> wrote in message
news:MPG.1642ac2dc...@news.direcpc.com...
> In article <bF0C7.4404$O4.639...@newssvr13.news.prodigy.com>,
> rh...@transdimension.com says...
>
> [ ... ]
>
> > Well, I am certainly unaware of the fact that they use microcode.
> > See additional comments later.
>
> I'll put up a photomicrograph of a PPC 850 tomorrow morning. There's
> a section labeled "instruction sequencer". If you look in that,
> you'll see that on the left side of that part is random logic, but on
> the right side are a couple of rather solid-looking black areas.
> You'll notice that these are black, and each has a narrow stripe down
> the middle of a lighter color. Those are ROMs, and the stripe down
> the middle is the logic to decode an address, and put the bits from
> the memory back together into bytes/words/whatever. Offhand I don't
> remember which of those holds microcode, but one of them certainly
> does. You'll note that they're located basically directly between
> the instruction sequencer logic and the execution units -- the
> instruction sequencer basically generates addresses, feeds them to
> the microcode ROM, then the instructions go from there to the
> execution units.
>
> The file should be at:
>
> ftp://www.taeusworks.com/ppc850.jpg
>
> by roughly 10:00 AM Friday, and I'll leave it there until next Monday
> or Tuesday (or so). It's about 295K.

They may or may not be, I don't know.
It's dangerous to make the assumption that this is microcode ROM storage.
Now if an engineer who worked on this chip would verify that this is indeed
microcode and that the PPC uses microcode, I would very rapidly cede this
point to you. But I can't draw the same conclusions you do without any
firm evidence.

> [ ... ]
>
> > You'll forgive me for
> > wanting something beyond a nebulous claim made in a post to
> > a newsgroup, but knowing the work involved in adding a
> > microengine/sequencer and microcode ROM to a CPU to
> > support "a few instructions in microcode" doesn't sound right to me.
>
> Try writing designing something like a floating point divider or a
> reciprocal square root estimator in pure hardware, and you'll quickly
> find that a microcode sequencer is fairly trivial by comparison.

Of course it is. This has always been the case. Indeed, one of the
big problems in the Zilog vs. Intel wars many years ago is that
Intel went with microcode while Zilog continued to use random
logic. Zilog never was able to get the Z80K functional within a
reasonable market window to compete with the '386. The
comparison is quite similar to that between using assembly and
using a HLL. For large projects, limited schedules, and
a set of average engineers, the HLL approach is probably
going to be safer -- as is the microcode approach. That's
why IBM started using microcode in the 360 in the first
place, as an example.

>
> Also note that much like the better know architectures of the AMD and
> Intel chips, most RISC chips decode their instructions into an
> internal representation, and have instruction sequencers that have to
> schedule them among the execution units anyway. As such, they've
> basically already got the "guts" of a microcode instruction sequencer
> regardless of whether they use microcode or not. As such, using
> microcode, even for a relatively small number of instructions,
> doesn't really add a huge amount of complexity in any case.

And that may very well be the purpose of that "ROM" bank in the
picture. Feed 32 bits in to several ROMs and get several different
control lines out in order to fire the other sequential logic. A very
common decoding technique that is *not* microcode. Just a fancy
way of doing a decoder than can actually be faster than random
logic when the number of inputs gets high enough. But this is
conjecture with respect to those photomicrographs (just another
guess, much like yours concerning microcode).

>
> > I would love to be proved wrong because I usually argue against
> > the benefits of RISC just as vehemently as you are. However,
> > I really don't feel like entering a debate making claims like this
> > without well-established facts, so please pass this information along,
> > I would find it to be of great use.
>
> Unfortunately, there are only two groups of people who know enough to
> say much authoritatively: CPU designers generally _can't_ talk much
> about it because they're normally under non-disclosure agreements. I
> happen to belong to the other group: people who do reverse
> engineering, in our case to enforce patents.

I seriously doubt that admitting the use of microcode would be a problem
(other than, perhaps, the embarrassment of admitting that your "so-called
RISC"
device fails to following one of the basic tenets of RISC design, I
suppose).

> If you had a patent you
> thought somebody was infringing, we'd be quite happy to investigate
> the product in question and figure out whether it did or not. OTOH,
> the work's expensive enough that nobody can normally plan on doing it
> just for the sake of winning an argument -- if I started giving away
> a bunch of pictures just to prove my points, I'd get fired. I can
> give this one away because it's really not one of our photos at all
> -- it's one I got from the Microprocessor Report about three years
> ago (and the markings of what's there were their's also).

Microcode is so old that the general idea is no longer covered by
any patents. I find it hard to believe that simply admitting the use
of microcode would give away any important trade secrets.
We're not talking about paten details here, we're talking
very general computer architecture stuff that you're going to find
in every college textbook on the subject. Now I will certainly
admit that in the past 3-4 years I've not been following the architectures
of the latest and greatest RISC processors and they may have started
slipping in microcode during that time, but prior to this point I'd never
heard of a major RISC device (that didn't simply use RISC as a
marketing gimmick, much like Intel uses the "RISC core" moniker)
that admitted to using microcode. Furthermore, this is the first time
in any newsgroup thread I've participated in where someone made
the assertion that mainstream RISC CPUs use microcode.
I am not saying that this isn't possible. I'm simply saying that this
is the first I've heard of it and you'll forgive me for being skeptical.

Now it's quite possible that modern CPU designers have
"lost the religion" and have injected microcode back into
their designs in order to make their lifes easier. Clearly,
RISC instruction sets are a lot more complex today than they
were 10-15 years ago. Maybe this is because we've had a
bunch of engineers moving over to the RISC chips from
the CISC designs and they've influenced the designs of these
chips. But you still have to forgive me for not jumping to
conclusions on a matter such as this; I really want an
engineer to say "I worked on such and such chip, which
is unquestionably RISC, and it uses microcode."
Then I'll be a true believer. Note, as Hennessey and
Patterson point out, there is fundamentally nothing wrong
with using microcode in a RISC design, assuming memory
speeds are sufficiently high in relation to what's achievable
with random logic. Maybe we've crossed that threshold
because of the complexity of modern RISC devices.
I don't know.

>
> > Because the microsequencer has to read the microcode from the
> > microcode ROM (the slow memory element). And that delay is
> > currently (or, at least, when RISC design was fashionable) greater
> > than the amount of time needed to propagate the operations through
> > random logic.
>
> Ah, I see what you were talking about. The problem is that microcode
> ROMs are built with a logic process, not a normal RAM or ROM process
> at all. It _is_ possible to combine a memory process onto a logic
> chip (e.g. NeoMagic made a big splash with a process to combine DRAM
> and logic on a single chip a few years ago) but memory on a CPU is
> normally done using a logic process that doesn't suffer the speed
> disadvantages of a normal memory process.

Well, I don't know about modern processors, but back in the days
I did work on an AMD 2900 bit slice device, we use fairly normal
ROM devices (if ECL is normal) for the microcode. Maybe
times have changed.

>
> Just for example, the microprocessor registers are really memory, but
> they're obviously VERY fast. In fact, in some cases, such as most
> recent Alphas, the registers are just about the fastest parts on the
> chip -- one of the reasons Alphas are so fast is that they allow you
> to both write to a register AND read that value from the register to
> feed to another instruction that depends on it, both in the same
> clock cycle. They do that by running them at double the clock speed
> of the rest of the chip...

Certainly they are memory in the sense that the bits in the registers
are latches, but their decoding and access is quite a bit different
than standard memory. Dual-port access to registers certainly
isn't limited to the Alpha; most modern pipelined processors
support this. AFAIK, multiport access is not achieved by
time-multiplexing, though this may very well be true for the
Alpha, I don't know.

>
> > I will assume here that this miscommunication was really my error
> > and I will not assume that you are ignorant of how microcode really
> > works, despite that everything about your reply suggests that this is
> > the case. Perhaps I should request that you provide an explanation
> > of how you feel microcode works? Maybe we're both using the
> > same term for two completely different things?
>
> It would be irrelevant -- I didn't realize that you were talking
> about microcode at all. In any case, it's mostly irrelevant because
> it assumes that microcode ROMs are similar to other ROMs, which
> really is NOT the case at all. At one time it's true that microcode
> was stored in ROMs that were similar to normal ROMs like you'd use
> for the BIOS, etc., but that was when the microcode ROM was a chip by
> itself, separate from the ALU chip(s). (In fact, if you look back
> far enough, you can find a couple of machines, such as a couple of
> the smallest ones in the IBM 360 series that actually stored
> microcode in the normal main memory -- but that was 30+ years ago).

You don't have to go back quite that far. The LSI-11 and AMD 2900
both used separate chips for the microcode storage. I've worked with
both. WCS (using RAM for microcode) was also done on occasion
(I suspect this is your reference to the 360 above).

However, whether the ROM was on-CPU or off CPU, I don't see how
that makes any difference. What is so special about microcode ROM
that you feel it's not a standard ROM technology? I don't understand.

>
> Simply put, the minute you put the microcode onto the same chip as
> the rest of the CPU you have _almost_ no choice but to also use a
> normal logic-type process to fabricate it. That reduces your density
> and increases the power consumption, but does NOT suffer from the
> speed problems you're talking about.

The fact that the on-chip microcode ROM might be faster than an off-chip
ROM does not mitigate the fact that the CPU must decode the incoming
instruction (often a very complex processing on CISC machines, causing
a fair delay by itself), use the opcode to select a starting point in the
microcode,
fetch the VLIW data, and operate on that. This sequence of steps is usually
slower (for simple instructions, like those typically found on RISC devices)
than going through the microcode interpretation process. Now if the
microcode
is many times faster than the random decoding logic, it *is* possible to
step through a few microcode sequences to interpret an instruction faster
than it is to decode those instructions with random logic.
Although Hennessey and Patterson's book is rather old at this point
(and the numbers they give are way off), the basic principle is sound.
The last I heard (3-4 years ago), the microcode vs. random logic
tradeoff still favored random logic from a pure performance point of view.
That may have changed recently. It's also quite possible that CPU designers
are willing to give up some performance in order to get their designs out
the door faster (microcode *is* easier for a lot of stuff).

>
> > But it has nothing to do with microcode (which is generally VLIW
> > code).
>
> Well, sort of. Certainly microcode instruction works are typically
> very large, but no, they're NOT much like what VLIW normally means.
> Specifically, VLIW normally means packing a number of instructions
> together into a packet in much the same way as the Itanium does. The
> only chips I know of that use microcode much like that are the AMD's
> (K5, K6, etc.)

Traditionally, VLIW lets you control all the functional units on the CPU
with a single machine word. This is exactly what traditional microcode
needs to do. Now there is quite a bit of difference between microcode
and general purpose VLIW computers (and still more difference between
those and the IA-64), but microcode is (or, at least, *was*) a VLIW-like
architecture.

>
> > > BTW, I'm hardly the only person thinking this way either. Just for
> > > example, you might want to take a look at the new instruction set
> > > Ubicom has designed for the IP2022 -- while somebody who wasn't
> > > familiar with Hennessey and Patterson's work might think "RISC" (e.g.
> > > it has on 39 instructions) it clearly does NOT follow the RISC
> > > philosophy in any noticeable way (e.g. most instructions can operate
> > > directly on memory operands).
> >
> > A common misconception is that RISC = Reduced (Instruction Set)
Computing
> > In fact, RISC = (Reduced Instruction) Set Computing.
>
> Like I said, to somebody not familiar with Hennessey and Patterson's
> work, 39 instructions sounds like RISC, even though, as I noted
> there's no real relationship in this case.
>
> OTOH, you should also read some of the earliest papers on RISC as
> well: you'll find that early on, it really WAS largely about reducing
> the size of the instruction set. It was only after that proved to be
> grossly ineffective that they changed their minds and re-defined what
> it was supposed to mean.

Yes, limiting the amount of work that each instruction does tends
to reduce the number of instructions since you eliminate all those
instructions that do tons of weird things. But having a small
number of opcodes does not equate to RISC. I remember
people trying to claim the 6502 was a RISC device because
it had fewer instructions that the 8080 and 6800 families.
In fact, the 6502 was probably the most CISC-like of the
bunch when you considered the definition of RISC.

>
> > I'm counting on that too (you don't see me rushing to write an
> > IA-64 edition of AoA). OTOH, Intel thinks in terms of
> > decades and they definitely expect to be selling a different
> > processor line for general computing tasks 10-20 years
> > from now.
>
> >From their attitudes, I'm not quite so sure...

Why give your customers reason to abandon your cash cow?
If Intel makes a big deal about switching everyone from IA-32
to IA-64, many customers may feel that they may as well
switch to SPARC or something else. OTOH, if they wait
until the IA-64 has a good showing (or, for that matter,
fails completely, don't forget the iAPX-432), they preserve
their x86 cash cow. Seems to me that they're making an
excellent marketing decision to keep their mouths shut until
they can really deliver (unlike, say, Palm or Apple).

>
> > However, the talk of the IA-64 not being used in desktop machines
> > will quickly fade when applications (e.g., digital video and others
> > yet to come) that need more than 4GB memory and memory
> > prices get to the point we can cheaply install 64GB in our machines
> > (by Moore's law, that's less than 10 years from now).
>
> Maybe. Keep in mind, however, that Intel has supported more than 4
> GB of physical memory for _years_ now -- since the Pentium Pro, to be
> exact. Yes, it's inevitable that eventually it'll be awfully handy
> to have more than 4 Gig per process as well, but I'd also note that
> the backward compatibility in the IA-64 isn't particularly compelling
> (as you've already pointed out). As such, if you're going to move to
> a 64-bit processor, IA-64 is NOT just an obvious choice.

I believe the examples I gave made it clear I was talking about
4 GB for a single application. Now granted, not everyone
needs to edit hour long DV movies in memory (which,
realistically, takes about 32 GB), but the future holds some new
applications that are real memory hogs (10 years ago, I would have
been astounded to believe that I'd set aside 64MB for an application
like Adobe GoLive).

Intel is sure working towards making the IA-64 the obvious choice
for 64-bit computing. Obviously, like all new chips and technologies,
they're marketing this for the servers; but with the possible exception
of the Pentium Pro, every server chip they've created in the IA-32
family has become a mainstay desktop CPU. OTOH, what do think
is the possibility that Sun or HP will attempt to move one of their
64-bit chips into the consumer space? Of course, there's always
AMD with their Hammer; and it will, undoubtedly, do well during
the transition between 32 and 64 bits, but once people are writing
pure 64 bit code, I suspect we'll care as much about the Hammer
as we do about the fact that the Pentium can run 8086 code today.

[snipped: info on power consumption that I basically agree with]
[snipped: info on die size that is becoming increasingly irrelevant to this
thread]

>
> > Originally, there was some argument about how RISC chips
> > could be cheaper than CISC chips because of the reduced
> > die size. To my knowledge, that fact has never been true.
> > x86 chips (and other CISC chips, for that matter) have competed
> > rather well with the RISC chips on price. With the exception
> > of the PowerPC, no RISC chip has made it into the
> > consumer general purpose PC market, so it's really hard to
> > draw comparisons here.
>
> Sort of true. Then again, there were things like the DEC 21164PC
> (yes, the "PC" really does mean the obvious thing -- it was aimed at
> the PC marketplace). It was approximately concurrent with the
> Pentium/MMX. It used a 137 square millimeter die in a .35 micron
> process with four layers of metal. The Pentium/MMX also used .35
> micron, 4 layer metal fabrication, but had a die area of 129 square
> millimeters. Interestingly enough, the Intel chip had more
> transistors (4.5 million vs. 3.4 million for the 21164PC) despite
> having a smaller die area. Usually, you get substantially higher
> density in the cache areas than in logic areas, so this _tends_ to
> indicate that the Intel chip was actually devoting a larger
> percentage of the chip to the cache than the DEC chip was. IOW,
> despite being a "RISC", the DEC chip apparently had substantially
> greater die area devoted to decoding and executing instructions.

More likely, DEC devoted a larger area of the die to interconnects.
This is one area where design expertise, leading way back to the 8088,
helps. The longer you've been reworking the same basic design, the
more you're able to tune it (e.g., move functional blocks around on the
die to reduce the expense of the interconnects).
This could also be due to something as simple as different design
tools (i.e., CAD software). But this, this was also offered as an
advantage to RISC by the original proponents - RISC design is
supposedly easier. It doesn't require the experience or care to
achieve a given performance level. Whether this is true or not
is certainly debatable, but that was an assertion.

>
> > Intel machines work great with four processors. Eight gets iffy;
> > and at 16 you've reached the point of diminishing returns
> > (based on Xeon/PIII observations, I haven't looked into what
> > the PIV versions of Xeon will offer). It takes a *lot* of effort
> > and research to do better than this. If it were easy, everyone would
> > be doing it and they wouldn't stop with 32 or 64 processors, they'd
> > be scaling up to 256, 1024, or more (true massive parallel systems).
>
> Oh, don't get me wrong: I'm not trying to say that scaling an SMP
> machine to 1024 processors (for example) is easy. What I'm saying is
> that the difference between 8 and 32 (for example) is NOT really all
> that huge.

Crossing the 16-processor barrier in SMP has been a big problem in
the past. As you note below, the bus saturates so you have to go to
a different design.

>
> You'll also note that I more or less avoided the Xeons as an example:
> they use a relatively simple design in which all the processors use a
> shared bus to memory. Even with quite large local caches, that
> shared bus quickly gets saturated, and as you've noted the basic
> design rarely works well with more than 8 processors or so (though it
> depends somewhat on the size of cache local to each processor).
>
> The Hammer, however, has a design _very_ much like Sun uses: each
> processor has four separate high-speed busses. Each processor
> typically devotes one bus to connecting to memory, and the other
> three to talking to other processors. This means memory latency
> becomes non-uniform, depending on the number of other processors you
> go through before you reach the memory you're addressing. OTOH, Sun
> has proven it to work quite well with relatively large numbers of
> processors as well.

Well, if the Hammer is a NUMA device, it will be interesting to see
how it works out. But I still argue that a lot of care has to go into
the design of parallel systems that scale well beyond 16 processors.
That seems to be where Sun, HP, etc., are putting their efforts -
system performance not CPU performance. Most CISC chips today
can't compete up there. Personally, I don't have anything against the
x86 architecture and I don't particularly like RISC. So I'd love to
see the Hammer succeed in this marketplace. But success remains
to be seen.

>
> I'd also note that a large part of the reason more processors is
> working well lately is also a change in work loads. Sun, for
> example, sells most of their big machines as big web servers. Web
> servers mostly work with a lot of relatively small pieces of data
> that are independent from each other, AND changes only relatively
> rarely. This minimal coupling makes it MUCH easier to use lots of
> processors without spending lots of bandwidth keeping them in synch
> with each other.

True. Now why aren't the CISC processors doing well in this arena?
Do you believe that it's just marketing? Of course, you can always
take the "blade" approach of stacking a bunch of separate computers
into the same rack (common way of doing web servers with Intel
machines), but one would hardly call this "multiprocessing" since the
machines are operating independently of one another and the only
bus that connects them is a network connection (talk about NUMA!)

>
> > So I think you're underestimating the effort. OTOH, I must admit
> > that I haven't looked into these issues in a while, so maybe the world
> > has changed while I wasn't looking and things are a whole lot easier
> > now (or uniprocessor design is much more complex).
>
> To some extent, both of these are true. If you look at trends, it
> becomes fairly apparent that something has changed: at one time a
> machine with a lot of processors was a rare and expensive beast from
> a specialized manufacturer, and you could almost plan on having a
> full-time technical staff to keep it running at all and a bunch of
> custom programmers to take advantage of any more than a fraction of
> its capabilities.
>
> Nowadays, nearly every hardware vendor around has at least a few
> multiprocessor machines in their lineup. On the software side, every
> copy of Windows NT or 2K ever sold has supported at least two
> processors, and 4 and 8 processor versions are perfectly normal off-
> the-shelf items as well (in fact, I've never seen a dual processor
> machine running the version that supports only two processors --
> every dual processor machine I've seen has been running at least
> NT/2k Server, which supports 4 processors).

Ouch.
Every dual processor system I've bought is just running the
two-CPU version. The OS is a bit cheaper!
Randy Hyde

ccr...@crayne.org

unread,

Oct 27, 2001, 1:54:24 AM10/27/01

In <DehC7.4258$mL5.154...@newssvr14.news.prodigy.com>, on 10/26/01
at 05:55 PM, "Randall Hyde" <rh...@transdimension.com> said:

:For large projects, limited schedules, and

:a set of average engineers, the HLL approach is probably
:going to be safer -- as is the microcode approach. That's
:why IBM started using microcode in the 360 in the first
:place, as an example.

Since you went to the trouble of forking this thread from the ASM vs HLL
one, I will not respond to that part of the above quote, but as to the
reason that IBM used microcode in the 360, it had nothing to do with
meeting schedules.

The IBM 360 family (to the best of my knowledge) was the first time that
anyone had committed to building a set of computer models with an order of
magnitude spread of processing speeds between the slowest and the fastest
(and corresponding differences in pricing), but which all executed the
same object code. If you think about this from an engineering point of
view, you might well ask (as many people did, at the time), "How is this
possible?".

The primary technique was microcode, which was then, and probably still
is, slower, but less expensive, then hard wired logic. Thus, the 360-30
was the most heavily microcoded model in the original announcement
package, and each higher numbered model used less microcode, and more hard
wired instructions. The 360-95 (which, as I recall, was not in the
original announcement, was the first of IBM's pipelined machines, and had
little (if any) microcode.

As to the early days of RISC, at least on the engineering side, reducing
the instruction set (however you choose to punctuate it) was never a goal,
but rather merely a means to an end. In those days, we were just beginning
to get away from the concept that the clock time must be set to allow the
slowest logic process to complete, and thought that carry prediction was a
slick new trick. So, if we logic designers could drop every instruction
which took more than one cycle to execute, and the circuit designers could
squeeze the clock width to the absolute minimum, we would have the cpu
equivalent of an Indy car.

Of course, this goal was met decades ago, and therefore, 'pure' RISC has
been obsolete ever since.

-- Chuck Crayne
-----------------------------------------------------------
ccr...@crayne.org
-----------------------------------------------------------

Tony Jester

unread,

Oct 27, 2001, 1:56:02 AM10/27/01

Sir,

After reading over your AoA books and reading your many interesting
posts here, I just gotta ask (in a humble and respectful way), just
where DO you find the time to type so much? My first wife could type
120 wpm, but I'll bet you'd blow her doors off.

-Tony-

Kuno Woudt

unread,

Oct 29, 2001, 9:55:52 AM10/29/01

Tony Jester <tje...@nospam.com> writes:

> After reading over your AoA books and reading your many interesting
> posts here, I just gotta ask (in a humble and respectful way), just
> where DO you find the time to type so much? My first wife could type
> 120 wpm, but I'll bet you'd blow her doors off.

haha, I don't think typing speed is the bottle neck with these kind of
things.

Anyway, just curious - How many characters are in a word? I get around
220 cpm iirc.

-- kuno.

Randall Hyde

unread,

Oct 30, 2001, 2:55:51 PM10/30/01

"Tony Jester" <tje...@nospam.com> wrote in message
news:3BDA4270...@nospam.com...

> Sir,
>
> After reading over your AoA books and reading your many interesting
> posts here, I just gotta ask (in a humble and respectful way), just
> where DO you find the time to type so much? My first wife could type
> 120 wpm, but I'll bet you'd blow her doors off.
>

Not at all.
I just spend *years* writing this stuff.
As for posts, Beth has me beat, hands down.
I know how much time *I* waste in the newsgroups,
I'd suspect that she lives here.
Randy Hyde

0 new messages