CISC architecture today

Seima Rao

unread,

Jan 26, 2014, 11:05:47 PM1/26/14

to

Hi,

I have a doubt about which of system design or programming
language was the originating philosphy for CISC.

With RISCs, I think it was programming languages but with
CISC, what was the originating philosphy?

Sincerely,
Seima Rao.

Stephen Sprunk

unread,

Jan 26, 2014, 11:33:39 PM1/26/14

to

On 26-Jan-14 22:05, Seima Rao wrote:
> I have a doubt about which of system design or programming language
> was the originating philosphy for CISC.
>
> With RISCs, I think it was programming languages

More specifically, RISC made it easier to write compilers for high-level
languages such as C, and it also made it simpler to design the CPUs
themselves and to get higher clock rates.

> but with CISC, what was the originating philosphy?

CISC was never a single, coherent philosophy; it's a retronym for
(nearly?) everything that predated RISC.

That said, CISC systems tend to have variable-length encodings,
non-orthogonal register sets and complex addressing modes, which are
great for human assembly programmers but not so great for compilers.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Joe Pfeiffer

unread,

Jan 26, 2014, 11:56:35 PM1/26/14

to

Seima Rao <seim...@gmail.com> writes:

> Hi,
>
> I have a doubt about which of system design or programming
> language was the originating philosphy for CISC.
>
> With RISCs, I think it was programming languages but with
> CISC, what was the originating philosphy?

In addition to Stephen's response, I'd point you to the first chapter or
so of a book called "Advances in Computer Architecture" by Myers, which
came out just a few years before Patterson's RISC paper. It posits that
the purpose of a computer architecture, and the direction architectures
were about to move, was to come closer and closer to executing high
level languages directly in the hardware.

Ivan Godard

unread,

Jan 27, 2014, 1:03:38 AM1/27/14

to

I agree with Myers that the machine should reflect the language, but he
failed as a prognosticator. Had he been right we would all be using
Burroughs stack instruction sets today - the ISA was a direct map to
Algol60.

And actually, despite the layer of religion slathered on the RISC/CISC
war, RISC was a temporal artifact of advances in packaging. There was a
roughly 10 year period in which you could fit a cut-down ISA in a single
chip, but you couldn't fit anything more substantial; the economics
meant that RISC would dominate everything but legacy and big iron. Ten
years later and the packaging economics no longer mattered. Now the only
reason for classic RISC is legacy and religion.

There was a similar packaging generation earlier. Data General succeeded
because they were able to fit a real computer on a single 19" board,
which had never been done. That lasted until packaging go to fit a
computer on a chip and board packaging no longer mattered.

Louis Krupp

unread,

Jan 27, 2014, 3:17:37 AM1/27/14

to

Namaskara.

It might be worth mentioning that in many English-speaking countries,
"doubt" and "question" don't necessarily mean exactly the same thing,
especially when used as nouns. dictionary.com, for example, has this
to say about the two words:

http://dictionary.reference.com/browse/doubt?s=t

http://dictionary.reference.com/browse/question?s=t

In American English, you would hear someone say "I have a question
about ..." if the speaker is looking for information. If you hear
someone say "I have my doubts about ...," it means that the speaker
believes that things might not be as they seem or as they've been
presented. "Doubt" in this case has a negative connotation, while
"question" is neutral.

Louis

Nick Maclaren

unread,

Jan 27, 2014, 3:51:36 AM1/27/14

to

In article <lc4nj4$kcc$1...@dont-email.me>,

Stephen Sprunk <ste...@sprunk.org> wrote:
>On 26-Jan-14 22:05, Seima Rao wrote:
>> I have a doubt about which of system design or programming language
>> was the originating philosphy for CISC.
>>
>> With RISCs, I think it was programming languages
>
>More specifically, RISC made it easier to write compilers for high-level
>languages such as C, and it also made it simpler to design the CPUs
>themselves and to get higher clock rates.

If you are referring to the religious incarnation of RISC, the
first statement was mere polemic and was often the opposite of
the truth.

Ivan has given one key fact. There were and are good reasons for
simpler, more systematic ISAs and they have been invented and
reinvented many times. But the bandwagon that most people called
RISC was religious fanaticism, and their definitions of RISC were
dogmatism, though there were as many schisms as early Christianity.

>> but with CISC, what was the originating philosphy?
>
>CISC was never a single, coherent philosophy; it's a retronym for
>(nearly?) everything that predated RISC.

Not quite. Remember that, in the 1950s and 1960s, most serious
programs were written in machine-code or assembler, and the one
consistent CISC philosophy was to make it easier for the assembler
programmers, and remove some bottlenecks from their code. Beyond
that, I agree with you.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Jan 27, 2014, 3:59:05 AM1/27/14

to

In article <tt3ce95ocsfd1n3e3...@4ax.com>,

Louis Krupp <lkr...@nospam.pssw.com.invalid> wrote:
>On Sun, 26 Jan 2014 20:05:47 -0800 (PST), Seima Rao
><seim...@gmail.com> wrote:
>
>> I have a doubt about which of system design or programming
>> language was the originating philosphy for CISC.
>>
>> With RISCs, I think it was programming languages but with
>> CISC, what was the originating philosphy?
>

>It might be worth mentioning that in many English-speaking countries,
>"doubt" and "question" don't necessarily mean exactly the same thing,

>especially when used as nouns. ...

True, but it's a standard English usage for those of us familiar
with more of the language aggregate that is English. From the OED:

a. The (subjective) state of uncertainty with regard to the truth
or reality of anything; undecidedness of belief or opinion. ...

c1300 Beket 375 Thanne was the Bischop in gret doute what were
therof to done.

Regards,
Nick Maclaren.

Michael S

unread,

Jan 27, 2014, 4:31:28 AM1/27/14

to

On Monday, January 27, 2014 8:03:38 AM UTC+2, Ivan Godard wrote:
> On 1/26/2014 8:56 PM, Joe Pfeiffer wrote:
>
> > Seima Rao <seim...@gmail.com> writes:
>
> >
> >> Hi,
> >>
> >> I have a doubt about which of system design or programming
> >> language was the originating philosphy for CISC.
> >>
> >> With RISCs, I think it was programming languages but with
> >> CISC, what was the originating philosphy?
> >
>
> > In addition to Stephen's response, I'd point you to the first chapter or
> > so of a book called "Advances in Computer Architecture" by Myers, which
> > came out just a few years before Patterson's RISC paper. It posits that
> > the purpose of a computer architecture, and the direction architectures
> > were about to move, was to come closer and closer to executing high
> > level languages directly in the hardware.
> >
>
>
> I agree with Myers that the machine should reflect the language, but he
> failed as a prognosticator. Had he been right we would all be using
> Burroughs stack instruction sets today - the ISA was a direct map to
> Algol60.
>
>
> And actually, despite the layer of religion slathered on the RISC/CISC
> war, RISC was a temporal artifact of advances in packaging. There was a
> roughly 10 year period in which you could fit a cut-down ISA in a single
> chip, but you couldn't fit anything more substantial; the economics
> meant that RISC would dominate everything but legacy and big iron. Ten
> years later and the packaging economics no longer mattered.

But spartan fix-instruction-width load+op ISA (or not so spartan, as TMS320C30) takes about the same area as pure load-store. I think, load-op was not included not because it didn't fit, but because RISC designers believed that load+op is as misfeature staying on the way of static scheduling. Doesn't your own Mill follow the same philosophy?

> Now the only reason for classic RISC is legacy and religion.

So, you think that the recent ARM move from non-classic borderline-RISC toward classic RISC is driven by religion?

BGB

unread,

Jan 27, 2014, 4:57:53 AM1/27/14

to

On 1/26/2014 10:33 PM, Stephen Sprunk wrote:
> On 26-Jan-14 22:05, Seima Rao wrote:
>> I have a doubt about which of system design or programming language
>> was the originating philosphy for CISC.
>>
>> With RISCs, I think it was programming languages
>
> More specifically, RISC made it easier to write compilers for high-level
> languages such as C, and it also made it simpler to design the CPUs
> themselves and to get higher clock rates.
>
>> but with CISC, what was the originating philosphy?
>
> CISC was never a single, coherent philosophy; it's a retronym for
> (nearly?) everything that predated RISC.
>
> That said, CISC systems tend to have variable-length encodings,
> non-orthogonal register sets and complex addressing modes, which are
> great for human assembly programmers but not so great for compilers.
>

though, in a more modern sense, it mostly boils down to some extent to
the whole x86 vs ARM thing, where ironically IMHO, x86 instruction
encoding is a lot simpler and more sensible than Thumb or Thumb2 encodings.

like, with x86, you can write logic to deal with a few generic
constructs and then cover most of the ISA, whereas with Thumb2 it is
just short of being necessary to write dedicated logic to deal with each
instruction form.

at the actual ISA level, x86 doesn't seem particularly difficult to
target, apart from its general low-levelness, but this would also apply
to a RISC-based ISA, and I suspect the x86 addressing modes are
"reasonably useful" for the most part ("A+(B<<N)+C" being a very common
construct underlying HLL code, and otherwise one would need to use
dedicated instructions).

variable length instructions are another one of those issues.

I think I had mentioned a random idea here before though for power-of-2
sized variable-length instructions.

for example:
0xxxxxxx, single byte instruction
10xxxxxx xxxxxxxx, 2-byte instruction
110xxxxx xxxxxxxx xxxxxxxx xxxxxxxx, 4 byte instruction
1110xxxx xxxxxxxx(x7), 8-byte instruction.
11110xxx xxxxxxxx(x15), 16-byte instruction.

with the likely restriction introduced that each instruction would need
to be aligned on the proper boundary (and likewise, could not cross a
boundary).

possibly, there could also be alignment restrictions on many jumps and
calls as well.

likewise, it would also be possible (assuming proper alignment) to
determine in a fixed number of steps whether or not any given address
points to the start of an instruction or into the body of another
instruction (and to identify this instruction), ... (possibly the CPU
could reject jumping into another instruction).

so, for example, for a 32-bit word:
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx, 4x single byte
0xxxxxxx 0xxxxxxx 10xxxxxx xxxxxxxx, 2x single byte, 1x 2-byte
10xxxxxx xxxxxxxx 10xxxxxx xxxxxxxx, 2x 2-byte
0xxxxxxx 10xxxxxx xxxxxxxx 0xxxxxxx, invalid
110xxxxx xxxxxxxx xxxxxxxx xxxxxxxx, 1x 4-byte
10xxxxxx xxxxxxxx 110xxxxx xxxxxxxx, invalid

example opcode forms:
0pppprrr, 4-bit opcode, regs=0-7
10pppppp rrrrssss, 6-bit opcode, 2 registers (0-15)
10pppppr rrsssttt, 5-bit opcode, 3 registers (0-7)
110opppp pppprrrr sssstttt cccccccc,
8-bit opcode, 3 registers (0-15), 8-bit constant,
1-bit order-hint
110opppp pppprrrr cccccccc cccccccc,
8-bit opcode, 1 register, 16-bit constant,
1-bit order-hint

possibly there could be a few special instructions:
NOPs, would come in power-of-2 sizes, and might be occasionally needed
to maintain alignment;
ordering hints, which could basically exist to indicate to the CPU when
it can execute multiple instructions in parallel (as a few bits as part
of another instruction form);
ordering hints would likewise be aligned on a power-of-2 boundary and
will apply at most to the unit they begin on (with the CPU having the
option of ignoring them).

for example:
a 32-bit instruction on a 16-byte boundary with an ordering-hint set may
encode, for example, 4x32-bit instructions to execute in parallel.
possibly there could be rules to allow smaller bundles.

ex:
64-byte boundary:
ordering hint is seen;
checks 32-bytes forwards, order hint:
clear=same bundle
set=separate bundle
sequential-only opcode=separate bundle.

the bundle-size is divided in half until the pattern matches.

maybe useful would be if it could provide both 2-register and 3-register
binary arithmetic forms (ex: "I=I+J" vs "I=J+K").

possibly also useful could be complex-address loads/stores, ...

dunno where exactly this would fit (would probably guess CISC).

or such...

peterf...@gmail.com

unread,

Jan 27, 2014, 6:34:35 AM1/27/14

to

On Monday, January 27, 2014 7:03:38 AM UTC+1, Ivan Godard wrote:

> I agree with Myers that the machine should reflect the language, but he
> failed as a prognosticator. Had he been right we would all be using
> Burroughs stack instruction sets today - the ISA was a direct map to
> Algol60.

Some say current ISAs map directly to C ;)

I am still immensely grateful that a small literary publishing company in Denmark began publishing computer literature around 1982, almost by chance. One of their authors wanted to know some more about computers as research for a novel he was writing and bought a ZX81... and then published both the novel and a GREAT beginner's book on ZX81 basic.

That's how a Danish translation of The Art of Software Testing ended up on the shelves of my local public library in Copenhagen. Only getting about half the maximum score on the triangle quiz was very educational :)

> years later and the packaging economics no longer mattered. Now the only
> reason for classic RISC is legacy and religion.

It also enables small teams of young and inexperienced people to get a design off the ground with a fair risk of success.

> There was a similar packaging generation earlier. Data General succeeded
> because they were able to fit a real computer on a single 19" board,
> which had never been done. That lasted until packaging go to fit a
> computer on a chip and board packaging no longer mattered.

But they should have been able to get it into a chip (or a small chipset)... maybe the problem was that the computers were too big and complex for that at a time when simpler and smaller computers could just about fit?

Discontinuities in technology can be tough, sometimes.

-Peter

Bill Findlay

unread,

Jan 27, 2014, 7:13:54 AM1/27/14

to

On 27/01/2014 08:59, in article lc574p$868$1...@needham.csi.cam.ac.uk, "Nick

And in Scotland, you can say " I doubt it will rain"
when you mean "I believe that it will rain". 8-)

--
Bill Findlay
with blueyonder.co.uk;
use surname & forename;

EricP

unread,

Jan 27, 2014, 9:59:55 AM1/27/14

to

To add to all of the above...

Memory prices were a big factor in these design decisions.
If memory cost a million dollars per megabyte (or whatever
in 2014 dollars) then there is a very strong pressure to use
CISC to maximize the utilization of each instruction bit.
Most CISC instructions are essentially subroutines.
Using 1 extra bit to indicate memory indirection can eliminate
a whole instruction. Similar rational repeated for other instructions
puts more and more functionality into the instruction set because the
marginal increase in cpu complexity is small compared to the
marginal savings in memory.

This rational breaks down when memory is cheap. Now the design
cost and time of a complex cpu becomes large and therefore
desirable to eliminate.

At the same time that memory costs drop dramatically,
RISC emerges onto the scene. It is in the right place
at the right time to take advantage of lower memory cost.
It can take advantage of the lower design costs and shorter
time to market to get product price and performance advantages.

Note that a fixed instruction size RISC cpu uses about 3 times
as many instructions as a CISC cpu. Further about 25% of the RISC
code space is zeros to pad to a fixed size. In an era of high cost
memory such wastage would make a product uncompetitive.

Eric

Robert Wessel

unread,

Jan 27, 2014, 10:54:42 AM1/27/14

to

While there may be RISC/CISC pairs where a 3:1 instruction count holds
(perhaps original MIPS vs. VAX?), on some common systems the counts
are much closer to 1:1. Say x86 vs. most RISCs. That's been the
result of a fair number of studies over the years. A recent
comparison of ARM and x86:

http://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa-power-struggles.pdf

(See figure 4 on page 7.)

Nick Maclaren

unread,

Jan 27, 2014, 11:12:37 AM1/27/14

to

In article <i10de9l03n4uau341...@4ax.com>,

While that is partially true, it hides the fact that earlier compilers
optimised for memory efficiency, and modern ones generally do not.
As the difference between compilers and optimisation levels has always
been of the same order (i.e. 2:1-3:1), it's hard to compare.

Also ARM has always been a pragmatic (i.e. not dogmatic) RISC, and is
much closer in many ways to many of the CISC designs of the 1960s than
to the dogmatic RISC designs. The real message is that well-designed
simplicity easily matches the functionality of accumulated complexity.

Regards,
Nick Maclaren.

Stefan Monnier

unread,

Jan 27, 2014, 12:23:50 PM1/27/14

to

> I have a doubt about which of system design or programming
> language was the originating philosphy for CISC.
> With RISCs, I think it was programming languages but with
> CISC, what was the originating philosphy?

While instruction set design is linked to issues about how you're going
to write programs (e.g., should the ISA make it easy to write assembly code,
or easy for compilers to use?), the way I see it CISC and RISC are also
largely linked to "pipelined vs non-pipelined":

With a non-pipelined implementation, the time spent going to the
next instruction and decoding it is "wasted", so you want to execute
fewer (hence larger) instructions. That pushes you to the CISCy side.

With a pipeline, you don't care nearly as much about the cost of
fetching an instruction and decoding it since that can be overlapped
with execution. Instead you focus on keeping your pipeline full.
This pushes you towards instructions that each take the same number of
pipeline stages, as is the case on the RISCy side.

Of course, all that depends also on hardware constraints.
Pipelines existed for CISCy systems before RISC came along, but RISC
made it possible to get efficient pipelines while fitting the packaging
constraints of the time (as Ivan mentioned).

Stefan

Paul A. Clayton

unread,

Jan 27, 2014, 2:22:43 PM1/27/14

to

On Monday, January 27, 2014 11:12:37 AM UTC-5, Nick Maclaren wrote:
> In article <i10de9l03n4uau341...@4ax.com>,
> Robert Wessel <robert...@yahoo.com> wrote:

[snip]

>>While there may be RISC/CISC pairs where a 3:1 instruction count holds
>>(perhaps original MIPS vs. VAX?), on some common systems the counts
>>are much closer to 1:1. Say x86 vs. most RISCs. That's been the
>>result of a fair number of studies over the years. A recent
>>comparison of ARM and x86:
>>
>>http://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa-power-struggles.pdf
>>
>>(See figure 4 on page 7.)
>
> While that is partially true, it hides the fact that earlier compilers
> optimised for memory efficiency, and modern ones generally do not.
> As the difference between compilers and optimisation levels has always
> been of the same order (i.e. 2:1-3:1), it's hard to compare.

I think Intel microarchitectures may also influence what
optimal code is. I seem to recall that for a time, there
was a benefit in splitting load-op instructions into
separate load and operate instructions. If the processor
does not have a micro-op cache and load-op instructions
are decoded into two micro-ops, a decode template like
4-1-1-1 might give load-op (and other "complex" instructions)
a greater penalty.

> Also ARM has always been a pragmatic (i.e. not dogmatic) RISC, and is
> much closer in many ways to many of the CISC designs of the 1960s than
> to the dogmatic RISC designs. The real message is that well-designed
> simplicity easily matches the functionality of accumulated complexity.

The pragmatism of ARM also means that it has not aged quite
as well. Optimizations for real-world conditions (i.e.,
pragmatic not dogmatic) at one time (like no cache and a very
short scalar in-order pipeline) make the ISA a little more
painful for modern high-performance implementations.

The delayed branch of MIPS was similarly pragmatic. (I would
*guess* that the HI and LO registers for multiply/divide results
were also at least partially pragmatic. Using GPRs would have
made pipelining more complex.)

It also becomes easier to be both pragmatic and dogmatic if
there are many dogmas (and even easier if one can select
individual teachings rather than having to embrace a
particular "school"). :-)

Ivan Godard

unread,

Jan 27, 2014, 4:33:41 PM1/27/14

to

One problem is the addressing mode encoding, which is a big consumner of
entropy. If every op has all the modes then Decode gets bigger and
slower because you need shifters in the decode, and the decoded signals
are bigger. Classic RISC not only did away with load-op, they did away
with address modes.

I think,
> load-op was not included not because it didn't fit, but because RISC
> designers believed that load+op is as misfeature staying on the way
> of static scheduling. Doesn't your own Mill follow the same
> philosophy?

Load-op impacts scheduling when there are multiple pipelines (a oroblem
avoided by OOO) but is no issue when there is only a single pipeline, as
in classic RISC. The Mill is wide-issue, so a load-op would have to
indicate two different pipe slots, one for the load part and one for the
op part, if hazards are to be avoided.

>> Now the only reason for classic RISC is legacy and religion.
>
> So, you think that the recent ARM move from non-classic
> borderline-RISC toward classic RISC is driven by religion?

No, by simplification. My guess (IANAHG) is that the predication latency
was the problem; they either had to expose a latency or dump
predication entirely.

Ivan Godard

unread,

Jan 27, 2014, 4:42:07 PM1/27/14

to

On 1/27/2014 3:34 AM, peterf...@gmail.com wrote:
> On Monday, January 27, 2014 7:03:38 AM UTC+1, Ivan Godard wrote:
>
>> I agree with Myers that the machine should reflect the language,
>> but he failed as a prognosticator. Had he been right we would all
>> be using Burroughs stack instruction sets today - the ISA was a
>> direct map to Algol60.
>
> Some say current ISAs map directly to C ;)

That's backwards: C maps directly to the PDP-11.

>
>> years later and the packaging economics no longer mattered. Now the
>> only reason for classic RISC is legacy and religion.
>
> It also enables small teams of young and inexperienced people to get
> a design off the ground with a fair risk of success.

Yes, RISC is reasonable for student projects not intended for commercial
constraints. However, RISC is proselytized as the salvation of the world.

>> There was a similar packaging generation earlier. Data General
>> succeeded because they were able to fit a real computer on a single
>> 19" board, which had never been done. That lasted until packaging
>> go to fit a computer on a chip and board packaging no longer
>> mattered.
>
> But they should have been able to get it into a chip (or a small
> chipset)... maybe the problem was that the computers were too big and
> complex for that at a time when simpler and smaller computers could
> just about fit?

They did; it was called Eclipse. (See "The Soul of a New Machine").
However, everybody else could too, so they became one of the pack
instead of having an unbeatable technical advantage, and the usual
business issues sunk them.

PRC did compiler work for Wang Labs, which faced the same problem of
competitors catching up on the tech and then being better at squeezing
nickels. An Wang's solution was to go and invent a completely new
billion-dollar industry, where he had a regained advantage. He did it
three times: the calculator; core memory; and word processing. Each was
good for a 15 year run before the bean counters ate his lunch. He
probably would have done it again if he had lived long enough, but
instead he installed Freddie, who was an utter boob.

William Clodius

unread,

Jan 27, 2014, 10:07:36 PM1/27/14

to

Paul A. Clayton <paaron...@gmail.com> wrote:

> <snip>

> It also becomes easier to be both pragmatic and dogmatic if
> there are many dogmas (and even easier if one can select
> individual teachings rather than having to embrace a
> particular "school"). :-)

It is also easier if there are a variety of different practical uses of
the technology each with different tradeoffs, so prgmatism in different
domains makes different dogmas attractive. Among the tradeoffs that have
led to different choices: processor cost, processor processing rate,
memory usage, bus size, power usage, robustness (e.g. military
applications), and probably some I haven't thought of.

EricP

unread,

Jan 27, 2014, 10:22:49 PM1/27/14

to

Robert Wessel wrote:
> On Mon, 27 Jan 2014 09:59:55 -0500, EricP
>>

>> Note that a fixed instruction size RISC cpu uses about 3 times
>> as many instructions as a CISC cpu. Further about 25% of the RISC
>> code space is zeros to pad to a fixed size. In an era of high cost
>> memory such wastage would make a product uncompetitive.
>
>
> While there may be RISC/CISC pairs where a 3:1 instruction count holds
> (perhaps original MIPS vs. VAX?), on some common systems the counts
> are much closer to 1:1. Say x86 vs. most RISCs. That's been the
> result of a fair number of studies over the years. A recent
> comparison of ARM and x86:
>
> http://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa-power-struggles.pdf
>
> (See figure 4 on page 7.)

Thanks. I have only quickly scanned it so far but...

Note that x86 vendors recommend compilers generate load/store/reg-reg
RISC style instructions. Performance can be penalized if not so done.
So comparing modern x86 vs arm compiler outputs is unlikely to show
any difference.

You would really have to compare with an old VAX Fortran binary as
that compiler really did try to use all the fancy addressing modes.

Something like:

do i = m,n
do j = p,q
do k = r,s
a[i] = b[j] + c[k]
end do
end do
end do

might use Vax triple indexed auto increment deferred:
add3l (r0)+[r1], (r2)+[r3], (r4)+[r5]

is equivalent to (note left to right parse order was important
if address modes had side effects):

tmp1 = (r0<<2)+r1)
r0 += 1
tmp2 = *((r2<<2)+r3)
r2 += 1
tmp3 = *((r4<<2)+r5)
r4 += 1
tmp4 = tmp2 + tmp3
*tmp1 = tmp4

Eric

Terje Mathisen

unread,

Jan 28, 2014, 2:58:53 AM1/28/14

to

Paul A. Clayton wrote:
> On Monday, January 27, 2014 11:12:37 AM UTC-5, Nick Maclaren wrote:
>> While that is partially true, it hides the fact that earlier compilers
>> optimised for memory efficiency, and modern ones generally do not.
>> As the difference between compilers and optimisation levels has always
>> been of the same order (i.e. 2:1-3:1), it's hard to compare.
>
> I think Intel microarchitectures may also influence what
> optimal code is. I seem to recall that for a time, there

may???

I've made several conference presentations about how microarch changes
has very strongly modified what optimized code looks like, illustrated
with 6-7 totally different versions of the same program.

This particular program (a port of word count/wc) started out as a
classic 8088 program, with lodsb and cmp to determine the character class.

> was a benefit in splitting load-op instructions into
> separate load and operate instructions. If the processor
> does not have a micro-op cache and load-op instructions
> are decoded into two micro-ops, a decode template like
> 4-1-1-1 might give load-op (and other "complex" instructions)
> a greater penalty.

The final Pentium version of wc used zero load-op instructions, instead
it inverted the inner loop so that the load-use distance was maximized.
This resulted in about 1.5 cycles/character total processing speed.

For the PPro and later OoO cpus I wrote 5-7 different versions of the
code, then use runtime measurements to select the fastest one on the
actual running platform.

Anyway, microarchitecture does indeed influence what optimal code is! :-)

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Michael S

unread,

Jan 28, 2014, 4:26:31 AM1/28/14

to

On Monday, January 27, 2014 11:33:41 PM UTC+2, Ivan Godard wrote:
> On 1/27/2014 1:31 AM, Michael S wrote:
>
> > On Monday, January 27, 2014 8:03:38 AM UTC+2, Ivan Godard wrote:
>
>
> >> And actually, despite the layer of religion slathered on the
> >> RISC/CISC war, RISC was a temporal artifact of advances in
> >> packaging. There was a roughly 10 year period in which you could
> >> fit a cut-down ISA in a single chip, but you couldn't fit anything
> >> more substantial; the economics meant that RISC would dominate
> >> everything but legacy and big iron. Ten years later and the
> >> packaging economics no longer mattered.
> >
>
> > But spartan fix-instruction-width load+op ISA (or not so spartan, as
> > TMS320C30) takes about the same area as pure load-store.
>
> One problem is the addressing mode encoding, which is a big consumner of
> entropy. If every op has all the modes then Decode gets bigger and
> slower because you need shifters in the decode, and the decoded signals
> are bigger. Classic RISC not only did away with load-op, they did away
> with address modes.
>

I said "spartan" which does not necessarily mean "complete" or "orthogonal".
Not all data processing instructions has to be available in load-op form and not all addressing modes available in load instructions has to be available in load-op instructions in order to catch something like 75% of potential saving of instruction count, esp. in the inner loops.

Not doing quantitative analysis, my guts feeling is that in MIPS-like ISA majority of saving will be achieved by providing just 5 load-op operations (mem+reg, mem-reg, reg-mem, mem+imm, imm-mem) where imm could be shorter than normal MIPS immediate and just one addressing mode - register indirect with post increment. Which, (post-increment) is very non-MIPS and non-future proof, but pragmatic for mid-80s time frame.

>
>
> > I think, load-op was not included not because it didn't fit, but because RISC
> > designers believed that load+op is as misfeature staying on the way
> > of static scheduling. Doesn't your own Mill follow the same
> > philosophy?
>
>
> Load-op impacts scheduling when there are multiple pipelines (a oroblem
> avoided by OOO) but is no issue when there is only a single pipeline, as
> in classic RISC.

You mean, with skewed pipeline? But it only helps for cache hits. Even with skewed pipeline load+op will still disturbs scheduling of cache misses. Besides, skewed pipeline is longer than non-skewed, so branch miss penalty is higher. And those early RISC designers were assuming either very minimalistic branch predictors or no predictors at all.
Overall, I think that they didn't considered skewed pipeline as a real option.

> The Mill is wide-issue, so a load-op would have to
> indicate two different pipe slots, one for the load part and one for the
> op part, if hazards are to be avoided.
>

> >> Now the only reason for classic RISC is legacy and religion.
> >
>
> > So, you think that the recent ARM move from non-classic
> > borderline-RISC toward classic RISC is driven by religion?
>
>
> No, by simplification. My guess (IANAHG) is that the predication latency
> was the problem; they either had to expose a latency or dump
> predication entirely.
>

Expose latency? In ISA? It's not MIll we are talking about, it's ARM. I'm very sure that they never even considered exposing latency at ISA level.

As to simplification, may be Aarch64 promises simplifications 5-10 years down the road, but for all early implementations it's more like complication, because in order to succeed in competitive market they all will need both good Aarch64 performance and good Thumb2 performance.

Quadibloc

unread,

Jan 28, 2014, 5:32:34 AM1/28/14

to

On Sunday, January 26, 2014 9:33:39 PM UTC-7, Stephen Sprunk wrote:

> That said, CISC systems tend to have variable-length encodings,
> non-orthogonal register sets and complex addressing modes, which are
> great for human assembly programmers but not so great for compilers.

As has been already pointed out, making life easy for human assembly programmers might be considered the philosophy of CISC.

Variable-length encodings, although a characteristic of recent CISC designs such as the IBM 360, the x86, or the 680x0, though, are not essential to CISC.

Take the IBM 704 and its descendants like the IBM 7094, take the SDS/XDS Sigma series of computers, or even the PDP-8 and the PDP-10.

One thing that is a very common feature of CISC designs is that they usually aren't load-store architectures. Instead, instructions add to a register - or even to the accumulator - from a memory location. That was true as far back as the EDSAC. That is what remains to distinguish the many historical CISC designs that don't have variable-length encoding from RISC.

And as to the reasonableness of making life easier for human assembly-language programmers:

Note that OS/360 was written in assembly language.

There was a time before higher-level languages. And then for a while higher-level languages imposed a huge cost in performance.

FORTRAN for the IBM 704 then came along, providing the first optimizing compiler. But it had a restricted domain - it wasn't C.

It's only very recently that GCC came along, providing a quality compiler that's available for free that anyone can port to a new architecture. So it used to be that anyone bringing out a new computer design would have to follow that by quite a bit of assembler-language coding in order to have the nucleus of a toolset for creating the software to make that new computer usable.

So it made very good sense that in the old days, computers looked like the IBM 360 rather than looking like the Itanium; people back then were acting rationally.

John Savard

Quadibloc

unread,

Jan 28, 2014, 5:49:09 AM1/28/14

to

On Monday, January 27, 2014 1:51:36 AM UTC-7, Nick Maclaren wrote:

> But the bandwagon that most people called
> RISC was religious fanaticism, and their definitions of RISC were
> dogmatism, though there were as many schisms as early Christianity.

Although that was true to an extent originally - thus, one part of the original definition of RISC, that all instructions must take only one cycle, and so you can't have hardware floating-point, has been dropped - it isn't relevant today.

RISC does speed up the design cycle of a new architecture, thus one has ARM and MIPS and PowerPC while CISC is embodied in legacy architectures, the 360 and the x86 and ColdFire.

While load-store architectures seem to throw away an opportunity for making the load instruction do more work, by specifying an arithmetic operation too, they must have some virtue.

One obvious candidate is that scheduling arithmetic operations in heavily pipelined machines is complex enough without throwing in the unpredictability of making some such operations intrinsically dependent on memory accesses.

Actually, the Control Data 6600 embodied what might be a good idea for chips today - instead of a big cache on the chip, have all the RAM on the chip itself, with a predictable access time (built from static RAM cells such as those used for cache, not DRAM on chip for more capacity but slower speed), with external DRAM accessed through explicit block transfer instructions, like a drum or disc.

But the reason that *isn't* done is also obvious. Just as, for different reasons, the Itanium was obsolete before it was available, since chips keep getting more powerful, designing an architecture around, say, having exactly 4 megabytes of on-chip RAM, no more, no less, guarantees immediate obsolescence once one can put 8 megabytes of RAM on the chip instead.

Plus, cache makes the large off-chip RAM behave almost as it it's all as fast as the cache - transparently to the programmer. That's clearly better than managing things explicitly for the case where an unpredictable number of tasks are operating simultaneously, thanks to a bloated operating system and a graphical user interface.

So the complexity of designing machines that cope tolerably well with the occasional cache miss isn't going to go away.

John Savard

Quadibloc

unread,

Jan 28, 2014, 5:57:28 AM1/28/14

to

On Monday, January 27, 2014 2:42:07 PM UTC-7, Ivan Godard wrote:
> On 1/27/2014 3:34 AM, peterf...@gmail.com wrote:

> > It also enables small teams of young and inexperienced people to get
> > a design off the ground with a fair risk of success.

> Yes, RISC is reasonable for student projects not intended for commercial
> constraints. However, RISC is proselytized as the salvation of the world.

No, RISC isn't just good for student projects.

Currently, the benefit of RISC is that it allows companies that are not Intel to design microprocessors and bring them to market before they're obsolete.

John Savard

Quadibloc

unread,

Jan 28, 2014, 6:00:10 AM1/28/14

to

On Monday, January 27, 2014 7:59:55 AM UTC-7, EricP wrote:

> This rational breaks down when memory is cheap.

I don't see why. After all, the number of pins on a chip is still fixed, and L1 cache is still expensive, and so if you can fit the same program in half the space, you are freeing up the data bus to spend more time on data and less time on instructions.

Hence, Thumb.

John Savard

Terje Mathisen

unread,

Jan 28, 2014, 9:04:51 AM1/28/14

to

This only works on real programs, not benchmarketing!

Pretty much all iterations of specint/fp have had programs where the
inner loop code (even with some bloated encodings) would fit inside the
code cache, at which point the actual encoding efficiency doesn't really
matter.

It is only when you run real applications like DBs, where the "inner
loop" can easily be larger than both L1 and L2 code caches, that having
a nice & tight encoding makes a visible difference. :-)

Stephen Sprunk

unread,

Jan 28, 2014, 2:39:09 PM1/28/14

to

Current x86 designs are often described as a CISC decoder in front of a
RISC core, so I'm not sure how much benefit that is to x86's competitors
anymore.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Nick Maclaren

unread,

Jan 28, 2014, 3:32:24 PM1/28/14

to

In article <5877f560-2c87-42ea...@googlegroups.com>,

Quadibloc <jsa...@ecn.ab.ca> wrote:
>
>> But the bandwagon that most people called
>> RISC was religious fanaticism, and their definitions of RISC were
>> dogmatism, though there were as many schisms as early Christianity.
>

>Although that was true to an extent originally - thus, one part of the orig=
>inal definition of RISC, that all instructions must take only one cycle, an=
>d so you can't have hardware floating-point, has been dropped - it isn't re=
>levant today.

Not entirely. Would you like to explain why the RISC fanatics still
describe systems like the ICL 1900 as CISC?

>RISC does speed up the design cycle of a new architecture, thus one has ARM=
> and MIPS and PowerPC while CISC is embodied in legacy architectures, the 3=

>60 and the x86 and ColdFire.

None of ARM, MIPS or PowerPC are particularly new any longer. The
simple fact, which was well-known long before the RISC bandwagon
got rolling, is that a simpler, cleaner design is faster to develop
and much faster to debug. Just as for software. So what else is new?

Regards,
Nick Maclaren.

Bill Findlay

unread,

Jan 28, 2014, 4:00:47 PM1/28/14

to

On 28/01/2014 20:32, in article lc944o$9lo$1...@needham.csi.cam.ac.uk, "Nick

Maclaren" <n...@needham.csi.cam.ac.uk> wrote:

> In article <5877f560-2c87-42ea...@googlegroups.com>,
> Quadibloc <jsa...@ecn.ab.ca> wrote:
>>
>>> But the bandwagon that most people called
>>> RISC was religious fanaticism, and their definitions of RISC were
>>> dogmatism, though there were as many schisms as early Christianity.
>>
>> Although that was true to an extent originally - thus, one part of the orig=
>> inal definition of RISC, that all instructions must take only one cycle, an=
>> d so you can't have hardware floating-point, has been dropped - it isn't re=
>> levant today.
>
> Not entirely. Would you like to explain why the RISC fanatics still
> describe systems like the ICL 1900 as CISC?

Stoe-to-store move operations?

Quadibloc

unread,

Jan 28, 2014, 6:07:54 PM1/28/14

to

On Tuesday, January 28, 2014 1:32:24 PM UTC-7, Nick Maclaren wrote:

> Not entirely. Would you like to explain why the RISC fanatics still
> describe systems like the ICL 1900 as CISC?

As a "normal instruction" contains a 12-bit field, presumably for an address, I presume it's not a load-store machine. Perhaps this question would be better asked of the CDC 6600, but it had block move instructions - and clearly the vector instructions of the CRAY I would disqualify it as RISC, even if its instruction set is load-store.

John Savard

Paul Wallich

unread,

Jan 28, 2014, 8:11:06 PM1/28/14

to

On 1/28/14 9:04 AM, Terje Mathisen wrote:
> Quadibloc wrote:
>> On Monday, January 27, 2014 7:59:55 AM UTC-7, EricP wrote:
>>
>>> This rational breaks down when memory is cheap.
>>
>> I don't see why. After all, the number of pins on a chip is still
>> fixed, and L1 cache is still expensive, and so if you can fit the
>> same program in half the space, you are freeing up the data bus to
>> spend more time on data and less time on instructions.
>
> This only works on real programs, not benchmarketing!
>
> Pretty much all iterations of specint/fp have had programs where the
> inner loop code (even with some bloated encodings) would fit inside the
> code cache, at which point the actual encoding efficiency doesn't really
> matter.
>
> It is only when you run real applications like DBs, where the "inner
> loop" can easily be larger than both L1 and L2 code caches, that having
> a nice & tight encoding makes a visible difference. :-)

What's happened, as on-chip clocks keep getting faster and faster and
off-chip communication keeps getting relatively slower is that memory
has gotten expensive again. Not so much in terms of dollars (although if
you wanted 128-wide SRAM in quantity and in interesting speeds to the
CPU it would cost you) as in time. Which is also money.

paul

Joe keane

unread,

Jan 28, 2014, 11:17:05 PM1/28/14

to

In article <1bzjmif...@snowball.wb.pfeifferfamily.net>,

Joe Pfeiffer <pfei...@cs.nmsu.edu> wrote:
>It posits that
>the purpose of a computer architecture, and the direction architectures
>were about to move, was to come closer and closer to executing high
>level languages directly in the hardware.

I prefer Laurel & Hardy.

Nick Maclaren

unread,

Jan 29, 2014, 3:13:26 AM1/29/14

to

In article <c9132d94-b36d-447d...@googlegroups.com>,

Quadibloc <jsa...@ecn.ab.ca> wrote:
>
>> Not entirely. Would you like to explain why the RISC fanatics still
>> describe systems like the ICL 1900 as CISC?
>

>As a "normal instruction" contains a 12-bit field, presumably for an addres=
>s, I presume it's not a load-store machine. Perhaps this question would be =
>better asked of the CDC 6600, but it had block move instructions - and clea=
>rly the vector instructions of the CRAY I would disqualify it as RISC, even=

> if its instruction set is load-store.

Both you and Bill may well be right, but my point was that RISC/CISC
is still an active religious debate, even today. I have always
agreed with many of the principles (not all!), but I thought the
labelling was nonsense - and still do. And you know what opinion
I have of the religious fanatics!

As I have said before, good results come from good design; and
there are many ways of achieving that.

Regards,
Nick Maclaren.

Bill Findlay

unread,

Jan 29, 2014, 5:44:59 AM1/29/14

to

On 29/01/2014 08:13, in article lcad76$n53$1...@needham.csi.cam.ac.uk, "Nick

Yup.
So many would force a choice between black and white in a universe of
rainbow colours.

Paul A. Clayton

unread,

Jan 29, 2014, 9:56:45 AM1/29/14

to

On Wednesday, January 29, 2014 5:44:59 AM UTC-5, Bill Findlay wrote:
> On 29/01/2014 08:13, in article lcad76$n53$1...@needham.csi.cam.ac.uk, "Nick
> Maclaren" <n...@needham.csi.cam.ac.uk> wrote:

[snip[

>> As I have said before, good results come from good design; and
>> there are many ways of achieving that.
>
> Yup.
> So many would force a choice between black and white in a universe of
> rainbow colours.

So you favor segregation! In computer architecture, I think
I am inclined toward black (even irrationally so), that the
emission spectrum should match the object temperature in a
natural manner (though I recognize that this limits the
object to indirect response to external illumination). I
like that white seeks a balanced spectrum and reflection of
the external illumination in a diffuse (not harsh, not
imitative [i.e., image reflecting]) way, but the mismatch
between the object's internal nature and its emission seems
inappropriate. (Of course, at a certain high temperature,
black is very much like white.)

(I am not certain whether a smiley should be attached.
There was some intention of humor--but not much success--
and there was some small intention of communicating
personal preferences in computer architecture--with perhaps
even less success--and even a light and shallow criticism
of quota-based social integration.)

Bill Findlay

unread,

Jan 29, 2014, 11:07:42 AM1/29/14

to

On 29/01/2014 14:56, in article
ccb2efa2-1d8a-497f...@googlegroups.com, "Paul A. Clayton"

<paaron...@gmail.com> wrote:

> On Wednesday, January 29, 2014 5:44:59 AM UTC-5, Bill Findlay wrote:
>> On 29/01/2014 08:13, in article lcad76$n53$1...@needham.csi.cam.ac.uk, "Nick
>> Maclaren" <n...@needham.csi.cam.ac.uk> wrote:
> [snip[
>>> As I have said before, good results come from good design; and
>>> there are many ways of achieving that.
>>
>> Yup.
>> So many would force a choice between black and white in a universe of
>> rainbow colours.
>
> So you favor segregation!

Eh?

Stephen Sprunk

unread,

Jan 29, 2014, 12:54:50 PM1/29/14

to

On 28-Jan-14 04:32, Quadibloc wrote:
> On Sunday, January 26, 2014 9:33:39 PM UTC-7, Stephen Sprunk wrote:
>> That said, CISC systems tend to have variable-length encodings,
>> non-orthogonal register sets and complex addressing modes, which
>> are great for human assembly programmers but not so great for
>> compilers.
>
> As has been already pointed out, making life easy for human assembly
> programmers might be considered the philosophy of CISC.

Right; that was what I meant to imply by the last part.

> Variable-length encodings, although a characteristic of recent CISC
> designs such as the IBM 360, the x86, or the 680x0, though, are not
> essential to CISC.

Of course; I said "tend to have", since it's a common feature but not
obligatory.

> One thing that is a very common feature of CISC designs is that they
> usually aren't load-store architectures.

Ah, I missed that one, but it's somewhat connected to complex addressing
modes.

> It's only very recently that GCC came along, providing a quality
> compiler that's available for free that anyone can port to a new
> architecture. So it used to be that anyone bringing out a new
> computer design would have to follow that by quite a bit of
> assembler-language coding in order to have the nucleus of a toolset
> for creating the software to make that new computer usable.
>
> So it made very good sense that in the old days, computers looked
> like the IBM 360 rather than looking like the Itanium; people back
> then were acting rationally.

Of course. But times change. An interesting question is why, other
than simple inertia, CISC designs continue to be successful.

Paul A. Clayton

unread,

Jan 29, 2014, 12:58:52 PM1/29/14

to

On Wednesday, January 29, 2014 11:07:42 AM UTC-5, Bill Findlay wrote:
> On 29/01/2014 14:56, in article
> ccb2efa2-1d8a-497f...@googlegroups.com, "Paul A. Clayton"
> <paaron...@gmail.com> wrote:
>
>> On Wednesday, January 29, 2014 5:44:59 AM UTC-5, Bill Findlay wrote:
>>> So many would force a choice between black and white in a universe of
>>> rainbow colours.
>>
>> So you favor segregation!
>
> Eh?

Of (light) colours! I *DID* note that it was a less than
successful attempt at humour.

Both black and white provide an integration of the spectrum
(one could say black practically excludes visible components
of the light spectrum, but I was being more favorable to the
"pro-black" position by associating it with natural thermal
emission [and perfect absorption]). A rainbow separates
colours, so favoring a rainbow model favors "segregation".

Yes, that was a very weak attempt at humour, perhaps increased
in weakness by its oddness (even by geeky standards). The
accusation was only meant playfully (I don't *think* there
is much outrage about segregating light even if one takes the
comment in a particular serious tone--which is difficult since
it was immediately used as an analogy for computer architecture
and later indicated as inaccurate and a weak attempt at humour.).

Nick Maclaren

unread,

Jan 29, 2014, 1:17:21 PM1/29/14

to

In article <lcbf9c$g10$1...@dont-email.me>,

Stephen Sprunk <ste...@sprunk.org> wrote:
>
>> It's only very recently that GCC came along, providing a quality
>> compiler that's available for free that anyone can port to a new
>> architecture. So it used to be that anyone bringing out a new
>> computer design would have to follow that by quite a bit of
>> assembler-language coding in order to have the nucleus of a toolset
>> for creating the software to make that new computer usable.
>>
>> So it made very good sense that in the old days, computers looked
>> like the IBM 360 rather than looking like the Itanium; people back
>> then were acting rationally.
>
>Of course. But times change. An interesting question is why, other
>than simple inertia, CISC designs continue to be successful.

Eh? That's well known, and nothing whatsoever to do with the merits
or demerits of the designs.

Regards,
Nick Maclaren.

Paul A. Clayton

unread,

Jan 29, 2014, 1:20:00 PM1/29/14

to

On Wednesday, January 29, 2014 12:54:50 PM UTC-5, Stephen Sprunk wrote:
> On 28-Jan-14 04:32, Quadibloc wrote:

[snip]

>> So it made very good sense that in the old days, computers looked
>> like the IBM 360 rather than looking like the Itanium; people back
>> then were acting rationally.

I think the argument about the "semantic gap" was quite
misguided. Even for assembly level programming, using a
macro is not more difficult than using a complex instruction
(with the exception of a few cases in which interrupt atomicity
is important).

It is also easy for stale insights to continue to guide design.

> Of course. But times change. An interesting question is why, other
> than simple inertia, CISC designs continue to be successful.

"Simple inertia" (i.e., legacy software and corporate connections)
is most of the reason.

The variable length encodings common to CISC also tend to make
extension easier (this further plays to binary compatibility).
CISCs (having a higher emphasis on "solutions" versus
"primitives") are also more likely to have effectively obsolete
instructions which can then be reclaimed to provide opcode
space for extension.

The only relatively new CISC that I know of is Renesas RX. This
ISA is targeted rather strictly at lower-end 32-bit embedded
systems where code density is very important and much of the
additional complexity of highly variable instruction length
and a non-load/store architecture are not significant. RX is
also a modern design, so it lacks some of the accumulated
quirks of legacy CISC ISAs.

Ivan Godard

unread,

Jan 29, 2014, 1:24:45 PM1/29/14

to

On 1/29/2014 9:54 AM, Stephen Sprunk wrote:

> Of course. But times change. An interesting question is why, other
> than simple inertia, CISC designs continue to be successful.

Because the intermediate state needed by the RISC equivalent is not
representable?

Take FMA, a classic complex op that is (nominally) equivalent to two
simple ops. Except that it's not; the intermediate in FMA is longer than
the result of a multiply, so FMA has better precision. And because the
hardware doesn't have to normalize the intermediate, the FMA is faster
than the mul->add sequence too.

The same is true of address modes. The Mill uses a conventional
base/index/offset mode structure. On a classic RISC, "A[i].f" would
involve a shift and two adds in the ALU before the actual fetch of the
data; total three cycles (two in dual-issue) and 4x32=128 bits of code.
Instead, the Mill haS a three-input mixed-length address-adder that does
it all in one cycle and needs only 16 to 46 bits of code.

So CISC continues to be successful because it is faster, cleaner, and
lower power. That should be enough for all but zealots.

Stephen Sprunk

unread,

Jan 29, 2014, 1:32:21 PM1/29/14

to

On 29-Jan-14 12:20, Paul A. Clayton wrote:
> On Wednesday, January 29, 2014 12:54:50 PM UTC-5, Stephen Sprunk
> wrote:
>> Of course. But times change. An interesting question is why,
>> other than simple inertia, CISC designs continue to be successful.
>
> "Simple inertia" (i.e., legacy software and corporate connections) is
> most of the reason.

That's what I figured, but I thought there might be something else. For
instance, IIRC there's been mention here that mem-op instructions allow
better I-cache utilitization and decode throughput.

> The variable length encodings common to CISC also tend to make
> extension easier (this further plays to binary compatibility). CISCs
> (having a higher emphasis on "solutions" versus "primitives") are
> also more likely to have effectively obsolete instructions which can
> then be reclaimed to provide opcode space for extension.

OTOH, RISCs tend to waste instruction bits to make decoding simpler, so
there is usually room to cram in lots of totally new opcodes if you're
willing to accept a little more decoding complexity.

Nick Maclaren

unread,

Jan 29, 2014, 1:57:14 PM1/29/14

to

In article <lcbhfn$u0s$1...@dont-email.me>,

Stephen Sprunk <ste...@sprunk.org> wrote:
>On 29-Jan-14 12:20, Paul A. Clayton wrote:
>>
>>> Of course. But times change. An interesting question is why,
>>> other than simple inertia, CISC designs continue to be successful.
>>
>> "Simple inertia" (i.e., legacy software and corporate connections) is
>> most of the reason.
>
>That's what I figured, but I thought there might be something else. For
>instance, IIRC there's been mention here that mem-op instructions allow
>better I-cache utilitization and decode throughput.

Nah. It's lost in the noise. The "legacy" problems are grossly
overstated, but the belief that they are critical does mean that
the "decision makers" have to have a better mousetrap snapped on
their nose before they will take notice. Modern Intel CPUs aren't
bad, and it would be VERY hard to produce a noticeably better
general-purpose design given that a new CPU would probably lag
Intel by a process generation or two.

For most embedded, a similar situation arises with ARM, and there
really aren't many specialised niches that would provide an entry
into the mainstream.

And then there is the issue that any non-x86 would be a Microsoft-
free environment, which is perceived by many to be a disqualifying
factor.

Since IBM blew it with the PowerPC, the only chance of any real
change is in the Far East, and none of the potential players look
interested. This isn't good news, not even for Intel, as the
more established a monopoly becomes, the less likely it is to
survive when it eventually falls.

But don't sell your Intel stocks short just yet :-)

Regards,
Nick Maclaren.

Ivan Godard

unread,

Jan 29, 2014, 2:01:40 PM1/29/14

to

On 1/29/2014 10:32 AM, Stephen Sprunk wrote:

> OTOH, RISCs tend to waste instruction bits to make decoding simpler, so
> there is usually room to cram in lots of totally new opcodes if you're
> willing to accept a little more decoding complexity.

Or simply use an internal per-machine-version specialization like the
IBM AS-400 (or the Mill) does.

Ivan Godard

unread,

Jan 29, 2014, 2:04:30 PM1/29/14

to

On 1/29/2014 10:20 AM, Paul A. Clayton wrote:

> The only relatively new CISC that I know of is Renesas RX. This
> ISA is targeted rather strictly at lower-end 32-bit embedded
> systems where code density is very important and much of the
> additional complexity of highly variable instruction length
> and a non-load/store architecture are not significant. RX is
> also a modern design, so it lacks some of the accumulated
> quirks of legacy CISC ISAs.
>

The Mill is certainly not a RISC. Is it a CISC? Well, having address
modes and complex ops like call would argue that it is. But is it
meaningful to place any wide-issue machine on the RISC-CISC continuum?

EricP

unread,

Jan 29, 2014, 4:26:40 PM1/29/14

to

Nick Maclaren wrote:
>
> Since IBM blew it with the PowerPC, the only chance of any real
> change is in the Far East, and none of the potential players look
> interested.

The Institute of Computing Technology, Chinese Academy of Sciences,
appear to have hitched their wagons to a MIPS variant called
Loogson/Godson. They are up to MIPS64 in a 28 nm process now.
I suppose that makes it their official national processor.

http://en.wikipedia.org/wiki/Loongson

Eric

Quadibloc

unread,

Jan 29, 2014, 4:56:25 PM1/29/14

to

On Wednesday, January 29, 2014 10:54:50 AM UTC-7, Stephen Sprunk wrote:

> Of course. But times change. An interesting question is why, other
> than simple inertia, CISC designs continue to be successful.

It's been noted that CISC isn't really that much outclassed by RISC.

But as far as I know, there have *been* no new CISC designs offered to the market; the last one was the 680x0, which didn't succeed. The few that are still available - x86 and z/Architecture - obviously do own their existence to "simple inertia", whatever their merits.

It's true that the Itanium is other than RISC - it's VLIW - but it was aimed at maximizing performance.

John Savard

Quadibloc

unread,

Jan 29, 2014, 4:59:55 PM1/29/14

to

On Wednesday, January 29, 2014 11:57:14 AM UTC-7, Nick Maclaren wrote:
> Modern Intel CPUs aren't
> bad, and it would be VERY hard to produce a noticeably better
> general-purpose design given that a new CPU would probably lag
> Intel by a process generation or two.

Yes, but if you aren't Intel, the extra effort a CISC architecture would require could make the difference between success or failure.

John Savard

Michael S

unread,

Jan 29, 2014, 6:29:54 PM1/29/14

to

On Wednesday, January 29, 2014 11:56:25 PM UTC+2, Quadibloc wrote:
> On Wednesday, January 29, 2014 10:54:50 AM UTC-7, Stephen Sprunk wrote:
>
>
>
> > Of course. But times change. An interesting question is why, other
> > than simple inertia, CISC designs continue to be successful.
>
> It's been noted that CISC isn't really that much outclassed by RISC.
>
> But as far as I know, there have *been* no new CISC designs offered to the market; the last one was the 680x0, which didn't succeed.

I'd rather count x386 as separate from x86. Which makes it newer than 68K.

> The few that are still available - x86 and z/Architecture - obviously do own their existence to "simple inertia", whatever their merits.
>
>

There were several relatively new CISC ISAs in embedded world. The latest of the notable ones is Renesas RX Family.

As to general-purpose computers, pay attention that there were no new CISCs and no new RISCs for more than 20 years. And the latest new general-purpose RISC (Alpha) didn't succeed any better than the last (according to your definition) general-purpose CISC. Both lasted for approximately decade. Arguably, 68K made more waves than Alpha. And it still sort of alive in form of ColdFire.

>
> It's true that the Itanium is other than RISC - it's VLIW - but it was aimed at maximizing performance.
>
>

IMHO, Itanium is closer to RISC than to VLIW. Closer, but not RISC.

>
> John Savard

Paul A. Clayton

unread,

Jan 29, 2014, 8:54:36 PM1/29/14

to

On Wednesday, January 29, 2014 6:29:54 PM UTC-5, Michael S wrote:
[snip]

> There were several relatively new CISC ISAs in embedded world.
> The latest of the notable ones is Renesas RX Family.

What are the other CISC ISAs? (I suspect I noticed RX because it
received more hype than usual, though I also have had more
exposure to articles about embedded systems in the last few years
than previously. When "someone" mentioned it on the Real World
Technologies forum, I had already heard of it.)

> As to general-purpose computers, pay attention that there were
> no new CISCs and no new RISCs for more than 20 years.

For general-purpose computers (at least in the last 10 years
or more), one would either be competing against Intel or
need to rely on a somewhat locked-in base of users (the Unix-
RISC vendors). Migrating locked-in users is not easy and not
likely to be attempted for a small gain. (The Itanium migration
had the promise of noticeably better performance on an ISA
basis [albeit false promise], the advantage of Intel manufacturing,
and the advantage of a merchant processor [which pretty much also
turned out to be a false hope].)

I would consider ARM's AArch64 a new RISC and it is likely
to be included in general-purpose computers. (It is not clear
if smart phones count; they do run user-installed applications.)

In the embedded space, assuming my very limited exposure is
not too biased, it seems that (somewhat) new RISCs are more
common than (somewhat) new CISCs (e.g, Tensilica's XTensa,
EnSilica's eSi-RISC, Motorola's M-Core (now retired?), CRIS in
Axis Communication's ETRAX, Andes Technologies' AndeStar [Is
ARC, now of Synopsis, a RISC? SuperH {early 1990s?} might be
too old to count in such a list; some of these might also be
"too old".]). New VLIWs (for embedded systems) seem to be more
popular than new CISCs (here I might be more biased in exposure
than for RISC vs. CISC). On the other hand, I know very little
about DSPs and probably even less about 16-bit (much less
8-bit or 4-bit) processors.

(It is not obvious if Thumb2 and microMIPS count as new
ISAs. They are substantially different encodings and have
somewhat different operations. They can even have
implementations that do not support the parent ISA.
*Technically* they should probably be considered distinct
ISAs since the encodings are more different than one
might typically justify by "just a mode", but this seems
to be a gray area.)

(Admittedly, the choice of RISC [and VLIW] is biased by start-up
costs. Developing a simple core and software support for a new
RISC [or even a VLIW] ISA is easier now than in the late 1980s;
the production costs are higher, but if one is selling designs
rather than hardware this might be less of a problem.)

> And the latest new general-purpose RISC (Alpha) didn't succeed any
> better than the last (according to your definition) general-purpose
> CISC. Both lasted for approximately decade. Arguably, 68K made more
> waves than Alpha. And it still sort of alive in form of ColdFire.

The 68K was also much more of a merchant processor and
made a transition to embedded systems.

[snip]

> IMHO, Itanium is closer to RISC than to VLIW. Closer, but
> not RISC.

EPIC seems more VLIW-ish than RISCy. The support for single
stepping is more RISC-like, but "explicit parallelism"
(independence marking) seems VLIW-ish. (I would tend to view
VLIW as closely related to RISC. Both emphasize compiler-based
optimization. VLIW tends to provide more explicit mechanisms
to exploit compile-time optimization.)

Paul A. Clayton

unread,

Jan 29, 2014, 9:56:02 PM1/29/14

to

On Wednesday, January 29, 2014 1:24:45 PM UTC-5, Ivan Godard wrote:
> On 1/29/2014 9:54 AM, Stephen Sprunk wrote:
>
>> Of course. But times change. An interesting question is why, other
>> than simple inertia, CISC designs continue to be successful.
>
> Because the intermediate state needed by the RISC equivalent is not
> representable?
>
> Take FMA, a classic complex op that is (nominally) equivalent to two
> simple ops. Except that it's not; the intermediate in FMA is longer than
> the result of a multiply, so FMA has better precision. And because the
> hardware doesn't have to normalize the intermediate, the FMA is faster
> than the mul->add sequence too.

I *think* the original provision of FMADD in MIPS used
intermediate rounding. At first I thought this was to
support compatibility (FMADD gives the same result as
FMUL and FADD performed sequentially), but years later I
discovered that intermediate rounding actually *saves*
hardware. Without intermediate rounding, the full
precision multiply must be computed and added with the
addend source (in some cases, which usually means doing
it for all cases to provide uniform latency and simplify
hardware). *With* intermediate rounding, the carry
propagation of the rounding can be performed with the
addition's carry propagation (there is only a repetition
of one binary digit input which is less delay than 53[?]
extra bits of addition for full precision) and even the
determination of the rounding could (I think) be less
work than generating all the bits and then ORing them.

*In theory* hardware could fuse FMUL-FADD into FMADD with
intermediate rounding and avoid the extra propagation
and normalization costs. This would **NOT** provide the
extra precision advantage of FMADD without intermediate
rounding.

> The same is true of address modes. The Mill uses a conventional
> base/index/offset mode structure. On a classic RISC, "A[i].f" would
> involve a shift and two adds in the ALU before the actual fetch of the
> data; total three cycles (two in dual-issue) and 4x32=128 bits of code.
> Instead, the Mill haS a three-input mixed-length address-adder that does
> it all in one cycle and needs only 16 to 46 bits of code.

regD := MEM[regS1 + regS2<<n + offset] is avoided by classic
RISC. Even with n determined by the access size (traditional
scaling), this probably leaves fewer bits for the offset value
with classic RISC instruction encoding. AArch64 *does* provide
base plus scaled register addressing.

It is not obvious how common such accesses are, how many
are not susceptible to compile-time optimization, and finally
how much energy/performance impact the extra instructions
have.

(I happen to like exploiting a single carry propagation for
such address generation, but the lack of such exploitation
does not seem especially disastrous [albeit potentially a
symptom of more general design issues].)

> So CISC continues to be successful because it is faster,
> cleaner, and lower power. That should be enough for all but
> zealots.

I think Mitch Alsup, who has worked on designs for both CISC
(x86) and RISC (m88k, SPARC) ISAs, would disagree. Yes,
classic RISC has issues (I believe Mitch's "Modern RISC"
has base+scaled_index+offset memory operations, variable
length instructions, and perhaps other "violations" of
classic RISC design.), but that does not mean that RISC
principles are just the ravings of "zealots".

(I have some CISC sympathies in part from code density
affection and in part from providing a higher abstraction
for hardware [when such facilitates microarchitectural
optimization]. I would probably design an ISA more like
CLIPPER or m68k than m88k. However, I also underappreciate
complexity issues, probably especially for compilers.)

The lines between CISC and RISC can also get blurry. Some
RISC-oriented ISAs provide load-op-store instructions for
atomic memory accesses, violating perhaps the most
distinguishing measure of RISC (even if only in a very
limited way).

Terje Mathisen

unread,

Jan 30, 2014, 3:06:59 AM1/30/14

to

Michael S wrote:
> On Wednesday, January 29, 2014 11:56:25 PM UTC+2, Quadibloc wrote:
>> But as far as I know, there have *been* no new CISC designs offered
>> to the market; the last one was the 680x0, which didn't succeed.
>
> I'd rather count x386 as separate from x86. Which makes it newer than
> 68K.

Not separate imho.

"If I say that a dog's tail is another leg, how many legs does it have?"

"Four. Calling a tail a leg does not make it so."

If the 386 had done a real change of the instruction encoding, maybe
moving to three-operand and 16 or 32 registers, then it would have been
really separate, but as it was, with everything just extended to 32
bits, I thought it was a very nice & natural extension.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Nick Maclaren

unread,

Jan 30, 2014, 4:55:46 AM1/30/14

to

In article <f8534ec2-dfe6-4156...@googlegroups.com>,

Quadibloc <jsa...@ecn.ab.ca> wrote:
>On Wednesday, January 29, 2014 10:54:50 AM UTC-7, Stephen Sprunk wrote:
>
>> Of course. But times change. An interesting question is why, other
>> than simple inertia, CISC designs continue to be successful.
>
>It's been noted that CISC isn't really that much outclassed by RISC.
>
>But as far as I know, there have *been* no new CISC designs offered

>to the market; the last one was the 680x0, which didn't succeed. ...

The 680x0 certainly DID succeed - it failed later, but that is
another matter. Indeed, at one point, it came very close to
blocking Intel from establishing in the then emerging commodity
workstation market - the the reasons it didn't were not due to the
CPU design.

>> Modern Intel CPUs aren't
>> bad, and it would be VERY hard to produce a noticeably better
>> general-purpose design given that a new CPU would probably lag
>> Intel by a process generation or two.
>
>Yes, but if you aren't Intel, the extra effort a CISC architecture
>would require could make the difference between success or failure.

There is no evidence that making a success of CISC is any harder
than doing so for RISC. The point there is that the RISC dogmas
do not imply simplicity, nor does abandoning them imply complexity,
as has been shown by the RISC designs themselves. As someone said,
the real-estate limits of the 1980s WERE an issue, but not today.

As I and others have said, there isn't a route into the mainstream
markets that doesn't involve competing directly with either Intel
or ARM, so all new designs have been very specialised. This will
change, sometime, but when and how? My guess is in the 2020s or
2030s, but I have not even a clue what will happen.

Regards,
Nick Maclaren.

Michael S

unread,

Jan 30, 2014, 5:31:16 AM1/30/14

to

On Thursday, January 30, 2014 10:06:59 AM UTC+2, Terje Mathisen wrote:
> Michael S wrote:
> > On Wednesday, January 29, 2014 11:56:25 PM UTC+2, Quadibloc wrote:
> >> But as far as I know, there have *been* no new CISC designs offered
> >> to the market; the last one was the 680x0, which didn't succeed.
> >
> > I'd rather count x386 as separate from x86. Which makes it newer than
> > 68K.
>
> Not separate imho.
>
> "If I say that a dog's tail is another leg, how many legs does it have?"
>
> "Four. Calling a tail a leg does not make it so."
>
> If the 386 had done a real change of the instruction encoding, maybe
> moving to three-operand and 16 or 32 registers, then it would have been
> really separate, but as it was, with everything just extended to 32
> bits, I thought it was a very nice & natural extension.
>

If SIB is not a real change of the instruction encoding then what is?
And not only an encoding, semantics too : orthogonality of GPRs in address generation, scaled index.

Nick Maclaren

unread,

Jan 30, 2014, 5:40:29 AM1/30/14

to

In article <c8690609-0803-4b27...@googlegroups.com>,

Michael S <already...@yahoo.com> wrote:
>On Thursday, January 30, 2014 10:06:59 AM UTC+2, Terje Mathisen wrote:
>>
>> >> But as far as I know, there have *been* no new CISC designs offered
>> >> to the market; the last one was the 680x0, which didn't succeed.
>> >
>> > I'd rather count x386 as separate from x86. Which makes it newer than
>> > 68K.
>>
>> Not separate imho.
>>
>> "If I say that a dog's tail is another leg, how many legs does it have?"
>>
>> "Four. Calling a tail a leg does not make it so."
>>
>> If the 386 had done a real change of the instruction encoding, maybe
>> moving to three-operand and 16 or 32 registers, then it would have been
>> really separate, but as it was, with everything just extended to 32
>> bits, I thought it was a very nice & natural extension.
>
>If SIB is not a real change of the instruction encoding then what is?
>And not only an encoding, semantics too : orthogonality of GPRs in
>address generation, scaled index.

Then I will raise you the changes from the System/360 to /370, to
/XA to /ESA and probably more recent ones. Mere extensions to an
existing design do not make it a new design.

Terje and I will stand on our legs - you are welcome to spin on
your tail!

Regards,
Nick Maclaren.

Michael S

unread,

Jan 30, 2014, 5:47:17 AM1/30/14

to

On Thursday, January 30, 2014 3:54:36 AM UTC+2, Paul A. Clayton wrote:
>
> > IMHO, Itanium is closer to RISC than to VLIW. Closer, but
> > not RISC.
>
> EPIC seems more VLIW-ish than RISCy. The support for single
> stepping is more RISC-like, but "explicit parallelism"
> (independence marking) seems VLIW-ish.

IPF group boundaries are almost 100% hints. Right now I can't think about situation in which they have semantic meaning. Which does not mean that such situations do not exist ;)
And apart from group stops, what's exactly is "explicitly parallel" in IPF ISA?

> (I would tend to view
> VLIW as closely related to RISC. Both emphasize compiler-based
> optimization. VLIW tends to provide more explicit mechanisms
> to exploit compile-time optimization.)

There was a discussion here month or two ago about what makes VLIW. 72 posts by 14 authors. It's too early to start over again.

Michael S

unread,

Jan 30, 2014, 6:01:29 AM1/30/14

to

If I'm not mistaken, the changes you mentioned a backward compatible in a sense that new instructions can be freely interchanged with old ones in mostly-old environment. Not dissimilar to SSE or AVX
With x386 you can't use new addressing mode in the middle of the old 16-bit code, you have to switch the CPU to different "mode".
And, BTW, despite much higher level of interoperability, IBM seems to consider 64-bit extension to S/370 a new ISA.

>
> Terje and I will stand on our legs - you are welcome to spin on your tail!
>

Is it attempt of humor?

>
>
> Regards,
>
> Nick Maclaren.

Nick Maclaren

unread,

Jan 30, 2014, 6:35:55 AM1/30/14

to

In article <fb57ff92-2755-4752...@googlegroups.com>,

Michael S <already...@yahoo.com> wrote:
>>
>> >> "If I say that a dog's tail is another leg, how many legs does it have?"
>> >>
>> >> "Four. Calling a tail a leg does not make it so."
>> >>
>> >> If the 386 had done a real change of the instruction encoding, maybe
>> >> moving to three-operand and 16 or 32 registers, then it would have been
>> >> really separate, but as it was, with everything just extended to 32
>> >> bits, I thought it was a very nice & natural extension.
>>
>> >If SIB is not a real change of the instruction encoding then what is?
>> >And not only an encoding, semantics too : orthogonality of GPRs in
>> >address generation, scaled index.
>>
>> Then I will raise you the changes from the System/360 to /370, to
>> /XA to /ESA and probably more recent ones. Mere extensions to an
>> existing design do not make it a new design.
>
>If I'm not mistaken, the changes you mentioned a backward compatible
>in a sense that new instructions can be freely interchanged with old
>ones in mostly-old environment. Not dissimilar to SSE or AVX

You are mistaken, sufficiently so to be just plain wrong.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Jan 30, 2014, 6:49:54 AM1/30/14

to

Michael S wrote:
> On Thursday, January 30, 2014 10:06:59 AM UTC+2, Terje Mathisen wrote:
>> Not separate imho.
>>
>> "If I say that a dog's tail is another leg, how many legs does it have?"
>>
>> "Four. Calling a tail a leg does not make it so."
>>
>> If the 386 had done a real change of the instruction encoding, maybe
>> moving to three-operand and 16 or 32 registers, then it would have been
>> really separate, but as it was, with everything just extended to 32
>> bits, I thought it was a very nice & natural extension.
>>
>
> If SIB is not a real change of the instruction encoding then what is?
> And not only an encoding, semantics too : orthogonality of GPRs in address generation, scaled index.

OK, I'll give you SIB and the register orthogonality! :-)

(Even though I personally experienced it more like a tearing down some
problematic barriers. :-) )

Yes, it was a real improvement, even if in many ways much smaller than
the 32->64 step for the same architecture.

BGB

unread,

Jan 30, 2014, 1:29:53 PM1/30/14

to

though not really a hardware person, I am left to wonder why, for
example, the CISC ISA stuff can't just run in firmware?...

like, the underlying chip hardware runs a MIPS-like ISA or similar and
runs mostly out of cache, and the higher-level (CISC style) ISA is
mostly just dynamically converted into machine-code fragments and
internal function calls (for less common and more complex instructions).

if done well, the performance overhead of the translation should be
fairly modest. the main performance hit would then be when jumping to a
new address or an out-of-cache address, which would require being
translated again.

could presumably work pretty well for low-cost low-performance chips,
and could also (likely) simplify the hardware design part.

> John Savard
>

Stephen Sprunk

unread,

Jan 30, 2014, 1:37:50 PM1/30/14

to

On 30-Jan-14 12:29, BGB wrote:
> though not really a hardware person, I am left to wonder why, for
> example, the CISC ISA stuff can't just run in firmware?...
>
> like, the underlying chip hardware runs a MIPS-like ISA or similar and
> runs mostly out of cache, and the higher-level (CISC style) ISA is
> mostly just dynamically converted into machine-code fragments and
> internal function calls (for less common and more complex instructions).
>
> if done well, the performance overhead of the translation should be
> fairly modest. the main performance hit would then be when jumping to a
> new address or an out-of-cache address, which would require being
> translated again.
>
> could presumably work pretty well for low-cost low-performance chips,
> and could also (likely) simplify the hardware design part.

Isn't that exactly what Transmeta did?

Intel and AMD have been doing the same thing, except in microcode.

Stephen Sprunk

unread,

Jan 30, 2014, 1:41:14 PM1/30/14

to

On 29-Jan-14 17:29, Michael S wrote:
> On Wednesday, January 29, 2014 11:56:25 PM UTC+2, Quadibloc wrote:
>> But as far as I know, there have *been* no new CISC designs offered
>> to the market; the last one was the 680x0, which didn't succeed.
>
> I'd rather count x386 as separate from x86. Which makes it newer than
> 68K.

386 is just one step along the x86 path, not a different ISA.

>> The few that are still available - x86 and z/Architecture -
>> obviously do own their existence to "simple inertia", whatever
>> their merits.
>
> There were several relatively new CISC ISAs in embedded world. The
> latest of the notable ones is Renesas RX Family.
>
> As to general-purpose computers, pay attention that there were no new
> CISCs and no new RISCs for more than 20 years. And the latest new
> general-purpose RISC (Alpha) didn't succeed any better than the last
> (according to your definition) general-purpose CISC. Both lasted for
> approximately decade. Arguably, 68K made more waves than Alpha. And
> it still sort of alive in form of ColdFire.

ARM may be most popular for embedded systems because that's the only
market that was still open to something new, but it seems like a
"general purpose" ISA overall, and it's RISC-ish.

Stephen Sprunk

unread,

Jan 30, 2014, 1:45:56 PM1/30/14

to

On 30-Jan-14 04:47, Michael S wrote:
> On Thursday, January 30, 2014 3:54:36 AM UTC+2, Paul A. Clayton
> wrote:
>>> IMHO, Itanium is closer to RISC than to VLIW. Closer, but not
>>> RISC.
>>
>> EPIC seems more VLIW-ish than RISCy. The support for single
>> stepping is more RISC-like, but "explicit parallelism"
>> (independence marking) seems VLIW-ish.
>
> IPF group boundaries are almost 100% hints. Right now I can't think
> about situation in which they have semantic meaning. Which does not
> mean that such situations do not exist ;) And apart from group stops,
> what's exactly is "explicitly parallel" in IPF ISA?

AIUI, stops explicitly separate groups of instructions that can be
executed in parallel, rather than requiring the processor to analyze
dependencies. The groups aren't really hints at all, just an odd way of
encoding instructions. I never understood why they didn't just encode
individual RISC-like instructions, each with an explicit stop bit.

Nick Maclaren

unread,

Jan 30, 2014, 1:46:58 PM1/30/14

to

In article <lce65u$1qi$2...@dont-email.me>,

Stephen Sprunk <ste...@sprunk.org> wrote:
>On 30-Jan-14 12:29, BGB wrote:
>> though not really a hardware person, I am left to wonder why, for
>> example, the CISC ISA stuff can't just run in firmware?...
>>
>> like, the underlying chip hardware runs a MIPS-like ISA or similar and
>> runs mostly out of cache, and the higher-level (CISC style) ISA is
>> mostly just dynamically converted into machine-code fragments and
>> internal function calls (for less common and more complex instructions).
>>
>> if done well, the performance overhead of the translation should be
>> fairly modest. the main performance hit would then be when jumping to a
>> new address or an out-of-cache address, which would require being
>> translated again.
>>
>> could presumably work pretty well for low-cost low-performance chips,
>> and could also (likely) simplify the hardware design part.
>
>Isn't that exactly what Transmeta did?

Yes. But it was crippled by attempting to emulate an architecture
not designed for such an implementation.

>Intel and AMD have been doing the same thing, except in microcode.

As did most of the mainframes, back in their heyday.

While I am not a hardware person either, there is a lot of solid
evidence that this approach could give major benefits, PROVIDED
that the 'CISC ISA' was designed for compiler use. One aspect I
have previously posted is that language-dependent interpretation
code could provide efficient 'hardware' checking of things like
array bounds in high-level languages and vastly improve RAS and
security. But, unfortunately, benchmarketing rules and those
aspects are an afterthought, if considered at all :-(

Regards,
Nick Maclaren.

Stephen Sprunk

unread,

Jan 30, 2014, 1:49:25 PM1/30/14

to

On 30-Jan-14 04:31, Michael S wrote:
> On Thursday, January 30, 2014 10:06:59 AM UTC+2, Terje Mathisen
> wrote:
>> Michael S wrote:
>>> I'd rather count x386 as separate from x86. Which makes it newer
>>> than 68K.
>>

>> If the 386 had done a real change of the instruction encoding,
>> maybe moving to three-operand and 16 or 32 registers, then it would
>> have been really separate, but as it was, with everything just
>> extended to 32 bits, I thought it was a very nice & natural
>> extension.
>
> If SIB is not a real change of the instruction encoding then what
> is? And not only an encoding, semantics too : orthogonality of GPRs
> in address generation, scaled index.

SIB was a kludgey extension, and a completely optional one aside from
having to encode one (very rare) modRM value a bit differently.

Register orthogonality came via new opcodes with explicit registers; the
old opcodes with implicit registers are still there, except some of them
were (much later) dropped in x86-64 as redundant.

peterf...@gmail.com

unread,

Jan 30, 2014, 1:52:25 PM1/30/14

to

On Thursday, January 30, 2014 7:29:53 PM UTC+1, BGB wrote:

> though not really a hardware person, I am left to wonder why, for
> example, the CISC ISA stuff can't just run in firmware?...
>
>
> like, the underlying chip hardware runs a MIPS-like ISA or similar and
> runs mostly out of cache, and the higher-level (CISC style) ISA is
> mostly just dynamically converted into machine-code fragments and
> internal function calls (for less common and more complex instructions).
>
>
> if done well, the performance overhead of the translation should be
> fairly modest. the main performance hit would then be when jumping to a
> new address or an out-of-cache address, which would require being
> translated again.
>
>
> could presumably work pretty well for low-cost low-performance chips,
> and could also (likely) simplify the hardware design part.

That's pretty much what they do, these days.

http://en.wikipedia.org/wiki/Microcode#Vertical_microcode

There are many variants of the idea: a CPU running a native instruction set with a JIT that translates from the official instruction set to the internal one (Transmeta, various emulators for CPU transitions from Apple (68K emulator on PowerMacs), HP, DEC (FX!32)). The visibility of the internal instruction set varies: the operating system never saw it on the Transmeta chips but the emulator and the operating system were intimately intertwined on the PowerMac.

I believe the Chinese MIPS (Loongson) does this when running x86 code (with instruction set extensions for decoding and emulating x86 instructions faster).

Trace caches do something similar, but in hardware. They could, in theory, be coupled to a "sidecar" processor that optimized the traces in the cache.

The Alpha would implement all the "hard" parts in PAL code. HP-PA had millicode. Later VLSI VAXen would trap the "decimal" instructions for COBOL and implement them as completely ordinary VAX instructions running from ROM at a fixed address.

x86 CPUs do it with canned "microcode" from internal ROM (mostly normal µops but with the occasional extra instruction and access to some extra internal registers + a way to handle atomicity and precise exceptions ("what IP do we assign to this?")).

Newer z/Architecture CPUs use both internal microcode ROM and traps to emulation code. Dunno if it's in ROM at a fixed address or loaded into RAM by the OS or a bootloader (sorry, IPL'er?) or a hypervisor.

-Peter

Quadibloc

unread,

Jan 30, 2014, 2:06:38 PM1/30/14

to

On Thursday, January 30, 2014 11:29:53 AM UTC-7, BGB wrote:

> though not really a hardware person, I am left to wonder why, for
> example, the CISC ISA stuff can't just run in firmware?...

> if done well, the performance overhead of the translation should be
> fairly modest.

"Modest" doesn't really cut it. Performance is a very competitive aspect of computer design. Thus, while current CISC implementations do involve a sort of translation layer between CISC instructions and RISC-like micro-ops, it's wired into the chip to keep overhead to an absolute minimum.

John Savard

BGB

unread,

Jan 30, 2014, 3:22:18 PM1/30/14

to

On 1/30/2014 12:37 PM, Stephen Sprunk wrote:
> On 30-Jan-14 12:29, BGB wrote:
>> though not really a hardware person, I am left to wonder why, for
>> example, the CISC ISA stuff can't just run in firmware?...
>>
>> like, the underlying chip hardware runs a MIPS-like ISA or similar and
>> runs mostly out of cache, and the higher-level (CISC style) ISA is
>> mostly just dynamically converted into machine-code fragments and
>> internal function calls (for less common and more complex instructions).
>>
>> if done well, the performance overhead of the translation should be
>> fairly modest. the main performance hit would then be when jumping to a
>> new address or an out-of-cache address, which would require being
>> translated again.
>>
>> could presumably work pretty well for low-cost low-performance chips,
>> and could also (likely) simplify the hardware design part.
>
> Isn't that exactly what Transmeta did?
>
> Intel and AMD have been doing the same thing, except in microcode.
>

pretty close, albeit:
one doesn't necessarily need to run (solely) x86;
the CPU could provide access to a "closer to the metal" ISA (*);
the CPU hardware and firmware don't necessarily need to be made by the
same people or company;
...

for example, a person could do lower-end x86 devices running on cheaper
3rd party hardware.

then, potentially, one provides an optional mode to more efficiently
execute JVM bytecode or .NET CIL or similar (probably identified via
CPUID magic).

*: such as a specialized register-IR.

or at least a more abstracted form of an x86-like ISA. for example,
though it turned out to be fairly pointless for my uses (vs more
specialized bytecode formats), I once designed an ISA I had called "x86
aleph", which was basically an x86 variant with the native word-size and
a few other details dropped or abstracted out (mostly to make it easier
to interpret or JIT-compile efficiently and gloss over the CPU mode).
(IIRC, I originally got the idea partly from NaCl and PNaCl).

this variant, however, offered little real direct advantage over other
possible register-IR designs (and would not have been binary compatible
with raw x86 nor able to directly leverage existing compilers, and still
involved a similarly complex JIT to native x86 or x86-64 as would
another bytecode design).

in the case of a "CPU" running everything in a JIT for x86 or similar,
such a design could offer a few possible advantages though:
able to leverage the existing decoder logic;
relative familiarity from programmers, where an ISA which looks a lot
like x86 will be a lot more familiar looking than a more novel design
(such as something more like Dalvik or LLVM IR);
could be used for more "target neutral" binaries;
...

changes were mostly fairly modest:
prefixes could explicitly qualify word-sizes (32/64/"Native", *2);
status flags were formally dropped, CMP+Jcc and TEST+Jcc and similar
were made special;
there were differences in terms of prefixes and similar;
some instructions and instruction forms were dropped (vs normal x86);
...

*2: "Native" basically means "the same as the natural pointer size",
which is important in the case of code which may be used in either 32
bit or 64 bit modes.

sometimes more information would be needed, such as for dealing with
things like structure-layout and similar (target specific), so there
were special cases for this.

there would have also been some amount of embedded metadata (ex:
metadata about function argument lists and stack layout, generally
encoded using special NOP instructions or prefixes), ... with COFF
objects generally being used for holding the bytecode.

IIRC, I never fully implemented it though.
it is supported by my x86 assembler though, and potentially I "could"
possibly consider reviving it at some point.

or such...

Stephen Sprunk

unread,

Jan 30, 2014, 3:24:01 PM1/30/14

to

Transmeta failed because their "Code Morphing" (i.e. x86 translation)
firmware ate up all of the power savings of using a simpler VLIW core,
and users described it as "sluggish" to boot. That doesn't sound like a
"modest" cost.

Stephen Sprunk

unread,

Jan 30, 2014, 3:37:21 PM1/30/14

to

On 29-Jan-14 12:57, Nick Maclaren wrote:
> In article <lcbhfn$u0s$1...@dont-email.me>, Stephen Sprunk
> <ste...@sprunk.org> wrote:

>> On 29-Jan-14 12:20, Paul A. Clayton wrote:
>>>> Of course. But times change. An interesting question is why,
>>>> other than simple inertia, CISC designs continue to be
>>>> successful.
>>>

>>> "Simple inertia" (i.e., legacy software and corporate
>>> connections) is most of the reason.
>>
>> That's what I figured, but I thought there might be something else.
>> For instance, IIRC there's been mention here that mem-op
>> instructions allow better I-cache utilitization and decode
>> throughput.
>
> Nah. It's lost in the noise. The "legacy" problems are grossly
> overstated, but the belief that they are critical does mean that the
> "decision makers" have to have a better mousetrap snapped on their
> nose before they will take notice. Modern Intel CPUs aren't bad, and

> it would be VERY hard to produce a noticeably better general-purpose
> design given that a new CPU would probably lag Intel by a process
> generation or two.

AMD does a credible job of stepping up when Intel screws up every few
years. The major difference is that Intel has the market power to
absorb such mistakes, whereas any x86 competitor has to get it right
every single time or die a horrible death.

> For most embedded, a similar situation arises with ARM, and there
> really aren't many specialised niches that would provide an entry
> into the mainstream.

ARM also doesn't seem to be as thoroughly evil as Intel, so there's not
much motivation to develop an alternative--except by Intel.

> And then there is the issue that any non-x86 would be a Microsoft-
> free environment, which is perceived by many to be a disqualifying
> factor.

Many of us would consider that an advantage, but you'll miss out on the
mass market, which is where Intel and AMD get the thrust necessary to
make the x86 pig fly.

> Since IBM blew it with the PowerPC, the only chance of any real
> change is in the Far East, and none of the potential players look

> interested. This isn't good news, not even for Intel, as the more
> established a monopoly becomes, the less likely it is to survive when
> it eventually falls.

China may have the power to break the Wintel monopoly; they can do it
within their internal market by fiat, and then export it cheaply to the
rest of the world.

> But don't sell your Intel stocks short just yet :-)

Indeed.

BGB

unread,

Jan 30, 2014, 4:03:55 PM1/30/14

to

On 1/30/2014 2:06 AM, Terje Mathisen wrote:
> Michael S wrote:
>> On Wednesday, January 29, 2014 11:56:25 PM UTC+2, Quadibloc wrote:
>>> But as far as I know, there have *been* no new CISC designs offered
>>> to the market; the last one was the 680x0, which didn't succeed.
>>
>> I'd rather count x386 as separate from x86. Which makes it newer than
>> 68K.
>
> Not separate imho.
>
> "If I say that a dog's tail is another leg, how many legs does it have?"
>
> "Four. Calling a tail a leg does not make it so."
>
> If the 386 had done a real change of the instruction encoding, maybe
> moving to three-operand and 16 or 32 registers, then it would have been
> really separate, but as it was, with everything just extended to 32
> bits, I thought it was a very nice & natural extension.
>

they did significantly change how the Mod/RM byte worked, added a SIB
byte, ...

in contrast while the move to 64-bits did break binary compatibility, it
also made fewer sweeping changes over-all to the ISA (apart from the REX
prefix).

though, admittedly, I am not entirely happy with the REX prefix. how
they implemented this single feature has caused a mess for programmers
which has now extended into its second decade, and had it been done more
like SSE and AVX (without the otherwise needless breaks in binary
compatibility), the transition to 64-bits could have been smoother.

though likely, otherwise, we could have ended up with globs of 32-bit
only code being crammed into the low 4GB of a process with lots of
64-bit code hanging around with code/data above the 4GB mark.
or otherwise 32-bit apps utilizing the 16x 64-bit registers.

though, I am less happy with how ABI people responded, making an ABI
(SysV/AMD64) which both doesn't really match the performance profile of
the CPUs and also is IMO needlessly complex. though, in my case, for my
projects, I was able to "simplify" it to a degree... and for the most
part, code doesn't notice.

most of the complexity is in uncommon edge-cases, and even with throwing
away a big part of the ABI rules, one can still produce output code
which can (for the most part) remain function-call-compatible with
natively compiled code.

a major place the ABI turns into a mess is when dealing with
passing/returning complex structures by-value. normally, the structures
are supposed to be broken up and partly passed in registers and partly
passed on the stack.

my simplified rules:
it either goes into a single register (if it will fit), or will be
passed/returned via passing a pointer (in a register).

since passing large structs by value is exceedingly rare in my code,
this doesn't actually seem to really be a problem.

I still remain annoyed that the ABI doesn't provide a place to spill
register arguments (forcing the use of temporaries for spilling register
arguments, severely complicating things like "va_list", ...).

but alas...

the pain is high enough to result in ongoing annoyance but not
sufficiently high to deal with the annoyance of using multiple ABIs (and
generate occasional ABI conversion stubs).

or such...

BGB

unread,

Jan 30, 2014, 4:31:12 PM1/30/14

to

On 1/30/2014 2:24 PM, Stephen Sprunk wrote:
> On 30-Jan-14 13:06, Quadibloc wrote:
>> On Thursday, January 30, 2014 11:29:53 AM UTC-7, BGB wrote:
>>> though not really a hardware person, I am left to wonder why, for
>>> example, the CISC ISA stuff can't just run in firmware?...
>>>
>>> if done well, the performance overhead of the translation should be
>>> fairly modest.
>>
>> "Modest" doesn't really cut it. Performance is a very competitive
>> aspect of computer design. Thus, while current CISC implementations
>> do involve a sort of translation layer between CISC instructions and
>> RISC-like micro-ops, it's wired into the chip to keep overhead to an
>> absolute minimum.
>
> Transmeta failed because their "Code Morphing" (i.e. x86 translation)
> firmware ate up all of the power savings of using a simpler VLIW core,
> and users described it as "sluggish" to boot. That doesn't sound like a
> "modest" cost.
>

hence the "if done well" part.

getting good performance from a JIT requires a bit of work:
"if done well", it can produce code with performance competitive with
native code, with the actual time spent in the JIT compiler kept fairly
small;
if not done well, then one either ends up with the JIT eating lots of
time, or with output code that is still several times slower than native.

I am not really familiar with the specifics of Transmeta's
implementation, but don't necessarily think that the idea is itself
non-workable.

though, more likely that it makes sense more in the context of
cost-minimization, than in trying for maximum performance.

an example would be, say, a person wants to run something like 32-bit
Windows XP on Raspberry Pi style HW (using cheap 3rd party SOCs, ...).

so, an x86 emulation layer is put into the ROM (in place of an OS
firmware image or similar), then proceeds to boot the thing like it were
a lower-end PC (treating the SD card as an ATA HDD, ...).

Joe keane

unread,

Jan 30, 2014, 5:58:02 PM1/30/14

to

In article <f8534ec2-dfe6-4156...@googlegroups.com>,

Quadibloc <jsa...@ecn.ab.ca> wrote:
>But as far as I know, there have *been* no new CISC designs offered to
>the market; the last one was the 680x0, which didn't succeed.

Hammer is the most important ISA of the last 20 years.

I'm telling you!

I guess it gets a big yawn here.

Vince Weaver

unread,

Jan 30, 2014, 10:32:08 PM1/30/14

to

On 2014-01-30, Paul A. Clayton <paaron...@gmail.com> wrote:
> (It is not obvious if Thumb2 and microMIPS count as new
> ISAs. They are substantially different encodings and have
> somewhat different operations. They can even have
> implementations that do not support the parent ISA.
> *Technically* they should probably be considered distinct
> ISAs since the encodings are more different than one
> might typically justify by "just a mode", but this seems
> to be a gray area.)

Thumb2 is an interesting case because it is pretty much entirely
compatible at the assembly level with ARM32. You just set a flag
in the assembly source to choose which to target. Sure there are some
games with the IT instruction (to handle the lack of predication)
but it's a neat trick.

I've heard people bring up the 8080/8008 or 8080/z80 architectures
as being somehow compatible at the assembly level but I don't think
that's anything near as clean as Thumb2/ARM32.

Vince

Vince Weaver

unread,

Jan 30, 2014, 10:39:22 PM1/30/14

to

On 2014-01-30, Stephen Sprunk <ste...@sprunk.org> wrote:
>
> Transmeta failed because their "Code Morphing" (i.e. x86 translation)
> firmware ate up all of the power savings of using a simpler VLIW core,
> and users described it as "sluggish" to boot. That doesn't sound like a
> "modest" cost.

Many years ago I worked at a company making Transmeta-powered "web-pads".

From what I understood (I could be remembering wrong) the problem was
that for the code morphing firmware to run properly it needed a huge amount
of RAM as cache, something like 16MB. These days that wouldn't be a lot,
but the systems we were building only had 64MB to begin with, so having
that cut back to 48MB really slowed things down (the machines ran
Linux, and, just before it all got shut down in the .com crash, they
were trying Windows on the things).

I'm not sure how power hungry it all was, but it was amazing that the
chips ran cool to the touch even without a heatsink, something you didn't
see in other x86 chips at the time.

Vince

Joe keane

unread,

Jan 30, 2014, 11:01:56 PM1/30/14

to

In article <0ca37cbf-dc54-4775...@googlegroups.com>,

Paul A. Clayton <paaron...@gmail.com> wrote:

>It is not obvious how common such accesses are,

78% dd(rr)
13% (rr)
6% dd(rr,rr,s)
3% (rr,rr,s)

EricP

unread,

Jan 30, 2014, 11:23:09 PM1/30/14

to

I take it that dd(rr,rr,s) is offset+base_reg+(index_reg<<scale).
What language construct makes use of that, the offset in particular.

Eric

Robert Wessel

unread,

Jan 31, 2014, 12:24:53 AM1/31/14

to

Index into an array in a structure, with the array positioned other
than at the beginning of the structure.

Ivan Godard

unread,

Jan 31, 2014, 12:50:03 AM1/31/14

to

Or A[i].f

Stephen Fuld

unread,

Jan 31, 2014, 2:53:55 AM1/31/14

to

On 1/29/2014 9:54 AM, Stephen Sprunk wrote:

snip

> Of course. But times change. An interesting question is why, other
> than simple inertia, CISC designs continue to be successful.

Yes, it is an interesting question. I think it is inertia, but it isn't
that simple. A new contender has to provide enough of an advantage over
the incumbent to make it worth while to switch. Initially, RISC seemed
to provide such an advantage, primarily through improved performance and
smaller cores which, other things equal would be less costly. However,
there were three major forces which moved against that advantage.

1. Intel, and to a lesser extent, AMD spent huge amounts of money on
engineers to develop advances in micro architecture and in better
fabrication capabilities than their competitors. These resulted in CISC
becoming competitive in performance with RISCs.

2. Technology advanced to the point where the smaller cores didn't make
much difference. Today, the cores occupy a small percentage of chip
area compared to caches, which, of course, are the same size weather the
core is RISC or CISC.

3. Regression toward the mean. RISC designs inevitably got more complex
and moved away from the original strict RISC dogma to develop designs
that were closer to CISC. And the CISC designs got more RISC like with
things like unified register set (X86), and becoming RISC internally
with a small amount of decode logic to be able to still present the CISC
ISA to the user.

These three factors tended to reduce any advantages that RISC had, until
there was so little RISC advantage that the RISC designs couldn't
overcome the inertia of existing software base and customer expertise.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Terje Mathisen

unread,

Jan 31, 2014, 4:05:38 AM1/31/14

to

Arrays stored within structs, typically with some scalar items first.

With a linked list of such structs you get very nice benefits from that
particular addressing mode.

Nick Maclaren

unread,

Jan 31, 2014, 5:27:53 AM1/31/14

to

In article <TyFGu.491734$tR7.1...@fx22.iad>,

n-D array indexing in loops probably does.

Regards,
Nick Maclaren.

Andreas Eder

unread,

Jan 31, 2014, 9:38:22 AM1/31/14

to

On 29 Jan 2014, Nick Maclaren wrote:

> And then there is the issue that any non-x86 would be a
> Microsoft- free environment, which is perceived by many to be a
> disqualifying factor.

Or a blessing :-)

'Andreas
--
ceterum censeo redmondinem esse delendam.

Stephen Sprunk

unread,

Jan 31, 2014, 11:33:56 AM1/31/14

to

On 30-Jan-14 15:03, BGB wrote:
> On 1/30/2014 2:06 AM, Terje Mathisen wrote:
>> Michael S wrote:
>>> On Wednesday, January 29, 2014 11:56:25 PM UTC+2, Quadibloc
>>> wrote:
>>>> But as far as I know, there have *been* no new CISC designs
>>>> offered to the market; the last one was the 680x0, which didn't
>>>> succeed.
>>>
>>> I'd rather count x386 as separate from x86. Which makes it newer
>>> than 68K.
>>

>> ...

>> If the 386 had done a real change of the instruction encoding,
>> maybe moving to three-operand and 16 or 32 registers, then it would
>> have been really separate, but as it was, with everything just
>> extended to 32 bits, I thought it was a very nice & natural
>> extension.
>
> they did significantly change how the Mod/RM byte worked, added a
> SIB byte, ...

Neither really changed how the ISA worked overall, which is remarkable
when you consider they went from a 16:16 segmented memory model to a
flat 32-bit memory model, doubled the width of the GPRs and made the
GPRs (mostly) orthogonal.

> in contrast while the move to 64-bits did break binary compatibility,
> it also made fewer sweeping changes over-all to the ISA (apart from
> the REX prefix).

They could have hacked 64-bit into the existing model, but doubling the
number of GPRs was long overdue, and that necessarily broke backward
compatibility.

> though, admittedly, I am not entirely happy with the REX prefix. how
> they implemented this single feature has caused a mess for
> programmers which has now extended into its second decade,

REX is only a "mess" for assembler authors, and there are only a handful
of those worldwide; it's transparent to everyone else.

> and had it been done more like SSE and AVX (without the otherwise
> needless breaks in binary compatibility), the transition to 64-bits
> could have been smoother.

See above.

I gotta say, though, the VEX prefix is really clever.

> though, I am less happy with how ABI people responded, making an ABI
> (SysV/AMD64) which both doesn't really match the performance profile
> of the CPUs and also is IMO needlessly complex. though, in my case,
> for my projects, I was able to "simplify" it to a degree... and for
> the most part, code doesn't notice.

What's "needlessly complex" about it? The only major change from the
32-bit ABI is the switch to a register calling convention, which means
less stack pressure--also long overdue. There are complicated rules for
what goes where, but that's the nature of the beast; other platforms
with register calling conventions have roughly the same complexity.

> I still remain annoyed that the ABI doesn't provide a place to spill
> register arguments (forcing the use of temporaries for spilling
> register arguments, severely complicating things like "va_list",
> ...).

IIRC, the ABI requires space to be reserved on the stack for register
parameters so the callee has a place to spill them if needed.

Varargs are passed the same way they were in the 32-bit ABI.

Paul A. Clayton

unread,

Jan 31, 2014, 2:54:03 PM1/31/14

to

What is the source (workload; dynamic vs. static) for these
numbers. The frequency of indexed addressing seems to *hint*
at these being dynamic values.

(I am also curious how the statistics were gathered. ISA and
compiler peculiarities may have influenced instruction selection.)

I am surprised that use would be that high. I would have
thought that (rr, rr, s) would have been converted by the
compiler to remove the scale factor in most cases (though
perhaps that violates the language and makes debugging
more difficult) by incrementing by the scale factor or
otherwise using index*scale. (Using index*scale has the
potential disadvantage of requiring more storage, e.g.,
if the index fits in 8 bits or 16 bits but index*scale
does not and the value is stored in memory rather than
being kept in a register.)

(I also wonder how much of the dd(rr) accesses were
stack or global/TLS scalar accesses rather than
"structure" accesses. The frequency of register-indirect
also seems a tiny bit high; I would have guessed 10%.)

I would also have guessed that most A[i].m accesses
would have been simplified in a similar manner
(particularly for unit stride accesses).

S.a[i] accesses might less friendly to optimization.
For variable length arrays, one also could not generally
place the array at the start of the structure. (Having
the array index backward from the "base" of the structure
might not be practical. Other code might expect forward
indexing and using an offset "base" address might not
work well with common memory allocation systems.)

BGB

unread,

Jan 31, 2014, 3:09:32 PM1/31/14

to

On 1/31/2014 10:33 AM, Stephen Sprunk wrote:
> On 30-Jan-14 15:03, BGB wrote:
>> On 1/30/2014 2:06 AM, Terje Mathisen wrote:
>>> Michael S wrote:
>>>> On Wednesday, January 29, 2014 11:56:25 PM UTC+2, Quadibloc
>>>> wrote:
>>>>> But as far as I know, there have *been* no new CISC designs
>>>>> offered to the market; the last one was the 680x0, which didn't
>>>>> succeed.
>>>>
>>>> I'd rather count x386 as separate from x86. Which makes it newer
>>>> than 68K.
>>>
>>> ...
>>> If the 386 had done a real change of the instruction encoding,
>>> maybe moving to three-operand and 16 or 32 registers, then it would
>>> have been really separate, but as it was, with everything just
>>> extended to 32 bits, I thought it was a very nice & natural
>>> extension.
>>
>> they did significantly change how the Mod/RM byte worked, added a
>> SIB byte, ...
>
> Neither really changed how the ISA worked overall, which is remarkable
> when you consider they went from a 16:16 segmented memory model to a
> flat 32-bit memory model, doubled the width of the GPRs and made the
> GPRs (mostly) orthogonal.
>

could be.

though, segmentation could be used with 32-bit code, generally no one
bothered (apart from using FS/GS for things like thread-local-storage
and similar).

>> in contrast while the move to 64-bits did break binary compatibility,
>> it also made fewer sweeping changes over-all to the ISA (apart from
>> the REX prefix).
>
> They could have hacked 64-bit into the existing model, but doubling the
> number of GPRs was long overdue, and that necessarily broke backward
> compatibility.
>

I don't agree on this point.

they added SSE and so on without breaking compatibility (which,
similarly, added new registers).

potentially, the added GPRs could have been added with similar
properties, just there would have been a slight lag until any 32-bit
OSes preserved them ("use at your own risk"), ...

though, it could have led to a potentially longer/less efficient
instruction encoding.

later on, the VEX prefix and similar were added, with the restriction of
keeping the 8x 32-bit GPRs restriction seeming fairly arbitrary at the
ISA level. theoretically, they could have just allowed using a VEX
prefix in-place of a REX prefix and getting the extended GPRs in 32-bit
mode. but, they didn't (my x86 interpreter actually did it this way,
calling this new construction "PREX" for "Pseudo-REX").

theoretically, this could have also altered the development path of
64-bit ISAs though.

>> though, admittedly, I am not entirely happy with the REX prefix. how
>> they implemented this single feature has caused a mess for
>> programmers which has now extended into its second decade,
>
> REX is only a "mess" for assembler authors, and there are only a handful
> of those worldwide; it's transparent to everyone else.
>

the encoding of REX (among a few other things) is a big part of the
break in compatibility between 32-bit and 64-bit code.

the alternative would be a path where there was no real break between 32
and 64-bit modes, and where things expanded more "naturally".

>> and had it been done more like SSE and AVX (without the otherwise
>> needless breaks in binary compatibility), the transition to 64-bits
>> could have been smoother.
>
> See above.
>
> I gotta say, though, the VEX prefix is really clever.
>

yeah.

though sad that it wasn't used to address the existing issues, say, as a
REX alternative.

"hey, now you can use the 16x 64-bit GPRs in 32-bit code!".

>> though, I am less happy with how ABI people responded, making an ABI
>> (SysV/AMD64) which both doesn't really match the performance profile
>> of the CPUs and also is IMO needlessly complex. though, in my case,
>> for my projects, I was able to "simplify" it to a degree... and for
>> the most part, code doesn't notice.
>
> What's "needlessly complex" about it? The only major change from the
> 32-bit ABI is the switch to a register calling convention, which means
> less stack pressure--also long overdue. There are complicated rules for
> what goes where, but that's the nature of the beast; other platforms
> with register calling conventions have roughly the same complexity.
>

there was the Win64 convention, which was a little more sane (and
considerably simpler).

it also passes and returns structs by (always) passing a reference in a
register, ...

but, I am talking about SysV/AMD64...

the main ugly needlessly-complex case in the ABI is the rules for
passing structures by-value, which effectively involves decomposing them
and passing individual fields in registers, with some cases of multiple
fields being packed into a single register, ...

I was just like "screw this" and didn't bother with a lot of this.

ex, struct foo_s { int x, y; float a, b, c, d; long long z, w; };

void foo(struct foo_s foo, int s, int t, int u, int v);

would be passed as:
RDI: x and y
RSI: z
RDX: w
RCX: s
R8: t
R9: u
XMM0: a, b, c, d
[RSP+0]: v

in my lazy/hacked version:
RDI: &foo
RSI: s
RDX: t
RCX: u
R8: v

and, in Win64:
RCX: &foo
RDX: s
R8: t
R9: u
[RSP+32]: v

however:
struct bar_s { float x, y, z, w; };
void bar(struct bar_s bar, int s, int t, int u, int v);

both versions (of SysV/AMD64):
RDI: s
RSI: t
RDX: u
RCX: v
XMM0: x, y, z, w

Win64:
RCX: &bar
RDX: s
R8: t
R9: u
[RSP+32]: v

the ABI would also return structures decomposed into registers, whereas
in my lazy version, it just passes a register giving an address to put
the returned struct into (if the whole struct can't be returned either
in RAX or XMM0).

>> I still remain annoyed that the ABI doesn't provide a place to spill
>> register arguments (forcing the use of temporaries for spilling
>> register arguments, severely complicating things like "va_list",
>> ...).
>
> IIRC, the ABI requires space to be reserved on the stack for register
> parameters so the callee has a place to spill them if needed.
>
> Varargs are passed the same way they were in the 32-bit ABI.
>

you sure you aren't thinking of Win64 (the Windows 64-bit ABI)?...

Win64 provides space to spill into, but SysV/AMD64 (the Linux/OSX/...
ABI) does not.

in SysV/AMD64, after the register args, the first place on-stack is used
for the first non-register argument.

like:
if you have integer args A-K, A-F will be passed in regs (SysV/AMD64
passes 6 args in registers, vs 4 for Win64), the first space on-stack
will be for G.

in a more sanely designed ABI, unused space would be left on the stack
for arguments A-F, then G would follow immediately afterwards, but it is
not.

varargs *does not* work the same as in x86 cdecl.

in x86 cdecl, you just need a pointer to the stack, and walk along
linearly. this will not work with SysV/AMD64, and the algorithm for
walking the argument list is a bit more involved.

however, in Win64, it is possible to spill the register arguments and
then read the argument list similarly to x86 cdecl.

Dombo

unread,

Jan 31, 2014, 4:40:49 PM1/31/14

to

Op 31-Jan-14 8:53, Stephen Fuld schreef:

> On 1/29/2014 9:54 AM, Stephen Sprunk wrote:
>
> snip
>
>> Of course. But times change. An interesting question is why, other
>> than simple inertia, CISC designs continue to be successful.
>
>
> Yes, it is an interesting question. I think it is inertia, but it isn't
> that simple. A new contender has to provide enough of an advantage over
> the incumbent to make it worth while to switch. Initially, RISC seemed
> to provide such an advantage, primarily through improved performance and
> smaller cores which, other things equal would be less costly. However,
> there were three major forces which moved against that advantage.
>
>
> 1. Intel, and to a lesser extent, AMD spent huge amounts of money on
> engineers to develop advances in micro architecture and in better
> fabrication capabilities than their competitors. These resulted in CISC
> becoming competitive in performance with RISCs.
>
> 2. Technology advanced to the point where the smaller cores didn't
> make much difference. Today, the cores occupy a small percentage of
> chip area compared to caches, which, of course, are the same size
> weather the core is RISC or CISC.

Which might also imply that higher code density is more beneficial than
simpler instruction decoding logic.

BGB

unread,

Jan 31, 2014, 6:43:44 PM1/31/14

to

taken further, this also applies to data:
one can also gain a speedup in some cases by using more densely packed
data-structures in memory (using smaller element types, ...), mostly by
helping data to fit in cache better (sometimes in the face of an
increased risk of bit-twiddly);
in some cases, even keeping data in partially compressed forms can help
(where the cost of the additional complexity is outweighed by allowing
more data to fit in cache, though in some cases the compression strategy
may actually make the overall computational complexity lower as well).

even if all this kind of goes in the face of a lot of conventional wisdom.

timca...@aol.com

unread,

Jan 31, 2014, 7:05:37 PM1/31/14

to

On Tuesday, January 28, 2014 6:07:54 PM UTC-5, Quadibloc wrote:
> Perhaps this question would be better asked of the CDC 6600, but it had block >move instructions -

The CMU was an optional feature, I think developed primarily for the Cobol customers. It was not offered as an option on later models (e.g. 7600).
It allowed denser code, but I would be surprised if it was really any
faster than a well written asm function.

- Tim

Robert Wessel

unread,

Jan 31, 2014, 11:25:29 PM1/31/14

to

Well the 8080 was not source compatible with the 8008, but it was
close enough that conversion was not usually too difficult (there were
some major differences - for example, instead of a stack in main
memory, the 8008 had a hardware stack for return addresses).

The 8086 was source compatible with the 8080, in that if you used the
right register mapping, all 8080 instructions could be encoded in
semantically equivalent 8086 sequences (all but a handful* of 8080
instructions encoded as one or two** 8086 instructions). A number of
programs were ported from 8080 to 8086 that way.

8080 and z-80 are a bit different. The Z-80 was a strict binary
superset of the 8080, so any 8080 binaries could run on a Z-80 (the
usual caveats about resources and timing apply), but there were a
bunch more instructions. Zilog, OTOH, defined a completely different
set of assembler mnemonics for the Z-80, although obviously it was
possible to translate any 8080 instruction into a Z-80 instruction.

*If memory serves, there were five 8080 instructions that required
more than two 8086 instructions to emulate, none more than five
instructions, although if you didn't need the exact flag setting
behavior, several had shorter sequences.

**Many of the 8080 instructions that required two 8086 instructions
were fairly rarely used - for example the 8008 had conditional
subroutine call and return instructions - the 8086 equivalent was the
obvious reversed-sense conditional branch around an unconditional call
or return.

Stephen Fuld

unread,

Feb 1, 2014, 2:07:27 AM2/1/14

to

On 1/31/2014 1:40 PM, Dombo wrote:
> Op 31-Jan-14 8:53, Stephen Fuld schreef:

snip

>> 2. Technology advanced to the point where the smaller cores didn't
>> make much difference. Today, the cores occupy a small percentage of
>> chip area compared to caches, which, of course, are the same size
>> weather the core is RISC or CISC.
>
> Which might also imply that higher code density is more beneficial than
> simpler instruction decoding logic.

Yes, but . . . Cache is a diminishing return game. As caches get
larger with improved technology, the benefits of higher code density
diminish. I don't have good figures on this, but how many programs
today are significantly hurt by i-cache misses? I suspect it is very
modest.

But if the penalty for the extra decode difficulty is small, and the
extra benefit of smaller footprint is small, it gets hard to see a big
advantage to either.

Stephen Fuld

unread,

Feb 1, 2014, 2:13:28 AM2/1/14

to

On 1/29/2014 1:56 PM, Quadibloc wrote:

> On Wednesday, January 29, 2014 10:54:50 AM UTC-7, Stephen Sprunk wrote:
>
>> Of course. But times change. An interesting question is why, other
>> than simple inertia, CISC designs continue to be successful.
>

> It's been noted that CISC isn't really that much outclassed by RISC.

>
> But as far as I know, there have *been* no new CISC designs offered to the market; the last one was the 680x0, which didn't succeed.

This doesn't change the thrust of the argument, but I think that both
the National Semi NS 16032 and the Western Electric MAC32/32100 both
post date the 680x0, but neither of them succeeded even as well as the
680x0.

Ivan Godard

unread,

Feb 1, 2014, 2:16:13 AM2/1/14

to

On 1/31/2014 11:07 PM, Stephen Fuld wrote:
> On 1/31/2014 1:40 PM, Dombo wrote:
>> Op 31-Jan-14 8:53, Stephen Fuld schreef:
>
> snip
>
>>> 2. Technology advanced to the point where the smaller cores didn't
>>> make much difference. Today, the cores occupy a small percentage of
>>> chip area compared to caches, which, of course, are the same size
>>> weather the core is RISC or CISC.
>>
>> Which might also imply that higher code density is more beneficial than
>> simpler instruction decoding logic.
>
> Yes, but . . . Cache is a diminishing return game. As caches get
> larger with improved technology, the benefits of higher code density
> diminish. I don't have good figures on this, but how many programs
> today are significantly hurt by i-cache misses? I suspect it is very
> modest.

Ever hear of bloatware?

Terje Mathisen

unread,

Feb 1, 2014, 4:02:58 AM2/1/14

to

BGB wrote:
> On 1/31/2014 3:40 PM, Dombo wrote:
>> Which might also imply that higher code density is more beneficial than
>> simpler instruction decoding logic.
>>

Almost certainly true these days: Doing a bit more work (i.e. when
decoding) in order to reduce the amount of code bytes loaded is almost
certainly a win.

>
> taken further, this also applies to data:
> one can also gain a speedup in some cases by using more densely packed
> data-structures in memory (using smaller element types, ...), mostly by
> helping data to fit in cache better (sometimes in the face of an
> increased risk of bit-twiddly);
> in some cases, even keeping data in partially compressed forms can help
> (where the cost of the additional complexity is outweighed by allowing
> more data to fit in cache, though in some cases the compression strategy
> may actually make the overall computational complexity lower as well).
>
> even if all this kind of goes in the face of a lot of conventional wisdom.
>

David Stafford, the guy I mentioned yesterday, took my .sig and modified
it slightly:

"almost all programming can be viewed as an exercise in compression"

I will of course claim that compression is just one of the things you
have to consider in order to make your caches as efficient as possible.
:-)

Terje Mathisen

unread,

Feb 1, 2014, 4:07:37 AM2/1/14

to

Stephen Fuld wrote:
> On 1/31/2014 1:40 PM, Dombo wrote:
>> Op 31-Jan-14 8:53, Stephen Fuld schreef:
>
> snip
>
>>> 2. Technology advanced to the point where the smaller cores didn't
>>> make much difference. Today, the cores occupy a small percentage of
>>> chip area compared to caches, which, of course, are the same size
>>> weather the core is RISC or CISC.
>>
>> Which might also imply that higher code density is more beneficial than
>> simpler instruction decoding logic.
>
> Yes, but . . . Cache is a diminishing return game. As caches get
> larger with improved technology, the benefits of higher code density

Larger caches are slower caches.

> diminish. I don't have good figures on this, but how many programs
> today are significantly hurt by i-cache misses? I suspect it is very
> modest.

Modest in number but not in business impact?

I.e. Oracle and other DBs.

>
> But if the penalty for the extra decode difficulty is small, and the
> extra benefit of smaller footprint is small, it gets hard to see a big
> advantage to either.

Sure, all designers are trying to find the sweet spot near the optimal
point on the space/complexity curve.

I do believe though that with the decreasing cost of gates and
increasing cost of large&fast (compared to core speeds) memories, the
balance point has moved a bit towards more compact instruction set.

Nick Maclaren

unread,

Feb 1, 2014, 5:02:02 AM2/1/14

to

In article <lcid8n$sgk$1...@speranza.aioe.org>,

Terje Mathisen <terje.m...@tmsw.no> wrote:
>BGB wrote:
>> On 1/31/2014 3:40 PM, Dombo wrote:
>>> Which might also imply that higher code density is more beneficial than
>>> simpler instruction decoding logic.
>>>
>
>Almost certainly true these days: Doing a bit more work (i.e. when
>decoding) in order to reduce the amount of code bytes loaded is almost
>certainly a win.
>>
>> taken further, this also applies to data:
>> one can also gain a speedup in some cases by using more densely packed
>> data-structures in memory (using smaller element types, ...), mostly by
>> helping data to fit in cache better (sometimes in the face of an
>> increased risk of bit-twiddly);
>> in some cases, even keeping data in partially compressed forms can help
>> (where the cost of the additional complexity is outweighed by allowing
>> more data to fit in cache, though in some cases the compression strategy
>> may actually make the overall computational complexity lower as well).
>>
>> even if all this kind of goes in the face of a lot of conventional wisdom.

Modern performance is almost entirely about data access, and actual
execution performance (in the strict sense) is usually lost in the
noise. It makes a very big difference, of course, when it slows
down the data access, so they aren't independent.

And, no, that is not conventional wisdom!

>David Stafford, the guy I mentioned yesterday, took my .sig and modified
>it slightly:
>
>"almost all programming can be viewed as an exercise in compression"
>
>I will of course claim that compression is just one of the things you
>have to consider in order to make your caches as efficient as possible.
>:-)

"Almost all programming can be viewed as an exercise in buzzword
bingo" :-)

While there is truth in both statements, you know as well as I do
that such aphorisms are rarely more than over-simplifications.
Not merely would I say that caching is just one of the things you
should consider when optimising data access, the algorithmic
aspects (i.e. using a different logical/mathematical formulation
of the same problem) are still major factors in programming.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Feb 1, 2014, 8:32:38 AM2/1/14

to

This is the _real_ core of my .sig!

Any improved program/function/algorithm which you
invent/implement/verify & debug is in fact a way to cache the most
important information by far, i.e. the best way to solve a particular
problem!
:-)

This is also the point where I lose many/most of the people who ask me
about that quote. :-(

Terje

--
- <Terje.Mathisen at tmsw.no>

"almost all programming can be viewed as an exercise in caching"

Nick Maclaren

unread,

Feb 1, 2014, 9:28:54 AM2/1/14

to

In article <lcit2b$k9q$1...@speranza.aioe.org>,

Terje Mathisen <terje.m...@tmsw.no> wrote:
>>>
>>> David Stafford, the guy I mentioned yesterday, took my .sig and modified
>>> it slightly:
>>>
>>> "almost all programming can be viewed as an exercise in compression"
>>>
>>> I will of course claim that compression is just one of the things you
>>> have to consider in order to make your caches as efficient as possible.
>>> :-)
>>
>> "Almost all programming can be viewed as an exercise in buzzword
>> bingo" :-)
>>
>> While there is truth in both statements, you know as well as I do
>> that such aphorisms are rarely more than over-simplifications.
>> Not merely would I say that caching is just one of the things you
>> should consider when optimising data access, the algorithmic
>> aspects (i.e. using a different logical/mathematical formulation
>> of the same problem) are still major factors in programming.
>
>This is the _real_ core of my .sig!
>
>Any improved program/function/algorithm which you
>invent/implement/verify & debug is in fact a way to cache the most
>important information by far, i.e. the best way to solve a particular
>problem!
>:-)

We are certainly agreed there!

>This is also the point where I lose many/most of the people who ask me
>about that quote. :-(

Including me, I am afraid! I can't make the term 'caching' make
sense in terms of (say) using the strong pseudo-primality test
instead of brute force. And that there are equivalent algorithms
that are even less closely related.

I take your point in terms of (say) comparison sorting, and even
many forms of numerical decomposition, where the theory is used to
order the operations so that fewer are needed. But where even the
basic operations are different is another matter.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Feb 1, 2014, 11:45:12 AM2/1/14

to

Including you. :-(

English is definitely not my native language.

What I'm trying to say is that the entire history of algorithms &
methods consists of having inventors coming up with brand new (or just
improved) way of solving a particular problem, then publishing it.

The act of implementing & publishing each such algorithm is imho an
extremely efficient way to cache the underlying idea, saving the rest of
us from having to re-invent it.

Terje

--
- <Terje.Mathisen at tmsw.no>

"almost all programming can be viewed as an exercise in caching"