Intel got a RISC cooking?

Eliot Shepard

unread,

Mar 24, 1994, 6:25:52 PM3/24/94

to

With the P6 announcement, etc., Intel's public position is clearly
supportive of the 80x86 line for the forseeable future. However, my
impression of them is of a company who covers their back well. With the
rising public awareness of RISC and its theoretical and demonstrated
advantages, I am curious about how well Intel is hedging its bets.

This is probably closely guarded information, but have there been any
rumors of Intel working on a "mass-market" RISC chip (ie: other than
the ix60s?) Any specifics?

-Eliot

--
Eliot Shepard Smithsonian Astrophysical Observatory
el...@asc.harvard.edu #include <std/disclaimer.h>
URL: http://hea-www.harvard.edu/~eliot/eshepard.html

Herman K.S. Lau

unread,

Mar 25, 1994, 8:25:32 AM3/25/94

to

Eliot Shepard (el...@asc.harvard.edu) wrote in comp.arch:

> -Eliot

As a close to complete newbie in the field of computer architecture, I
would like to ask this question (please, no flames, I don't know what I'm
doing): Is it possible for Intel to make a RISC that is compatible with
the x86 line? And if so is this a bad idea? Thanks.

Torben AEgidius Mogensen

unread,

Mar 25, 1994, 1:16:36 PM3/25/94

to

l...@eniac.seas.upenn.edu (Herman K.S. Lau) writes:

>As a close to complete newbie in the field of computer architecture, I
>would like to ask this question (please, no flames, I don't know what I'm
>doing): Is it possible for Intel to make a RISC that is compatible with
>the x86 line? And if so is this a bad idea? Thanks.

That depends both on what you mean by "RISC" and what you mean by
"compatible". Below I will use the following "definition" of RISC:

A RISC has:

1) Fixed instruction size with no operands in the following words.

2) Few different instruction formats (1-4).

3) A relatively large register set (32 or more general purpose
registers).

A RISC does NOT:

1) Access more than one memory location in any single instruction.

2) Have instructions that perform an unbounded amount of
computation.

3) Support unaligned instruction placement or memory access.

Some leeway is usually allowed. Some generally-agreed-upon RISCs have
instructions that access multiple memory locations, but with some
restrictions (usually only adjacent memory locations are addressed).
Some have less than 32 GP registers. Some support unaligned memory
access (but usually not instruction placement). But if you break too
many rules or break one in too drastic a way, you won't get recognized
as RISC.

If a processor should be fully compatible with the x86 line, it should
implement its full instruction set. But the x86 instruction set break
all the rules above: there are many different instruction formats of
many different sizes, there are few GP registers, some instructions
(block move etc.) can do an unbounded amount of work and access an
unbounded number of memory locations. Also, unaligned memory access is
supported. Hence a fully compatible processor can't be RISC.

A partly compatible processor could implement most of the x86
instruction set using a different encoding of the instructions. It
could even have a larger register set and make any operation possible
on one register be possible on any other (no special segment registers
etc.). This way the majority of the x86 instructions could map to
single instructions in the new processor and the remainding could map
to (short) sequences of instructions. This processor could very well
be RISC, but it wouldn't solve all problems of object code
translation. In particular, indirect (computed) jumps and
self-modifying code is difficult to handle.

A third possibility is to forget about a direct mapping of x86
instructions to the new processor, but provide facilities that avoids
the worst problems with emulation, such as condition codes and data
formats.

Of course, Intel could choose not to worry about compatibility at all,
just going for the fastest possible processor. If the majority of
costumers migrate to WNT or OSF/1, there is less reason to provide
efficient support for running old x86 binaries.

Torben Mogensen (tor...@diku.dk)

P.S.
If Intel is really working on a RISC processor to replace the x86
series, don't expect to hear about it until it is (nearly) ready to
market. Preannouncement of P6 etc. will largely protect Intels sale of
its existing chips (by delaying migration to other processor
families), but preannouncement of a RISC processor may have the
opposite effect.

103t_e...@west.cscwc.pima.edu

unread,

Mar 27, 1994, 5:06:43 PM3/27/94

to

In article <lovejoyaC...@netcom.com>, love...@netcom.com (Alan Lovejoy) writes:
> What is far less clear is the OS and GUI software that most people will
> want to run on their PowerPCs in 1999. Could be WNT. Could be Pink.
> Could be Workplace OS with personalities for MacOS, Windows, OS/2 and Unix.
> Should be a very interesting five years ahead.
>

You left out Magic Cap with a voice-interface...
>
>
>
>
Lawson

Dennis O'Connor

unread,

Mar 28, 1994, 9:12:35 AM3/28/94

to

r...@tyrell.net (Bob Daniel) writes:
] Last I heard from a source inside Intel...
]
] P6 will be x86 compatible but have a RISC command set. It is a transition
] chip for P7 which will be 100% (?) RISC with no x86 compatability. x86
] in P7 will be done in emulation.

This does not agree with my understanding of the situation. I have
to admit as not being even in the same state as the guys doing the
P6 and P7 work. But this statement doesn't agree with what I do know
about the projects.

The official public statements are that Intel will keep making
faster and faster processors that will continue to execute the
Intel Architrecture ISA. That said, the implementation of
these processors is another issue entirely, as would be any
extensions to the ISA.

Sorry, I can't say any more than that.

--
Dennis O'Connor doco...@sedona.intel.com
Intel i960(R) Microprocessor Division Solely responsible for what I do.
"In spite of the cost of living, it's still popular."

Eric Koldinger

unread,

Mar 28, 1994, 6:42:43 PM3/28/94

to

In article <940327101...@geweke.ppp.msu.edu> gewe...@studentg.msu.edu (Andrew Geweke) writes:
>In article <lovejoyaC...@netcom.com>, love...@netcom.com (Alan

>If the "big boy" clone makers start making PPC615-based machines, they're
>going to do so at a price/performance point that Intel absolutely can't
>reach. Intel is rumored to be sinking just ungodly amounts of money into R&D.
>That translates into much higher prices for the P5, P6, and P7. A $350 PPC
>615 that runs significantly faster than a $1000 P5 or P6 is going to have a
>huge impact.

An interesting idea. However, let's assume that both Intel and IBM have
reasonably equivalent engineers and fabs, with reasonably equivalent costs
for each. The primary factors in cost of a chip are going to be the size
of the chip, and the amount of markup the company charges (profit) (R&D
costs can be amortized over the entire lifecycle of the chip, assuming a
reasonably large number of sales). The Intel chips are typcially larger than the
current crop of PPC's mainly because of maintaing compatibility with the
Intel Architecture (and doing so at high performance)? Where do you get the
idea that once IBM starts adding Intel Archictecture compatibility to the
PPC line that their chips will be significantly smaller? Especially if they
aim to hit the same performance point? Assuming they're building a real PPC
chip with Intel compatibility, they might even be larger, as they'll have to
support two instruction sets on the same chip. If this is true, we can expect
the prices of the two chips less profit to be somewhere in the same range. And
the profit markup will be determined by demand (although IBM could and
probably would keep their profit margin low to get acceptance for the
chip). Furthermore, size will depend significantly on which PPC core they
build on. 604 and 620 will probably both be significantly larger than the
current 601 core.

I can see a few ways that they could make their chip smaller than a genuine
Intel processor. 1) They're smarter than we are. Possible, but I highly
doubt they're smart enough to get Intel Architecture compatibility for
free. 2) they don't implement the entire Intel Architecture. This is a big
possibility, but if they don't run all the software that's out there,
they're looking for trouble. The only thing I soo them being able to leave
out is some of the 32-bit OS-level support (could be emulated in the PPC
OS), but it's not clear how much that would buy them. 3) Magic. I'm
discounting that one.

All this is, of course, moot, as IBM has (to the best of my knowledge)
either confirmed or denied the existance of 615 development, or announced
any plans to make the chip available to outside customers (I heard one
RUMOR that it would only be for IBM's use, to differentiate themselves from
the clones that they hope are going to be built).
--
Eric Koldinger Making the world safe for MP.
kol...@ichips.intel.com Intel Corp, 5200 NE Elam Young Pkwy, JF1-19
Intel Jones Farm Hillsboro, Oregon 97124-6497
This is a private posting and doesn't indicate opinions or positions of Intel.

--
Eric Koldinger Making the world safe for MP.
kol...@ichips.intel.com Intel Corp, 5200 NE Elam Young Pkwy, JF1-19
Intel Jones Farm Hillsboro, Oregon 97124-6497
This is a private posting and doesn't indicate opinions or positions of Intel.

Bob Daniel

unread,

Mar 28, 1994, 8:32:49 PM3/28/94

to

In article <DOCONNOR.94...@sedona.intel.com> doco...@sedona.intel.com (Dennis O'Connor) writes:
>
>r...@tyrell.net (Bob Daniel) writes:
>] Last I heard from a source inside Intel...
>] P6 will be x86 compatible but have a RISC command set. It is a transition
>] chip for P7 which will be 100% (?) RISC with no x86 compatability. x86
>] in P7 will be done in emulation.

>The official public statements are that Intel will keep making

>faster and faster processors that will continue to execute the
>Intel Architrecture ISA. That said, the implementation of
>these processors is another issue entirely, as would be any
>extensions to the ISA.

This does not conflict with the above. To "continue to execute" can
very well be done in emulation.

Alan Lovejoy

unread,

Mar 28, 1994, 11:51:04 PM3/28/94

to

In article <kolding.764877348@ichips> kol...@ichips.intel.com (Eric Koldinger) writes:
>
>All this is, of course, moot, as IBM has (to the best of my knowledge)
>either confirmed or denied the existance of 615 development, or announced
>any plans to make the chip available to outside customers (I heard one
>RUMOR that it would only be for IBM's use, to differentiate themselves from
>the clones that they hope are going to be built).
>--
> Eric Koldinger Making the world safe for MP.
> kol...@ichips.intel.com Intel Corp, 5200 NE Elam Young Pkwy, JF1-19
> Intel Jones Farm Hillsboro, Oregon 97124-6497
>This is a private posting and doesn't indicate opinions or positions of Intel.

I have been informed that IBM neither confirms nor denies anything about any
alleged "PPC 615" project or product. Motorola has nothing to do with it.

So what would be the best possible strategy for the PPC alliance to follow
in order to capture the largest possible market share?

First, they have to maximize the price/performance of systems built with
PPC processors. They can try very agressive cpu implementations (lots of
complicated logic for superscalar, superpiped instruction execution and
other gimmicks), or they can try a very simple engine running at very high
clock rates (the Alpha approach). They can try both approaches (with
different processor models, of course).

The problem with this is that, unless they can get performance relative to
x86 chips that is better by at least a factor of 3, performance alone will
not suffice to really grab much market share from Intel. The prospects of
getting the necessary relative performance has been and will be debated here
endlessly, so I won't address it further now.

The second avenue of attack is software. Apple has shown that it can gain
and keep some market share from the Intel/Microsoft duo with superior
software. The way that Lotus 123 helped MS-DOS/x86 overtake C/PM/z80
illustrates how a "killer app" could be used to help PPC versus x86.
Unfortunately, you can't get a killer app just by wishing for one.
And you can't necessarily guarantee that software won't be ported or
duplicated on MS-DOS/Windows/x86 platforms. Apple has been more successful
than just about anyone at preventing migration of their software and/or
software capabilities to other platforms, but even so one can do Desktop
Publishing and point-and-click icon-based computing on Windows/x86 machines.

A third avenue of attack is emulation of Windows/x86. IBM is apparently
devoting considerable talent and resources to this. See the April issue
of BYTE for details. Briefly, they are working on sophisticated x86 to
PPC dynamic run-time **translation** (not "interpretation") that appears
similar to what the ParcPlace Smalltalk virtual machine does, only better.
The BYTE article states thay they expect to get x86 emulation speeds that
average only 1/3 of native PPC execution speeds using their dynamic
translation technology.

Related to this would be hardware support in the PPC for x86 instruction
execution. An interesting tactic would be to use some combination of
translation software and execution hardware to get near-native performance
when executing x86 instructions. Perhaps the hardware could use something
analogous to instruction decode logic to translate x86 instructions into
PPC instructions for normal execution (with software support for saving and
reusing already-translated code, and for other issues such as op-code
translations that are too difficult to do completely in hardware).

This tactic would be especially effective in a PPC that used a high clock
rate and large L1 cache instead of complicated pipelines and multiple
execution units with Tomasulo scheduling to achieve high performance.

Suppose such a PPC cpu could execute x86 code at least half as fast as the
best x86 chip then available. Since machines with this hypothetical CPU
could execute System7, WorkplaceOS/Taligent, WindowsNT, Windows and
PowerOpen software, at speeds that are quite competitive with the fastest
x86 machines, their market draw would be rather considerable.

I think this is what the PPC alliance has to try to accomplish. The
combination of better native performance, better software and reasonably
fast execution of x86 code can succed at capturing market share from
the Windows/x86 platform. All three components of the strategy are
necessary--and sufficient, if done reasonably well.

--
--

--
Alan Lovejoy | INTERNET: love...@netcom.com | VOICE: 510-208-5664
"Do not go gentle into that good night. Old age should burn and rave
at the closing of the day. Rage, rage at the dying of the light!"

Phil Locascio

unread,

Mar 31, 1994, 10:02:27 AM3/31/94

to

Eliot Shepard (el...@asc.harvard.edu) wrote:
: With the P6 announcement, etc., Intel's public position is clearly

: -Eliot

Hi there, At the time I post this there were 30 responses, so I'm sorry to add
bandwidth. On the subject of Intel RISC chips, I was under the impression they
have been making them for years under the guise of the 80860 (i860). This was
introduced in 1990 (source BYTE) and as a single chip microprocessor makes the
486 look pretty weeny, with a peak performance of 66MFlops (doubl precision @
33MHz). OK, it was never designed to compete with the 486 (well not directly)
but was used to implement iPSC/860 supercomputers. I have seen a number of
advertisments for boards thst plug into PCs with 2 i860s mounted for use as a
sort of coprocessor environment. I,m sure that Intel has been developing RISC
chips for years, and the latest incarnation of the i860 (50 MHz)? shows that
they can indeed make them.
I think one of most important points about the PowerPC chip is that it
provides competition from the point of view of RISC/CISC compatability. Since
the x86 range have compatability to worry about, if an emulation can produce
adequate performance, then RISC applications will be developed in earnest fro
the future.

Phil (ih...@uk.ac.wmin)

--
______________________________________________________________________________
| + |
| Phil LoCascio + _____ ____ |
| + / / / / / / |

Anton Ertl

unread,

Mar 31, 1994, 4:49:54 PM3/31/94

to

In article <CnIx8...@westminster.ac.uk>, ih...@westminster.ac.uk (Phil Locascio) writes:
|> On the subject of Intel RISC chips, I was under the impression they
|> have been making them for years under the guise of the 80860 (i860).

The 80860 can hardly be called an architecture. It has several
programmer-visible implementation details that will make future
implementaions hard. This is not an architecture for a long line of
binary compatible processors that you would imagine as the successor
for the x86 architecture.

On the other hand, IMO the 80860 is ahead of it's time. I think we
have to say goodbye to binary compatibility. (V)LIW machines can be
made cheaper (and/or more powerful) than equivalent superscalars, and
the more instructions are to be executed simultaneously the larger the
difference gets AFAIK. But VLIWs are binary incompatible. So I think
in the long run we have to go to an architecture independent code
format (something like ANDF), and have to convert the code for the
specific processor at installation or load time. Binary compatibility
does not work so well even nowadays. E.g., conider the need to
"Pentium-optimize" code.

BTW, some older RISCs also have implementation stuff in the
architecture, most notably delayed branches. They will probably be a
problem in the long run. The Alpha, which is newer and was designed
with long-term objectives, does not have them, and the 88110 manual
discourages using them, because future implementations might not
support them in hardware (will there be future implementations of the
88k architecture?).

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen

Mark L. Neidengard

unread,

Mar 31, 1994, 5:23:24 PM3/31/94

to

In article <2neuvi$k...@email.tuwien.ac.at>,

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>In article <CnIx8...@westminster.ac.uk>, ih...@westminster.ac.uk (Phil Locascio) writes:
>|> On the subject of Intel RISC chips, I was under the impression they
>|> have been making them for years under the guise of the 80860 (i860).
>
>The 80860 can hardly be called an architecture. It has several
>programmer-visible implementation details that will make future
>implementaions hard. This is not an architecture for a long line of
>binary compatible processors that you would imagine as the successor
>for the x86 architecture.

What is true is that the i860 is the successor to the x86s. Intel has
officially discontinued further development of the chip. However,
the i860 actually had two generations (XR and XP) and was able to
progress in that fasion. It is anything but obvious to me that
the i860 needed be a dead end. In fact, even though development
has stopped, the processor is still being used, for example in the
Paragon XP/S supercomputers. Of course it would be ludicrous to
build future generations of machines with the i860 as more powerful
processors are available, but at the time the Paragon was introduced
it was easily viable enough to keep running today.

>
>On the other hand, IMO the 80860 is ahead of it's time. I think we
>have to say goodbye to binary compatibility. (V)LIW machines can be
>made cheaper (and/or more powerful) than equivalent superscalars, and
>the more instructions are to be executed simultaneously the larger the
>difference gets AFAIK. But VLIWs are binary incompatible. So I think
>in the long run we have to go to an architecture independent code
>format (something like ANDF), and have to convert the code for the
>specific processor at installation or load time. Binary compatibility
>does not work so well even nowadays. E.g., conider the need to
>"Pentium-optimize" code.

I don't think this follows. Experience seems to be showing that
VLIW is not the way to get the best processor performance. Instead,
superscalar instruction units allow architectures to be designed
that easily benefit from more instruction units or faster
instruction units, without needing recompilation. Of course,
recompilation will help, but nothing is made fundamentally
incompatible by adding another floating point adder.
One still wants to compile for optimum effect on any machine, but
in terms of backward compatibility, given a good underlying
architecture, it seems we can have our cake and eat it too.

--
/!\/!ark /!\!eidengard, CS Major, E&AS | Operator, Concurrent Supercomputing
VLSI Design: Asynchronous, Analog, and | Facility: Home to the Touchstone
otherwise; Massively Parallel Hardware | Delta and Paragon XP/S L38; only at
And Software Design and Implementation | The California Institute of Technology

Mark L. Neidengard

unread,

Mar 31, 1994, 5:30:20 PM3/31/94

to

In article <2nf0uc$9...@gap.cco.caltech.edu>,

Mark L. Neidengard <mnei...@frappe.ugcs.caltech.edu> wrote:

>What is true is that the i860 is the successor to the x86s.

I meant to say is NOT the successor to the x86. Drat; now I've started
flame wars forever.

Anton Ertl

unread,

Apr 1, 1994, 4:30:42 PM4/1/94

to

In article <2nf0uc$9...@gap.cco.caltech.edu>, mnei...@frappe.ugcs.caltech.edu (Mark L. Neidengard) writes:
|> In article <2neuvi$k...@email.tuwien.ac.at>,
|> Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
|> >On the other hand, IMO the 80860 is ahead of it's time. I think we
|> >have to say goodbye to binary compatibility. (V)LIW machines can be
|> >made cheaper (and/or more powerful) than equivalent superscalars, and
|> >the more instructions are to be executed simultaneously the larger the
|> >difference gets AFAIK. But VLIWs are binary incompatible. So I think
|> >in the long run we have to go to an architecture independent code
|> >format (something like ANDF), and have to convert the code for the
|> >specific processor at installation or load time. Binary compatibility
|> >does not work so well even nowadays. E.g., conider the need to
|> >"Pentium-optimize" code.
|>
|> I don't think this follows. Experience seems to be showing that
|> VLIW is not the way to get the best processor performance.

Could you elaborate on that? How would a VLIW have less performance
than an equivalent (i.e. the same functional units etc.) superscalar?
It's just the superscalar without the instruction grouping hardware.

|> Instead,
|> superscalar instruction units allow architectures to be designed
|> that easily benefit from more instruction units or faster
|> instruction units, without needing recompilation.

Right. Binary compatibility is the point that superscalars have over
VLIWs.

Now the question is: is binary compatibility valuable enough to
outweigh the cost advantage of a VLIW? Currently yes. But consider
degree 10 machines (i.e. up to 10 instructions/cycle) and imagine wide
availability of, say, ANDF installers. Now you can buy a superscalar
and run the old binaries at low speed. You can use the installer to
get a higher speed binary for your superscalar. Or you can buy a VLIW,
which will have either lower cost than the superscalar or more power
(e.g. more functional units) and use the installer to produce a fast
binary for that.

What would you buy? That probably dependes on the difference in price
and/or performance. I remember having read that the hardware for
checking dependences between instructions grows quadratically with the
degree. If this is right, the difference between superscalars and
VLIWs in cost or power should be decisive for high-degree machines.

Wayne Schlitt

unread,

Apr 1, 1994, 10:49:12 PM4/1/94

to

In article <2nf0uc$9...@gap.cco.caltech.edu> mnei...@frappe.ugcs.caltech.edu (Mark L. Neidengard) writes:
< In article <2neuvi$k...@email.tuwien.ac.at>,
< Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
< >
< >The 80860 can hardly be called an architecture. It has several
< >programmer-visible implementation details that will make future
< >implementaions hard.
<

< What is true is that the i860 is the successor to the x86s. Intel has

^ not

< officially discontinued further development of the chip. However,
< the i860 actually had two generations (XR and XP) and was able to
< progress in that fasion.

The XR and XP are really fairly similar chips. I gave my Intel books
to a friend, so I can't check the details, but, if I remember
correctly, the major differences are the size of the caches and
multiprocess support. The 486DX33 and the DX4 have more differences
than that.

-wayne
--
The Fundamental Problem with USENET is that you have at least a couple
of hours, if not a day or so to think up that witty, absolutely
devastating retort... The other Fundamental Problem is people don't
even take a couple of minutes to think before they hit that send key...

Jeremy Alvah Moore

unread,

Apr 2, 1994, 3:43:40 AM4/2/94

to

Doesn't SGI use the i860 in their Reality Engine2 as the graphics processor
in the Crimson and Onyx line? As a person who does not know about the
internals of processors (yet). Can someone explain what the i860 is designed
for and its specs? Thanks ...
--
--------------------------------------------------------------
Jeremy A. Moore
moo...@cs.odu.edu
Old Dominion University
Norfolk, VA
--------------------------------------------------------------

Mark L. Neidengard

unread,

Apr 2, 1994, 7:54:15 PM4/2/94

to

In article <2nhi7i$s...@email.tuwien.ac.at>,

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>In article <2nf0uc$9...@gap.cco.caltech.edu>, mnei...@frappe.ugcs.caltech.edu (Mark L. Neidengard) writes:
>|>
>|> I don't think this follows. Experience seems to be showing that
>|> VLIW is not the way to get the best processor performance.
>
>Could you elaborate on that? How would a VLIW have less performance
>than an equivalent (i.e. the same functional units etc.) superscalar?
>It's just the superscalar without the instruction grouping hardware.

What I should have said that VLIW is not the best way to get
processor performance, rather than the other way around. Good
point.

>|> Instead,
>|> superscalar instruction units allow architectures to be designed
>|> that easily benefit from more instruction units or faster
>|> instruction units, without needing recompilation.
>
>Right. Binary compatibility is the point that superscalars have over
>VLIWs.
>
>Now the question is: is binary compatibility valuable enough to
>outweigh the cost advantage of a VLIW? Currently yes. But consider
>degree 10 machines (i.e. up to 10 instructions/cycle) and imagine wide
>availability of, say, ANDF installers. Now you can buy a superscalar
>and run the old binaries at low speed. You can use the installer to
>get a higher speed binary for your superscalar. Or you can buy a VLIW,
>which will have either lower cost than the superscalar or more power
>(e.g. more functional units) and use the installer to produce a fast
>binary for that.

It is certainly plausible that silicon real-estate not devoted to
an instruction issuer could be devoted to another functional unit.
But then the burden falls increasingly on the compiler. Chips like
the i860 that only have two functional units visible to the
instruction length have had sucessful compilers built for them. As
the parallelism rises though, utilization of the chip will be harder
and harder to code for with static instruction unit utilization.
But setting this aside, the prospects are actually looking rather
grim for having commercially appealing VLIW systems to run our software
faster and cheaper. ANDF certainly does not have much of a foothold
to speak of, and direct software compatibility is a tremendous
issue. Even if everyone were to start writing all their software
in ANDF right this minute, there is still a tremendous body of
software already in existence that people want to run faster. I
am willing to have one functional unit devoted to instruction
dispatching; the benefits seem to justify it in my mind.

Reinhard Kirchner

unread,

Apr 6, 1994, 10:35:41 AM4/6/94

to

From article <2nkih7$3...@gap.cco.caltech.edu>, by mnei...@sloth.ugcs.caltech.edu (Mark L. Neidengard):

>
> It is certainly plausible that silicon real-estate not devoted to
> an instruction issuer could be devoted to another functional unit.
> But then the burden falls increasingly on the compiler. Chips like
> the i860 that only have two functional units visible to the
> instruction length have had sucessful compilers built for them. As
> the parallelism rises though, utilization of the chip will be harder
> and harder to code for with static instruction unit utilization.
> But setting this aside, the prospects are actually looking rather
> grim for having commercially appealing VLIW systems to run our software
> faster and cheaper. ANDF certainly does not have much of a foothold
> to speak of, and direct software compatibility is a tremendous
> issue. Even if everyone were to start writing all their software
> in ANDF right this minute, there is still a tremendous body of
> software already in existence that people want to run faster. I
> am willing to have one functional unit devoted to instruction
> dispatching; the benefits seem to justify it in my mind.
>

There has been a commercial VLIW, the Multiflow Trace, which had 7, 14 or
21 functional units, and at least the 7-unit machine did quite well.
I know a german university who bought one instead of a vector machine
because their applications where not very vectorizable, but where fast
on the Multiflow. I was also told that compilation took a long time,
which seems obvious.

Multiflow had to stop business, the reason said was they could not design
new hardware fast enough. The single chip superscalar processors got too
fast for their low volume board processors.

And now it comes: the designer of these machines, who did his doctoral
thesis on the VLIW compiler, Mr. Fisher, was then hired by INTEL.
This was after the i860 design, so there may be a VLIW in Intel's future.

The trouble with the i860 was not only the 2 unit VLIW mode, but the
underlying and per software to handle pipelines. How shall a compiler
use results he gets three instructions later ?

I am also not very optimistic about the ANDF, but simply because of the
organisation who does it. It comes out of projects of the European Community,
and there is little professialism in this. Lot is done at small companies,
universities etc., many people are travelling around ... and so on.
This is my personal impression, which I hope is wrong. I would have much
more hope if Sun, HP, DEC or IBM would stand behind ANDF. ( Or Microsoft,
Novell, Borland, to name a few )

Reinhard Kirchner
Univ. of Kaiserslautern, Germany
kirc...@informatik.uni-kl.de

Donald Lindsay

unread,

Apr 6, 1994, 4:40:13 PM4/6/94

to

In article <1994Apr6.1...@uklirb.informatik.uni-kl.de>,

Reinhard Kirchner <kirc...@uklira.informatik.uni-kl.de> wrote:
>And now it comes: the designer of these machines, who did his doctoral
>thesis on the VLIW compiler, Mr. Fisher, was then hired by INTEL.

Actually, Fisher was the advisor for that thesis, and went to HP. But
you are half correct: Intel has some people who have done VLIW.
--
Don D.C.Lindsay Carnegie Mellon Computer Science

Reinhard Kirchner

unread,

Apr 6, 1994, 4:52:20 PM4/6/94

to

From article <1994Apr6.1...@uklirb.informatik.uni-kl.de>, by kirc...@uklira.informatik.uni-kl.de (Reinhard Kirchner):

>
> And now it comes: the designer of these machines, who did his doctoral
> thesis on the VLIW compiler, Mr. Fisher, was then hired by INTEL.

I got several postings stating that Mr. Fisher is with HP not Intel.
But it is true that Intel bought the compiler technology from Multiflow.

I also remember some source saying the successor of PA-RISC will be VLIW.
The Apollo Prism which HP took over in the Apollo merger was a VLIW
with two functional units ( from what could be read in the sales papers )

>
> The trouble with the i860 was not only the 2 unit VLIW mode, but the
> underlying and per software to handle pipelines. How shall a compiler
> use results he gets three instructions later ?

I got mail saying this was not the problem and could be handled easily.
The pain with the i860 was exception handling. I remeber the very complicated
descriptions of exception handling in the 860 user manual.

Allen J. Baum

unread,

Apr 6, 1994, 7:04:22 PM4/6/94

to

In article <1994Apr6.1...@uklirb.informatik.uni-kl.de>,
kirc...@uklira.informatik.uni-kl.de (Reinhard Kirchner) wrote:

> And now it comes: the designer of these machines, who did his doctoral
> thesis on the VLIW compiler, Mr. Fisher, was then hired by INTEL.
> This was after the i860 design, so there may be a VLIW in Intel's future.

Err..
There were 2 commercial VLIW machines: Multiflow and Cydrome

Multiflow was headed by Josh Fisher. He now works at HP Labs - not Intel !!
Cydrome was headed by Bob Rau. He now works at HP Labs - also.

(Hmm, I'm not exactly sure what their positions were at their respective
companies, so 'headed is a delibrately vague term. Apologies to all if I've
got it wrong.)

Mauricio Breternitz Jr.

unread,

Apr 6, 1994, 8:27:14 PM4/6/94

to

Reinhard Kirchner writes:
>
>And now it comes: the designer of these machines, who did his doctoral
>thesis on the VLIW compiler, Mr. Fisher, was then hired by INTEL.
>This was after the i860 design, so there may be a VLIW in Intel's future.
>

Just a factual nit. Josh Fisher, then a professor at Yale (you there, Josh ?)
advised the thesis on VLIW compilation, by J. Ellis ("Bulldog, A Compiler for VLIW Architectures".)
The thesis got an ACM Best Dissertation award, and the book is available from MIT Press.

I heard that Intel had bought rights to the Multiflow compiler, though.
--
Mauricio Breternitz Jr, Ph.D. mb...@futserv.austin.ibm.com
IBM AWD - Internal ZIP 9461
11400 Burnet Road - Austin, TX 78758
Austin TX 78758 phone (512) 838 8531

Torben AEgidius Mogensen

unread,

Apr 7, 1994, 9:23:37 AM4/7/94

to

mnei...@sloth.ugcs.caltech.edu (Mark L. Neidengard) writes:

> But setting this aside, the prospects are actually looking rather
> grim for having commercially appealing VLIW systems to run our software
> faster and cheaper. ANDF certainly does not have much of a foothold
> to speak of, and direct software compatibility is a tremendous
> issue. Even if everyone were to start writing all their software
> in ANDF right this minute, there is still a tremendous body of
> software already in existence that people want to run faster.

A VLIW would be quite good at emulating other processors. We can use
some functional units to decode an instruction while other functional
units perform the action of the previous instruction. In essence we
simulate the pipeline of the emulated processor by assigning the
different functional units to different pipeline stages. Quite a lot
of functional units can be effectively utilized in this fashion.

This means that a VLIW can be used to run existing software reasonably
fast (better than traditional RISCs) by emulation, while running new
software very fast.

Note that there is nothing fundamentally wrong with combining VLIW
with superscalarity. If the original design of a VLIW has e.g. 4
functional units, a later implementation can be e.g. two-way
superscalar within each of the 4 "visible" functional units.
Obviously, this goes against the design philosophy of VLIW, but no
more than superscalarity goes against the basic RISC (KISS)
philosophy.

Torben Mogensen (tor...@diku.dk)

Mark L. Neidengard

unread,

Apr 7, 1994, 2:04:27 PM4/7/94

to

In article <1994Apr7.0...@odin.diku.dk>,

Torben AEgidius Mogensen <tor...@diku.dk> wrote:
>mnei...@sloth.ugcs.caltech.edu (Mark L. Neidengard) writes:

>A VLIW would be quite good at emulating other processors. We can use
>some functional units to decode an instruction while other functional
>units perform the action of the previous instruction. In essence we
>simulate the pipeline of the emulated processor by assigning the
>different functional units to different pipeline stages. Quite a lot
>of functional units can be effectively utilized in this fashion.

So, we trade the real-estate of an instruction issuer for self-modifying
code. This is an interesting idea; it could certainly work. I will still
prefer superscalar to VLIW because of its ability to dynamically utilize
on-chip resources, rather than statically at compile time.

>This means that a VLIW can be used to run existing software reasonably
>fast (better than traditional RISCs) by emulation, while running new
>software very fast.

I'm not sure how a VLIW translator is superior to a superscalar RISC chip
with comparable execution units. Both would translate instructions and
shell them out to the functional units on the chip. The VLIW wants to
rewrite part of its instruction stream before that stream is "read" by the
functional units, probably making prefetch-and-decode useless. A
superscalar chip need not pound the instruction caching system; the
intelligent scheme would be to store many pending instructions in partially
decoded form, and treat "foreign" instructions as data. Then, we issue
a native instruction to start taking data from this block and making
partially decoded instructions from it, let this fill the prefetch-and-decode
queue as though the translated instructions had been decoded from native
instructions in memory, and presto. When we're done reading the foreign
instruction block, the translator starts requesting more native instructions,
keeping the prefetch pipeline full. Again, the VLIW approach may be
simpler, but not necessarily the most efficient.

>Note that there is nothing fundamentally wrong with combining VLIW
>with superscalarity. If the original design of a VLIW has e.g. 4
>functional units, a later implementation can be e.g. two-way
>superscalar within each of the 4 "visible" functional units.
>Obviously, this goes against the design philosophy of VLIW, but no
>more than superscalarity goes against the basic RISC (KISS)
>philosophy.

You are completely correct that superscalarity within a VLIW somewhat
defeats the purpose =) However, saying that RISC is about KISS _per_se_ is a
little misleading, IMHO. I would characterize RISC as being about minimizing
instruction execution time by minimizing (as much as necessary) work per
instruction. This says nothing about implementing superscalarity, pipelines,
etc. The ability to execute two streams of RISC instructions simultaneously is
orthogonal to the expression of computation as numerous short steps rather than
few long steps.

Michael Gerlek

unread,

Apr 7, 1994, 3:39:48 PM4/7/94

to

In article <Cnuu5...@austin.ibm.com> mb...@austin.ibm.com (Mauricio Breternitz Jr.) writes:
> I heard that Intel had bought rights to the Multiflow compiler, though.

"The Multiflow compiler technology has been purchsed by Intel, HP,
DEC, Fujitsu, Hughes, HAL Computer, and SGI."

quoted from "The Multiflow Trace Scheduling Compiler"
Lowney et al.
Journal of Supercomputing, 7, 51-142 (1993)

-[mpg]
ger...@cse.ogi.edu

John R. Mashey

unread,

Apr 7, 1994, 5:15:51 PM4/7/94

to

In article <baum-060...@baum-allen.apple.com>, ba...@apple.com (Allen J. Baum) writes:

|> Err..
|> There were 2 commercial VLIW machines: Multiflow and Cydrome
|>
|> Multiflow was headed by Josh Fisher. He now works at HP Labs - not Intel !!
|> Cydrome was headed by Bob Rau. He now works at HP Labs - also.

Actually, the refugees are fairly well sprinkled around.
Multiflow folks: HP, Intel, DEC, SGI (at least).
Cydrome folks: HP, SGI (at least).

-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: ma...@sgi.com
DDD: 415-390-3090 FAX: 415-967-8496
USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

Vijay Gill

unread,

Apr 7, 1994, 5:22:21 PM4/7/94

to

In article <GERLEK.94...@dis.cse.ogi.edu>,
Michael Gerlek <ger...@dis.cse.ogi.edu> wrote:

[ deleted ]

[
multiflow technology bought by - DEC, Fujitsu, Hughes, HAL Computer,
and SGI.
]

> quoted from "The Multiflow Trace Scheduling Compiler"
> Lowney et al.
> Journal of Supercomputing, 7, 51-142 (1993)

I have been trying to locate papers on VLIW, including the Bulldog
compiler, but archie searches on bulldog, VLIW, multiflow and cydrome
all fail to lead to anything. If anyone has any idea where such
papers/thesis might be found, please mail me and I'll I'll summarize
and post the results here.

On a related VLIW note, I asked about HP's VLIW plans on
comp.sys.hp.hardware and I've been told that HP's VLIW is supposed to
be binary compatible with the PA Risc series. Would it not be more
efficient to design a VLIW ISA from scratch? And how does HP plan to
get binary compatibility without some major problems? As far as I can
see, to really use the hardware, programs had to be recompiled for
either the 7 unit TRACE or the 14 unit machine. How could HP
circumvent this? Or is this even possible?

--

--
Vijay Gill |The (paying) customer is always right.
wr...@cs.umbc.edu | - Piercarlo Grandi
vi...@umbc.edu | Eagles may soar, but weasels don't get
These are my opinions only. | sucked into jet engines.

Chuck Narad

unread,

Apr 7, 1994, 8:14:16 PM4/7/94

to

In article <CnwFy...@odin.corp.sgi.com>, ma...@mash.engr.sgi.com writes:
> In article <baum-060...@baum-allen.apple.com>, ba...@apple.com (Allen J. Baum) writes:
>
> |> Err..
> |> There were 2 commercial VLIW machines: Multiflow and Cydrome
> |>
> |> Multiflow was headed by Josh Fisher. He now works at HP Labs - not Intel !!
> |> Cydrome was headed by Bob Rau. He now works at HP Labs - also.
>
> Actually, the refugees are fairly well sprinkled around.
> Multiflow folks: HP, Intel, DEC, SGI (at least).
> Cydrome folks: HP, SGI (at least).
>

Make that 3; there was the Culler-7 from Culler Scientific Systems in
Santa Barbara. A couple of dozen systems were shipped, although only
about a dozen by Culler; the rest were sold by SAXPY (who bought the
technology), and by Glen Culler & Associates, who bought it back
from SAXPY. The Culler-7 predated both Multiflow and Cydrome by a
couple of years.

Refugees from Culler are scattered around everywhere; SGI, Sun, Apple,
Thinking Machines, HaL, Sequent, Informix, to name a few. (A big hello
to ex-Cullerites everywhere; send me mail, I'm getting a list together
for some kind of reunion which, given the nature of our work, may be
held in parallel in several party-units :-)

cheers,
chuck/

----------------------------------------
Chuck Narad -- diver/adventurer/engineer

Henry J. Cobb

unread,

Apr 8, 1994, 4:07:51 AM4/8/94

to

>In article <2neuvi$k...@email.tuwien.ac.at> an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:

>On the other hand, IMO the 80860 is ahead of it's time. I think we
>have to say goodbye to binary compatibility. (V)LIW machines can be
>made cheaper (and/or more powerful) than equivalent superscalars, and
>the more instructions are to be executed simultaneously the larger the
>difference gets AFAIK. But VLIWs are binary incompatible. So I think
>in the long run we have to go to an architecture independent code
>format (something like ANDF), and have to convert the code for the
>specific processor at installation or load time. Binary compatibility
>does not work so well even nowadays. E.g., conider the need to
>"Pentium-optimize" code.

>--

>M. Anton Ertl Some things have to be seen to be believed
>an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen

___
Sorry VLIW is only a win if the code generator has "perfect"
knowledge of the run-time situation.

Superscalar chips can drift parts ahead and behind other parts
(like the PPC's integer and FP units) to take advantage of fetch bandwidth
and cache stalls.

The proof lies in the vast difficulty of getting the full
performance of the i860 in hand crafted assembly vs. the 601s power from
compiled code.
--
Henry J. Cobb hc...@fly2.berkeley.edu, SFB Tyrant-for-life
A "Bjorn again" programmer.

Josh Fisher

unread,

Apr 8, 1994, 5:20:48 PM4/8/94

to

>Reinhard Kirchner said:
>And now it comes: the designer of these machines, who did his doctoral
>thesis on the VLIW compiler, Mr. Fisher, was then hired by INTEL.
>This was after the i860 design, so there may be a VLIW in Intel's future.

>Then Mauricio Breternitz said:
>Just a factual nit. Josh Fisher, then a professor at Yale (you there,
>Josh ?) advised the thesis on VLIW compilation, by J. Ellis
>("Bulldog, A Compiler for VLIW Architectures".)

I'm here (what,live without "You have 23,519 messages in comp.arch. Read now?")

Enough people have commented and been confused; I guess I should set
it straight. I developed trace scheduling for VLIWs in my own thesis
at Courant and wrote a trace scheduler at Yale. My student John Ellis
took over the engineering of that compiler, reimplemented trace
scheduling, did very impressive experiments, and wrote a wonderful
thesis. That this theis won the ACM Best Thesis award shows that life
is sometimes very fair.

I joined HP Labs after Multiflow crashed, and have been here ever since. I
now work at the HP Labs Cambridge Research Office in MA.

If for whatever reason all this history interests you, the section called
"who did what" in Chapter 1 of John's thesis sets it out very clearly
(like everything else in the thesis).

-- Josh

Scott Mahlke

unread,

Apr 10, 1994, 8:10:24 PM4/10/94

to

I do not fancy myself as a VLIW proponent, but it seems VLIW is taking a bit
more criticism than it deserves here. First, VLIW does not require "perfect"
knowledge of the run-time behavior of the architecture. For example, Cydra 5 had
a memory latency register used to specify a variable load latency, thus the
compiler did not need to know the exact load latency of the machine. Also,
dynamic scheduling is not synonymous with superscalar, I refer you to the paper
by Rau in Micro-26, Dynamically Scheduled VLIW Processors.

Also, I do not think any proof exists in the VLIW vs superscalar argument by
comparing the i860 and the 601. The i860 had alot of difficulties achieving
its maximum performance that had little to do with it being VLIW. Using this
argument, you could pick an unsuccessful implementation of any processor or feature
and condemn any similiar implementations.

Scott
--
Scott Mahlke Center for Reliable and High-Performance Computing
mah...@crhc.uiuc.edu University of Illinois at Urbana-Champaign

Rich Lethin

unread,

Apr 15, 1994, 3:40:38 PM4/15/94

to

In article <2oeee2$1...@news.arc.nasa.gov>,
Hugh LaMaster <lama...@pioneer.arc.nasa.gov> wrote:
>This is the standard reference for the Bulldog compiler.
>I don't know if it is in print, but it is from the MIT press.

A good follow-up on the Bulldog compiler is the paper about
the commercialized version of this compiler. See:

@ARTICLE{LOWNEY93,
AUTHOR = {P. G. Lowney and S. G. Freudenberger and
T. J. Karzes and
W. D. Lichtenstein and
R. P. Nix and
J. S. O'Donnell and
J. C. Ruttenberg},
TITLE = {The Multiflow Trace Scheduling Compiler},
JOURNAL = {The Journal of Supercomputing},
YEAR = {1993},
VOLUME = {7},
NUMBER = {1-2},
PAGES = {51-142},
MONTH = May,
NOTE = {Extensive discussion of the architecture, algorithms,
tricks, and performance of the Multiflow compiler}
}

Hugh LaMaster

unread,

Apr 12, 1994, 3:23:46 PM4/12/94

to

In article <2o1fgd...@topdog.cs.umbc.edu>, wr...@cs.umbc.edu (Vijay Gill) writes:

|> I have been trying to locate papers on VLIW, including the Bulldog
|> compiler, but archie searches on bulldog, VLIW, multiflow and cydrome
|> all fail to lead to anything.

This is the standard reference for the Bulldog compiler.

I don't know if it is in print, but it is from the MIT press.

Ellis, John R. (John Rolfe), Bulldog: a compiler for VLIW archi-
tectures, p. 320, MIT Press, Cambridge Mass., 1986. (LC:
QA76.76.C65 E45 1986 DD: 005.4/53 dc19 ISBN: 026205034X
LCCN: 85-24092 r924) "Originally presented as the author's
thesis (doctoral)--Yale, 1985. ACM doctoral dissertation
award--1985.

(Original bib format:)

%A John R. (John Rolfe) Ellis
%T Bulldog: a compiler for VLIW architectures
%C Cambridge Mass.
%I MIT Press
%D 1986
%P 320
%K Bulldog (Computer file). Compilers (Computer programs).
Computer architecture. Very Long Instruction Word (VLIW) computers.
%O (LC: QA76.76.C65 E45 1986 DD: 005.4/53 dc19 ISBN: 026205034X
LCCN: 85-24092 r924) "Originally presented as the author's thesis
(doctoral)--Yale, 1985. ACM doctoral dissertation award--1985.

--
Hugh LaMaster, M/S 233-9, UUCP: ames!lamaster
NASA Ames Research Center Internet: lama...@ames.arc.nasa.gov
Moffett Field, CA 94035-1000 Or: lama...@george.arc.nasa.gov
Phone: 415/604-1056 #include <std_disclaimer.h>

Anton Ertl

unread,

Apr 13, 1994, 2:22:41 PM4/13/94

to

In article <HCOBB.94A...@fly2.berkeley.edu>, hc...@fly2.berkeley.edu (Henry J. Cobb) writes:
|> Sorry VLIW is only a win if the code generator has "perfect"
|> knowledge of the run-time situation.
|>
|> Superscalar chips can drift parts ahead and behind other parts
|> (like the PPC's integer and FP units) to take advantage of fetch bandwidth
|> and cache stalls.

Well, actually I was talking about a superscalar without
reordering. You are talking about a superscalar with dynamic
scheduling. The difference between VLIW and a statically scheduled
superscalar is just the grouping hardware. For dynamic reordering you
have to add lots of hardware support. And I'm not sure that the only
cost is transistors and development time. You may have to increase the
cycle time and/or add stages to the pipeline. There is an exception:
Integer and floating point units are essentially independent, so
moving FP instructions over integer instructions is not that difficult
(However, this might introduce imprecise exceptions. Does the PPC have
precise exceptions?).

Re: your point about the incomplete knowledge of the code generator:
The main problems are that it does not know where a branch will go and
it does not know how long a load will take. The hardware does not know
these things either, until they happen. At that point in time the
knowledge is not as helpful as it would have been earlier (especially
at compile time as we will see below).

Now these things can be predicted. The best dynamic (hardware) branch
prediction mechanisms achieve 3% miss rate, and profile-based branch
prediction (which is a form that can be used by the compiler) can
achieve similar accuracy (my collegue Andi Krall
(an...@mips.complang.tuwien.ac.at) has a paper on that in the next
PLDI). There's also work on predicting cache misses (by Abraham and
Sugamar (of umich) and maybe others; Sorry, I have no reference
handy). The prediction mechanisms help both dynamic and scheduling
strategies, but they reduce the impact of not knowing the result.

The compiler has a big advantage over dynamic scheduling hardware: It
can look forward a long way to see what value will be needed when and
schedule the code that will produce that value at that time. The
hardware is quite limited in this respect.

I don't think you will get the benefits you expect from knowing the
dynamic behaviour. E.g. if you get a cache miss not predicted during
compilation, that increases load latency by, say, 10 cycles, your
dynamic scheduler will soon (after a few cycles) have collected a lot
of instructions dependent on that load, and there will hardly be any
ready instructions available, so many of your functional units become
idle. Or even worse, the dependent instructions might fill up the
instruction window (or the reservation stations, depending on model),
blocking execution completely. After the load completes, execution
will take a few cycles to rev up again (first only the instructions
directly dependent on the load are executed, then the instructions
depending on them etc.). So, in the end, the cache miss results in an
effective execution time penalty even with dynamic scheduling,
although it will be a bit lower than on a VLIW.

So, does dynamic scheduling pay off? I don't think so for code
compiled for that implementation. But it sure is nice if you have a
binary for another implementation (i.e. running a 486 binary on a
Cyrix M1). But since I don't believe in the long-term future of binary
compatibility, I don't believe in dynamic scheduling either.

- anton

Robert Everitt Heiss

unread,

Apr 24, 1994, 10:21:00 PM4/24/94

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>I don't think you will get the benefits you expect from knowing the
>dynamic behaviour. E.g. if you get a cache miss not predicted during
>compilation, that increases load latency by, say, 10 cycles, your
>dynamic scheduler will soon (after a few cycles) have collected a lot
>of instructions dependent on that load, and there will hardly be any
>ready instructions available, so many of your functional units become
>idle. Or even worse, the dependent instructions might fill up the
>instruction window (or the reservation stations, depending on model),
>blocking execution completely.

Even independent instructions could stall instruction issue if they
must complete in order after the load. Another finite buffer :-(

Will optimizing compilers account for all the chaos in each new
dynamic superscalar chip? Humans might be significantly better at
implementation-specific software tuning.

Desnogues

unread,

Apr 25, 1994, 8:22:11 AM4/25/94

to

In article <2ogv7i$m...@email.tuwien.ac.at>, an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
|> So, does dynamic scheduling pay off? I don't think so for code
|> compiled for that implementation. But it sure is nice if you have a
|> binary for another implementation (i.e. running a 486 binary on a
|> Cyrix M1). But since I don't believe in the long-term future of binary
|> compatibility, I don't believe in dynamic scheduling either.

A good example of this is the (almost-) dynamic scheduling SuperSPARC:
they state in the User's book that by running code not compiled for it,
you can get 1.1 instr/cycle, and 1.4 instr/cycle with SuperSPARC code.
Anyway the SuperSPARC runs all SPARC code; for sure such old SPARC
code will disappear, but this approach helps waiting for all code to
be recompiled.

Laurent Desnogues

John R. Mashey

unread,

Apr 25, 1994, 5:52:36 PM4/25/94

to

In article <1994Apr24.2...@zeus.aix.calpoly.edu>, rhe...@harp.aix.calpoly.edu (Robert Everitt Heiss) writes:
|
|> Will optimizing compilers account for all the chaos in each new
|> dynamic superscalar chip? Humans might be significantly better at
|> implementation-specific software tuning.

Dynamic-scheduled chips are likely to be less sensitive to
compiler issues than static ones.

--

Stefan Monnier

unread,

Apr 25, 1994, 3:02:29 PM4/25/94

to

In article <1994Apr24.2...@zeus.aix.calpoly.edu>,

Robert Everitt Heiss <rhe...@harp.aix.calpoly.edu> wrote:
> Will optimizing compilers account for all the chaos in each new
> dynamic superscalar chip? Humans might be significantly better at
> implementation-specific software tuning.

For a two line function, maybe. For a 100K program... good luck !
The problem is one of graph theory, optimisation, dynamic programming,
etc...
In clear: big matrices, big computations. The problem is way too
complex to be solved by a human. Of course, it's also too complex to
be really solved optimally by a computer, but at least the computer
won't be disgusted by the repetitiveness and won't get mad at the
number of things he has to remember and take into account !!

Stefan
--

-----------------------------------------------------
-- On the average, people seem to be acting normal --
-----------------------------------------------------

Anton Ertl

unread,

Apr 26, 1994, 8:37:38 AM4/26/94

to

In article <1994Apr24.2...@zeus.aix.calpoly.edu>, rhe...@harp.aix.calpoly.edu (Robert Everitt Heiss) writes:

|> Will optimizing compilers account for all the chaos in each new
|> dynamic superscalar chip?

I think that the compiler should treat the chip like a static
superscalar and will achieve good performance.

|> Humans might be significantly better at
|> implementation-specific software tuning.

But humans are expensive and error-prone.

Terje Mathisen

unread,

Apr 26, 1994, 7:01:52 PM4/26/94

to

In <2pgm25$7...@disun41.epfl.ch>, mon...@di.epfl.ch (Stefan Monnier) writes:
>In article <1994Apr24.2...@zeus.aix.calpoly.edu>,
>Robert Everitt Heiss <rhe...@harp.aix.calpoly.edu> wrote:
>> Will optimizing compilers account for all the chaos in each new
>> dynamic superscalar chip? Humans might be significantly better at
>> implementation-specific software tuning.
>
>For a two line function, maybe. For a 100K program... good luck !
>The problem is one of graph theory, optimisation, dynamic programming,
>etc...
>In clear: big matrices, big computations. The problem is way too
>complex to be solved by a human. Of course, it's also too complex to
>be really solved optimally by a computer, but at least the computer
>won't be disgusted by the repetitiveness and won't get mad at the
>number of things he has to remember and take into account !!
>

I've seen this argument in favour of compilers over humans a number of
times, which does not make it any truer. I have yet to see any compiler
for any architecture which wasn't consistently beaten by anywhere from 25
to 300% by competent asm programmers.

If you have a counter-example though, I'd love to see it!

-Terje

Stefan Monnier

unread,

Apr 26, 1994, 8:12:43 PM4/26/94

to

In article <2pjof0$1m...@vkhdib01.hda.hydro.com>,

Terje Mathisen <ter...@hda.hydro.com> wrote:
> I've seen this argument in favour of compilers over humans a number of
> times, which does not make it any truer. I have yet to see any compiler
> for any architecture which wasn't consistently beaten by anywhere from 25
> to 300% by competent asm programmers.
>
> If you have a counter-example though, I'd love to see it!

All right, then !
Go ahead, and use assembly !

Stefan

Andrew Geweke

unread,

Apr 26, 1994, 10:19:23 PM4/26/94

to

In article <2pjsjr$9...@disuns2.epfl.ch>, mon...@di.epfl.ch (Stefan Monnier)
writes:

Yeah, I agree. If Mr. Mathisen is willing to hand-tune assembly for me for as
much as it would cost me to buy a good optimising compiler to do the same
thing, I'll go for it. I simply can't afford to go mucking about in RISC
assembly to achieve a speedup -- which isn't anywhere near 300% on nearly all
seriously good compilers.

Terje Mathisen

unread,

Apr 27, 1994, 6:58:11 AM4/27/94

to

No, I am not going to hand-tune general code for you, or even myself. What I do
is to make a _very_ careful analysis of where my algorithm is spending most of it's
time, then rearrange my algorithm/data/lookup tables to fit what I observe when
profiling the code.

Finally, I look into modifying the basic algorithm to fit any special quirks in the cpu.

Using these sorts of techniques, I have written a wc (Word Count) clone where the
engine will count lines, words and characters (user-specified word separator set),
at a rate of 1.5 cycles/byte on a Pentium (2.5 cycles/byte on a 486). This is 50 to
100% faster (in cycles/byte) than what I've been able to do on any other cpu.

It is also 1 to 2 orders of magnitude faster than the standard Unix wc.

The important thing here though is that all the speedups, except for the last
doubling are due to better algorithms and data structures. I don't think a
compiler will ever use BL and BH as both independent 8-bit registers and
combine them as a 16-bit index. The index is also used both in little- and
big-endian mode, as I alternate between BL and BH for the current state.

Preston Briggs

unread,

Apr 27, 1994, 5:15:19 PM4/27/94

to

ter...@hda.hydro.com (Terje Mathisen) writes:

>No, I am not going to hand-tune general code for you, or even myself.
>What I do is to make a _very_ careful analysis of where my algorithm
>is spending most of it's time, then rearrange my
>algorithm/data/lookup tables to fit what I observe when profiling the
>code. Finally, I look into modifying the basic algorithm to fit any
>special quirks in the cpu.
>
>Using these sorts of techniques, I have written a wc (Word Count) clone

I hate to see that. You started so well, but you might as well have
ended by saying:

"Using these sorts of techniques, I have wasted a lot of time."

I suppose, as an educational experience, such an exercise might be
worthwhile; otherwise, ...

In the old days, people used assembly because there were no compilers.
After compilers became somewhat available, people still used assembly
when they were required to for performance reasons. As optimizers
improved and, more significantly, machine performance improved,
there was less performance-driven need for assembly code.

Today, there is still a place for assembly-level effort, but the costs
are higher than ever. Humans cost more and more in relation to
hardware, hardware changes faster and faster, and customers demand
more and more features (that is, the specifications of the program
change faster and faster).

Similar arguments apply to using Fortran and C versus things like
Smalltalk, Lisp, ML, or whatever. In the old days, people used
Fortran because they believed they couldn't afford to use Lisp.
Later, people used C because they believed they couldn't afford to use
Smalltalk. (Of course, some people have used Lisp all along, claiming
that they couldn't afford the time to use Fortran or C or C++ or
whatever.)

It seems reasonable to expect that, as machine continue to improve,
the number of applications requiring _any_ assembly coding will
approach zero. Similarly, I would expect that the number of
applications that require Fortran or C (versus your heart's desire)
will also approach zero.

Now, we'll all see a bunch of posts from people claiming that their
particular special application is so special and particular that it
requires assembly or Fortran or whatever. However, I'm not going to
believe them. I believe people use the language they _want_ to use.

Preston Briggs

randy hyde

unread,

Apr 27, 1994, 8:17:00 PM4/27/94

to

>>>>>>
Today, there is still a place for assembly-level effort, but the costs
are higher than ever. Humans cost more and more in relation to
hardware, hardware changes faster and faster, and customers demand
more and more features (that is, the specifications of the program
change faster and faster).
<<<<<<

Ah, the workstation mentality.
Yes, there probably is no need to write accounting programs in assembly
language now that we have fast RISC machines; however, there are
many cases where assembly is still the *primary* language of choice --
low-cost embedded systems.

Yes, there are embedded systems using 68K, 32K, 80x86, and RISC chips,
but they are *not* low-cost systems (like PICs, 8031/51, 6811, etc.).
Although many will argue that memory costs are not relevant anymore,
this simply is not true. Yes, a 27256 costs less than a 2764 and
probably consumes less power. However, real low-cost embedded systems
use on-chip ROMs which are limited to about 8K or so, these days.
So writing code which is small is very important.
Likewise, processors which run at lower clock frequencies cost less
and consume less power (allowing you to use smaller batteries).
Hence assembly is important because it's fast.

Yes, C compilers do exist for the 8031, 6811, and like chips.
They produce crappy code. Almost everyone I know who writes
serious applications (complex, not trivial things that could
be done in 8031 BASIC) has wound up writing their code in
assembly.

Finally, your post ignores a very simple fact:
Some algorithms are *easier* to implement in assembly language
than in any other language. True, "C" has replaced assembly
in many such cases, but there are still many times I drop
into assembly because it is the more natural language to
use for the given algorithm.

Donald Lindsay

unread,

Apr 27, 1994, 8:49:05 PM4/27/94

to

In article <2pl2e3$1o...@vkhdib01.hda.hydro.com>,

Terje Mathisen <ter...@hda.hydro.com> wrote:
>>> > I've seen this argument in favour of compilers over humans a number
>>> > of times, which does not make it any truer. I have yet to see
>>> > any compiler for any architecture which wasn't consistently beaten
>>> > by anywhere from 25 to 300% by competent asm programmers.
>>> >
>>> > If you have a counter-example though, I'd love to see it!

>No, I am not going to hand-tune general code for you, or even myself.

If you did, you would find that you aren't beating good compilers by
300%. In fact, you wouldn't beat them by 10% on typical large
routines, and on really big stuff, you would wind up budgeting your
time, to the point where on the average, you did much worse.

I have seen a head-to-head test that was medium size - some thousands
of lines of assembler, not a piddly 5 or 50 lines. Our compiler
generated *better* code than the human had.

So: you've "seen this argument in favour of compilers over humans a
number of times", because it's true, when the task to be done hasn't
been carefully chosen to favor the human.

--
Don D.C.Lindsay Carnegie Mellon Computer Science

John D. McCalpin

unread,

Apr 27, 1994, 9:56:20 PM4/27/94

to

In article <2pm6j7$h...@larry.rice.edu>,

Preston Briggs <pre...@noel.cs.rice.edu> wrote:
>
>Today, there is still a place for assembly-level effort, but the costs
>are higher than ever. Humans cost more and more in relation to
>hardware, hardware changes faster and faster, and customers demand
>more and more features (that is, the specifications of the program
>change faster and faster).

Preston usually says such reasonable things, but today I think that he
is slipping a bit....

Hardware does not only include the little workstations that all of us
are so fond of playing with, it also includes really big expensive
machines that are still quite a bit more expensive than people. For
example, the number I heard attached to the Cray C916 installation at
Vicksburg, Mississippi, was $42 million. If careful code optimization
can delay the need for replacement of such a machine for even one year,
the cost savings would be millions of dollars (counting amortization).
That would certainly cover *my* salary....

>It seems reasonable to expect that, as machine continue to improve,
>the number of applications requiring _any_ assembly coding will
>approach zero. Similarly, I would expect that the number of
>applications that require Fortran or C (versus your heart's desire)
>will also approach zero.

You are obviously not currently working on problems that require several
orders of magnitude more computing power than are currently available!
It is simple economics to recognize that there will always be an advantage
to being able to use a hardware resource fully. Those who are able to do
so will have a competitive advantage in certain niches of research or
product design.

I have seen no evidence in the last decade that all of these fancy CS
languages are close to providing competitive levels of performance for
scientific and engineering application codes. :-)

I am not ignoring the advantages obtained by improvements speed and
accuracy of coding. Superiority in those areas provide certain types
of economic advantage as well, but they are different advantages.

>Now, we'll all see a bunch of posts from people claiming that their
>particular special application is so special and particular that it
>requires assembly or Fortran or whatever.

I would certainly not argue for assembly language programming, but I
think that there is a great deal of room for more attention to be paid
to performance optimization. This often involves reading assembly
language compiler output and figuring out how to get the Fortran or C
compiler to make the right decisions. Such knowledge of the machine
characteristics is crucial for all types of high performance computers,
especially parallel machines and machines with very high vector/scalar
speed ratios (like the NEC SX-3).

It is all too easy for researchers and other users of free computer
time to claim that hardware is cheap. In companies that pay attention
to the real cost and effectiveness of humans and computers, I hope that
there is a willingness to balance modest human resources applied to
optimization against the very real costs of continual computer
replacement. It is, in principle, not a difficult trade-off to
calculate (although the details of the accounting might be awkward).

>However, I'm not going to
>believe them. I believe people use the language they _want_ to use.

There is a name for that logical fallacy, but I can't remember it right
now.... (Since "want" is not an externally observable item, it is too
easy to define it in terms of observed behavior. Of course, at that point
the whole concept of "want" has no predictive value.)

--
--
John D. McCalpin mcca...@perelandra.cms.udel.edu
Assistant Professor mcca...@brahms.udel.edu
College of Marine Studies, U. Del. John.M...@mvs.udel.edu

David Rifkind

unread,

Apr 27, 1994, 6:43:18 PM4/27/94

to

The compilers haven't won yet. Terje is a very good assembly language
programmer--'competent' doesn't begin to cover it--and has proved it
against highly-touted compilers more than once. If you restrict
yourself to thinking like a compiler, you won't see 300% improvements,
or even 25% consistently. But, GIVEN THE RIGHT PROBLEM, you can use the
processor in ways the compiler isn't allowed to think about, or ways you
can't describe to the compiler.

I wrote a Gaussian convolution routine for the AMD 29K that ran over ten
times as fast as my best effort in C. It used the entire local register
file as an explicit cache, loading data in bands then scrolling the
kernel window through the band by moving the register stack pointer.
How could you describe that to a compiler? The idea of the local
register file as an array and the RSP as an index into it is quite
outside the compiler's model for code generation.

Anyway, anecdotes aside:

If you get to choose the problem, the compiler will probably win.
If I get to choose the problem, I can probably win.

Nobody is arguing that programs should be written entirely in
assembly language, or mostly in assembly, or even that more than a
few programs need ANY assembly language. (Well, I'm not, and I hope
nobody else is.)
--
"We consider that we are entitled to at least 13 pages of gross and
stupid errors. After that we shall have to explain." -- Charles Fort

randy hyde

unread,

Apr 28, 1994, 12:29:35 AM4/28/94

to

>>>>>
I have seen a head-to-head test that was medium size - some thousands
of lines of assembler, not a piddly 5 or 50 lines. Our compiler
generated *better* code than the human had.
<<<<<

In general, I have found just the opposite of your findings --
the bigger the program the more opportunity I had to optimize it.
Without insulting you, I would point out that your example is just
that -- a single example. Who was this "the human?" What
credentials? How good was s/he? I have a class full of assembly
language students which almost any compiler can beat. Telling me
that a compiler beat one human isn't saying very much.
Keep in mind, compilers only output code which their authors
instructed them too. Therefore, a compiler is only as good
as the person who wrote that compiler. Surely you are not going
to tell me that the best assembly language programmer in the
world wrote your compiler?

Yes, compilers, on the average, generate better assembly code
than your average assembly language programmer. However, a
good assembler language program can easily achieve a 100%
improvement over the best compilers by restructing code.
And you, obviously, know very little if anything about
Terje's capabilities. I do. He is one of the best.
I seriously doubt any 80x86 compiler could come close to
beating him.

Finally, don't forget a *big* advantage assembly language
programmers have over C (or other HLL) compilers -- the
"-S" option. I can have the compiler do its best job, learn
from what it did, and then tweak the code to make it better.
Having done this with GCC -O6 output, I've been able to achieve
3x performance improvements in some compression routines
(several hundred lines of highly hand optimized C code) I've
written.

Preston Briggs

unread,

Apr 28, 1994, 12:24:35 AM4/28/94

to

mcca...@perelandra.cms.udel.edu (John D. McCalpin) writes:

>>Today, there is still a place for assembly-level effort, but the costs
>>are higher than ever. Humans cost more and more in relation to
>>hardware, hardware changes faster and faster, and customers demand
>>more and more features (that is, the specifications of the program
>>change faster and faster).
>
>Preston usually says such reasonable things, but today I think that he
>is slipping a bit....

Maybe so. But look again at what I wrote (you kindly provided some of
it above). It ought to be hedged enough to please anyone. In fact, I
have trouble believing that anyone is disagreeing with me. It seems
very "Mom and apple pie."

>Hardware does not only include the little workstations that all of us
>are so fond of playing with, it also includes really big expensive
>machines that are still quite a bit more expensive than people.

Sure, and I wrote "there is still a place for assembly-level effort"
and I meant it. The point of the paragraph is that the size of that
place is shrinking every day. Maybe I should qualify that and say
"the relative size". (How much money is spent on supers and their
programs versus micros and their programs?)

>You are obviously not currently working on problems that require several
>orders of magnitude more computing power than are currently available!

That's not true. I work, like all optimizer people, on NP-complete
problems. However, I recognize that optimal solution of the problems will
potentially consume any machine, so I explore heuristic solutions.

>It is simple economics to recognize that there will always be an advantage
>to being able to use a hardware resource fully. Those who are able to do
>so will have a competitive advantage in certain niches of research or
>product design.

Yes. I only argue that the size of that niche has shunk over the
years and will continue to shrink.

>I have seen no evidence in the last decade that all of these fancy CS
>languages are close to providing competitive levels of performance for
>scientific and engineering application codes. :-)

It took quite a while for scientists and engineers to began using
compilers on their early machines. And it took a while for them to
use vectorizing compilers on their supers. And I think it'll take a
while before many of them will use HPF on their distributed-memory
machines. But these things are all happening, even though there's a
cost in performance, in programming awkwardness in certain areas, etc.
And I believe engineers and scientists are going to use "fancy CS
languages" too. How can you argue that the evolution of tools will
stop, dead in its tracks, today? (Why not last week? Why not
tomorrow? Why ever?)

>>Now, we'll all see a bunch of posts from people claiming that their
>>particular special application is so special and particular that it
>>requires assembly or Fortran or whatever.
>
>I would certainly not argue for assembly language programming, but I
>think that there is a great deal of room for more attention to be paid
>to performance optimization.

Sounds like an argument for assembly language programming to me.
And I don't mind it; I keep saying there's a place for it. I also
think there's a place for tuning up Fortran programs. Everybody makes
their own decisions, based on their own priorities.

>(like the NEC SX-3).

Speaking of "special and particular!" So why don't you use assembly
on this beast? It's certainly expensive enough and your problems are
big enough.

[room reserved for detailed arguments in reply]

Doesn't it seem like all those arguments can be applied the use of
whatever-comes-next versus fortran?

>>I believe people use the language they _want_ to use.
>
>There is a name for that logical fallacy

It figures that I would walk into an old trap.
Unfortunately, I was a bit rushed on the original post and didn't get
to finish it quite like I had intended. My intended point was that
most people chose some language and code away. When asked later about
their decision, they simply invent a justification. I think, in most
cases, "performance" is a bogus justification. Much more reasonable
are things like:

I know Fortran (or Lisp, or whatever)

I've got a C compiler (or Prolog interpreter) on my machine

My boss told me to use Cobol

Everybody else at CMU uses ML

Hyde argued that assembly is still required in many embedded
applications. McCalpin argues that Fortran is still required for many
scientific programs. I don't deny it; but I still think that we're
going to see more scientific problems attacked and solved in languages
besides Fortran and we're going to see more use of high-level
alternatives to assembly in embedded applications (and the same thing
will happen in every other area).

Preston Briggs

John D. McCalpin

unread,

Apr 28, 1994, 1:40:04 AM4/28/94

to

In article <2p mn24$o...@news.udel.edu>

mcca...@perelandra.cms.udel.edu (John D. McCalpin) writes:

jdm> I would certainly not argue for assembly language programming, but I
jdm> think that there is a great deal of room for more attention to be paid
jdm> to performance optimization.

In article <2pmvo3$j...@larry.rice.edu>,
Preston Briggs <pre...@noel.cs.rice.edu> replies:

pb> Sounds like an argument for assembly language programming to me.
pb> And I don't mind it; I keep saying there's a place for it. I also
pb> think there's a place for tuning up Fortran programs. Everybody makes
pb> their own decisions, based on their own priorities.

Two comments:

(1) Performance optimization <> Assembly language programming.

The reason that assembly language is almost never needed any more is
that modern compilers are pretty good. *BUT* they are not always
"pretty good" if you just give them the code that expresses what you
want to do in an obvious manner. Sometimes it takes a great deal of
"tweaking" to get the compiler to generate really good code, and this
tweaking generally requires looking at the generated machine code to
see what is going on in detail. Sometimes your set of "tweaks" is
useful on more than one machine, quite often it is not.

An interesting example of this is from LAPACK. Despite the algorithmic
superiority of the blocked BLAS-3 based algorithms, many of the
routines run *slower* than the equivalent BLAS-1 based LINPACK
routines, when each is compiled from the Fortran sources. This is
because the coding style required to get good performance is different
for different vendors compilers, and the LAPACK sample BLAS (which is
not intended to be tuned for any particular architecture/implementation/
compiler) is not a particularly good match for any of the systems that
I have looked at.

(2) While it is true that decisions are made, it is not generally true that
they are based on priorities (at least not visible ones). See my
seriously non-architectural comments below....

pb> My intended point was that
pb> most people chose some language and code away. When asked later about
pb> their decision, they simply invent a justification. I think, in most
pb> cases, "performance" is a bogus justification. Much more reasonable
pb> are things like:
pb>
pb> I know Fortran (or Lisp, or whatever)
pb> I've got a C compiler (or Prolog interpreter) on my machine
pb> My boss told me to use Cobol
pb> Everybody else at CMU uses ML

This understanding of the actual motivations is much more realistic than
assuming that people base their actions on "priorities". This "post-
justifying" mode of operation is so standard for us that we do not usually
even realize that most of what we do is done *before* the thinking that
ostensibly led to the actions.

The point here (and I recognize that Skinnerian behaviorism is fairly
far afield from computer architecture) is that it is often a mistake to
assume that people are making these decisions (i.e. choice of programming
languages and levels of effort expended on optimization) based on rational
priorities (like cost-benefit analyses).

My argument (which is not *generally* opposed to Preston's comments),
is that in the high performance computing enterprise, it quite uncommon
to see a cost-benefit analysis (of human optimization vs buying more
hardware) that is both objective and done before the decisions are made.

pb> McCalpin argues that Fortran is still required for many
pb> scientific programs. I don't deny it; but I still think that we're
pb> going to see more scientific problems attacked and solved in languages
pb> besides Fortran [...]

I don't think that I mentioned Fortran at all! What I said is that
there is still an important part of the market for which hardware
resources (like cpu time) cannot be judged to be inexpensive compared
to the cost of human resource without performing some sort of
quantitative analysis.

I wonder how much of the huge peecee market for 486 (and faster)
machines is based on an objective assessment of the net productivity
gains obtained through the use of the latest and most bloated software?
How much of the market is just "keeping up with the Jones's?"
Considering that these machines are typically replaced every 2-3 years,
one would need to demonstrate improved productivity of at least $1k-$2k
per year per machine....

Alan Watson

unread,

Apr 28, 1994, 1:04:22 AM4/28/94

to

In article <2pn01f$2...@galaxy.ucr.edu>

rh...@mirage.NoSubdomain.NoDomain (randy hyde) wrote:
>Yes, compilers, on the average, generate better assembly code
>than your average assembly language programmer. However, a
>good assembler language program can easily achieve a 100%
>improvement over the best compilers by restructing code.
>And you, obviously, know very little if anything about
>Terje's capabilities. I do. He is one of the best.
>I seriously doubt any 80x86 compiler could come close to
>beating him.

I think there are several interesting questions here, some of which are
of a chicken and egg nature, but nevertheless:

On a CISC machine, you probably know more about the architecture and
the implementation than a good compiler; on a simple RISC probably you
do not. How important is that knowledge in obtaining those
improvements? How much of the improvements are relatively independent
of the architecture and implementation? Do you think a skilled
assembly programmer would be able to achieve such impressive margins
over a good compiler on a simple RISC machine?

What kind of things can you do in assembly on simple RISCs that you
cannot do in, say, C? (To partially answer my own question, sometimes
one can do n x n -> 2n and sometimes one can get at the carry flag, but
what else?)

How much better would a good assembly programmer do against a good
compiler on a superscaler RISC machine compared to a simple RISC
machine?

--
Alan Watson | And finally there is the fourth category,
al...@oldp.astro.wisc.edu | the rarest, the category of people who
Department of Astronomy | live in the imaginary eyes of those who
University of Wisconsin -- Madison | are not present. They are the dreamers.
| -- Milan Kundera

Paul Dietz

unread,

Apr 28, 1994, 1:32:11 AM4/28/94

to

In article <2pmh7s$f...@galaxy.ucr.edu>,
randy hyde <rh...@mirage.NoSubdomain.NoDomain> wrote:

> Ah, the workstation mentality.
> Yes, there probably is no need to write accounting programs in assembly
> language now that we have fast RISC machines; however, there are
> many cases where assembly is still the *primary* language of choice --
> low-cost embedded systems.

Really? I would have thought the positive effect of more efficient
programs would be more than offset by the negative effect of longer
product development time.

Paul

randy hyde

unread,

Apr 28, 1994, 3:42:53 AM4/28/94

to

>>>>>
Really? I would have thought the positive effect of more efficient
programs would be more than offset by the negative effect of longer
product development time.
<<<<<

(above reference to using assembly for embedded systems.)

If you are only build a small number of systems, you are absolutely
right. Spend a lot of money on hardware to make the software as
easy to write as possible. However, if you are building Barbie
dolls that say "I love Ken" whenever you squeeze them, an extra
penny matters *a lot*. Software development in such a product is
a fixed cost which is easily amortized over several million dolls.
However, that extra penny increases the cost of each doll.

Finally, have you ever used a C compiler on the 6811 or 8031?
It really isn't a whole lot better than using straight assembly.
There is one professor here at UCR who forces his students to
create 6811 code in C because the professor does not want to learn
6811 assembly. The student spent three weeks dealing with library
bugs before completely giving up. I have some students in my
microcomputer systems design course who are designing code for
a robot with a 6811 microcontroller. They spent less than three
weeks *learning* 6811 *and* writing their code. Assembly isn't
as bad as everyone makes it out to be. Further, given some of the
C code I've seen around here, it's not a whole lot less maintainable
either.

Of course, why would anyone use C on a workstation when Icon and SETL
are available? :-)

randy hyde

unread,

Apr 28, 1994, 3:36:11 AM4/28/94

to

>>>>
How much better would a good assembly programmer do against a good
compiler on a superscaler RISC machine compared to a simple RISC
machine?
<<<<

I can say nothing about simple RISC vs. superscalar RISC, but
on Pentiums vs. 80386 humans tend to do much better than C
compilers (expert humans, that is). We'll see how this
stands up with the next crop of Pentium compilers.

As for RISCs, in general, compilers tend to fill the branch
slots and data hazard slots (in pipelined machines) with
quite a few more NOPs than humans. I would imagine this
is even worse with superscalar (no personal experience there,
sorry).

However, this argument for HLLs will never win.
The only reasonable argument is that compilers can be
rewritten for new architectures and (generally) only
a recompile is necessary to take advantage of that
new architecture.

I really doubt any Pentium compiler can beat my hand
written Pentium code. But 80386 compilers routinely
beat the 8088 code I wrote ten years ago. People who
need the O(1) performance boost possible with assembly
are willing to live with this. They want the fastest
possible code *on the cheapest hardware*. It was
amazing how long the Apple II continued to compete
with 8088 PCs and 68K Macs mainly due to the fact
that most serious Apple II apps were written in
assembly while PC and Mac apps were written in Pascal
or C. This easily added five years to the life of
that machine.

Ralph Schmidt

unread,

Apr 28, 1994, 5:47:16 AM4/28/94

to

mcca...@perelandra.cms.udel.edu (John D. McCalpin) writes:

>I would certainly not argue for assembly language programming, but I
>think that there is a great deal of room for more attention to be paid
>to performance optimization. This often involves reading assembly
>language compiler output and figuring out how to get the Fortran or C
>compiler to make the right decisions. Such knowledge of the machine
>characteristics is crucial for all types of high performance computers,
>especially parallel machines and machines with very high vector/scalar
>speed ratios (like the NEC SX-3).

How true...but it's sad that this philosophy is lost in the main
software houses. Why do computers get more expensive and you get
less performance every new generation than you should expect ?
This nonsense that memory is so cheap...let's waste it...is simply
a big lie. Why do we need far more than 16MB to run the current
OS generation without a big pain ?
Can somebody assume how much this policy increases the system
prices today ?
It's so poor for the sw-engineers that they can release such
products without feeling ashamed.

Sorry for wasting some emotions:-B

Regards
--
Ralph Schmidt la...@uni-paderborn.de
University of Paderborn (Germany)

Stefan Monnier

unread,

Apr 28, 1994, 10:25:58 AM4/28/94

to

In article <2pmh7s$f...@galaxy.ucr.edu>,
randy hyde <rh...@mirage.NoSubdomain.NoDomain> wrote:

> Yes, there are embedded systems using 68K, 32K, 80x86, and RISC chips,
> but they are *not* low-cost systems (like PICs, 8031/51, 6811, etc.).
> Although many will argue that memory costs are not relevant anymore,
> this simply is not true. Yes, a 27256 costs less than a 2764 and
> probably consumes less power. However, real low-cost embedded systems
> use on-chip ROMs which are limited to about 8K or so, these days.
> So writing code which is small is very important.
> Likewise, processors which run at lower clock frequencies cost less
> and consume less power (allowing you to use smaller batteries).
> Hence assembly is important because it's fast.

Right, bit memory and chips are getting faster and faster, even the
real low cost ones. 8K of on-chip ROM isn't so bad. With 8K you can
start using higher-level languages (I said higher, not high, cause I
don't think C is high-level :-), and 16K will be here sooner than you
think (for the exact same price and power consumption and pin-count),
...
You say there is no good C compiler: all right, then the problem is
not in the language, but in the availablility of the language.
But what I don't understand is why you don't use Forth, which seems
like the perfect embedded-systems language: programs are impressively
small and it's easy to translate into asembly the little subroutines
(all right, I know some Forth'ers don't like the subroutine term,
but...) that needs speed, ...

> Finally, your post ignores a very simple fact:
> Some algorithms are *easier* to implement in assembly language
> than in any other language. True, "C" has replaced assembly
> in many such cases, but there are still many times I drop
> into assembly because it is the more natural language to
> use for the given algorithm.

I can see the advantage of assembly for implementing some parts of
systems software, like interrupt handlers, context switchers, ... But
these are not algorithms. I just can't imagine an algorithm that's
easier to implement in assembly than in C or Forth or BCPL (I could
see some that are hader in SML, but even those are rare). So, what are
those "assembly-bound" algorithms ?

Stefan

Terje Mathisen

unread,

Apr 28, 1994, 12:03:18 PM4/28/94

to

OK, consider an array of 16-bit data, where when things
changed, I needed to modify both the data-block itself and the
neighbouring blocks.

Because I knew the memory layout (little-endian), and the fact that
only the nearest byte of the blocks before and after was actually
going to change, I could use a single inline asm statement to
update all three blocks:

add dword ptr [di-1], eax

The only penalty being that [di-1] was misaligned, so it took a couple
more cycles than it would have if 32-bit aligned.

The best example of where asm helps, is when you can make intelligent
use of the condition code registers. Consider the TCP/IP checksum: It
is a 16-bit add with carry wraparound, which makes it byte-sex
independent. This thing is used on every packet sent or received
by the ip protocol stack, so it really helps to make it efficient.

On a Pentium I came up with this inner loop algorithm:

dword_loop:
mov edx,[esi]
lea esi,[esi+4]
adc eax,edx
dec ecx
jnz dword_loop:

This handles 32 instead of 16 bits at a time, using just 3 cycles pr. 4 bytes
without any other unrolling. I still believe it to be faster than any
algorithm written in a compiled language could be.

On a 64-bit chip (Alpha) you could just add 32-bit values together without
worrying about carry, since it would flow into the upper half. That would
need support for 64-bit ints in the compiler though.

-Terje

Freek Wiedijk

unread,

Apr 28, 1994, 11:31:54 AM4/28/94

to

pre...@noel.cs.rice.edu (Preston Briggs) writes:
>It seems reasonable to expect that, as machine continue to improve,
>the number of applications requiring _any_ assembly coding will
>approach zero.

I don't know. For instance: is Apple's QuickTime extension
hand optimized in assembly? Should it be? Or is it better
left written in lisp?

Freek
--
Third theory of Phenomenal Dynamics: The difference between
a symbol and an object is quantitative, not qualitative.

Torben AEgidius Mogensen

unread,

Apr 28, 1994, 10:55:08 AM4/28/94

to

rh...@mirage.NoSubdomain.NoDomain (randy hyde) writes:

>a compiler is only as good
>as the person who wrote that compiler. Surely you are not going
>to tell me that the best assembly language programmer in the
>world wrote your compiler?

The idea that "the program is only as good as the programmer" has
several problems. It is true that the programmer can do the same steps
that he/she instructed the computer to do and get the same result, and
in some cases use knowledge that wasn't incorporated into the program
to obtain a better result. But this assumes that the programmer has
the time to do this, and that he/she doesn't make any errors while
doing so. For very small problems this may be true, but for large
problems it just isn't the case. The human coder realizes this and
spends his/her efforts where he/she perceives it will do the most
good, using the usual 10%/90% rule (applied recursively). But for ver
large problems even the 10% gets so large that the compiler will do
better, simply because it will be more thorough.

This becomes even more true when the complexity of the target machine
increases: latency, multiple issue etc. While it is true that finding
optimal codings is NP-complete in most cases (and hence not solvable
by the compiler), even heuristics require a large amount of search. So
while a human coder may have better "heuristics" than a compiler, the
amount of search he/she is able to do is a lot smaller.

As to what gives the best result: an advanced heuristic with little
search or a simple heuristic with much search, is not quite clear, and
depends on the problem. Chess players have shown that the best human
heuristics (which have quite limited search) outdo simple programmed
heuristics with massive search. But chess programs even on modest
computers will beat the majority of merely competent chess players,
even the programmers who made the programs. For other problems (like
travelling salesman) simple computerized heuristics win over even the
best humans, given similar time frames.

Torben Mogensen (tor...@diku.dk)

Henry G. Baker

unread,

Apr 28, 1994, 4:31:54 AM4/28/94

to

In article <2pmn24$o...@news.udel.edu> mcca...@perelandra.cms.udel.edu (John D. McCalpin) writes:
>In article <2pm6j7$h...@larry.rice.edu>,
>Preston Briggs <pre...@noel.cs.rice.edu> wrote:
>>
>>Today, there is still a place for assembly-level effort, but the costs
>>are higher than ever. Humans cost more and more in relation to
>>hardware, hardware changes faster and faster, and customers demand
>>more and more features (that is, the specifications of the program
>>change faster and faster).

The easiest way to make sure that assembly code is no better than
compiled code is to tie the machine code so closely to the compiled
language that there is no hope of improvement. E.g., the Burroughs
machines. This is unlikely, however, if the machine must be the
target of several different languages -- e.g., Fortran, C, Ada, Cobol.

One way to guarantee assembly programmers lifetime employment is to
offer little 'extras' that compilers have a devil of a time taking
advantage of. E.g., extra temporary registers in the floating point
pipeline to hold 'constants' (80860), pipelines that don't empty
themselves, so you can't take a branch without emptying the pipe
yourself (80860); special modes that require extraordinary complexity
in the compiler (80860 dual instruction mode). A real source of
extra income for the consulting hacker are unadvertised pipeline
stalls, that happen only on some (undocumented) operand values;
you have to special-case these, even when they work, else suffer
substantial performance penalties.

Other places where compilers lose is the different memory systems
across a range of compatible instruction sets. It would be nearly
impossible for a compiler to optimize for all of these different
variations, but an assembly language programmer with something to
prove can probably gain substantially if he takes advantage of a
particular machine model.

Henry G. Baker

unread,

Apr 28, 1994, 4:37:38 AM4/28/94

to

In article <2pn01f$2...@galaxy.ucr.edu> rh...@mirage.NoSubdomain.NoDomain (randy hyde) writes:
>>>>>>
>I have seen a head-to-head test that was medium size - some thousands
>of lines of assembler, not a piddly 5 or 50 lines. Our compiler
>generated *better* code than the human had.
><<<<<
>
>In general, I have found just the opposite of your findings --
>the bigger the program the more opportunity I had to optimize it.

A potentially large source of improvement over many compilers is their
requirement to stay with the standard calling conventions. In my
experience, the calling conventions are often chosen to allow
multiple language compatibility, and are therefore quite heavyweight.
If you can dump the fixed callee/caller ratio, the fixed argument
to register mappings, and the standard stack frame stuff, you can
often gain 2X to 3X.

awd

unread,

Apr 28, 1994, 1:41:21 PM4/28/94

to

In article <2po2vm$b...@disuns2.epfl.ch>
mon...@di.epfl.ch (Stefan Monnier) writes:

> I can see the advantage of assembly for implementing some parts of
> systems software, like interrupt handlers, context switchers, ... But
> these are not algorithms. I just can't imagine an algorithm that's
> easier to implement in assembly than in C or Forth or BCPL (I could
> see some that are hader in SML, but even those are rare). So, what are
> those "assembly-bound" algorithms ?
>
>
> Stefan

For example, we have a lossless codec that implements a state machine
in assembly. It is very difficult to get a compiler to do this because
it always will keep a state variable around. This code uses the program
counter as its state variable (i.e. if I'm executing this code then I'm
in state x). OTOH, we have recently used an asm translator to move the
code to RISC. We, frankly, did not care about optimizing for speed in
this first pass. We knew that we had a good algorithm that would be
"acceptable" on the RISC machine. We were not so concerned about
exploiting RISC semantics, though that will come...

Andrew

Terje Mathisen

unread,

Apr 28, 1994, 8:06:01 AM4/28/94

to

In <2pm6j7$h...@larry.rice.edu>, pre...@noel.cs.rice.edu (Preston Briggs) writes:
>ter...@hda.hydro.com (Terje Mathisen) writes:
>
>>No, I am not going to hand-tune general code for you, or even myself.
>>What I do is to make a _very_ careful analysis of where my algorithm
>>is spending most of it's time, then rearrange my
>>algorithm/data/lookup tables to fit what I observe when profiling the
>>code. Finally, I look into modifying the basic algorithm to fit any
>>special quirks in the cpu.
>>
>>Using these sorts of techniques, I have written a wc (Word Count) clone
>
>I hate to see that. You started so well, but you might as well have
>ended by saying:
>
> "Using these sorts of techniques, I have wasted a lot of time."
>
>I suppose, as an educational experience, such an exercise might be
>worthwhile; otherwise, ...

Sorry, I should have given the full background for this particular asm
hack: A challenge and a coding competition.

When I write _anything_, I try to start by stepping way back from the
problem at hand, and analyse it properly.

- How many people/machines will use the resulting program?

- How often will they run it?

- How much time will each user spend waiting for it?

- Is this a one-shot program, like a filter, that will be loaded, run, and
discarded, or will it stay loaded for a long time, using RAM and CPU
resources, like a device driver.

If I after this process determine that I can solve the task with a Perl
script, I'm very happy. For a GUI app, I'll use PowerBuilder or Visual
Basic if appropriate.

If I need more performance and less RAM impact, I might decide upon
C(++) or Pascal.

If it is really speed-critical, but not totally memory-limited, I'll try to
modify the problem to where it fits the intended cpu architecture,
write it in C(++), and either use a profiler or just insert event
counters all over the place to measure how often each part gets
executed.

Finally, when I cannot get any more speed out of the compiler, and I
determine that my 10,000 users still would be helped if I could improve
it, I'll hand-tune the _really_ critical inner loops in asm.

I am _not_ an asm/<particular language> bigot, except that I've never met
a Cobol program that I liked. :-)

I also agree that the number of applications that really need to be written
totally in asm are rapidly approaching zero. This is because compilers
are improving, modern cpus are easier to work with for the compiler, and
the average quality of programmers are probably going down, while the
applications to be written are becoming more and more complicated.

In the beginning of the computer era, only the real enthusiasts (hackers)
got involved, because it was the most interesting thing in the world for
them. Currently, the field have become wide open, with a lot of CS
students that take the courses because they think the jobs will be well-
paying. Really great code needs a lot more dedication than that!

-Terje

Chris Gray

unread,

Apr 28, 1994, 3:04:47 PM4/28/94

to

In article <2pm6j7$h...@larry.rice.edu> pre...@noel.cs.rice.edu (Preston Briggs) writes:

Such excellent flamebait, how can I resist?

Similar arguments apply to using Fortran and C versus things like
Smalltalk, Lisp, ML, or whatever. In the old days, people used
Fortran because they believed they couldn't afford to use Lisp.
Later, people used C because they believed they couldn't afford to use
Smalltalk. (Of course, some people have used Lisp all along, claiming
that they couldn't afford the time to use Fortran or C or C++ or
whatever.)

...

However, I'm not going to
believe them. I believe people use the language they _want_ to use.

Personally, I'm definitely not a fan of C, and I think C++ is a big
mistake (emphasis on the big!). However, I would prefer to use C for
most kinds of work over Lisp or Smalltalk (or Forth or ...). Give me
a language that I can READ! Higher levels of abstraction are good, but
only if the resulting program can be read by someone not familiar with
it and the zillions of abstractions it uses. It is NOT just syntactic
sugar - it is readability, which has been shown to be very important
to future modifications and program maintenance.

I don't need to say much about assembler - lots of other folk have
commented on it. An assembler programmer who treats a given problem
the same as a C programmer will probably not produce better code than
a good compiler, but if you have an experienced assembler programmer,
who knows the machine well, and who steps back and thinks about how
the problem can best be done on the machine, then significant
performance improvements are possible. This becomes less and less
as the machines become simpler and simpler, of course.
--
Chris Gray c...@myrias.ab.ca [C macros - a powerful way to make bugs!]
Myrias Computer Technologies Inc. (403) 463-1337

Preston Briggs

unread,

Apr 28, 1994, 3:07:34 PM4/28/94

to

rh...@mirage.NoSubdomain.NoDomain (randy hyde) writes:
>Keep in mind, compilers only output code which their authors
>instructed them too.

No! Don't believe him. Compilers only output code discovered using
the methods implemented by the compiler writer. In some cases, those
methods will be more exhaustive than the methods used by humans (e.g.,
register allocation via coloring). In other cases, they will be less
thorough (for example, Baker pointed out that most compilers are
limited by the machines calling conventions).

>Therefore, a compiler is only as good
>as the person who wrote that compiler.

Well, I think that better compiler writers produce better compilers;
however, I don't agree that the code produced by a compiler has much
relationship to the assembly-coding abilities of the compiler writer.

>Surely you are not going
>to tell me that the best assembly language programmer in the
>world wrote your compiler?

Nope. However, I do try to use compilers written by some of the best
compiler writers in the world.

Preston Briggs

Mark Lacey

unread,

Apr 28, 1994, 3:04:55 PM4/28/94

to Stefan Monnier

>>>>> "Stefan" == Stefan Monnier <mon...@di.epfl.ch> writes:

Stefan> I can see the advantage of assembly for implementing some
Stefan> parts of systems software, like interrupt handlers,
Stefan> context switchers, ... But these are not algorithms. I
^^^^^
Stefan> just can't imagine an algorithm that's easier to implement
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Stefan> in assembly than in C or Forth or BCPL (I could see some
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Stefan> that are hader in SML, but even those are rare). So, what
Stefan> are those "assembly-bound" algorithms ?

Well, try implementing FFTs or filters on our (Motorola's) DSPs
efficiently in anything but the DSP assembly language. Why can it be
done so well in assembly language? Because of addressing modes and an
architecture that are built for doing those types of things very
efficiently, and very easily (there are bit-reversed and modulo
addressing modes in the architecture).

On the overall subject of "humans vs. compilers" --- of course there
is always going to be some need and some room for assembly language
programming. What is quickly diminishing is the need (due to
processors getting faster and faster) and possibility (due to programs
getting larger and larger and the amount of time it takes to write an
assembly language routine taking typically longer than writing a C [or
whatever] routine) for writing in assembly language. With superscalar
architectures and good compilers for those architectures it is getting
harder and harder to beat the compiler without spending a significant
amount of time on the problem.

As for the person who said ``the best assembly language programmer can
beat the best compiler...'' --- that is true, but when they followed
with ``the best assembly language programmers aren't writing the
compilers'' --- that might also be true, but that sort of misses the
point for a couple reasons --- even if they were, they still wouldn't
be able to beat themselves (and probably wouldn't be able to beat most
compiler writers who had a grasp of the architecture). Knowing how to
write a good compiler is more than just knowing how to program the
chip well in assembly language.

Stefan> Stefan

Mark M. Lacey
la...@dsp.sps.mot.com

Anton Rang

unread,

Apr 28, 1994, 1:36:49 PM4/28/94

to

In article <2pnbbt$4...@galaxy.ucr.edu> rh...@mirage.NoSubdomain.NoDomain (randy hyde) writes:
>(above reference to using assembly for embedded systems.)
>

>Finally, have you ever used a C compiler on the 6811 or 8031?
>It really isn't a whole lot better than using straight assembly.

> [...]

Another argument against using C for these sorts of embedded systems
is that embedded processors very often have special instructions which
are useful to the assembly programmer, but difficult or impossible for
a compiler to generate. (For instance, the table lookup instructions
on the 68K series, or the timer control instructions on Motorola's
embedded chips with TPUs.)
--
Anton Rang (ra...@winternet.mpls.mn.us)

Preston Briggs

unread,

Apr 28, 1994, 4:58:40 PM4/28/94

to

c...@myrias.ab.ca (Chris Gray) writes:

>I would prefer to use C for
>most kinds of work over Lisp or Smalltalk (or Forth or ...). Give me
>a language that I can READ!

>Chris Gray c...@myrias.ab.ca [C macros - a powerful way to make bugs!]

I love your comment when considered together with your signature.

In any case, I think your essential your point is good. The challenge
to the higher-level language designers is to come up with a language
and toolset that makes their ideas so attractive (fun, easy-to-learn,
easy-to-use, safe, readbale, writable, whatever) that users are
willing to give up a little performance.

But if you want a language you can read, and you only know how to read
C, then I'm afraid you're stuck.

Preston Briggs

Donald Lindsay

unread,

Apr 28, 1994, 6:10:56 PM4/28/94

to

In article <2pnbbt$4...@galaxy.ucr.edu>,

randy hyde <rh...@mirage.NoSubdomain.NoDomain> wrote:
>The student spent three weeks dealing with library
>bugs before completely giving up.

We *were* talking about "good" compilers, which do exist in plenty
elsewhere.

>I have some students in my
>microcomputer systems design course who are designing code for
>a robot with a 6811 microcontroller. They spent less than three
>weeks *learning* 6811 *and* writing their code. Assembly isn't
>as bad as everyone makes it out to be.

I've written and maintained almost 100,000 lines of assembler -
mostly for 8-bit micros - and I assert that assembler is usually a
waste of time. Even for your barbie-doll example, there are superior
alternatives. I've written several thousand lines in an assembler-
alternative, PL/360, and it was a joy compared.

> Further, given some of the
>C code I've seen around here, it's not a whole lot less maintainable
>either.

Then fix your students: they're broken.

randy hyde

unread,

Apr 28, 1994, 7:15:56 PM4/28/94

to

>>>>>
You say there is no good C compiler: all right, then the problem is
not in the language, but in the availablility of the language.
But what I don't understand is why you don't use Forth, which seems
like the perfect embedded-systems language: programs are impressively
small and it's easy to translate into asembly the little subroutines
(all right, I know some Forth'ers don't like the subroutine term,
but...) that needs speed, ...
<<<<<<

Please explain the advantages of FORTH over asm! :-)
Quite frankly, they both have the same set of disadvantages.
Indeed, I find FORTH considerably more difficult to read
than assembly. FORTH is often more compact at the expense
of speed -- the classic tradeoff.

As for better C (or whatever) compilers, the day a C compiler
is available for any machine which always does a better job
than me, will be the day I stop using assembly altogether.
Alas, people continue to talk about how great "so-and-so's"
C compiler is and use this as an excuse to continue using
Borland C++! Hey, if I've got a high performance C compiler
on a RISC workstation and utmost speed isn't the overriding
concern, I will use C (heck, I even use Borland C++ much of
the time). OTOH, Great compilers on RISC workstations are
not a good reason to use weak C compilers on microcontrollers.

People seem to lose sight of one very important point-
The whole point of a language is to make software development
easier. If your C compiler has so many bugs that you spend
most of your time fighting the bugs (in the compiler) rather
than writing C code, you're *much* better off in assembly language
(assuming, of course, that the assembler isn't buggy :-)).

Further, another thing that UNIX/C++ programmers forget about
embedded systems is that you do not really have the "C Standard
Library," the "UNIX C Library," or "XLib" available. In fact,
you're quite lucky if you've got decent string routines and
floating point. Try writing C without making any standard library
calls sometime. You will have a hard time convincing me that
this is considerably easier than writing the code directly in
assembly language (on a microcontroller).

>>>>>
I can see the advantage of assembly for implementing some parts of
systems software, like interrupt handlers, context switchers, ... But
these are not algorithms. I just can't imagine an algorithm that's
easier to implement in assembly than in C or Forth or BCPL (I could
see some that are hader in SML, but even those are rare). So, what are
those "assembly-bound" algorithms ?
<<<<<

Just what is an algorithm, by your definition?
E.g., there are many context switching algorithms...

However, not to nit-pick; I've written lots of keyboard scanning
algorithms in my time. I've written them in C, Pascal, 80x86 assembly,
and in 8031 assembly. The assembly versions were only a few more lines
longer than the C and Pascal versions (generally on the order of
100 well commented lines vs. 80 well commented lines).
I did get the assembly versions working *faster* than the C versions
despite the fact that I wrote the C version *after* writing the
assembly versions. If the code was not for hire, I would post it just
to show you.
This is but one example of such algorithms I've found to be easier to
code in assembly. However, I could babble on forever and still not
convince anyone who is not already convinced, so I won't even try.
However, one big class of algorithms which is often easier in assembly
than in C are those involving table lookups.

randy hyde

unread,

Apr 28, 1994, 7:26:29 PM4/28/94

to

>>>>>
For other problems (like
travelling salesman) simple computerized heuristics win over even the
best humans, given similar time frames.
<<<<<

Now I don't believe *anyone* would be willing to argue that
a human could produce better code given the same amount of time
that a compiler takes.

As for your argument about the 90/10 rule, I have only one comment.
It seems from your discussion that you take the attitude of
"hey, this program's so big I'm not even going to try to speed it up."
This is a real shame; relying on advances in hardware to improve
performance isn't working out as well as everyone thought it would.

I don't think there is anyone here who seriously proposes that we rewrite
all programs in excess of 100,000 lines of code in assembly language.
However, the truth is, *most* programs are not large. The average program
today is (well) under 10,000 lines of code. Even those 1,000,000 line
giant systems we hear about in Software Engineering texts generally consists
of 100s of separate programs. Programs under 10,000 lines of C code
are *easily* written, tested, debugged, documented, etc., in assembly
language. Only other issues, such as portability or desired maintenence
across various family members (e.g., 80x86) is a reasonable *performance*
argument. There are, of course, many arguments for not using assembly
which have nothing to do with performance; like "gee, we don't have
anyone working for us who is a good assembly language programmer."
However, such project members should admit this and quit claiming
that their C code is going to be just as fast as well written
assembly.

randy hyde

unread,

Apr 28, 1994, 7:35:02 PM4/28/94

to

>>>>>
But in the end, they died and the investment in those applications
died with them (that is, their value went to zero). Now consider a
serious application written in a portable language, say emacs, TeX,
unix, Smalltalk, compilers, database systems, ... Machines come and
go, but they continue to run. People are able to maintain and improve
them without fear of losing their investment of time. Users are will
to devote time to learning them, since there's hope they'll be around
tomorrow. They can be sold cheaper, since developers can expect a
reasonable lifetime in which to recoup their investment.
<<<<<

All machines will die someday, so will all applications (yuck! you're
still using TeX! :-] )

Quite frankly, your list amazes me: emacs -- 1 1960's based editor;
Tex -- a 1970's attempt at word processing;
unix -- a 1960's attempt at an operating system;
etc.

It's too bad this software is still with us. Otherwise we might
have something truly great today.

Of course, for every application you list which is still with us
today, I can list ten (written in HLLs) which are dead and gone.
The choice of language has little to do with this -- the market
decides such things. The main reason software like emacs, TeX,
and so on, are still with us is because they are FREE. If people
actually had to pay for these things, they would have died decades
ago.

As for people being able (and willing) to maintain software,
boy, I wish this were the case. I purchased a lot of software
in my life which the developers just gave up on. It was the
market, not their choice of programming languages, which
killed those products. Indeed, writing in a lower level language
has often been a marketing advantage for many firms because their
software was faster and more efficient than the competition's.

randy hyde

unread,

Apr 28, 1994, 7:43:17 PM4/28/94

to

>>>>>
No! Don't believe him. Compilers only output code discovered using
the methods implemented by the compiler writer. In some cases, those
methods will be more exhaustive than the methods used by humans (e.g.,
register allocation via coloring). In other cases, they will be less
thorough (for example, Baker pointed out that most compilers are
limited by the machines calling conventions).
<<<<<

Have you ever looked at the output of some of these compilers?
If a compiler writer doesn't know the idiosyncrasies of the
machine architecture, how can they possibly write a good code
generator? Let me give a stupid example, there are still compilers
around today that will take "i = i + 1;" and compile it as
mov ax, i
add ax, 1
mov i, ax
Yes, most compilers don't do this, but I can find stupid sequences
like this in output from almost any compiler.
I question your comment about register coloring. On both RISC
and CISC architectures I am generally able to do most of my work
within the existing register set without resorting to memory
variables. Compilers, doing the same task, generally do not fare
as well. Yes, if I had to "hand compile" the C code and had to
use some fixed register coloring algorithm it would take me
forever. However, as pointed out earlier, I use better heuristics
than most compilers. Rarely do I have to make more than two passes
over the code to get everything into registers.

I would be a fool and a liar if I claimed that a compiler has never
beat me on some given sequence of C instructions. However, I found
that even though the compiler beats me in a few sections of the code,
overall I still win by a factor of 2x or 3x over the best written
C code (and I do compare well written C code against well written
assembly).

<<<<<

randy hyde

unread,

Apr 28, 1994, 7:54:15 PM4/28/94

to

>>>>>
We *were* talking about "good" compilers, which do exist in plenty
elsewhere.
<<<<<

As with all religous discussions, this is ultimately what the
HLL vs assembly discussion always comes down to.

>>> FLAME ON <<<<

C'mon folks, it's time to grow up.
Someone asks for an example where asm is a better choice, I gave one.
Simply waving one's hands and saying "gee, if it hurts don't do that."
completely begs the question.
Just because there are great C compilers for an R4400 is not a good
reason to use C on a 6811. IF YOU CAN FIND ME A GOOD C COMPILER FOR
the 68HC11, I will use it (and I will have my students use it).
We've searched, we haven't found any.
We need to write 6811 code, our robot has a 6811 chip in it, we don't
have a choice. And the last time I checked, GCC wasn't generating
6811 code.

The argument is not "Is assembly language appropriate for everything."
The argument is "Is assembly language good for anything at all."

I believe I've given several examples which show that assembly is still
good for something.

<<<<
I've written and maintained almost 100,000 lines of assembler -
mostly for 8-bit micros - and I assert that assembler is usually a
waste of time. Even for your barbie-doll example, there are superior
alternatives. I've written several thousand lines in an assembler-
alternative, PL/360, and it was a joy compared.
<<<<

Okay, I'll bite. What are the superior alternatives for a Barbie doll.
I make a part of my living coding embedded systems. If you can come up
with a better way to do this, please let me know -- I really do want to
get rich off this.

While I have never used PL/360, I have used PL/M. While marginally
lower level than C (i.e., I had access to the condition codes) it really
is no better than C, ultimately. Finally, PL/M has the same problem
as C -- no high quality compilers available for the 6811 (they do exist
for the 8031, but the quality is questionable).

randy hyde

unread,

Apr 28, 1994, 7:57:46 PM4/28/94

to

<<<<<
I've written and maintained almost 100,000 lines of assembler -

mostly for 8-bit micros.
<<<<<

This reminds me of a discussion on CompuServe around 1985.
Several programmers were bragging about how many lines of code they'd
written in their lifetimes and who'd made the elusive "Million Line Club."
Back then I'd figured I'd written about 500,000 lines of code (mostly
6502 and 80x86) in assembly and, perhaps, another 100,000 lines in HLLs.
I wonder if I've reached the million line point yet?
I've probably written another 200,000 thousand lines of assembly since
then, perhaps 100,000 lines of HLL code; but I haven't kept close
count.

This is one advantage of writing in assembly :-)
You reach the million line mark a lot faster!

John R. Mashey

unread,

Apr 28, 1994, 7:33:08 PM4/28/94

to

Rational people try to optimize whatever costs the most.
On general-prupose computers, that are likely to get upgrades, and have
long-lived genreations of softwre, most people write most code in
COBOL, FORTRAN, C, C++, etc, or 4GLs, or the hgiehst level they can get
away with. Soemtimes they write assembler code because they are actually doing
things that are eitehr difficult to do in a higher levle languuage, or
executed so often that it's well worth it.

Sometimes the cost function has an infinity: IF this program fits in X bytes,
on some 8-bit micro (in on-chip ROM, perhaps), then this product costing
$10 to build has a market of millions. IF this program requires X+1 bytes
of memory, then it needs an off-chip PROM, and it uses too much power,
or it costs too much, or it doesn't fit in a credit card-size thing any more,
and it's not worth building ... hence the cost of that 1 byte might as well
be infinite.
This is not an argument that assembler is preferred, merely that proejct
domains exist where it is still plausible ... and likely to be so for a while.

-john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: ma...@sgi.com
DDD: 415-390-3090 FAX: 415-967-8496
USPS: Silicon Graphics 6L-005, 2011 N. Shoreline Blvd, Mountain View, CA 94039-7311

Mike McCarty

unread,

Apr 28, 1994, 8:42:20 PM4/28/94

to

In article <1994Apr28....@cs.rochester.edu>,

Well, it's like this.

When you add hardware to the product, you MULTIPLY cost by number of
units shipped.

When you add software to the product, you DIVIDE cost by number of
units shipped.

For example, adding a $3.50 extra ROM to hold a larger image generated
by a compiler (instead of tighter assembly) to a board and shipping
10000 of them costs $35000.00

OTOH, spending $5000.00 extra to write in assembler (for extra
programming time) and shipping 10000 costs $0.50 per board.

So in this (hypothetical) example, the more software / less hardware
approach is 7 times as cost effective.

See?

Mike

char *p="char *p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}

Mike McCarty

unread,

Apr 28, 1994, 8:45:21 PM4/28/94

to

In article <RANG.94Ap...@icicle.winternet.mpls.mn.us>,

Although I agree that assembly is often the language of choice for small
embedded systems, I do not believe that your argument can really be said
to support that concept.

What it _does_ support is the position that different languages are
useful for different parts of any given project.

All other things being equal, to handle the kinds of things you
mentioned (even in an embedded project) I would write a few access
routines in assembler, the rest in C (or similar) language.

Jim Nitchals

unread,

Apr 28, 1994, 7:50:05 PM4/28/94

to

fr...@phil.ruu.nl (Freek Wiedijk) writes:

>I don't know. For instance: is Apple's QuickTime extension
>hand optimized in assembly? Should it be? Or is it better
>left written in lisp?

Portions of the 68K version of QuickTime are written in highly
hand-optimized assembly; other portions are written in 68K C.

The PowerPC plug-in is written in C, but it was written with
great attention being paid to the code the compiler generated.
The 90/10 rule works well here, because a lot of the 90% of
the code that isn't executed often is actually emulated 68K
instructions. The 10% of the code that bottlenecks the system
is native PowerPC code, but didn't require any assembly language.

QuickTime may not be a 100% fair case to use, because it's highly
bandwidth limited by RAM (both video RAM and main memory) and
by disk I/O.

The incremental performance improvements of assembly seem to
be dwindling with RISC for many purposes. Pixel-pushing, for
example, tends to thrash caches and can be slow writing to video
RAM if there's bus contention with the video's needs. In
the one case where you thought every cycle was essential,
it now turns out that writing in C is as good as you can get.
The PowerPC can execute a lot of internal instructions while
letting its write cache empty.

Most of the stupidity I saw in old compilers had to do with
using memory instead of registers, or unnecessarily moving
registers around. With the PowerPC's architecture of <src1> op
<src2> -> <dest>, where dest can be different than either source,
the compilers don't need to be smart about managing register
resources. And with 32 registers, I see a lot less code
that actually has to put temporary variables out in RAM.

There are deficiencies in the C language for expressing rotates
and mask inserts, although a good compiler can figure out
the intent of convoluted code written to accomplish those
tasks.

Times are definitely changing. Two years ago I swore never
to write code in C. Now the RS6000 C compiler is generating
code nearly as good as I can do by hand most of the time.
Maybe I'm just getting old :)

- Jim Nitchals
QuickTime engineering team, Apple Computer Inc.

--
---------------------------------------------------------------------------
"A lot of people say there's too much personal freedom. When
personal freedom's being abused, you have to move to limit it."
- President Clinton as quoted on MTV, April 19, 1994

Mike McCarty

unread,

Apr 28, 1994, 8:57:48 PM4/28/94

to

Forth? I prefer assembler to Forth. I agree that it can be impressively
small (even smaller than assembler in some cases), and is reasonably
fast (1/3 to 1/2 as fast as assembler in some cases), the LANGUAGE
ITSELF has flaws which cause me not to want to use it.

C also has flaws, along with all other programming languages. But Forth
I just can't abide.

For one thing, RPN (there, I said it!).

For another, STACK ORIENTED (not the way I think).

Very weak error handling.

Very very weak information hiding.

Pushes a machine-oriented view of the problem onto the user. I prefer
object-oriented code. While this is not _precluded_ by Forth, it
certainly discourages it.

I know that there are Forth advocates out there who will jump in with
their religiously fervent arguments. I will not respond to them.

It is not implementation that I object to. ANY language can be
implemented by threading. It is SYNTAX. BLEAH!

If you disagree, well, everyone has the right to be wrong.

To paraphrase Dijkstra: "Forth Considered Harmful".

Preston Briggs

unread,

Apr 28, 1994, 2:46:56 PM4/28/94

to

rh...@mirage.NoSubdomain.NoDomain (randy hyde) writes:
>amazing how long the Apple II continued to compete
>with 8088 PCs and 68K Macs mainly due to the fact
>that most serious Apple II apps were written in
>assembly while PC and Mac apps were written in Pascal
>or C. This easily added five years to the life of
>that machine.

But in the end, they died and the investment in those applications
died with them (that is, their value went to zero). Now consider a
serious application written in a portable language, say emacs, TeX,
unix, Smalltalk, compilers, database systems, ... Machines come and
go, but they continue to run. People are able to maintain and improve
them without fear of losing their investment of time. Users are will
to devote time to learning them, since there's hope they'll be around
tomorrow. They can be sold cheaper, since developers can expect a
reasonable lifetime in which to recoup their investment.

Preston Briggs

Paul Dietz

unread,

Apr 28, 1994, 10:15:21 PM4/28/94

to

In article <2pp73c$c...@sun001.dsccc.com>,
Mike McCarty <jmcc...@spd.dsccc.com> wrote:

>>Really? I would have thought the positive effect of more efficient
>>programs would be more than offset by the negative effect of longer
>>product development time.

>Well, it's like this.
...

>When you add software to the product, you DIVIDE cost by number of
>units shipped.

...
>See?

No shit. I wasn't talking about the direct cost of paying the
programmers. I was talking about the fact you lose market share
because your product comes out N months after the competitors', and
miss the early period when prices and margins are high. Or because
you cycle down the learning curve more slowly than they do.

Paul

Mike McCarty

unread,

Apr 29, 1994, 12:34:07 AM4/29/94

to

In article <1994Apr28.2...@cs.rochester.edu>,

There has been a lot of conversation/hot air generated on this thread.
Your reply seems to add to the latter category.

I wasn't trying to talk down to you (although your response leads me to
believe that you may have thought so). You also seem to think that I
misunderstood your point. I believe I did not.

I guess that the bottom line to all this for me is that each language
has its strong points and its weak points. Compiled languages are often
not the appropriate vehicle for developing _small_ programs.

Trying to develop an embedded application which _must_ fit in 8K
because of cost can result in it taking _much longer_ to develop with a
compiled language than with assembler. I really _hate_ playing RAMcram
with a compiled language.

You seemed to be implying that development time would be _less_ with a
compiled language than with assembler.

So I think that "No shit." is a little bit strong to use. I was _not_
stating the obvious. I think that you may have missed a point here.

Alan Watson

unread,

Apr 29, 1994, 1:32:46 AM4/29/94

to

In article <2ppklv$g...@sun001.dsccc.com>

jmcc...@spd.dsccc.com (Mike McCarty) wrote:
>Trying to develop an embedded application which _must_ fit in 8K
>because of cost can result in it taking _much longer_ to develop with a
>compiled language than with assembler. I really _hate_ playing RAMcram
>with a compiled language.

As a data point, see the quote in my signature.

--
Alan Watson | B can be thought of as C without types;
al...@oldp.astro.wisc.edu | more accurately, it is BCPL squeezed into
Department of Astronomy | 8K bytes of memory and filtered through
University of Wisconsin -- Madison | Thompson's brain. -- Dennis Ritchie

Chris Vick

unread,

Apr 28, 1994, 8:47:17 PM4/28/94

to

While I have no doubts about the skills of the individuals in this
discussion, I find it hard to accept an argument which says that
people should not use compilers for high level languages because
the world's finest assembly language programmers could write
the application better in assembly. In a variety of embedded products
I have seen that the assembly code which is actually written by
average/normal programmers is little if any better than the code
produced by quality C compilers. Frequently the code is worse
because the compiler writers often have more and better information
about the quirks of the hardware (pipeline stalls resulting from
certain instruction orderings and the like) which are overlooked
by the applications programmers. Thus, while I will continue to write
some code in assembly (particularly critical special purpose codes on complex
architectures), I question the notion that this practice is of general utility to the
AVERAGE PROGRAMMER.

Chris
vi...@tools.micro.ti.com

Mark Lacey

unread,

Apr 29, 1994, 2:26:14 AM4/29/94

to randy hyde

>>>>> "randy" == randy hyde <rh...@mirage.NoSubdomain.NoDomain> writes:

randy> I don't think there is anyone here who seriously proposes
randy> that we rewrite all programs in excess of 100,000 lines of
randy> code in assembly language. However, the truth is, *most*
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
randy> programs are not large. The average program today is
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
randy> (well) under 10,000 lines of code. Even those 1,000,000
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Exactly what programs are you talking about? Student projects or
something? Sorry, but MOST commercial programs are well over 10,000
lines of code. Even well written C programs that are developed with
efficiency in mind are often well over 10,000 lines of code (of course
how one measures lines of code is always a tricky subject).

Mark M. Lacey
la...@dsp.sps.mot.com

Anton Ertl

unread,

Apr 29, 1994, 11:41:08 AM4/29/94

to

In article <2pp21c$b...@galaxy.ucr.edu>, rh...@mirage.NoSubdomain.NoDomain (randy hyde) writes:
|> Please explain the advantages of FORTH over asm! :-)

[in the context of embedded systems]

Interactivity. Portability.

Concerning interactivity, Forth development systems allow you direct,
interactive control of your embedded controller.

Concerning portability, currently this means that you can port your
programs from one controller to another, but have to stick with one
Forth vendor ("If you have seen one Forth, you have seen one
Forth"). However, the recently approved ANS Forth standard will
probably improve this situation ("For the first time, I have been able
to use the same program of more than 20 lines on more than one Forth
system").

|> Indeed, I find FORTH considerably more difficult to read
|> than assembly.

Perhaps you should become one of the "hackers, who understand the
difference between unreadability and illiteracy". :-)

I have seen Forth programs that are very hard to understand. I have
also seen well-written Forth programs. I find my Forth code much more
readable than my assembly code.

|> FORTH is often more compact at the expense
|> of speed -- the classic tradeoff.

This is true for the traditional threaded implementations. The
traditional answer is to use assembly for the hot spots. There are
also a number of compilers that produce native code, but AFAIK they
are mainly available in PC and workstation environments.

If you have more questions about Forth, post to comp.lang.forth, which
is read by some embedded controller guys.

Concerning Mike McCarthy who claims that Forth discourages
object-oriented programming: This is simply not true. Forth does not
support it, but it certainly does not discourage it. Indeed it's a bit
better than some other languages in this respect. In other languages
(C, Pascal) you have to fight the type checking if you want to do
anything like object-oriented programming, in Forth this is not
necessary. There are a number of object-oriented variants of Forth
around (i.e. Forth systems with built-in support for object
orientation), e.g. Kevo, Mops, and Yerk.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen

HALLAM-BAKER Phillip

unread,

Apr 29, 1994, 1:21:51 PM4/29/94

to

Just did a search on current projects:

Files searched: 511 Buffered I/O count: 1047
Records searched: 99335 Direct I/O count: 846
Characters searched: 2658523 Page faults: 43
Records matched: 0 Elapsed CPU time: 0 00:00:08.46
Lines printed: 0 Elapsed time: 0 00:00:32.98

Ie about 100k lines (records). But this does not include different code
versions etc, nor UNIX code. Also I have another 50K lines stored offline.

This is all the sources. Each project generates about twice as much low
level code, mainly in C or occam.

Then I have a videogame written in assembler with 100K lines in alone...

Occam used to be great for numbers of lines.

--
Phillip M. Hallam-Baker

Not Speaking for anyone else.

Brad Kepley

unread,

Apr 29, 1994, 11:47:10 AM4/29/94

to

In article <LACEY.94A...@zamboni.sps.mot.com> la...@zamboni.sps.mot.com (Mark Lacey) writes:
>Exactly what programs are you talking about? Student projects or
>something? Sorry, but MOST commercial programs are well over 10,000
>lines of code. Even well written C programs that are developed with
>efficiency in mind are often well over 10,000 lines of code (of course
>how one measures lines of code is always a tricky subject).

Embedded systems. R.Hyde said he was an embedded systems programmer. I'll
bet for embedded systems the *average* program might be under 1000 lines,
or is well on it's way towards that given how cheap and easy it is to
put a PIC processor in a toaster or what have you. Maybe what you meant
was that what one calls a "real program" is a tricky subject.

--

randy hyde

unread,

Apr 29, 1994, 2:57:35 PM4/29/94

to

<<<<<
Exactly what programs are you talking about? Student projects or
something? Sorry, but MOST commercial programs are well over 10,000
lines of code. Even well written C programs that are developed with
efficiency in mind are often well over 10,000 lines of code (of course
how one measures lines of code is always a tricky subject).
<<<<<<

Most programs are not commercial products.
For every Microsoft Word, there are a hundred little filter programs
which someone has written for personal use. If the world
of software development consisted only of commercial programs
everything would be written in assembly because we'd probably
have hundreds of programmers for every commercial program sold!

randy hyde

unread,

Apr 29, 1994, 3:26:33 PM4/29/94

to

>>>>
I think Randy Hyde misses several different points.
<<<<

No, I quite understand these points (I do teach the compilers and
programming languages courses here at UCR, I would hope I understand
something about compilers).

To address your points:

#1: Writing an optimizing compiler is different than writing
an assembly language program...

Why would you think that I don't understand this? All I said
was that a compiler writer who does not know how to program
in assembly language will be a lousy compiler writer. I've
seen code output by various compilers that did not take
machine idioms into account. While the compiler did a good
job of peephole optimization and common subexpression elimination,
it generated lousy straight-line code. Fortunately, most
modern compilers have had someone who knew what they were doing
working on the native code generation. Consider GCC; I doubt
the original authors knew assembly on every machine it's running
on, fortunately, the chip manufacturers and other good asm
programmers have supplied the machine dependent stuff.

#2: Good compilers are only produced when costs justify it.
Exactly. So if you don't have a good compiler (e.g., for a
6811 or 8031) why use a poor compiler just because good
ones exist for other architectures? I simply stated that
"Gee, there is no good compiler for the 6811 and I need
to generate 6811 code. This is an example of why you
would want to use assembly." I cannot believe how many
people have sent me mail telling me how great the C
compilers are on their workstations. This doesn't help
me at all. I need to generate 6811 code. Why do so
many people have a hard time understanding this?

#3: Machines are changing.
Yes, they are. However, my 6811-based robot code will never run
on any other machine. It is the epitome of machine idependent
code. The keyboard scanning code I've written in the past will
never be portable. About the time they decide to use a new
processor, the keyboards will be different as well (heck, they
have lots of different keyboards with the *same* processor and
given the system constraints, the code is not portable on the
same processor.) Someone mentioned that it won't be long before
we have 16K available. It's been a long time already and I don't
see any *cheap* 16K 6811 or 8031 or PIC parts on the horizon.
Further, if I can get my code into 4K, rather than 8K, there is
an important price difference. Yes, someone might supply me with
a 32-bit embedded processor with 16K RAM and 1M ROM in the future.
I might even use such a device on a project for which it is well
suited. However, I would *never* suggest to an employer that
they should do this do I can use C rather than assembly on the
embedded controller. The per-unit costs will almost always surely
be higher.

#4: A compiler never forgets.
It doesn't learn, either. Yes, new compilers come out (I understand
that), but this doesn't help most people in the middle of a project.
I loathe having to change system software in the middle of a project.
It almost always makes things worse.

#5: People writing compilers did not fall off the turnip truck.
I never said compilers weren't good. Only that humans can do better
*if they are motivated to do so.*

People- Quit reading my comments as "no one should ever program in HLLs."
I am simply pointing out some areas in software engineering where assembly
is still the best choice for many applications and compilers will probably
never catch up. People keep giving me lots of examples where you would
not want to use assembly (e.g., 100,000 line programs). So what? *I* know
there are applications where you wouldn't want to use assembly. I know
several HLLs, VHLLs, and several assembly languages. Probably a lot more
than many in this conference (side note: yes *you* probably know more than
me, therefore both of us probably know more than most people here). In
particular, I am sick of people who only know C/C++ and, possibly, Pascal
trying to tell me that C is the only way to go. I could just as easily
argue, as some already have in this thread, that C is too low level and
everyone should be programming in a VHLL like Prolog, Icon, SNOBOL4, SETL,
FLEX/BISON, etc. I, like Terje, tend to choose the *best* language I know
for a given job. Best, as defined by the task requirements including
efficiency, life cycle, programming effort required, expected maintenence,
commercial viability of product, and so on. For some tasks, assembly is
the best choice. Let me give a non-embedded system application which I
am working on currently. I am working on a distributed shared memory system.
In order to improve performance, we are compressing the pages of data before
transmitting them. Alas, when the codec was written in C *on a workstation
with a good C compiler (gcc)* the cost of compression/decompression and
transmission of the data was greater than simply transmitting the uncompressed
data. Most researchers argued that this meant that compression was not
viable. However, I rewrote the codec in assembly and hand tuned it.
Now compression pays off (most of the time). Faster machines would not have
solved this problem. *Everything* would have executed faster except time
on the wire (which is only about 10% of the total) so that wouldn't have
bought us anything. Likewise, a better compiler would have sped everything
up. By writing the codec in assembly, it became viable. Of course, I could
have rewritten *everything* in assembly and made the codec non-viable again,
but this is a good example of where writing a short assembly program
(about 500 lines) improved system performance and there is probably no
way I could have achieved this in C.

David Chase

unread,

Apr 29, 1994, 2:34:35 PM4/29/94

to

I think Randy Hyde misses several different points.

#1 Writing an optimizing compiler is a different task
than writing an assembly language program. A major
part of writing an optimizing compiler is figuring
how to get a computer to analyze a program, which is
utterly unlike assembly language programming. Once we
have that information, then things begin to look
more alike, though the human and the computer work
with different sorts of information about a program.

#2 Good compilers are produced only when the costs justify
it. The large numbers of workstations shipped, together
with the importance of price-for-performance, and benchmarks
that are not quite so easily hacked, has made it profitable
to pay attention to the performance of compiler-optimized code.

#3 Machines are changing. Old machines were designed with
a different idea of how they should be programmed. New
machines were designed (and will be designed) with knowledge
of transformations that a compiler can (or will) be able to
apply. Low-level parallelism and locality of reference
appear to be the name of the game for future machines, and
compilers do a decent job of exploiting those (in some cases,
such as software pipelining, they do a stupendous job). I've
tried to write some of this code by hand, and it is unbelievably
tedious (the number of lines of properly-pipelined fully-general
i860 assembly language that I can write per day is less than
10. I talked to someone once who did this for a living, and he
said that was his experience as well).

#4 A compiler "never forgets". Any trick I threw into the last compiler
that I worked on, is still there, and will be applied wherever the
compiler sees the opportunity. There are "tricks" that require
a certain amount of computation for their application anyway -- for
example, what if you need to multiply by a constant that is not
a power of two? What if you need to multiply a double-register
quantity by not a power of two? I cannot even remember all the
different branchless idioms that I coded into Sun's SC3.0 code
generator, but it does. It was my experience, working on the Rice
Vectorizer long ago, that humans had a pretty damn hard time figuring
out whether a given program had been correctly transformed or not,
which is an even stronger statement than "humans have a pretty damn
hard time performing the transformation".

#5 A corollary of #2 -- the people writing optimizing compilers for
workstation manufacturers did not just fall off the turnip truck.

yours,

David Chase, speaking for myself

randy hyde

unread,

Apr 29, 1994, 2:54:10 PM4/29/94

to

>>>>>
No ****. I wasn't talking about the direct cost of paying the

programmers. I was talking about the fact you lose market share
because your product comes out N months after the competitors', and
miss the early period when prices and margins are high. Or because
you cycle down the learning curve more slowly than they do.

Paul
<<<<<

OTOH, you do get to learn from *their* mistakes :-)

Donald Lindsay

unread,

Apr 30, 1994, 4:11:01 AM4/30/94

to

In article <2pp497$b...@galaxy.ucr.edu>,

randy hyde <rh...@mirage.NoSubdomain.NoDomain> wrote:
>Even for your barbie-doll example, there are superior
>alternatives. I've written several thousand lines in an assembler-
>alternative, PL/360, and it was a joy compared.
>
>Okay, I'll bite. What are the superior alternatives for a Barbie doll.

>While I have never used PL/360, I have used PL/M. While marginally

>lower level than C (i.e., I had access to the condition codes) it really
>is no better than C, ultimately. Finally, PL/M has the same problem
>as C -- no high quality compilers available for the 6811 (they do exist
>for the 8031, but the quality is questionable).

This should be in comp.compilers, but anyway:

OK, some history. PL/360 was invented in the late 60's as something
to write the Algol-W compiler in. This inspired at least two or three
PL's for the PDP-11 (eg LIL) and SPL. (IBM rewrote its mainframe OS
from assembler to SPL in the early 70's.) Then C and Pascal (etc)
started to replace assembler, and essentially killed off the PLs.

Intel came in on the tail end of the PL wave with PL/M, but neither
the language nor its implementation impressed anyone. This was partly
because it tried to make an 8080 look like something it wasn't (16
bit), an idea completely foreign to PL/360.

The essential PL idea was this: each construct generated exactly one
(or exactly two) instructions: and the programmer knew which one(s).
At all times, you were in control of what was being generated.
For example:

add r4, r5
add r4, r6
shl r4, 2

is instead

r4 := r5 + r6 shl 2;

which can be read left-to-right with less deciphering. Actually you
could define synonyms (local names) for the registers:

nlevels := tablelen + tableoffset shl elemsize;

and each PL had some syntax that suited its machine's addressing modes
and condition codes. PL/360 had a nice macro facility for specifying
parameterized bit patterns, so I could cope with irregular op codes:

compare(4,field(p+1),"quish");
if = then begin
set(anyparm);
set(smparm);
end
else

A machine with baroque addressing modes would complicate its PL's
syntax. However, on a simple machine, there isn't much you can do, so
the PL is correspondingly simple. PL/360 was so simple that its
compiler, written in itself, wasn't much over 3000 lines long. No
precedence, no type conversions, no optimization - this compiler held
the speed record, and was bug-free. If you want your own 6811 PL, you
should be able to get a slew by making it the term project in a
compiler writing course.

Thatcher Ulrich

unread,

Apr 30, 1994, 5:35:53 AM4/30/94

to

Mike McCarty (jmcc...@spd.dsccc.com) wrote:
: >
: >Really? I would have thought the positive effect of more efficient

: >programs would be more than offset by the negative effect of longer
: >product development time.
: >
: > Paul
: Well, it's like this.

: When you add hardware to the product, you MULTIPLY cost by number of
: units shipped.

: When you add software to the product, you DIVIDE cost by number of
: units shipped.

: For example, adding a $3.50 extra ROM to hold a larger image generated
: by a compiler (instead of tighter assembly) to a board and shipping
: 10000 of them costs $35000.00

: OTOH, spending $5000.00 extra to write in assembler (for extra
: programming time) and shipping 10000 costs $0.50 per board.

: So in this (hypothetical) example, the more software / less hardware
: approach is 7 times as cost effective.

: See?

Well, it's like this: getting to market 6 months later than your
competitor results in a reduction of sales volume to 1000 instead
of 10000. Your software ends up costing $5/board.

(All numbers pulled directly OOA to support my argument.)

-Thatcher

Henry G. Baker

unread,

May 1, 1994, 1:53:31 AM5/1/94

to

In article <LACEY.94A...@zamboni.sps.mot.com> la...@zamboni.sps.mot.com (Mark Lacey) writes:

>Well, try implementing FFTs or filters on our (Motorola's) DSPs
>efficiently in anything but the DSP assembly language. Why can it be
>done so well in assembly language? Because of addressing modes and an
>architecture that are built for doing those types of things very
>efficiently, and very easily (there are bit-reversed and modulo
>addressing modes in the architecture).

There is another approach to this problem. On a number of
occassions, I have written a specialized compiler for a single
problem. These specialized compilers used to be called (assembly
language) _macros_, but with the gelded macro processors like CPP, I
doubt that many young programmers have seen a 'real' (i.e.,
Turing-universal) macro processor like those on the IBSYS assemblers
and 360 assemblers. More recently, I have utilized specialized
compiler/macros written in Lisp to generate fairly complex assembly
language code.

The key thing about these systems is that they work because they are
not a large, heavy-weight compiler. There is no parser, there is no
complex general-purpose, complete language compiler, but just a
simple, recursive walk of a simple computer generated expression. In
most cases, I generate code backwards, so I get the effect of a 2-pass
compiler from a single 'pass'. This approach has been used to
generated code for very complex pipeline operations with a compiler
program that was less than 20 pages long.

In this way, I was able to embed my detailed knowledge about writing
assembly code for this machine into a small program, so that if the
code didn't work, I could modify the program that produced it, rather
than performing global modifications on a large assembly program.

It is a shame that a side-effect of modern compiler courses is to
convince most of the students that "Oh, _I_ could never do all that".
Well, perhaps you don't have to "do _all_ that".

(One caution: if you take this approach, you should probably not
mention the "c" (compiler) word to your boss. He'll probably get
cold feet and take you off the project, because he has also been
convinced that compilers are large and expensive projects.)

Amos Shapir

unread,

May 1, 1994, 11:40:57 AM5/1/94

to

ter...@hda.hydro.com (Terje Mathisen) writes:

>>> > If you have a counter-example though, I'd love to see it!
...
>The important thing here though is that all the speedups, except for the last
>doubling are due to better algorithms and data structures. I don't think a
>compiler will ever use BL and BH as both independent 8-bit registers and
>combine them as a 16-bit index. The index is also used both in little- and
>big-endian mode, as I alternate between BL and BH for the current state.

BL and BH, how quaint! (to paraphrase a scene in one of the Star Trek
movies).

A case in point is National Semiconductor's 32764 Swordfish -- it has
two integer arithmetic units, and the compiler schedules instructions
between them; the listing it produces is generated in two columns,
with sequence numbers to indicate the original order of instructions.
Such code can be tweaked by a human being (after all, a human wrote
the compiler in the first place) but so much state has to be kept in
mind, it's usually not worth the bother.

I agree there are a lot more 86-type machines around than such
two-headed monsters, and will be for quite some time, but the trend is
obvious.
--
Amos Shapir Net: am...@cs.huji.ac.il
Paper: The Hebrew Univ. of Jerusalem, Dept. of Comp. Science.
Givat-Ram, Jerusalem 91904, Israel
Tel: +972 2 585706,586950 GEO: 35 11 46 E / 31 46 21 N

Urs Hoelzle

unread,

May 1, 1994, 4:14:28 PM5/1/94

to

lind...@cs.cmu.edu (Donald Lindsay) writes:

>OK, some history. PL/360 was invented in the late 60's as something
>to write the Algol-W compiler in. This inspired at least two or three
>PL's for the PDP-11 (eg LIL) and SPL. (IBM rewrote its mainframe OS
>from assembler to SPL in the early 70's.) Then C and Pascal (etc)
>started to replace assembler, and essentially killed off the PLs.

[description of PL/360 omitted]

Just to add a little bit of triva: and the designer and implementor of
PL/360 was...?

...Niklaus Wirth, that's right. (Despite "Real Programmers Don't Use
Pascal", I guess he was a Real programmer :-)

-Urs

----------------------------------------------------------------------------
Urs Hoelzle u...@cs.stanford.EDU
Computer Systems Laboratory, CIS 57, Stanford University, Stanford, CA 94305

luis fernandes

unread,

May 1, 1994, 4:22:25 PM5/1/94

to

In article <CozEI...@cs.cmu.edu>
lind...@cs.cmu.edu (Donald Lindsay) writes:

In article <2pnbbt$4...@galaxy.ucr.edu>,
randy hyde <rh...@mirage.NoSubdomain.NoDomain> wrote:
>The student spent three weeks dealing with library
>bugs before completely giving up.

We *were* talking about "good" compilers, which do exist in plenty
elsewhere.

Exactly! Someone got this free 6811 C compiler off the net, that
someone crufted together in his spare time, and we found we were
spending far too much time helping people trying to coerce the code
to make the compiler understand something trivial like:

#define SOMECHAR 'x'
/* ... */
if(c==SOMECHAR) foo();

without casting SOMECHAR with some obscenely horrendous construct.

Then we spent something like $1000 to get this really fabulous
compiler and everything is beautiful once again.

We even have one adventerous type who writes C++ code, uses cfront
(from Sun's C++ compiler) to generate C, and then compiles the C
(with minor touch-ups for idiosyncrancies) for an embedded system
(his model-railroad controller)!

-lf

Terje Mathisen

unread,

May 2, 1994, 6:41:25 AM5/2/94

to

In <amos.767791874@hazard>, am...@cs.huji.ac.il (Amos Shapir) writes:

>ter...@hda.hydro.com (Terje Mathisen) writes:
>>The important thing here though is that all the speedups, except for the last
>>doubling are due to better algorithms and data structures. I don't think a
>>compiler will ever use BL and BH as both independent 8-bit registers and
>>combine them as a 16-bit index. The index is also used both in little- and
>>big-endian mode, as I alternate between BL and BH for the current state.
>
>BL and BH, how quaint! (to paraphrase a scene in one of the Star Trek
>movies).
>

"how quaint!" I like your quote!

However, I gave this example just to show that there are some ideas that
compilers never, ever, will be able to use. In this particular task, the
ability to use the two low bytes in a register both independently and
together, is the single key to the last 50-100% speedup of my code.

Do you suggest that my code is somehow invalid because it uses a "quaint"
capability only available on a majority of all computers ever sold? :-)

On a risc machine, the fastest alternate solution I've found is to shift the
current state value down and add in the new state, read in from a 64K
lookup table. This approach _will_ need one or two more instructions pr.
iteration of the unrolled loop though, which is costly when the inner loop
is down to three cycles/four instructions.

-Terje