Optimizing in assembly language

Randall Hyde

unread,

Mar 1, 2001, 2:34:42 AM3/1/01

to

As usual, the thread on "compiler's optimization vs. hand written
assembly" has flared up with lots of opinions (mostly dealing with
software engineering issues rather than dealing with the simple
predicate "is is possible to write faster code in hand written
assembly.")

Clearly, given the necessary resources (time, skill, etc.), an expert
assembly program *can* beat a compiler for a given *instance* of the
program. However, the software engineers rightly point out the
following problems:

1) The assembly code will only be optimal for one processor (e.g.,
moving from a PIII to a PIV breaks the optimizations). Recompiling
the HLL code (assuming a high quality compiler is available, it may
not be) generates new optimized machine code with very little effort.
The assembly code will have to be rewritten. Sometimes the assembly
code (on the older processor) is *still* better, but this is becoming
less and less likely as processors change (e.g., PII to PIII was okay,
but PIII to PIV was a whole different ball game).

2) Highly optimized assembly language code is the stuff that legends
(and case studies) are made of. Simple assembly language isn't
difficult to understand, but optimized assembly code (like high
optimized machine specific code, in general) is very difficult to
understand. Therefore, the likelihood of a large number of defects is
higher in optimized code (vs. simple code) *in any language*, but
certainly assembly.

3) Very few software modules are static; they require maintenance and
change. Highly optimized assembly code is often "sequence dependent"
and cannot be changed without rewriting large, unrelated (to the
change) sections. The result is often less-than-optimal code because
either the programmer doesn't change it or the changes break the
optimizations.

4) Very few people are good enough at assembly language to write the
kind of code that can consistently beat compilers.

5) etc. (fill in the blanks with your favorite argument against assembly.)

It *is* possible to write machine-specific optimized code in HLLs.
The Berkeley strings library for C stands out as a prime example of
code that does really well on 32-bit machines. However, the resulting
code has nearly all the defects of assembly language (portability to a
different CPU is about the only advantage). The optimizations may
very well break with a new version of the compiler and they may also
break with new CPUs.

Basically, it boils down to this: compilers are really great at
bookkeeping but lousy at dealing with the "big picture" (global
optimizations not withstanding). Humans (the "assembly coder") are
really great at the big picture and lousy with the details.

So here's my question: why can't we write an "optimizing assembler"
that lets the programmer specify machine sequences and the assembler
does the same kinds of "book keeping" that a HLL compiler would do.
Sure, optimization is a bit more difficult at this level and you'd
have to allow the programmer to disable the optimizations on a
statement by statement basis (since sometimes the programmer really
does know better). This would allow the programmer to do what they
know best while taking advantage of the current state of the art in
optimization technology.

Of course, the real problem is "who would be willing to write such a
thing?" (and maintain it.) But the idea seems interesting to me. Let
the machine choose the best instruction organization given a sequence
in machine code. Let the machine rearrange registers in instructions
to make better use of them (and move memory accesses into registers,
if needed). Let the machine eliminate dead code. Let the machine do
code movement to improve performance.

While some assembly programmers might find this whole process
offensive ("I don't want a compiler touching my beautiful code."), it
would allow "less than expert" assembly programmers to at least match,
and usually better the code emitted by a typical HLL compiler.

Many of the software engineering problems would remain (do you really
want to write a large amount of code in assembly?), but some of the
issues, like the appearance of new CPUs with their own idiosyncrases,
would diminish or go away entirely.

Randy Hyde

Jim Granville

unread,

Mar 4, 2001, 1:39:08 AM3/4/01

to

Randall Hyde wrote:
>
<snip Summary>

> So here's my question: why can't we write an "optimizing assembler"
> that lets the programmer specify machine sequences and the assembler
> does the same kinds of "book keeping" that a HLL compiler would do.

> ...

> Of course, the real problem is "who would be willing to write such a
> thing?" (and maintain it.) But the idea seems interesting to me. Let
> the machine choose the best instruction organization given a sequence
> in machine code. Let the machine rearrange registers in instructions
> to make better use of them (and move memory accesses into registers,
> if needed). Let the machine eliminate dead code. Let the machine do
> code movement to improve performance.

I am not sure exactly what you mean by 'Let the machine choose' ?

Doing this ON-CHIP is probably too costly & risky in silicon, but the
pathway taken by the Crusoe Chip, and being picked up by AMD looks
very promising.

They do a more extensive task of morphing, but the idea of an
'Optimse for Core' software layer would seem to meet your target.

Transmeta must have solved the problem of 'morphing data tables by
mistake':-) but I have not seen reports of how they cope with the most
paranoid self modifying code often found in dongles ?

Given the huge expense in silicon engineering to rack +20% in speed,
some smart software would seem an ideal answer, and would be readily
funded by the Chip makers, as it would allow their newest releases to
shine, and avoid that serious SW lag problem.

I did see an interesting release from Transmeta claiming some ??%
improvement in their latest morphing Sw release.
You can see scope here to 'spot and replace' some legacy code in
common operating systems, and improve the operation.

-jg

Shankar Unni

unread,

Mar 4, 2001, 1:54:32 AM3/4/01

to

Randall Hyde wrote:

> So here's my question: why can't we write an "optimizing assembler"
> that lets the programmer specify machine sequences and the assembler
> does the same kinds of "book keeping" that a HLL compiler would do.

It's not such a wild-assed thought.

The key would be to wrap the assembler instructions in some high-level
construct that allowed you to match up the input and output memory
operands to concrete symbolic memory locations (stack or heap or
global variables), so that you could intermix assembly and C (HLL)
code, and have the two be optimized together.

GCC has some sort of a notation like this, except that they don't
actually touch the assembly sections, but merely uses those notations
to decide what side-effects a snippet of assembly code has. But it's
not all that hard to extend the model to also go in and try to fiddle
with the instructions themselves..

I also know of several attempts at "optimizing assemblers" (IIRC, the
HP-PA assembler used to have an "optimization" flag that attempted to
do do some basic instruction rearranging to achieve reasonable
pipelining and reduce stalls..

Also, the MIPS assembler tried (for the R3000) to ensure that you
didn't have any hazards (e.g. register write-read) in the code
(because the architecture didn't have any such protections).
--
Shankar Unni su...@speakeasy.net
[Quite a while ago I read a paper about a Bell Labs language called LIL
that was intended as a level between C and assembler. They eventually
declared it a failure because in every case where they thought it would
be useful, it turned out to be easier overall to make their C compiler
emit better code and write in C. -John]

Tzvetan Mikov

unread,

Mar 8, 2001, 12:31:59 PM3/8/01

to

Randall Hyde <rh...@transdimension.com> wrote in message

> So here's my question: why can't we write an "optimizing assembler"
> that lets the programmer specify machine sequences and the assembler
> does the same kinds of "book keeping" that a HLL compiler would do.

I did something similar in an experimental version of a 8051 C
compiler which I had developed as a student. It was a very simple
design, practically no register allocation, only ad-hoc optimizations
in the code generator. In order to produce at least somewhat nice code
I had implemented some simple optimizations on the generated assembler
instructions - remove chained jumps, convert long jumps to short ones,
replace jump to a ret with ret, eliminate the basic cases of
unreachable code (only the cases when the code doesn't jump to
itself), eliminate immediately obvious redundant loads to the
accumulator. Not much, but at least distinguished it from SmallC and
the likes ...

To do anything reasonable with my compiler one had to use really *a
lot of* inline assembler, had to manually pass function parameters in
registers, etc. In the end it was being used like a convenient macro
processor around the inline assembler ... :-)

A few years ago I had to revamp the compiler which meanwhile had been
used for a few projects. So, I added value numbering within basic
blocks. Because of the (ridiculously) bad design of the rest of the
compiler the optimizer couldn't use any high-level information (like
the location of temporaries on the stack, etc). It had to operate only
with machine registers and byte-sized memory locations, so it was
practically independent from anything else and able to work on
arbitrary 8051 assembler code. The interesting part was allowing it to
optimize the inline assembly statements (which actually happened by
mistake) - I got very encouraging results. The most improvements in
the output came from better "gluing" the assembler code with the
surrounding C code - the redundant stores putting things in
temporaries just to load them in a register in the assembler could be
removed without having to complicate the inline assembly syntax like
GCC or Watcom C does. In a few cases the optimizer actually managed to
remove instructions within the inline assembler - that naturally made
me enormously proud (although I had written the sub-optimal assembler
code on the first place:-).

I am not aware of any other compiler performing such optimizations,
but I think it could be very useful and not terribly hard to
implement.

-tzvetan

Jonathan Guthrie

unread,

Mar 8, 2001, 12:30:51 PM3/8/01

to

Randall Hyde <rh...@transdimension.com> wrote:
> So here's my question: why can't we write an "optimizing assembler"
> that lets the programmer specify machine sequences and the assembler
> does the same kinds of "book keeping" that a HLL compiler would do.

Microware's 6809 assembler for OS-9 in the early 80's could run a
peephole optimizer on the input file. It didn't work all that well,
though, and reduced the size of the code, in my dimly-remembered
experience, more than the run time.

T.Shackell

unread,

Mar 8, 2001, 12:28:21 PM3/8/01

to

Randall Hyde <rh...@transdimension.com> wrote:

> 4) Very few people are good enough at assembly language to write the
> kind of code that can consistently beat compilers.

True.

> It *is* possible to write machine-specific optimized code in HLLs.
> The Berkeley strings library for C stands out as a prime example of
> code that does really well on 32-bit machines. However, the resulting
> code has nearly all the defects of assembly language (portability to a
> different CPU is about the only advantage). The optimizations may
> very well break with a new version of the compiler and they may also
> break with new CPUs.

*nods* C is a low level language.

> Basically, it boils down to this: compilers are really great at
> bookkeeping but lousy at dealing with the "big picture" (global
> optimizations not withstanding). Humans (the "assembly coder") are
> really great at the big picture and lousy with the details.

compilers compile programs from 'the inside', therefore they *can* be
beaten by a human who sees the code from a higher level.

> So here's my question: why can't we write an "optimizing assembler"
> that lets the programmer specify machine sequences and the assembler
> does the same kinds of "book keeping" that a HLL compiler would do.
> Sure, optimization is a bit more difficult at this level and you'd
> have to allow the programmer to disable the optimizations on a
> statement by statement basis (since sometimes the programmer really
> does know better). This would allow the programmer to do what they
> know best while taking advantage of the current state of the art in
> optimization technology.

I think a better approach and one that is currently used is to
put little *chunks* of hand optimized ASM into a higher level language.
This has many advantages over the scheme you suggest.
- coders on the whole can deal with a smaller number of lines of code
- they can focus their optimizations to key areas
- most of the code is very portable only the ASM bits are not
- small chunks of assembler are easier to understand than a whole program
at an intermediate level.
- the programmer can see what the compiler is doing by simply reverse
engineering their own HLL code.

In most cases speed is not so critical as to require *any* part of a
program to be reduced into ASM let alone all of it! The uses for ASM
that I have found have mainly been restricted to optimizing inner
loops that the CPU spends a huge ammount of time in. Here writing the
rest of the code in a higher level language and embedding blocks of
assembly for critical points is clearly the right solution.

Using ASM is a last resort, finding a better algorithm is always the
better way to go.

If *all* your code needs to be faster I would suggest you either:
- use a better algorith or
- raise the target spec of the machine to run it on

> Of course, the real problem is "who would be willing to write such a
> thing?" (and maintain it.) But the idea seems interesting to me. Let
> the machine choose the best instruction organization given a sequence
> in machine code. Let the machine rearrange registers in instructions
> to make better use of them (and move memory accesses into registers,
> if needed). Let the machine eliminate dead code. Let the machine do
> code movement to improve performance.

.. or compile the section of code your intrested in and reverse
engineer it, I have done this on several occasions.

> While some assembly programmers might find this whole process
> offensive ("I don't want a compiler touching my beautiful code."), it
> would allow "less than expert" assembly programmers to at least match,
> and usually better the code emitted by a typical HLL compiler.

less than expert assembly programmers are not likely to be writing
programs that need the extra speed. If they do then they should learn
optimized assembly, most people will never need to.

> Many of the software engineering problems would remain (do you really
> want to write a large amount of code in assembly?), but some of the
> issues, like the appearance of new CPUs with their own idiosyncrases,
> would diminish or go away entirely.

many of the software engineering problems would be *worse*, instead of
having mostly HLL with small ammounts of lower level languages mixed
in you now have a medium level soup. This could be very hard to
maintain.

It's a nice idea but I think what is currently done is more sensible.

Tom Shackell

Laurent Desnogues

unread,

Mar 8, 2001, 12:31:29 PM3/8/01

to

Randall Hyde wrote:
> So here's my question: why can't we write an "optimizing assembler"
> that lets the programmer specify machine sequences and the assembler
> does the same kinds of "book keeping" that a HLL compiler would do.
> Sure, optimization is a bit more difficult at this level and you'd
> have to allow the programmer to disable the optimizations on a
> statement by statement basis (since sometimes the programmer really
> does know better). This would allow the programmer to do what they
> know best while taking advantage of the current state of the art in
> optimization technology.

This is kind of done for the C6x DSP processors. The C6x are VLIW
(8 way) and the "optimizing assembler" does software pipelining,
schedule instructions. I am not so sure about all what is done at the
assembly level, but it's a good starting point and to the best of my
knowledge it's the first time such a tool is commercially available.

Oh, BTW, I am not trying to sell you anything ;)

Regards,

Laurent

A Johnstone

unread,

Mar 8, 2001, 1:17:17 PM3/8/01

to

:> So here's my question: why can't we write an "optimizing assembler"

:> that lets the programmer specify machine sequences and the assembler
:> does the same kinds of "book keeping" that a HLL compiler would do.

Slightly off topic, but a year or two ago we wrote some reverse
assembly tools for a DSP
(http://www.cs.rhul.ac.uk/research/languages). One of the side effects
of this work was an exhastive dataflow analysis of the human-written
assembler source, and this identified all registers that were in use
in a function, and more importantly all registers which were live
across the sum of all function calls for a particular function. This
allowed register mis-use to be easily identified.

I haven't written assembler in real anger for a few years, but a tool
like that would have saved many programmer days of effort. I wonder
why assemblers don't come with good dataflow analysis tools?

Adrian

--
Dr Adrian Johnstone, Senior Lecturer in Computing, Computer Science Dep,
Royal Holloway, University of London, Egham, Surrey, TW20 0EX, England.
Email a.joh...@rhul.ac.uk Tel:+44(0)1784 443425 Fax:+44(0)1784 439786

Randall Hyde

unread,

Mar 10, 2001, 3:54:22 PM3/10/01

to

"T.Shackell" <t...@ukc.ac.uk> wrote in message

> I think a better approach and one that is currently used is to
> put little *chunks* of hand optimized ASM into a higher level language.
> This has many advantages over the scheme you suggest.

Alas, it confuses the HLL compiler's optimizer to no end and often the
assembly interface is at the procedure call level and this destroys
what little optimizations you gain in assembly. I've had much better
luck with slightly larger chunks of assembly code (generally, I
replace at the module level rather than the procedure level).

> - coders on the whole can deal with a smaller number of lines of code

Of course, I have several examples where it took *fewer* lines of code
to express the algorithm in assembly than in C/C++, so I don't
completely buy the party line that HLLs are always more efficient
because it takes fewer lines of code to express an algorithm. In
general this may be true, but code isn't "generally" written in
assembly so this is a dangerous statement to make about a specific
problem.

> - they can focus their optimizations to key areas

This is language independent.

> - most of the code is very portable only the ASM bits are not

Of course, splicing in lots of in-line assembly makes the code *less*
portable than putting the assembly all in one module.

> - small chunks of assembler are easier to understand than a whole program
> at an intermediate level.

Agreed. However, I'd suggest that a given algorithm, implemented in
the appropriate language for that algorithm, is going to be the
easiest to read and understand. Sometimes assembly is the most
appropriate implementation language for a given algorithm (which is
why some of my assembly code contained fewer lines of code than the
C/C++ code, assembly was more appropriate in that case).

> - the programmer can see what the compiler is doing by simply reverse
> engineering their own HLL code.
>
> In most cases speed is not so critical as to require *any* part of a
> program to be reduced into ASM let alone all of it! The uses for ASM
> that I have found have mainly been restricted to optimizing inner
> loops that the CPU spends a huge ammount of time in. Here writing the
> rest of the code in a higher level language and embedding blocks of
> assembly for critical points is clearly the right solution.

I must say that I rarely attempt to write fast assembly code.
Most of the time that I use use assembly it's because:
(1) It is the most appropriate language for implementing the algorithm.
(2) I need absolute control over the execution sequence (e.g., order of
evaluation in an arithmetic expression with side effects).
(3) I'm doing it for fun.

Writing optimal code in *any* language is hard work. The best
optimizing compiler in the world isn't going to make up for sloppy
coding practices (even when using the best underlying algorithm).

Given that I don't pay attention to things like instruction
scheduling, register allocation, etc. (i.e., the things that
optimizers do a very good job of), it's quite amazing that my code
typically runs on par with the output of a compiler. But my assembly
code could probably run faster if it were run through some sort of
optimizer; hence my original question.

> Using ASM is a last resort, finding a better algorithm is always the
> better way to go.

Using any optimization technique is a last resort to finding a better
algorithm. Alas, better algorithms are usually difficult to come by
unless you've written some really stupid code to begin with; which, of
course, probably describes 80% of the code that's out there :-(.

> If *all* your code needs to be faster I would suggest you either:
> - use a better algorith or
> - raise the target spec of the machine to run it on

What happens when you're using the best known algorithm running on the
fastest machine you can afford?

Seriously, though, a factor of two's difference in running speed can
mean the difference between success and failure in the commercial
marketplace. Raising the target spec is cheating. If the
specifications call for a response in xxx.yyy seconds on a 500 MHz
machine, you don't meet specs by changing this to a 1GHz machine.

As we're seeing today, the average consumer has stopped buying the
fastest machine available, so the expectation that next year's
machines will be fast enough isn't a good bet anymore. Given the
troubles that CPU manufacturers are having boosting the speed of their
uniprocessor systems recently, I'd suggest that the days of this free
ride may be coming to a close.

> less than expert assembly programmers are not likely to be writing
> programs that need the extra speed. If they do then they should
> learn optimized assembly, most people will never need to.

There are two problems with this approach. First, why shouldn't the
machine be put to use to help out those who need to use assembly for
one reason or another (e.g., they need the control or space savings,
speed is a secondary consideration)?

Second, the claims that assembly is not portable is, strictly
speaking, not true. An 80x86 assembly program will run on many
different CPUs, say the Pentium, the Pentium II/III, the Pentium IV,
the Athlon, etc. As HLL proponents are fond of pointing out, code
that is optimized for one of these processors may be suboptimal on
another. With a HLL, all it takes is a recompile (plus, of course, a
new compiler) and your code is running fast on the other processor(s).
Why shouldn't assembly language programmers who need to work in
assembly language be able to do the same thing?

> > Many of the software engineering problems would remain (do you really
> > want to write a large amount of code in assembly?), but some of the
> > issues, like the appearance of new CPUs with their own idiosyncrases,
> > would diminish or go away entirely.
>
> many of the software engineering problems would be *worse*, instead of
> having mostly HLL with small ammounts of lower level languages mixed
> in you now have a medium level soup. This could be very hard to
> maintain.

You're assuming that such an enabling technology would encourage more
people to use assembly language rather than a HLL and that they would
code in assembly what they would otherwise code in C. While some of
this might happen, I just don't see this being a big deal. And in the
cases where it did happen, it would probably happen because there were
solid engineering or personal reasons for doing so. I could be wrong,
but I doubt that an optimizing assembler would spark a renaissance in
assembly language programming. It would simply be a tool that would
make maintenance of assembly code much easier (since, for example, you
wouldn't have to maintain different versions of the code for different
processors that have different optimization characteristics).

> It's a nice idea but I think what is currently done is more sensible.

No optimizations at all?
Force someone to use an inappropriate language because they can't
maintain the assembly code across all CPUs in a family?
I'm not sure I buy the line that "just use a HLL, it's better in the
long run." I'd rather let the software's designer weigh all the
options and make that decision on a case-by-case basis
as it pertains to their projects.
Randy Hyde

Scottie

unread,

Mar 10, 2001, 4:01:13 PM3/10/01

to

> So here's my question: why can't we write an "optimizing assembler"
> that lets the programmer specify machine sequences and the assembler
> does the same kinds of "book keeping" that a HLL compiler would do.

IIRC DEC/Mips used at least a bit of this approach. There were some
problems, the most obvious of which were related to use of memory as
semaphores. Code motion to improve memory bandwith removed full
protection of critical sections.
...
while (lock != 0)
...
;
lock = 1
...update data with mutiple instruction...
lock = 0
...
Dead store elimination, if implemented, can remove locks on
resources above entirely, leaving code like:
...
while (lock != 0)
...
;
...update data with multiple instruction...
lock = 0
...
or even (on discovering the protect loop always exits with lock==0):
...
while (lock != 0)
...
;
update data with more than one indivisible instruction
...

A "clever" assembler may notice the "lock=1" code is useless because
lock is not examined anywhere in the update, so it may simply
eliminate the lock = 1 altogether. Often this is right, but if we are
concerned with multiple threads viewing the data, the optimization is
wrong. It is tough to determine which bits of assembly are
intentional and which are artifact, and I wouldn't trust an assembler
to do it myself.

Scott David Daniels

Tom Payne

unread,

Mar 12, 2001, 2:37:04 AM3/12/01

to

Randall Hyde <rh...@transdimension.com> wrote:
[...]
: Clearly, given the necessary resources (time, skill, etc.), an expert

: assembly program *can* beat a compiler for a given *instance* of the
: program.

It's not clear to me. For a finite machine, given sufficient
resources, a programmer and a compiler could each find an optimum
(i.e., unbeatable) "instance" of the program.

: So here's my question: why can't we write an "optimizing assembler"

: that lets the programmer specify machine sequences and the assembler
: does the same kinds of "book keeping" that a HLL compiler would do.
: Sure, optimization is a bit more difficult at this level and you'd
: have to allow the programmer to disable the optimizations on a
: statement by statement basis (since sometimes the programmer really
: does know better).

AFIK, all low-level language translators (attempt to) decompile
decompile programs in their source language into some sort of
intermediate code that can be optimized and code-generated into the
target language. AFIK, such systems don't handle self-modifying code
and tend to have problems with indirect jumps.

Tom Payne

Randall Hyde

unread,

Mar 14, 2001, 12:11:54 AM3/14/01

to

"Tom Payne" <t...@hill.cs.ucr.edu> wrote in message

> Randall Hyde <rh...@transdimension.com> wrote:
> [...]
>: Clearly, given the necessary resources (time, skill, etc.), an expert
>: assembly program *can* beat a compiler for a given *instance* of the
>: program.
>
> It's not clear to me. For a finite machine, given sufficient
> resources, a programmer and a compiler could each find an optimum
> (i.e., unbeatable) "instance" of the program.

Touche.
However, very real world programs are optimal. So I must ammend
my claim to "the typical case".

> AFIK, all low-level language translators (attempt to) decompile
> decompile programs in their source language into some sort of
> intermediate code that can be optimized and code-generated into the
> target language. AFIK, such systems don't handle self-modifying code
> and tend to have problems with indirect jumps.

These same problems exist in HLLs (yes, people do write self-modifying
code in HLLs, I've helped many a person do this).

Clearly to pull off an optimizer in assembly language, the assembly
must be made a little richer than the traditional variety. You need
the ability to declare "volatile" locations (so the critical region
"optimization" mentioned earlier in this thread doesn't occur)
and it would help if one could provide hints in the language
about the addresses a pointer may contain. Other hints,
like "pure" functions, etc., would be quite useful.

As to the intermediate language, I *have* seen research that attempts
to reduce machine code to a set of basic micro-operations in an
attempt to translate the machine code to a different processor. While
this research has yet to produce anything practical, clearly this is
an optimization problem and the same techniques could be applied to an
optimizing assembler (in which case the target processor would be the
same as the source processor).

Randy Hyde

Paolo Bonzini

unread,

Mar 22, 2001, 1:17:45 AM3/22/01

to

> Transmeta must have solved the problem of 'morphing data tables by
> mistake':-) but I have not seen reports of how they cope with the most

> paranoid self modifying code often found in dongles ?

Both solutions are simple:
1) the code looks at x86 data, not at translated data, nobody knows
about translated data in fact. And the morpher never morphs
unreachable paths (being unreachable is a transitive relationship).

2) paranoid self modifying code will be slow; Transmeta says the
morpher will spend less time working on paranoid code, possibly
by interpreting it instead of compiling it. The problem is, that JITs
often create code that the morpher will think of as self modifying
(for example JITs that cache native code as there is need to run
it, instead of compiling everything at startup)

Eric Boon

unread,

Mar 22, 2001, 1:21:59 AM3/22/01

to

Scottie wrote:

That problem can be overcome easily by qualifying 'lock' as a
'volatile' variable, a variable of which the value can be changed
externally. That's very useful for e.g. memory mapped I/O,
semaphores:-), etc. Anything a program does to a volatile variable
should be left untouched by an optimizer.

Eric

Tom Payne

unread,

Mar 22, 2001, 1:19:45 AM3/22/01

to

Randall Hyde <rh...@transdimension.com> wrote:
: "Tom Payne" <t...@hill.cs.ucr.edu> wrote in message

:> Randall Hyde <rh...@transdimension.com> wrote:
:> [...]
:>: Clearly, given the necessary resources (time, skill, etc.), an expert
:>: assembly program *can* beat a compiler for a given *instance* of the
:>: program.
:>
:> It's not clear to me. For a finite machine, given sufficient
:> resources, a programmer and a compiler could each find an optimum
:> (i.e., unbeatable) "instance" of the program.

: Touche.
: However, very real world programs are optimal. So I must amend
: my claim to "the typical case".

Since we are talking about programs for some given finite
architecture, every real-world program has an optimum equivalent. For
resource-constrained program optimization, the current situation is is
similar to man-vs.-machine chess of say 20 years ago. Each year,
however, it becomes more difficult for humans to continue winning, and
it is not clear to me that even the best humans will always be able to
win in chess or program optimization.

[...]
: Clearly to pull off an optimizer in assembly language, the assembly

: must be made a little richer than the traditional variety.

Also, some restrictions would help, e.g., preventing self modification
and unrestricted indirect jumps.

[...]
: As to the intermediate language, I *have* seen research that attempts

: to reduce machine code to a set of basic micro-operations in an
: attempt to translate the machine code to a different processor. While
: this research has yet to produce anything practical,

VirtualPC and FX32!

: clearly this is

: an optimization problem and the same techniques could be applied to an
: optimizing assembler (in which case the target processor would be the
: same as the source processor).

Once the flow graph of an assembly language program has been
determined, I would assume that the usual optimization tricks can be
applied. So, what are the special difficulties in determining the
flow graph of an assembly langauge program?

Tom Payne

Barry Watson

unread,

Mar 26, 2001, 1:41:32 PM3/26/01

to

Tom Payne wrote...

>Once the flow graph of an assembly language program has been
>determined, I would assume that the usual optimization tricks can be
>applied. So, what are the special difficulties in determining the
>flow graph of an assembly langauge program?

For those interested David W. Goodwin "Interprocedural Dataflow
Analysis in an Executable Optimizer" ACM SIGPLAN 97 pp 122-133, and
Robert Muth "Register Liveness Analysis of Executable Code" Department
of Computer Science University of Arizona, cover the issues of CFGs
(actually Interprocedural CFGs) for executables. Problems include
computed jumps, calling via function pointers (e.g. virtual funcs in
C++) and exception handling. Muth is pretty good on these.

Barry Watson
Compiler Designer
Ericsson Utvecklings AB

Martin Ward

unread,

Mar 26, 2001, 1:42:15 PM3/26/01

to

Tom Payne <t...@hill.cs.ucr.edu> writes:

> Once the flow graph of an assembly language program has been
> determined, I would assume that the usual optimization tricks can be
> applied. So, what are the special difficulties in determining the
> flow graph of an assembly langauge program?

Self-modifying code, branch to register (eg: read some characters
from the keyboard, do some arithmetic, treat the result as an address
and branch to it...).

See http://www.dur.ac.uk/martin.ward/martin/papers/icsm99-t.ps.gz
for a few more problems with analysing assembler (and our solution...)

Martin

Marti...@durham.ac.uk http://www.dur.ac.uk/martin.ward/ Erdos number: 4

Shankar Unni

unread,

Mar 26, 2001, 1:52:21 PM3/26/01

to

I wrote:

> It's not such a wild-assed thought.

And our esteemed moderator added:

> [Quite a while ago I read a paper about a Bell Labs language called LIL
> that was intended as a level between C and assembler. They eventually
> declared it a failure because in every case where they thought it would
> be useful, it turned out to be easier overall to make their C compiler
> emit better code and write in C. -John]

Absolutely. No argument in 99% of the cases.

The only place where it makes sense to do this is if you have to write
snippets of code that either does not fit into a standard C model
(usually interleaving assembly inside regular C code), or touches
registers or other resources that fall outside the usual "registers and
memory" model, *AND* and you want to be able to do at least some
instruction rearranging around or even through the assembly code.

This becomes even more important if, for the convenience of your
programmers, you provide macros that do standard things in a convenient
fashion (e.g. DSP blocks), and you don't want to go around adding
high-level built-ins for each of those building blocks..

--
Shankar Unni su...@speakeasy.net

Joachim Durchholz

unread,

Mar 26, 2001, 1:49:40 PM3/26/01

to

Tom Payne <t...@hill.cs.ucr.edu> wrote:
>
> Once the flow graph of an assembly language program has been
> determined, I would assume that the usual optimization tricks can be
> applied. So, what are the special difficulties in determining the
> flow graph of an assembly langauge program?

Assembly is too powerful; there are too many ways to change the
control flow in unorthodox manners. For example:

The result of tail call optimization replaces a subroutine call with a
jump. The disassembler has no way of determining that a jump
instruction is really a subroutine call other than by simulating stack
evolution, and that's damn hard, in particular if the original
assembly was already optimized.

It's easy to change the return address by twiddling with stack contents.

The output of an OO compiler will generate lots of jump tables. An
assembly optimizer will have to establish that these tables are never
overwritten to do any cross-subroutine optimizations. If the OO
software places these tables in the heap (maybe because it's loading
the code dynamically), and the heap is managed by a copying garbage
collector, the disassembler will have a rather hard time determining
that the addresses in the jump table will stay unchanged. (In fact any
optimizer that sees the code of a moving garbage collector will have a
hard time establishing that the GC doesn't change the observable
system state, so this problem isn't limited to determining the flow
graph but to optimization in general. The usual solution is to
withhold the GC mechanism from the optimizer. Of course, this leaves
the combination of GC and optimization prone to bugs - the optimizer
may think that an address is fixed while it really isn't.)

And I didn't even start to think about self-modifying code, or code
that uses some writable data both for display and as a code address or
machine instruction (well, the latter techniques are luckily not very
common anymore, but you get the idea).

Regards,
Joachim

Randall Hyde

unread,

Mar 27, 2001, 11:47:02 PM3/27/01

to

"Joachim Durchholz" <joac...@gmx.de> wrote

> Assembly is too powerful; there are too many ways to change the
> control flow in unorthodox manners. For example:
>
> The result of tail call optimization replaces a subroutine call with a
> jump. The disassembler has no way of determining that a jump
> instruction is really a subroutine call other than by simulating stack
> evolution, and that's damn hard, in particular if the original
> assembly was already optimized.

Not that it matters, but presumably the optimizer would be working
with source code, not disassembling machine instructions.

As for tail recursion, a good data flow analyzer could figure this out
in most cases. It's not "damn hard" to do at all. In those cases
where the DFA cannot determine the target address, it can still figure
out what's going on and revert to "safe" optimization mode.

> It's easy to change the return address by twiddling with stack contents.

Again, a DFA can figure this out. I did this kind of work over 15
years ago when I wrote a static code analyzer for 6502 code (which
makes considerable use of this trick since the indirect jump
instruction was weak).

> The output of an OO compiler will generate lots of jump tables. An
> assembly optimizer will have to establish that these tables are never
> overwritten to do any cross-subroutine optimizations. If the OO
> software places these tables in the heap (maybe because it's loading
> the code dynamically), and the heap is managed by a copying garbage
> collector, the disassembler will have a rather hard time determining
> that the addresses in the jump table will stay unchanged. (In fact any
> optimizer that sees the code of a moving garbage collector will have a
> hard time establishing that the GC doesn't change the observable
> system state, so this problem isn't limited to determining the flow
> graph but to optimization in general. The usual solution is to
> withhold the GC mechanism from the optimizer. Of course, this leaves
> the combination of GC and optimization prone to bugs - the optimizer
> may think that an address is fixed while it really isn't.)

(1) We are talking about optimizing assembly source code here,
not optimizing binary machine code. Therefore, worrying about
the output of a compiler isn't an issue.

(2) Of course, we *are* talking about assembly language here
and it's certainly possible to write in assembly anything that
can be done in a HLL. So you still have a valid point.
I'll address that point next.

>
>
> And I didn't even start to think about self-modifying code, or code
> that uses some writable data both for display and as a code address or
> machine instruction (well, the latter techniques are luckily not very
> common anymore, but you get the idea).

A common saying in Computer Science is "optimize for the most common
case." Very few people use self-modifying code (in various forms)
these days. As long as the optimizer can recognize such code and not
do anything stupid with it, that's great. It's not like HLL
optimizers are successful at optimizing all code either (it is easy to
trip up most optimizers, particularly if the language supports an
indirect jump, gotos, or in-line assembly).

I've heard a ton of complaints that "an assembly based optimizer can't
do a perfect job." or "an assembly-based optimizer will fail under
these circumstances." So what? If it produces better code most of
the time and less than optimal code some of the time, I'd be real
happy with it. I could even live with an assembly optimizer that
choked on some code once in a great while as long as the optimizer
could be turned off around the offending code (or, less desirable, for
the entire assembly process).

The truth is, 99% of the assembly code I've ever written wouldn't
present any special problems to a traditional optimizer. Perhaps I'm
better than most (or worse than most, depending on how you want to
look at it), but I suspect that if an assembly based optimizer
provided an 80% solution, that would be great. If 19.9% of the time
it couldn't produce better code, I could live with that. If 0.09% of
the time it threw up its hands and said "I can't deal with this code."
then that would be fine. I could even live with it generating bad
code with no indication of error a tiny fraction of the time if there
was some way I could manually verify the output (e.g., emission of
assembly code rather than a binary object file).

Whether or not the resulting code is faster than the code emitted by C
is irrelevant. Lots of code is written in assembly for many different
reasons. Efficiency is only one of them. Most of the code I write in
assembly I don't take any special pains to make it fast. Clearly,
there are some very simple transformations that could double or triple
the speed of my code (e.g., better instruction scheduling). Tools
already exist to help you locate such problems with your code (e.g.,
Intel's VTune). The next step is to actually make the modifications
for you during the compile/assembly process. Even if the "assembly
optimizer" limited itself to doing the obvious and easy (or trivial)
things, I argue this would be a boon to those who use assembly.

Randy Hyde

Joachim Durchholz

unread,

Apr 4, 2001, 12:26:00 AM4/4/01

to

Shankar Unni <su...@speakeasy.net> wrote:
>
> The only place where it makes sense to do this is if you have to
> write snippets of code that either does not fit into a standard C
> model (usually interleaving assembly inside regular C code), or
> touches registers or other resources that fall outside the usual
> "registers and memory" model, *AND* and you want to be able to do
> at least some instruction rearranging around or even through the
> assembly code.

Hmm... there are a few additional cases where C is simply inappropriate:
1) You want to check for integer overflow.
2) You need exceptions.

Oh, and slightly off-topic, there are a few other things that annoy
those who want to use C as a backend for their compiler:
3) It has no support for tail call recursion.
4) It has no support for automatic garbage collection.
(The following are from http://www.cminusminus.org/faq.html:)
5) It cannot return multiple values in registers
6) It cannot bind global variables to registers
7) It has no support for lightweight concurrency

Not all of these features are required for each language, but many
(most?) languages need at least one of them.

Regards,
Joachim

Eric O'Dell

unread,

Apr 10, 2001, 1:20:11 AM4/10/01

to

On 4 Apr 2001 00:26:00 -0400, Joachim Durchholz <joac...@gmx.de> wrote:

>Hmm... there are a few additional cases where C is simply inappropriate:
>1) You want to check for integer overflow.
>2) You need exceptions.
>
>Oh, and slightly off-topic, there are a few other things that annoy
>those who want to use C as a backend for their compiler:
>3) It has no support for tail call recursion.
>4) It has no support for automatic garbage collection.
>(The following are from http://www.cminusminus.org/faq.html:)
>5) It cannot return multiple values in registers
>6) It cannot bind global variables to registers
>7) It has no support for lightweight concurrency
>
>Not all of these features are required for each language, but many
>(most?) languages need at least one of them.

In fairness, most C compilers I've seen support direct manipulation of
the registers, but generally not in a portable or standardized
fashion. Of course, direct register manipulation isn't portable
anyway. And automatic garbage collection is certainly possible in C,
but you have to do the work yourself or find one of the several
available libraries to do it for you. Ditto on exceptions.

-e.

Andreas Krall

unread,

Apr 10, 2001, 1:20:34 AM4/10/01

to

"Joachim Durchholz" <joac...@gmx.de> writes:
> Hmm... there are a few additional cases where C is simply inappropriate:

...

>
> Oh, and slightly off-topic, there are a few other things that annoy
> those who want to use C as a backend for their compiler:
> 3) It has no support for tail call recursion.
> 4) It has no support for automatic garbage collection.
> (The following are from http://www.cminusminus.org/faq.html:)
> 5) It cannot return multiple values in registers
> 6) It cannot bind global variables to registers

Here you mix up the language definition of C with current
implementions of C compilers. Some C compilers do not implement the
above mentioned features, but many implement proper tail call
recursion, structs and unions returns in registers and interprocedural
register allocation. One of my students has implemented proper tail
recursion for GCC for the x86 and the Alpha backend. His master thesis
and the code will be available very soon from our web server.

ad 4) There is always the Boehm/Weiser GC

ad 5)
Just read the calling conventions for differnt compilers, e.g.: SGI
calling conventions:

A struct with only one or two floating point fields is returned in $f0
(and $f2 if needed).
Any other struct or union results of at most 128 bits are returned in $2
and $3 (remainder if necessary).

--
an...@complang.tuwien.ac.at Andreas Krall
http://www.complang.tuwien.ac.at/andi/ Inst. f. Computersprachen, TU Wien
tel: (+431) 58801/18511 Argentinierstr. 8/4/1851
fax: (+431) 58801/18598 A-1040 Wien AUSTRIA EUROPE

Morrisett

unread,

Apr 10, 2001, 1:24:04 AM4/10/01

to

"Joachim Durchholz" <joac...@gmx.de> wrote in message

> 5) It cannot return multiple values in registers

I don't understand this -- what's to prevent a (small) struct
from being returned in registers?

-Greg
[Nothing, many C compilers do that. -John]

jacob navia

unread,

Apr 10, 2001, 1:30:03 AM4/10/01

to

I would like to point out that some of the problems of C as an
assembly language are just wrong, using the specific example of the
lcc-win32 system. No environment is "perfect" but I think lcc-win32
comes close, and is widely used as a back end compiler.

> Hmm... there are a few additional cases where C is simply inappropriate:
> 1) You want to check for integer overflow.

Under lcc-win32 you just write
c = a+b;
if (_overflow()) {
// Overflow handling
}
> 2) You need exceptions.

lcc-win32 lets you use the exceptions defined by the OS.
The setjmp/longjmp facility allows you to implement any exception schema you
need.

> Oh, and slightly off-topic, there are a few other things that annoy
> those who want to use C as a backend for their compiler:
> 3) It has no support for tail call recursion.

Not very difficult to implement in the generated C with a few assignments
and a goto...

> 4) It has no support for automatic garbage collection.

lcc-win32 features an automatic garbage collector (Boehm's)

> (The following are from http://www.cminusminus.org/faq.html:)
> 5) It cannot return multiple values in registers

Multiple values in registers?
In general C shields you from assembly. The register layout is defined by
the compiler. Nothing prevents you from building your own schemas of course.

> 6) It cannot bind global variables to registers

In the x86, with only 6 registers this would be impossible, but in other
architectures, the C compiler lets you assign globals to the global
registers with some compiler specific syntax.

> 7) It has no support for lightweight concurrency

Well, under windows you can use the thread facility of the operating system.
Lcc-win32 supports it of course.

>
> Not all of these features are required for each language, but many
> (most?) languages need at least one of them.

lcc-win32 is used as a back end for Objective C Eiffel and other
customer specific languages.

But a language system is not the compiler back end only. When looking
at the alternatives you should include the existence of a configurable
linker and above all a configurable debugger. It is those features
that make actually a system usable as a back end or not. The compiler
is just a part of it.

felix

unread,

Apr 10, 2001, 1:33:34 AM4/10/01

to

Joachim Durchholz wrote in message <01-0...@comp.compilers>...

>
>Hmm... there are a few additional cases where C is simply inappropriate:
>1) You want to check for integer overflow.
>2) You need exceptions.
>
>Oh, and slightly off-topic, there are a few other things that annoy
>those who want to use C as a backend for their compiler:
>3) It has no support for tail call recursion.
>4) It has no support for automatic garbage collection.
>(The following are from http://www.cminusminus.org/faq.html:)
>5) It cannot return multiple values in registers
>6) It cannot bind global variables to registers
>7) It has no support for lightweight concurrency
>
>Not all of these features are required for each language, but many
>(most?) languages need at least one of them.

It is generally true that C has it's problems when considered
an "assembly" language, but I would like to add something to
each point:

1) Integer overflow detection is a big problem in C (I would say
perhaps even the biggest). The obvious solution to perform
integer- and floating-point arithmetic in parallel will of course
always be inferior. It would be interesting, though, to measure
how much of the fp-arithmetic can be done truly in parallel
on modern RISC processors with separate execution units
for integer- and fp-math.

2) Well setjmp()/longjmp() are there. I see no reason not to
use it.

3) Tail-recursion is hard in C. Even GCC can't do it in the
general case. Possible solutions: a driver-loop (or "trampoline")
as used in Steele's Rabbit, in Gambit or sml2c, or Continuation Passing
Style.

4) The Boehm collector is a very convenient way out of this.
And writing a precise GC isn't impossible, either.

5) CPS would be a possible answer: a return is transformed into
a call, so multiple values can be passed in registers without problem.

6) GNU C can use global registers.

7) With underlying concepts like trampolines or CPS lightweight
threads are no problem, either. The next release of the
Gambit Scheme compiler is supposed to offer threads based on first-class
continuations that outperform all types of native Linux- and Java-
threads.

Some additional points:

8) First class continuations: CPS or trampolines.

9) Language issues: aliasing rules in C can limit the set of
possible optimizations (GNU C offers "-fstrict-aliasing", how
far this helps, I can't say).

10) Platform issues: SPARC register windows are often more
in the way than helpful (GNU C has "-mflat").

Concepts like CPS or trampolines/driver-loops have of course their
own problems. I'm not saying any of these is the perfect solution.
But an important points is IMHO that C (or GNU C - limiting yourself
to GNU C as a target language would be perfectly acceptable)
offers a couple of things:

- Portability
- Easy availability
- Efficiency
- Availability of tools like profilers, debuggers, etc.

The huge amount of effort put into GCC, for example, is hard to match.
This is one reason why I think C-- will have a hard time ever to
compete seriously with it.
The huge amount of work needed to write a robust and optimizing native
code backend is IMHO out of the question for anything but large
projects.

cheers,
felix

Fergus Henderson

unread,

Apr 10, 2001, 3:19:41 AM4/10/01

to

"felix" <felixu...@freenet.de> writes:

>2) Well setjmp()/longjmp() are there. I see no reason not to
> use it.

One problem is that on some systems they are very slow. E.g. they may
do a system call to save/restore the signal mask.

However, there are alternatives: Posix has sigsetjmp(), which you can
tell to *not* save the signal mask. And GNU C has __builtin_setjmp()
and __builtin_longjmp(), which compile to a quite small amount of
inline code -- they are quite efficient.

However, even that may still not be very satisfactory if your source
language has LOTS of exception handlers (e.g. C++ destructors) -- in
that case you really want a table-driven exception handling approach,
one that doesn't add significant run-time over per exception handler,
but you can't get it in C.

>3) Tail-recursion is hard in C. Even GCC can't do it in the
> general case. Possible solutions: a driver-loop (or "trampoline")
> as used in Steele's Rabbit, in Gambit or sml2c, or Continuation Passing
> Style.

The trouble with the driver loop approach is that you can't pass
arguments in registers anymore (unless you use global register
variables -- and see below for the problems with that), and you have
to use your own call stack, rather than the C call stack. This also
makes interoperability more difficult.

The trouble with continuation passing style is that C compilers don't
do general tail call optimization (indeed, doing so in C is tricky,
because the program might take the address of local variables). So a
straightforward CPS will suffer from stack leaks. There are
approaches to avoid this, but they do have their drawbacks.

>4) The Boehm collector is a very convenient way out of this.
> And writing a precise GC isn't impossible, either.

...

>6) GNU C can use global registers.

Unfortunately, recent versions of GNU C don't support global
registers. It's still in the manual, but gcc often rejects programs,
reporting "internal error / impossible asm", if you use them. For
example, if you add global register variable declarations to the Boehm
collector, it doesn't compile with gcc 2.95.3. And the GCC
maintainers have not expressed any interest in fixing the bugs,
instead the response has been that the global register variable
feature was a bad idea.

>10) Platform issues: SPARC register windows are often more
> in the way than helpful (GNU C has "-mflat").

Unfortunately GCC's `-mflat' does not work in combination with `-fpic',
or at least it didn't last time I looked (2.95.2?).

>Concepts like CPS or trampolines/driver-loops have of course their
>own problems. I'm not saying any of these is the perfect solution.
>But an important points is IMHO that C (or GNU C - limiting yourself
>to GNU C as a target language would be perfectly acceptable)
>offers a couple of things:
>
>- Portability
>- Easy availability
>- Efficiency
>- Availability of tools like profilers, debuggers, etc.
>
>The huge amount of effort put into GCC, for example, is hard to match.
>This is one reason why I think C-- will have a hard time ever to
>compete seriously with it.
>The huge amount of work needed to write a robust and optimizing native
>code backend is IMHO out of the question for anything but large
>projects.

I agree with all of the above. Because of this, I think it's worth
also pursuing another approach, complementary to that of C--, namely
adding support for the features that we need (for example, proper tail
calls) to C or GNU C.

--
Fergus Henderson <f...@cs.mu.oz.au> | "I have always known that the pursuit
| of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.

Fergus Henderson

unread,

Apr 12, 2001, 2:46:40 AM4/12/01

to

"jacob navia" <ja...@jacob.remcomp.fr> writes:

>I would like to point out that some of the problems of C as an
>assembly language are just wrong, using the specific example of the
>lcc-win32 system. No environment is "perfect" but I think lcc-win32
>comes close, and is widely used as a back end compiler.
>
>> Hmm... there are a few additional cases where C is simply inappropriate:
>> 1) You want to check for integer overflow.
>
>Under lcc-win32 you just write
> c = a+b;
> if (_overflow()) {
> // Overflow handling
> }

That's not portable, though. And it's probably not very efficient
either, since you have to use lcc-win32. (How does lcc-win32 compare
to GNU C and other C compilers? The original lcc on which lcc-win32
is based generates significantly worse code than gcc.)

>> 2) You need exceptions.
>
>lcc-win32 lets you use the exceptions defined by the OS.

That's not portable, though.

>> Oh, and slightly off-topic, there are a few other things that annoy
>> those who want to use C as a backend for their compiler:
>> 3) It has no support for tail call recursion.
>
>Not very difficult to implement in the generated C with a few assignments
>and a goto...

It is indeed difficult if you want to support separate compilation,
since C doesn't allow a "goto" to jump from one function to another.
And compiling the whole source program to a single C function is
likely to cause trouble for the C compiler, either exceeding fixed
limits or getting blow-outs in compile times.

>> 6) It cannot bind global variables to registers
>In the x86, with only 6 registers this would be impossible,

Sure it would be possible. Three for the front-end compiler and three
for the C compiler, for example. That's how we do it when using GNU C
global register variables in the Mercury compiler. (In recent
versions of GNU C it doesn't work reliabily, but that's another
story... AFAIK, there's no reason why it shouldn't, just lack of
people with sufficient time and expertise to fix it).

>but in other
>architectures, the C compiler lets you assign globals to the global
>registers with some compiler specific syntax.

Only GNU C allows that, AFAIK.

>> 7) It has no support for lightweight concurrency
>
>Well, under windows you can use the thread facility of the operating system.
>Lcc-win32 supports it of course.

That's not portable, though.

If you tie yourself to lcc-win32, then you lose many of the advantages
of compiling to C.

VBDis

unread,

Apr 12, 2001, 2:49:27 AM4/12/01

to

"Joachim Durchholz" <joac...@gmx.de> schreibt:

>Hmm... there are a few additional cases where C is simply inappropriate:

C was /designed/ as a replacement for assembly language, and IMO this
goal is reached nowadays, too. No real assembler will define or deal
with exceptions and like constructs, which are beyond the creation of
CPU instructions.

AFAIR one of the strongest arguments and restrictions of the language
specification was, that the assembly programmer should be able to
exactly predict the native instructions, created by the C
compiler. This expectation cannot hold, when the instruction set
prevents the creation of such a unique instruction stream from a C
primitive (statement or expression), as is true for current
"incompatible" (Intel or RISC) processors. Similarly every assumption
about "improved" parameter passing, overflow handling etc. is out of C
scope.

It should be clear, that no single high-level-language can replace an
assembler for /all/ current processors. So the usablity of C IMO
depends on the way, how the /compiler/ can be /adopted/ to a specific
set of processors. Some extensions of the C language can be handled in
the preprocessor, others must be built into the language and compiler
itself.

C++ or C# are far away from C and processors, and establish their own
model of execution on the target machine. Better use C or any other
HLL instead, which better suits your expectations on OO programming.

IMO it's easier to make an assembler understand C, than to make an C
compiler understand machine specific constructs. No smilie here,
opposed to my first intention. C was designed for a specific set of
processors, existing at that time (many decades ago), not at all for
processors in general.

DoDi

Fergus Henderson

unread,

Apr 14, 2001, 5:15:24 PM4/14/01

to

vb...@aol.com (VBDis) writes:

>C was /designed/ as a replacement for assembly language, and IMO this
>goal is reached nowadays, too. No real assembler will define or deal
>with exceptions and like constructs, which are beyond the creation of
>CPU instructions.

That depends on what you mean by a "real" assembler.
ILASM, the assembler for the Intermediate Language used for
the Microsoft.NET platform, deals with exceptions and like constructs.
Likewise Jasmin and the various other Java bytecode assemblers do the same.
Are they real assemblers? If not, why not?

Randall Hyde

unread,

Apr 14, 2001, 5:17:46 PM4/14/01

to

"VBDis" <vb...@aol.com> wrote in message news:01-0...@comp.compilers...

> "Joachim Durchholz" <joac...@gmx.de> schreibt:
>
> >Hmm... there are a few additional cases where C is simply inappropriate:
>
> C was /designed/ as a replacement for assembly language, and IMO this
> goal is reached nowadays, too. No real assembler will define or deal
> with exceptions and like constructs, which are beyond the creation of
> CPU instructions.

Guess it depends upon how you define "real" assembler. HLA (the High
Level Assembler) certain provides syntax for handling exceptions. HLA
doesn't feel any less "real" because it provides this capability
(after all, if it offends your sensibilities, you can always ignore
the exception handling syntax). Note, btw, that I implemented the
same syntax HLA uses within MASM using macros. Few people would argue
that MASM is not a real assembler.

> IMO it's easier to make an assembler understand C, than to make an C
> compiler understand machine specific constructs. No smilie here,
> opposed to my first intention. C was designed for a specific set of
> processors, existing at that time (many decades ago), not at all for
> processors in general.

OTOH, most modern processors in wide use were certainly designed to
execute C code efficiently. Indeed, I largely suspect that the
architectural changes appearing in more modern processors are more
largely responsible for the narrowing gap between compiler output and
human-written assembly code performance than the quality of
optimizers. When CPU manufacturers like Intel depreciate all the old
cool instructions by making them slow (and others just don't include
them), it really begins to limit the options assembly programmers have
(and limits them to the same sequences the compilers would generate).
This isn't necessarily a bad thing, of course, just an explanation.
Randy Hyde
[Many of those old cool instructions looked to me like they were
suffering from the Vax fallacy, complex instructions that might make
it easy to write or generate code for some constructs, but are really
hard to make run fast. -John]

felix

unread,

Apr 14, 2001, 5:15:09 PM4/14/01

to

Fergus Henderson wrote

>>3) Tail-recursion is hard in C. Even GCC can't do it in the
>> general case. Possible solutions: a driver-loop (or "trampoline")
>> as used in Steele's Rabbit, in Gambit or sml2c, or Continuation Passing
>> Style.
>
>The trouble with the driver loop approach is that you can't pass
>arguments in registers anymore (unless you use global register
>variables -- and see below for the problems with that), and you have
>to use your own call stack, rather than the C call stack. This also
>makes interoperability more difficult.

There is an article by Eliot Miranda floating around in the
comp.compilers archives somewhere, where he uses some interesting
postprocessing on GCC's assembly output, applied to threaded code.

>The trouble with continuation passing style is that C compilers don't
>do general tail call optimization (indeed, doing so in C is tricky,
>because the program might take the address of local variables). So a
>straightforward CPS will suffer from stack leaks. There are
>approaches to avoid this, but they do have their drawbacks.

Which would that be?

cheers,
felix

VBDis

unread,

Apr 15, 2001, 10:51:00 PM4/15/01

to

Im Artikel <01-0...@comp.compilers>, "Randall Hyde"
<rh...@transdimension.com> schreibt:

>Guess it depends upon how you define "real" assembler.

Right, I only had one point in mind, and forgot about all the other things,
which an assembler is assumed to do :-(

What I had in mind was, that an assembler is not normally allowed to insert
instructions, modify the sequence of instructions, or the instructions
themselves. At least this was the point of some hard core assembly programmers,
which thought to know better how to write efficient code, and possibly also
wrote self modifying code.

In short, when someone says "do it exactly this way", then an assembler should
do it that way. In all other cases the assembler can provide optimizations and
address calculations, macros and other handy stuff.

DoDi
[I think enough angels have danced on the head of this pin. -John]

Jim Granville

unread,

Apr 18, 2001, 2:25:40 AM4/18/01

to

VBDis wrote:
>
> Im Artikel <01-0...@comp.compilers>, "Randall Hyde"
> <rh...@transdimension.com> schreibt:
>
> >Guess it depends upon how you define "real" assembler.
>
> Right, I only had one point in mind, and forgot about all the other things,
> which an assembler is assumed to do :-(
>
> What I had in mind was, that an assembler is not normally allowed to insert
> instructions, modify the sequence of instructions, or the instructions
> themselves. At least this was the point of some hard core assembly programmers,
> which thought to know better how to write efficient code, and possibly also
> wrote self modifying code.

Yes, but does it HAVE to be this way ?

> In short, when someone says "do it exactly this way", then an assembler should
> do it that way. In all other cases the assembler can provide optimizations and
> address calculations, macros and other handy stuff.

We work a lot with PLD tools/compilers, and one attribute they have,
is control over KEEP / Collapse of logic nodes ( ie downstream LOGIC
minimise / optimise )
Sometimes in PLD code you want to say "don't fiddle with this at all!"

A similar option in Assemblers could benefit (all?) Assemblers, and
allow work on lower level optimise engines, as other threads have
discussed.

- jg

Joachim Durchholz

unread,

May 3, 2001, 1:43:35 PM5/3/01

to

Andreas Krall <an...@complang.tuwien.ac.at> wrote:
> "Joachim Durchholz" <joac...@gmx.de> writes:
> > Hmm... there are a few additional cases where C is simply
> > inappropriate: ...
> >
> > Oh, and slightly off-topic, there are a few other things that annoy
> > those who want to use C as a backend for their compiler:
> > 3) It has no support for tail call recursion.
> > 4) It has no support for automatic garbage collection.
> > (The following are from http://www.cminusminus.org/faq.html:)
> > 5) It cannot return multiple values in registers
> > 6) It cannot bind global variables to registers
>
> Here you mix up the language definition of C with current
> implementions of C compilers.

This doesn't really make a difference if I select a backend.
Sure, I can restrict my language to a specific C implementation. But
that's exactly the type of restriction that will cripple a language.

> Some C compilers do not implement the above mentioned features, but
> many implement proper tail call recursion,

This does *not* help. Tail-call optimization removes a factor of N from
big-O complexities for many typical functional programs, and that's so
important that some languages even specify this optimization as
*required* for a valid implementation.

> ad 4) There is always the Boehm/Weiser GC

... which doesn't do finalization.
(Or does it?)

Regards,
Joachim

Joachim Durchholz

unread,

May 7, 2001, 11:16:15 PM5/7/01

to

Morrisett <jmor...@twcny.rr.com> wrote:
> "Joachim Durchholz" <joac...@gmx.de> wrote in message
> > 5) It cannot return multiple values in registers
>
> I don't understand this -- what's to prevent a (small) struct
> from being returned in registers?

Nothing. But putting the results into a struct comes with its own set of
problems:
1) The HLL compiler can't force the C compiler to do that. I'm pretty
sure that several brain-damaged C compiler can't do that.
2) What's more important: The HLL compiler has no way to decide how many
results should go into the result struct. If it puts too few results
into the struct, it wastes register space. If it puts too many results
into the struct, no results at all will be passed in registers.
3) What should the caller do with a struct from a result? You can't
assign it to another struct in C. You can't even take its address. All
you can do with it is (a) select a member from it (which essentially
means you're back at a single result) and (b) pass it to another
function (which will result in unlimited stack growth unless you're
*very* careful *and* have a C compiler that can do tail call
elimination).

Regards,
Joachim
--
This is not an official statement from my employer.

Hans Boehm

unread,

May 7, 2001, 11:18:22 PM5/7/01

to

Joachim Durchholz <joac...@gmx.de> wrote in message

news:01-0...@comp.compilers...

> Andreas Krall <an...@complang.tuwien.ac.at> wrote:
> > ad 4) There is always the Boehm/Weiser GC
>
> ... which doesn't do finalization.
> (Or does it?)

It does. In several flavors.

Hans-J. Boehm

Jonathan Thornburg

unread,

May 13, 2001, 1:15:30 AM5/13/01

to

Joachim Durchholz <joachim....@gmx.de> wrote:
[[if a C function retunrs a struct]]

>3) What should the caller do with a struct from a result? You can't

=========

>assign it to another struct in C.

=================================

You can't??? I was under the impression that structure assignment was
already present even in K&R Classic. And gcc 2.95.2 -Wall -pedantic
has no qualms with a test program:

#include <stdio.h>
struct foo { int x, y; };
struct foo fn(int i, int j, int k);
int main(void)
{
struct foo result;
result = fn(3,1,4);
printf("result.x=%d result.y=%d\n", result.x, result.y);
return 0;
}
struct foo fn(int i, int j, int k)
{
struct foo s;
s.x = i+j+k;
s.y = i*j*k;
return s;
}

--
-- Jonathan Thornburg <jth...@thp.univie.ac.at>
http://www.thp.univie.ac.at/~jthorn/home.html
Universitaet Wien (Vienna, Austria) / Institut fuer Theoretische Physik

David Thompson

unread,

May 15, 2001, 12:32:54 AM5/15/01

to

Joachim Durchholz <joac...@gmx.de> wrote :
[ returning a struct from a function in C, possibly in registers ]

> 3) What should the caller do with a struct from a result? You can't
> assign it to another struct in C. You can't even take its address. All
> you can do with it is (a) select a member from it (which essentially
> means you're back at a single result) and (b) pass it to another

> function ...

You certainly can assign it -- that's normally the reason
people use a struct return type (to return multiple values),
even on system that don't or can't put it in registers.

You can't take the address of a non-lvalue whether
implemented in register(s) or not.

--
- David.Thompson 1 now at worldnet.att.net