Plenty of Registers?

Andrew Molitor

unread,

Oct 8, 1996, 3:00:00 AM10/8/96

to

I've been spending altogether too much time grubbing about in
compiler output lately, and I've made the following observations:

- you practically never use all the registers
- you spend far too much time thrashing data in and out
of registers, because you can't prove the memory wasn't
modified between here and there.

I'm not sure whether this is because C sucks (and C++ sucks
a WHOLE LOT MORE), or because gcc sucks. In many cases, the thrashing
seems to genuinely be because gcc can't prove anything, and is being
conservative.

C++ is worse because you spend so much time monkeying about
with class data, about which gcc/g++ can prove nothing, so orthodox
C++ methods (short, very few locals, mostly class data bashing) use
laughably few registers.

Is it possible to write a C or C++ compiler that does much
better than gcc in these areas? It's my understanding that languages
like Pascal and Modula-{2|3} are at least theoretically much more
compiler friendly. How true is this?

Things are getting bad enough that I (once a devout Trust Your
Compiler fanatic) am beginning to look at assembler again, since all the
cunning scheduling in the world cannot make up for all the unnecessary
bashing about in memory that's some unspeakable number of cycles away.
If Modula-3 will bring me back to HLL nirvana, I would be pleased.

Kate Mohr

unread,

Oct 8, 1996, 3:00:00 AM10/8/96

to

Paul says:
I have done the same thing with Borland C and it appears to be much
better but I am an assembly freak so I type __asm a lot anyway.
MS compilers have always been wasteful, I think they just want
every body to get a supercomputer and then there would be no problem.

Larry Kilgallen

unread,

Oct 8, 1996, 3:00:00 AM10/8/96

to

In article <53c83o$6...@news.network.com>, amol...@blefscu.network.com (Andrew Molitor) writes:

> Is it possible to write a C or C++ compiler that does much
> better than gcc in these areas? It's my understanding that languages

While wasted instructions are a metric of compiler inadequacy,
the mere existence of unused registers is not.

> better than gcc in these areas? It's my understanding that languages
> like Pascal and Modula-{2|3} are at least theoretically much more
> compiler friendly. How true is this?

Strongly typed languages (and perhaps Ada is most strongly typed
of commercial languages) give more information to the compiler
about programmer intentions. The compiler knows full well the
user is not going to be referring to a later array element by
incrementing the pointer because such operations are not supported.

> Things are getting bad enough that I (once a devout Trust Your
> Compiler fanatic) am beginning to look at assembler again, since all the
> cunning scheduling in the world cannot make up for all the unnecessary
> bashing about in memory that's some unspeakable number of cycles away.

Take a look at the compiler output of a vendor who is in the Benchmark
Races. DEC's GEM backend for their Alpha compilers is claimed to be
quite good at such things. Given that claim, I presume that HP, Sun,
etc. have also put a lot of effort into their side of the competition.

GCC would seem to be in the worst possible position for efficient
code generation -- there is no hardware vendor providing funds to
make it better. Furthermore, GCC is _portable_, and making something
portable can easily defeat efforts to make it efficient.

Larry Kilgallen

Maynard Handley

unread,

Oct 8, 1996, 3:00:00 AM10/8/96

to

In article <53c83o$6...@news.network.com>, amol...@blefscu.network.com

(Andrew Molitor) wrote:

> I've been spending altogether too much time grubbing about in
> compiler output lately, and I've made the following observations:
>
> - you practically never use all the registers
> - you spend far too much time thrashing data in and out
> of registers, because you can't prove the memory wasn't
> modified between here and there.
>
> I'm not sure whether this is because C sucks (and C++ sucks
> a WHOLE LOT MORE), or because gcc sucks. In many cases, the thrashing
> seems to genuinely be because gcc can't prove anything, and is being
> conservative.

I religously declare every variable I can REGISTER, on the theory that an
intelligent compiler will use this as a hint that no store to memory can
touch this var so it does not need to be swapped in/out. (This is as
opposed to declaring the thing register to keep it in a register because I
like to think the compiler can at least get register allocation correct.)

Do the compilers out there actually use this hint in this fashion?

Maynard

--
My opinion only

Ian Kemmish

unread,

Oct 8, 1996, 3:00:00 AM10/8/96

to

In article <53c83o$6...@news.network.com>, amol...@blefscu.network.com says...
>

> I'm not sure whether this is because C sucks (and C++ sucks
>a WHOLE LOT MORE), or because gcc sucks. In many cases, the thrashing
>seems to genuinely be because gcc can't prove anything, and is being
>conservative.

Last time I bothered to check (about four years ago, before I finally gave up
on the gcc optimiser bugs), the mips compilers generated better code than gcc
for mips. One speicifc thing I remember is that gcc would repeatedly load the
same value, even though one knew it had'nt changed.

Larry Kilgallen mentions the DEC GEM compiler back end. This also looks pretty
good, and I gather was originally written by some of the same people as the
mips compilers.

> C++ is worse because you spend so much time monkeying about
>with class data, about which gcc/g++ can prove nothing, so orthodox
>C++ methods (short, very few locals, mostly class data bashing) use
>laughably few registers.

If you care about this stuff, you can get into the habit of regarding C as a
high-level assembler for your critical routines....

The interesting challenge a few years ago was producing C that compiled equally
well on machines with and without condition codes. I get the impression that
in these days of multiple issue, that's the least of your worries anymore:-):-)

============================================================================
Ian Kemmish 18 Durham Close, Biggleswade, Beds SG18 8HZ
i...@five-d.com Tel: +44 1767 601 361 Fax: +44 1767 312 006
Info on Jaws and 5D's other products on http://www.five-d.com/5d
============================================================================
`The customer is King, but the proprietor is God'

Cliff Click

unread,

Oct 8, 1996, 3:00:00 AM10/8/96

to

amol...@blefscu.network.com (Andrew Molitor) writes:

> I've been spending altogether too much time grubbing about in
> compiler output lately, and I've made the following observations:
>
> - you practically never use all the registers
> - you spend far too much time thrashing data in and out
> of registers, because you can't prove the memory wasn't
> modified between here and there.
>

> I'm not sure whether this is because C sucks (and C++ sucks
> a WHOLE LOT MORE), or because gcc sucks. In many cases, the thrashing
> seems to genuinely be because gcc can't prove anything, and is being
> conservative.
>

> Is it possible to write a C or C++ compiler that does much
> better than gcc in these areas?

gcc is a fine compiler that is amazingly well ported.
But it's optimizations are not all that great (OK, but not great).
==> A better compiler will help

Nothing can optimize badly written code (and badly written code can
be written in any language).
==> Write better code ("Better code"? What's that? :-)

Ok, so write "optimizable" code. Subroutine calls to functions not in
this compilation smash all global variables and all locals whose
address has "escaped". So don't use global variables, don't take
the address of locals, and don't use pass-by-reference. For C++ this
implies not using the "this" pointer (:-).

Seriously, for C++ you need some C++ specific optimizations. These
optimizations certainly would apply to the equivalently written C code
(and Fortran, if you could write the right kind of code), but in
practice they don't appear. Hence scarce compiler writers spend their
scarce time optimizing the common C & Fortran stuff, and C++/Java
stuff languishes. :-(. Bummer.

Ask Spec for a C++ performance benchmark, wait a year, then buy the
compiler that gets all those hot numbers on your favorite box. :-)

Or switch to another compiler; I've seen lotsa (research) compilers do
quite well on C++ code when I was in school. Go check the
comp.compilers monthly posting on available free/shareware C++
compilers.

Cliff

--
Cliff Click, Ph.D. Compiler Researcher & Designer
RISC Software, Motorola PowerPC Compilers
cli...@risc.sps.mot.com (512) 891-7240
http://members.aol.com/mjclick1

Zalman Stern

unread,

Oct 8, 1996, 3:00:00 AM10/8/96

to

Ian Kemmish (i...@five-d.com) wrote:
: Larry Kilgallen mentions the DEC GEM compiler back end. This also looks pretty

: good, and I gather was originally written by some of the same people as the
: mips compilers.

I don't think so. DEC may have had access to the MIPS compiler sources
though. (Well I'm pretty sure various people at DEC did have such access,
the question is whether the people writing their new compiler system looked
at the MIPS sources.)

: > C++ is worse because you spend so much time monkeying about

: >with class data, about which gcc/g++ can prove nothing, so orthodox
: >C++ methods (short, very few locals, mostly class data bashing) use
: >laughably few registers.

Not to mention lots of indirect calls about which very little can be
proven.

You can win a little bit by using const everywhere you can. Then there's
inlining. Occasionally you end up explicitly introducing temporaries to
"communicate" with the compiler:

int foo, bar;

bar = foo;
silly_call_by_ref(&bar);
/* Compiler now knows foo wasn't modified by above call. */

But yes, in general there are problems with the semantics of C here.

Steven Correll

unread,

Oct 8, 1996, 3:00:00 AM10/8/96

to

In article <CLIFFC.96...@ami.sps.mot.com>,

Cliff Click <cli...@ami.sps.mot.com> wrote:
>Seriously, for C++ you need some C++ specific optimizations. These
>optimizations certainly would apply to the equivalently written C code
>(and Fortran, if you could write the right kind of code), but in
>practice they don't appear. Hence scarce compiler writers spend their
>scarce time optimizing the common C & Fortran stuff, and C++/Java
>stuff languishes. :-(. Bummer.

Another problem is that until about a year ago, the C++ draft standard
was changing in real time, and some of the novelties introduced during
the course of standardization (e.g. RTTI and exceptions) impacted not just
the typical compiler front end, but also affected code generation and runtime
support. As compilers catch up with all the new stuff, one hopes that people
can devote more attention to code-generation quality.
--
Steven Correll == PO Box 66625, Scotts Valley, CA 95067 == s...@netcom.com

Cliff Click

unread,

Oct 9, 1996, 3:00:00 AM10/9/96

to

hand...@apple.com (Maynard Handley) writes:

I religously declare every variable I can REGISTER, on the theory that an
intelligent compiler will use this as a hint that no store to memory can
touch this var so it does not need to be swapped in/out. (This is as
opposed to declaring the thing register to keep it in a register because I
like to think the compiler can at least get register allocation correct.)

Do the compilers out there actually use this hint in this fashion?

This basically degrades to an assertion that the address of the
variable is never taken. Any compiler worth it's salt *already*
notices this fact, so no, the REGISTER keyword isn't helping the
compiler any. (It might help YOU remember that address-of isn't legal
here).

A better keyword is "const", as well as avoiding global variables like
the plague. As mentioned previously, if you must pass-by-reference to
return results, try to have NO other uses of the referenced variable.
An annoying routine from the specint95 vortex benchmark looks like
this:

int j;
...
foo( &j );
if( ... )
for( ; j<lim; j++ ) /* nicely stride through an array with a MEMORY */
...array[j]... /* based index variable. Bleah. */

What you really want here is:

int j;
register int j2;
...
foo( &j );
j2 = j;
if( ... )
for( ; j2<lim; j2++ )
...uses j2...

Hoisting "j" into a register in the first example is doable, but
difficult. I don't believe gcc attempts this.

Stephen Clarke

unread,

Oct 9, 1996, 3:00:00 AM10/9/96

to

Maynard Handley (hand...@apple.com) wrote:

: I religously declare every variable I can REGISTER, on the theory that an

: intelligent compiler will use this as a hint that no store to memory can
: touch this var so it does not need to be swapped in/out. (This is as
: opposed to declaring the thing register to keep it in a register because I
: like to think the compiler can at least get register allocation correct.)

: Do the compilers out there actually use this hint in this fashion?

It's not really necessary. Any reasonable C compiler knows that
if you never take the address of a local variable, then no store can touch
it (except direct stores to the variable itself, obviously) - this is
true whether or not it's declared "register".

However, you __could__ look on the use of register as a useful (if
not-so-obvious) engineering practice: mark all the variables you want
efficient access to as register, then if you do make an alias (i.e. take
the address) of one - the compiler will report it as an error. This
then gives you the opportunity to do the copy-to-temp, alias temp, copy-
from-temp transformation mentioned earlier on this thread.

Steve
--------------
PACT (Partnership in Advanced Computer Technologies)
Email: ste...@pact.srf.ac.uk
Mail : 10 Priory Road, Clifton, Bristol, BS8 1TU. UK.
Phone: +44 117 970 7188, Fax +44 117 970 7171.

Alberto Moreira

unread,

Oct 9, 1996, 3:00:00 AM10/9/96

to

In article <CLIFFC.96...@ami.sps.mot.com>, cli...@ami.sps.mot.com
says...

> An annoying routine from the specint95 vortex benchmark looks like
> this:
>
> int j;
> ...
> foo( &j );
> if( ... )
> for( ; j<lim; j++ ) /* nicely stride through an array with a
MEMORY */
> ...array[j]... /* based index variable. Bleah. */
>
> What you really want here is:
>
> int j;
> register int j2;
> ...
> foo( &j );
> j2 = j;
> if( ... )
> for( ; j2<lim; j2++ )
> ...uses j2...
>
> Hoisting "j" into a register in the first example is doable, but
> difficult. I don't believe gcc attempts this.

It seems conceptually possible to convert the above sequence to

foo (&j);
if (...){
reg = j;
for (; reg<lim; reg++) { ... substitute "reg" for "j" ... }
j=reg;
}

as long as the body of the for loop is reasonably well behaved. This
could be done as the intermediate code is generated ("reg" would be a
temporary here) or further down the pike by the code generator itself.
A healthy dose of global flow analysis might even preclude the need
to generate the "j=reg" portion. In machines such as the Pentium where
adding to memory is relatively cheap, it may be possible to generate
code that uses "reg" as a shadow value, and incrementing both register
and memory location when the "reg++" must be done:

reg=j;
for (; reg<lim; reg++,j++) { ... }

If the body of the "for" loop is heavy, spending a few more cycles
at loop entry and exit seems a reasonable tradeoff if it means that
array indexes will be in registers inside the loop. For example, having
the above "j" in a register means that the code generator can take
advantage of complex addressing modes (for ex., array[ebx*4] for one-
dimensional integer or float arrays, or array[ecx][ebx*8] for two-
dimensional double arrays).

Alberto.

Maynard Handley

unread,

Oct 10, 1996, 3:00:00 AM10/10/96

to

In article <CLIFFC.96...@ami.sps.mot.com>, cli...@ami.sps.mot.com

(Cliff Click) wrote:

> hand...@apple.com (Maynard Handley) writes:
>
> I religously declare every variable I can REGISTER, on the theory that an
> intelligent compiler will use this as a hint that no store to memory can
> touch this var so it does not need to be swapped in/out. (This is as
> opposed to declaring the thing register to keep it in a register because I
> like to think the compiler can at least get register allocation correct.)
>
> Do the compilers out there actually use this hint in this fashion?
>

> This basically degrades to an assertion that the address of the
> variable is never taken. Any compiler worth it's salt *already*
> notices this fact, so no, the REGISTER keyword isn't helping the
> compiler any. (It might help YOU remember that address-of isn't legal
> here).
>

This is interesting information (and is probably why in my experience I
have not seen too much of the continual memory load/storing that started
this thread).
My experience is mostly looking at the output of the PPC compilers MrC 2.0
from Apple and mcc 2.0 from Motorola. Both of these are pretty reasonable
compilers and seem to do a good job of implementing both the standard
compiler optimizations along with some nifty PPC-specific tricks like
using multiple condition registers.
However they both seem to do, IMHO, a fairly sub-optimal job of
micro-scheduling by which I mean things like performing loads as early as
possible, calculating condition registers as early as possible,
interleaving instructions to make use of multiple functional units on each
cycle.
It seems like one could write some sort of post-processor that would take
a given code fragment (application or DLL) and run through performing such
micro-scheduling as aggresively as possible.

Has much work been done on this subject? Is it considered so hard to do in
software that the scheduling I see in these compilers is pretty much state
of the art? For example Apple has a tool, MrPlus, that postprocess a code
fragment (after linking, hence with global knowledge of what's going on)
performing a few common modifications, mostly fiddling with the registers
that are saved/restored on function calls, cleaning up function calls that
are not cross-TOC, and sometimes inling cross-TOC glue, with the usual
result of shrinking the code by about 5% in size. However it does not do
any of the more aggressive re-arranging of code I've suggested.

Cliff Click

unread,

Oct 10, 1996, 3:00:00 AM10/10/96

to

hand...@apple.com (Maynard Handley) writes:

> It seems like one could write some sort of post-processor that would take
> a given code fragment (application or DLL) and run through performing such
> micro-scheduling as aggresively as possible.
>
> Has much work been done on this subject? Is it considered so hard to do in
> software that the scheduling I see in these compilers is pretty much state
> of the art?

Lots of work has been done on post-processing scheduling (ala Mips
assembler). Lots of work has been done on scheduling in general.
Moto's mcc is a fairly stock block-only list scheduler, so it's not
really state-of-the art. A new (global) scheduler is fairly high
on our list of things to do.

Michael Golden

unread,

Oct 11, 1996, 3:00:00 AM10/11/96

to

Maynard Handley wrote:
>
> This is interesting information (and is probably why in my experience I
> have not seen too much of the continual memory load/storing that started
> this thread).
> My experience is mostly looking at the output of the PPC compilers MrC 2.0
> from Apple and mcc 2.0 from Motorola. Both of these are pretty reasonable
> compilers and seem to do a good job of implementing both the standard
> compiler optimizations along with some nifty PPC-specific tricks like
> using multiple condition registers.
> However they both seem to do, IMHO, a fairly sub-optimal job of
> micro-scheduling by which I mean things like performing loads as early as
> possible, calculating condition registers as early as possible,
> interleaving instructions to make use of multiple functional units on each
> cycle.

As I understand it, good register allocation (which prevents the continual loading
and storing of values sometimes known as "spill code") and good instruction
scheduling are not independent operations.

If you do the best possible job of instruction scheduling, you tend to keep values
alive for a long time, making it difficult to find enough registers in which
to put those values, and potentially introducing more spill code.

If you do the best possible job of register allocation, anti- and
output- dependences (write-after-read and write-after-write) can prevent
optimal instruction scheduling, even though those dependences aren't required
by the data flow in the program; they result from the limited number of
available registers.

The optimal solution is a trade-off between the two. Some of the papers
from the IMPACT group at The University of Illinois discuss this trade-off in
more detail.

So it kind of makes sense that your compiler might do a good job of eliminating
spill code and not such a good job of instruction scheduling.

--

Michael
------------------------------------------------------------------------------
The Division Of AMD Formerly Known As NexGen michael...@amd.com
m/s 362 -- AMD -- PO Box 3453 -- Sunnyvale, CA -- 94088-3453
Phone:(408) 325-8067 FAX:(408) 435-0262

Joe Keane

unread,

Oct 13, 1996, 3:00:00 AM10/13/96

to

A good programmer knows that aliasing is a major issue for compilers,
and good code is written with that in mind.

The basic assumption is that any store can change any value, unless the
value doesn't have an address. Now people will tell you that this is
pessimistic, and there are cases where you can improve on this model.
That's true, but in general the pessimistic assumption is quite
accurate, and it's the assumption that code should be written to.

In large part the strategy consists of loading any values that will be
useful into local variables, and storing them back if they're changed.
Some typical candidates are fields of a structure and elements of an
array you're iterating through. Long and repeated chains of arrows and
brackets are the sign that this has not been done.

*This is not a transparent change.*

Pulling things into locals changes the meaning of the code and its
behavior when aliasing is present. It depends on the programmer's
knowledge to know that certain values aren't changed. Of course, in
almost all cases we know that there's not any weird aliasing going on.
But the compiler doesn't know this, and it must not assume such things
on its own, or you have a buggy compiler. But tell that to Microsoft.
Indeed, buggy compilers feature prominently in programmer hell.

One result is that you'll have lots of local variables. In large part
this is good since it gives the optimizer room to work. I assume that
my compiler does a competent job of liveness analysis and a decent job
of register allocation. Even so, with lots of locals many will end up
not in registers, so you get extra code to copy the values from their
original place to the stack and back. It's still worth it though, to
make it clear to the compiler, and the human reader, what's going on.

*Don't take the address of any variable you care about.*

When a variable is passed into or returned from a function by reference,
it's now corroded since you've taken its address. This is very bad if
the variable is used a lot later, for example it may be an important
structure pointer that you assume will be in a register. Again we know
what the intention is, but once the address is taken the compiler must
consider all sorts of possibilities. I highly recommend to use an extra
temporary variable in this case. And again it may seem like you're just
copying things around more than necessary, but there's a large benefit
from making clear when things are changed and when they are not.

As noted, in modern C, the main purpose of the `register' keyword is to
prevent you from taking the address. Of course *you* know not to do it,
but some other idiot that changes your code may not understand that.

--
Joe Keane, amateur mathematician

Jan Vorbrueggen

unread,

Oct 14, 1996, 3:00:00 AM10/14/96

to

Joe Keane <j...@jgk.org> writes:

> A good programmer knows that aliasing is a major issue for compilers,
> and good code is written with that in mind.
>
> The basic assumption is that any store can change any value, unless the
> value doesn't have an address. Now people will tell you that this is
> pessimistic, and there are cases where you can improve on this model.
> That's true, but in general the pessimistic assumption is quite
> accurate, and it's the assumption that code should be written to.

While good advice, this only applies to C, C++ and similarly brain-dead (in
this respect) languages. Even FORTRAN did better, and other languages are
quite good at it (try occam, for instance 8-|).

Jan

Michael D. Nahas

unread,

Oct 14, 1996, 3:00:00 AM10/14/96

to

I think the thing to notice is that compilers are all making
worst case assumptions about thing.

For example: a load through a pointer could conceivable load
any location in memory, so everything must be stored before
such a load takes place.

This is made easier in strongly typed languages, where only
things with the same type as the pointer need to be saved.

In addition to strong typing, I think assertions should be
a part of the language. In C, they are a macro call. As
part of the language, the compiler can do some static analysis
to confirm they are true, and in production code use them
to determine which optimizations can be applied.

There are many techinques that can be used to make code faster,
but many cannot be applied because the compiler must take into
account the worst case. Many compilers contain an "unsafe"
optimization level - anything that can be added to the language
to make these operations safe must be considered.

Mike

--
/********************************************************\
* Michael David Nahas na...@virginia.edu *
* Graduate Student in CS University of Virginia *
\********************************************************/

Cliff Click

unread,

Oct 14, 1996, 3:00:00 AM10/14/96

to

Joe Keane <j...@jgk.org> writes:

A good programmer knows that aliasing is a major issue for compilers,
and good code is written with that in mind.

Alas. The compress benchmark from SpecInt95 is written using lots of
globals. And pointers passed in from functions defined in another
file which might alias. To get good performance out of compress you
simply must:

compile all files together,
discover there's really no aliasing going on, and
update the globals in memory outside of all major loops.

In short: compilers are scored on how well they handle bad code. :-(

Bernd Paysan

unread,

Oct 14, 1996, 3:00:00 AM10/14/96

to

Cliff Click wrote:
> In short: compilers are scored on how well they handle bad code. :-(

I think it's really time for a free SPEC compeating benchmark suite. It
should be allowed to anybody who runs the program to change it - and
give the changes out (according to GPL). This is how free software
works, and if somebody would find that gzip had a brain-damaged problem
with global variables, he would change gzip, not gcc. The result works
better on every (or almost every) other CPU. This will stabilize, when
the programs perform really good, so after a short period of boosting
values, we'll have a stable basis. Unlike SPEC, a benchmark suite should
take the "basis machine" literally, so whatever is the shortest
benchmark time measured for this machine, it sets the standard "1.0".
Finally, we would have a measurement of how fast well written programs
compiled with stable and useful compilers run on different CPUs.

The compiler tweaking can have interesting effects. E.g. Intel optimized
one SPECint92 benchmark (on the compiler side), doing something DEC did
for the Alpha compiler before. However, Intel did it wrong (resulting in
20% higher SPECint92 values for Pentium), but didn't discover it,
because the SPEC suite ran successful. This isn't the sort of compiler I
would want to buy. And the source change of this program would be quite
short, and increase speed even for those compiler builder who don't care
about weird programming style.

--
Bernd Paysan
"Late answers are wrong answers!"
http://www.informatik.tu-muenchen.de/~paysan/

Doug Siebert

unread,

Oct 14, 1996, 3:00:00 AM10/14/96

to

cli...@ami.sps.mot.com (Cliff Click) writes:

>In short: compilers are scored on how well they handle bad code. :-(

Sounds like a good real world test to me. The great majority of code is
bad code. (That has to be one of the Moore's law type things, anyone know
what its name is?)

--
Douglas Siebert Director of Computing Facilities
douglas...@uiowa.edu Division of Mathematical Sciences, U of Iowa

"It is easier to apologize than to get permission" -- Grace Hopper

Chris G. Perrott

unread,

Oct 15, 1996, 3:00:00 AM10/15/96

to

Doug Siebert wrote:
>
> cli...@ami.sps.mot.com (Cliff Click) writes:
>
> >In short: compilers are scored on how well they handle bad code. :-(
>
> Sounds like a good real world test to me. The great majority of code is
> bad code. (That has to be one of the Moore's law type things, anyone know
> what its name is?)

I think it's Sturgeon's Law: "90% of everything is crap."
--
Chris Perrott

Cliff Click

unread,

Oct 15, 1996, 3:00:00 AM10/15/96

to

Bernd Paysan <pay...@informatik.tu-muenchen.de> writes:

> Cliff Click wrote:
> > In short: compilers are scored on how well they handle bad code. :-(
>

> I think it's really time for a free SPEC compeating benchmark suite. It
> should be allowed to anybody who runs the program to change it - and give
> the changes out (according to GPL). This is how free software works, and
> if somebody would find that gzip had a brain-damaged problem with global
> variables, he would change gzip, not gcc. The result works better on
> every (or almost every) other CPU. This will stabilize, when the programs
> perform really good, so after a short period of boosting values, we'll
> have a stable basis.

This will lead to lots of competing implementations for the functionality.

A The AJAX compiler does inlining, no alias analysis and targets a machine
with a 2 clock divide. The AJAX version of the code is very careful about
avoiding address-of operations and merrily takes the difference of two
pointers-to-weird-sized-structs.

B The WHOOZIT compiler does not inline, has a 17 cycle divide and does great
interprocedural alias analysis. The WHOOZIT version is carefully
hand-inlined and does divide-by-constant with multiplication by reciprical
(using shifts & adds no doubt).

Neither version will do well on somebody else's compiler/machine combo.

> Finally, we would have a measurement of how fast well written programs
> compiled with stable and useful compilers run on different CPUs.

Yes, a nice measure. Certainly not what Spec is measuring. Spec does
measure something useful, just not "well written programs".

ijpeg actually looks pretty good; so does gcc if you assume an optimizing
compiler with a certain level of competancy. Vortex is simply awful.
li and m88ksim have a few performance-miserable places where tweaking the
source would help the compiler enourmously. I mentioned compress's endless
global variables. Go is very sensitive to data layout and size of your
datacache. Perl looks ok, but you have to optimize a somewhat peculiar code
structure (giant switch statement with the high-frequency arms having very
little code).

Bernd Paysan

unread,

Oct 15, 1996, 3:00:00 AM10/15/96

to

Cliff Click wrote:
> This will lead to lots of competing implementations for the functionality.
>
> A The AJAX compiler does inlining, no alias analysis and targets a machine
> with a 2 clock divide. The AJAX version of the code is very careful about
> avoiding address-of operations and merrily takes the difference of two
> pointers-to-weird-sized-structs.
>
> B The WHOOZIT compiler does not inline, has a 17 cycle divide and does great
> interprocedural alias analysis. The WHOOZIT version is carefully
> hand-inlined and does divide-by-constant with multiplication by reciprical
> (using shifts & adds no doubt).
>
> Neither version will do well on somebody else's compiler/machine combo.

Well, this is not the intention of a free benchmark. One important rule
of the game is that you have to use a free compiler, and there's only
one that gives good performance now: GCC. Yes, you can tweak the .md
files for your processor or write a new scheduler (like the Haifa
scheduler which looks a lot better than the current scheduler) or other
parts that helps your architecture, and - many architectures are quite
similar - others too.

> > Finally, we would have a measurement of how fast well written programs
> > compiled with stable and useful compilers run on different CPUs.
>
> Yes, a nice measure. Certainly not what Spec is measuring. Spec does
> measure something useful, just not "well written programs".

A useful performance critical program should be "well written". We all
know that this is not the fact. But there's BapCO to measure the
bloated, snail-like programs the typical PC masochist is using for every
day's work ;-).

> ijpeg actually looks pretty good; so does gcc if you assume an optimizing
> compiler with a certain level of competancy. Vortex is simply awful.
> li and m88ksim have a few performance-miserable places where tweaking the
> source would help the compiler enourmously. I mentioned compress's endless
> global variables. Go is very sensitive to data layout and size of your
> datacache. Perl looks ok, but you have to optimize a somewhat peculiar code
> structure (giant switch statement with the high-frequency arms having very
> little code).

You pointed at wounds, and I add: Who is using XLisp? I've seen better
free lisp versions, however, the only version of Lisp I "use" frequently
is emacs lisp (so I would care about performance). m88ksim certainly
isn't useful as is, it's just a class of applications that may be used.

However, the main problem of programs with "a few performance-miserable
places" is that someone will find out a compiler tweak that helps this
place. This had been done several times in SPEC history (and in the
history of other benchmarks, remember "Byte Sieve"?), leading to
unrealistic speedups. You won't expect the next version of a compiler to
compile one of your programs to something that runs an order of
magnitude faster, but exactly this happend. The improvements a compiler
tweak can do on well written programs is less, and I expect, it helps
more programs.

Another important difference is that all these tweaks are public. You
are required to release your tweaks to the source of the program, and
the source of the compiler. It might be that a tweak that gives you 20%
gives your competition 30% speed increase. Some tweaks may be absurd, so
there must be a difference between tweaks that find their way to the
"official" release of the program, and those that are buggy or hurt
performance on many machines but the one the tweak came from.

Dale Pontius

unread,

Oct 18, 1996, 3:00:00 AM10/18/96

to

In article <1996Oct8.110824.1@eisner>,
kilg...@eisner.decus.org (Larry Kilgallen) writes:

>In article <53c83o$6...@news.network.com>, amol...@blefscu.network.com (Andrew Molitor) writes:
>
>> better than gcc in these areas? It's my understanding that languages
>> like Pascal and Modula-{2|3} are at least theoretically much more
>> compiler friendly. How true is this?
>
>Strongly typed languages (and perhaps Ada is most strongly typed
>of commercial languages) give more information to the compiler
>about programmer intentions. The compiler knows full well the
>user is not going to be referring to a later array element by
>incrementing the pointer because such operations are not supported.
>

Certainly languages like Modula-{2|3} and Ada can be better at
conveying their intent to an optimizer, and they should be able
to be optimized better...

But are they? They're currently way out of the mainstream, and
is sufficient focus placed on optimization in Ada/Modula-{2|3}
compilers to realize this capability?

Dale Pontius
(NOT speaking for IBM)

Kaz Kylheku

unread,

Oct 18, 1996, 3:00:00 AM10/18/96

to

In article <y43ezi6...@mailhost.neuroinformatik.ruhr-uni-bochum.de>,

Jan Vorbrueggen <j...@mailhost.neuroinformatik.ruhr-uni-bochum.de> wrote:
>Joe Keane <j...@jgk.org> writes:
>
>> A good programmer knows that aliasing is a major issue for compilers,
>> and good code is written with that in mind.
>>

>> The basic assumption is that any store can change any value, unless the
>> value doesn't have an address. Now people will tell you that this is
>> pessimistic, and there are cases where you can improve on this model.
>> That's true, but in general the pessimistic assumption is quite
>> accurate, and it's the assumption that code should be written to.
>
>While good advice, this only applies to C, C++ and similarly brain-dead (in
>this respect) languages. Even FORTRAN did better, and other languages are
>quite good at it (try occam, for instance 8-|).

That is not all together fair. Yes, there are certain well-known aliasing
problems in C which interfere with optimization. But it's not as pessimistic as
Joe Keane describes. Thus the ``good advice'' above does certainly not apply.

Really, the only serious uncertainty comes from non-local pointers which
reference non-local objects. It's not known how such pointers came to have the
values that they have, so it's possible that they refer to the same object,
even if they are of different type.

There cannot be aliasing problems between pointers that originated outside
of a function and that function's local variables.

Then there are special rules about pointer arithmetic: if, through pointer
arithmetic, you try to make a pointer which aims outside of the object which
the original pointer references, you are invoking undefined behavior. To the
compiler developer, this means that once you assign the address of an object to
a pointer, the compiler can assume that accesses and stores through that
pointer affect only that object until the pointer is reassigned to another
object.

So you see, it's far too naive and pessimistic to assume that any store can
change any value.

Stanley Chow

unread,

Oct 18, 1996, 3:00:00 AM10/18/96

to

In article <548dra$s...@bcrkh13.bnr.ca>,

Kaz Kylheku <k...@vision.crest.nt.com> wrote:
>In article <y43ezi6...@mailhost.neuroinformatik.ruhr-uni-bochum.de>,
>Jan Vorbrueggen <j...@mailhost.neuroinformatik.ruhr-uni-bochum.de> wrote:
>>Joe Keane <j...@jgk.org> writes:
>>
>>> The basic assumption is that any store can change any value, unless the
>>> value doesn't have an address. Now people will tell you that this is

^^^^^^^^^^^^^^^^^^^^^

>>> pessimistic, and there are cases where you can improve on this model.

^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>> That's true, but in general the pessimistic assumption is quite

^^^^^^^^^^

>>> accurate, and it's the assumption that code should be written to.

>

[Kaz does the "people will tell you" bit snipped]

[Kaz does the "cases where you are improve" bit snipped]

>So you see, it's far too naive and pessimistic to assume that any store can
>change any value.

But you have not disproofed the "in general" bit.

It is fine to say "in theory, it is possible to do very good alias
analysis even for C++"; but that is only vaguely related to what
is *actually* done in production compilers.

I suspect Joe is right in his pessimistic assumption as far as the
current crop of compilers are concerned.

--
Stanley Chow; sc...@bnr.ca, stanley....@nt.com; (613) 763-2831
Bell Northern Research Ltd., PO Box 3511 Station C, Ottawa, Ontario
Me? Represent other people? Don't make them laugh so hard.

Dann Corbit

unread,

Oct 18, 1996, 3:00:00 AM10/18/96

to

Stanley Chow <sc...@bnr.ca> wrote in article <548olh$3...@bcarh8ab.bnr.ca>...
[snip]

> It is fine to say "in theory, it is possible to do very good alias
> analysis even for C++"; but that is only vaguely related to what
> is *actually* done in production compilers.
>
> I suspect Joe is right in his pessimistic assumption as far as the
> current crop of compilers are concerned.

I can tell several of my C compilers to assume that no aliasing takes
place when compiling a module.

Zalman Stern

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

Dann Corbit (a-c...@microsoft.com) wrote:
: I can tell several of my C compilers to assume that no aliasing takes

: place when compiling a module.

I think its IBM's xlc which has a command line switch to assert no
aliasing, a #pragma to assert no aliasing, and a command line switch to
disable the #pragma so you can find out why your program doesn't work when
you use the #pragma :-)

Seriously, this is a useful feature so long as you only use it on isolated
swaths of code that you intentionaly wrote in a language that is almost
like C, but very much different in an important respect...

Stanley Chow

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

In article <01bbbd81$2a243f30$2dac399d@a-cnadc5>,

Dann Corbit <a-c...@microsoft.com> wrote:
>Stanley Chow <sc...@bnr.ca> wrote in article <548olh$3...@bcarh8ab.bnr.ca>...
>[snip]
>> It is fine to say "in theory, it is possible to do very good alias
>> analysis even for C++"; but that is only vaguely related to what
>> is *actually* done in production compilers.
>>
>> I suspect Joe is right in his pessimistic assumption as far as the
>> current crop of compilers are concerned.

>I can tell several of my C compilers to assume that no aliasing takes
>place when compiling a module.

Hmm, so you think your compiler has good alias analysis as long as
it has a switch/pragma to assume no aliasing?

Interesting view point, but I must stress that I set a much higher
standard.

I know I shouldn't ask this, but I just can refuse, are you just so
used to it from work? Or do you blame the language for causing you to
have such low expectations?

Larry Kilgallen

unread,

Oct 19, 1996, 3:00:00 AM10/19/96

to

In article <547vib$g...@mdnews.btv.ibm.com>, pon...@btv.ibm.com (Dale Pontius) writes:

> Certainly languages like Modula-{2|3} and Ada can be better at
> conveying their intent to an optimizer, and they should be able
> to be optimized better...
>
> But are they? They're currently way out of the mainstream, and
> is sufficient focus placed on optimization in Ada/Modula-{2|3}
> compilers to realize this capability?

DEC's Alpha compilers use a common back end, and I as I understand it
there are inputs to that back end used exclusively by the higher level
languages because they have semantic information unavailable in C.

By higher level languages, I mean Ada, Pascal and PL/I since DEC
has not released a Modula-anything product for Alpha.

Larry Kilgallen

Dann Corbit

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

Stanley Chow <sc...@bnr.ca> wrote in article

<54b3or$i...@bcarh8ab.bnr.ca>...

> In article <01bbbd81$2a243f30$2dac399d@a-cnadc5>,
> Dann Corbit <a-c...@microsoft.com> wrote:
> >Stanley Chow <sc...@bnr.ca> wrote in article
<548olh$3...@bcarh8ab.bnr.ca>...
> >[snip]
> >> It is fine to say "in theory, it is possible to do very good alias
> >> analysis even for C++"; but that is only vaguely related to what
> >> is *actually* done in production compilers.
> >>
> >> I suspect Joe is right in his pessimistic assumption as far as the
> >> current crop of compilers are concerned.
> >I can tell several of my C compilers to assume that no aliasing takes
> >place when compiling a module.
>
> Hmm, so you think your compiler has good alias analysis as long as
> it has a switch/pragma to assume no aliasing?
>
> Interesting view point, but I must stress that I set a much higher
> standard.
>
> I know I shouldn't ask this, but I just can refuse, are you just so
> used to it from work? Or do you blame the language for causing you to
> have such low expectations?

The ONLY value is that I can examine the assembly. If I think that
the compiler has been overly pessimistic, and I know how I plan to
use the code, I can tell the compiler not to be pessimistic about
simultaneous modifications of array contents. In general, it's not
a good idea to do that. I don't believe, however, that the compiler
can ever know (for certain) that it can prefetch array values unless
I do some horrid system dependent trick like a critical section.
Not only do I think that C compilers can never be 100% safe in this
type of optimization, I belive it is a mathematical impossibility,
given the semantics of C.
--
1. I speak for myself
2. I'm just a contractor, I don't work for Microsoft!

Dann Corbit

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

> On the calling side, if the compiler can see that the parameters are
> definitely not aliased, it calls the first version. Otherwise it calls
> the second version. This is completely safe.

But the compiler can't possibly see that. For example:

char *p = /* (some calculation) */;
p[4]++;

The rules of the game are that (some calculation)
can be anything, even:
(char *)sin(log((double)rand()))*exp((double)rand())*rand();
Yes, that's an absurd example. But There's really no way
to guarantee that two different pointers don't point to the
same address. The cost of creating two functions for every
call that I make with a pointer seems a bit extreme, especially
when there is no way to guarantee that its guess will be correct.

--
1. I speak for myself
2. I'm just a contractor, I don't work for Microsoft!

D. J. Bernstein <d...@koobera.math.uic.edu> wrote in article
<1996Oct2108...@koobera.math.uic.edu>...

> Dann Corbit <a-c...@microsoft.com> wrote:
> > Not only do I think that C compilers can never be 100% safe in this
> > type of optimization, I belive it is a mathematical impossibility,
> > given the semantics of C.
>

> It's possible. In fact, it's easy.
>
> The compiler creates two versions of each routine, one for ``no
> parameters are aliased'' and one for ``parameters may be aliased.''
>
> On the calling side, if the compiler can see that the parameters are
> definitely not aliased, it calls the first version. Otherwise it calls
> the second version. This is completely safe.
>
> The upshot is that, given f2c output, this compiler will produce code
> optimized just as heavily as if it were Fortran code.
>
> ``C's extra permissiveness makes it impossible to optimize as heavily''
> is just another way to say ``we're too lazy to write the optimizer.''
>
> ---Dan
>

D. J. Bernstein

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

Craig Burley

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

That's assuming you define "optimize" as "generate as a program that can
be arbitrarily large to run on a theoretical machine with no practical limit
on program size and no penalty for larger vs. smaller programs".

Those of us who work on machines with things like instruction caches have
a definition of "optimize" different from yours. Not that yours is wrong,
just that it doesn't mean much to the rest of us.

Still, I think the specific approach you describe is quite a reasonable one
to provide in compilers that use it only when they see that a particular
routine _could_ be a lot faster in the non-aliased case (or, hey, in the
aliased case -- this can happen in real programs) and that the speed-up would
be worth it (by some metric, say, the routine is called from some of the
"critical" regions of the program). I've even done this sort of coding by hand
in a few cases, if I can stretch the definition of "this approach" a bit.

Selective use of code duplication is something I'm quite in favor of, because
it can make some things run a lot faster with, hopefully, lots less bloat than
arbitrary use of code duplication.

There are some routines that, given the number of inputs (including global areas
such as Fortran COMMON), would have a significant expense associated with
just checking whether any aliasing was going on -- even just checking the
"important" aliasing that the optimizer decided was worthwhile. So, the
design should include balancing the estimates of gains from using the assume-no-
alias model versus the costs of testing whether the assume-no-alias model is
valid for a particular invocation.

It would be a lot easier on us compiler writers if you architecture types would
just make machines that supported arbitrarily large programs with no performance
penalty. Throw in arbitrarily long instruction sequences with no penalty and
you'd really have something -- writing compilers would be so easy, I couldn't find
anyone to pay me to do it!

In summary, yes, we're too lazy to write the optimizer. But, C programmers
don't care that much about performance -- otherwise they'd be writing in
Fortran anyway.

;-)

--

"Practice random senselessness and act kind of beautiful."
James Craig Burley, Software Craftsperson bur...@gnu.ai.mit.edu

Kaz Kylheku

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

In article <1996Oct2108...@koobera.math.uic.edu>,

D. J. Bernstein <d...@koobera.math.uic.edu> wrote:

>``C's extra permissiveness makes it impossible to optimize as heavily''
>is just another way to say ``we're too lazy to write the optimizer.''

Or perhaps to even read the Holy Standard to discover the true extent of that
permisiveness.

D. J. Bernstein

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

Dann Corbit <a-c...@microsoft.com> wrote:
> But the compiler can't possibly see that.

Nonsense.

If the parameters are two different local arrays, they aren't aliased.

If the compiler is creating the non-aliased version of ``f(a,b,c) {
int d[10]; g(a,b,c,d); }'' then the parameters to g aren't aliased.

> There's really no way
> to guarantee that two different pointers don't point to the
> same address.

Nonsense. A tiny amount of local analysis suffices to prove the
non-aliasing in every possible program output by f2c.

---Dan

D. J. Bernstein

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

Craig Burley <bur...@gnu.ai.mit.edu> wrote:
> That's assuming you define "optimize" as "generate as a program that can
> be arbitrarily large to run on a theoretical machine with no practical limit
> on program size and no penalty for larger vs. smaller programs".

This is a frivolous objection. The linker can delete unused routines.

_If_ you translate a program from Fortran to C, and then run it through
this compiler and linker, you will get compiled output that is just as
small as the Fortran compiler's output.

The compiler runs more slowly, but that's the penalty you pay for having
a more permissive language. The myth that I'm addressing is that there
has to be a run-time penalty.

> would have a significant expense associated with
> just checking whether any aliasing was going on

The optimization that I'm talking about has _zero_ extra expense at run
time. In the fast version of a routine, the compiler can safely assume,
with no extra tests, that the parameters aren't aliased.

You're talking about a traditional optimization for the slow version:
check at run time whether the parameters are aliased, and call the fast
version if possible. As you note, this may or may not be worthwhile.

---Dan

Kaz Kylheku

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

In article <548olh$3...@bcarh8ab.bnr.ca>, Stanley Chow <sc...@bnr.ca> wrote:
>In article <548dra$s...@bcrkh13.bnr.ca>,
>Kaz Kylheku <k...@vision.crest.nt.com> wrote:
>>In article <y43ezi6...@mailhost.neuroinformatik.ruhr-uni-bochum.de>,
>>Jan Vorbrueggen <j...@mailhost.neuroinformatik.ruhr-uni-bochum.de> wrote:
>>>Joe Keane <j...@jgk.org> writes:
>>>
>>>> The basic assumption is that any store can change any value, unless the
>>>> value doesn't have an address. Now people will tell you that this is
> ^^^^^^^^^^^^^^^^^^^^^
>>>> pessimistic, and there are cases where you can improve on this model.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> That's true, but in general the pessimistic assumption is quite
> ^^^^^^^^^^
>>>> accurate, and it's the assumption that code should be written to.
>
>>

I see that Nortel's internal C++ newsgroup wasn't enough to contain you! :)

>[Kaz does the "people will tell you" bit snipped]

Which people?

>[Kaz does the "cases where you are improve" bit snipped]

Which cases?

>>So you see, it's far too naive and pessimistic to assume that any store can
>>change any value.
>
>But you have not disproofed the "in general" bit.

Sure I have. The pessimistic assumption is sufficient to avoid violating
the abstract semantics, but never accurate. One should rely on the precise
rules of the C language standard.

In addition to what I already mentioned, the C language standard prohibits
access to a storage object other than through an lvalue whose type is identical
to the declaration of the type of that object, or through a character lvalue.
Furthermore, you may access an object through a qualified lvalue of the same
type, or through a struct or union aggregate (i.e. I assign two struct, and in
the process modify an int member without having gone through an int lvalue).
The purpose of these rules is to specify precise circumstances under which
aliasing can and cannot occur.

What this means is that if someone access a location via, say, a pointer to
double, the compiler does not have to worry about the contents of an array of
int. Any compiler that does is slightly brain-dead (but that does not make
the C language brain-dead, does it?)

>It is fine to say "in theory, it is possible to do very good alias
>analysis even for C++"; but that is only vaguely related to what

Perhaps, but I never said that. I said that the assumption that any store
operation can modify any object is too pessimistic. You don't have to worry
about those store operations which invoke undefined behavior.

>is *actually* done in production compilers.

Who cares what is actually done? We discuss the C language in comp.lang.c
not what production compilers do with it. It's possible to make a conforming
C implementation which directly implements the abstract semantics of C directly
with no optimization whatsoever.

The rules for do make certain analyses obvious and easy. Furthermore, knowing
the rules you can write code to make the job easier for the implementation.

>I suspect Joe is right in his pessimistic assumption as far as the
>current crop of compilers are concerned.

The assumption is not about what the current crop of compilers do or don't do,
but about what you can get away with as an implementor of the C language.

As such, the assumption is needlessly pessimistic, and mirrors a common view
held by programmers who have never read the C standard: the idea that
``anything goes in C''. The fact is that the C language standard pretty much
dicates what assumptions you may make, and there is no reason to be more
conservative than that. It also prohibits many constructs which programmers
take for granted, (such as my example of using a pointer to double to index
into an array of integers).

Nobody calls C braindead and gets away with it around here!

Dann Corbit

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

D. J. Bernstein <d...@koobera.math.uic.edu> wrote in article
<1996Oct2120...@koobera.math.uic.edu>...

I'v thought about it a bit, and you may well be right about f2c output.
It's not possible in general, in C to do this. BUT, if we know:

1. The source is entirely FORTRAN 77 (No extern calls to C/C++)
2. That f2c does not perform aliasing in any internal lib routine
3. The generated C source has not been altered by hand

Then it may be correct. I'm surprised you can prove it with so little
effort, given how difficult it is to do program proving in general.

D. J. Bernstein

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

Dann Corbit <a-c...@microsoft.com> wrote:
> It's not possible in general, in C to do this.

The compiler can easily prove that certain function calls don't have
aliased arguments. Smarter compilers can prove this for more and more
functions. It doesn't take much effort to cover the vast majority of
scientific programs.

There's no single algorithm that will produce perfect code for _every_
C program. However, I'm not compiling _every_ C program.

---Dan

Kaz Kylheku

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

In article <1996Oct2120...@koobera.math.uic.edu>,

D. J. Bernstein <d...@koobera.math.uic.edu> wrote:
>Dann Corbit <a-c...@microsoft.com> wrote:
>> But the compiler can't possibly see that.
>
>Nonsense.
>
>If the parameters are two different local arrays, they aren't aliased.
>
>If the compiler is creating the non-aliased version of ``f(a,b,c) {
>int d[10]; g(a,b,c,d); }'' then the parameters to g aren't aliased.
>
>> There's really no way
>> to guarantee that two different pointers don't point to the
>> same address.
>
>Nonsense. A tiny amount of local analysis suffices to prove the
>non-aliasing in every possible program output by f2c.

Are you sure about that one? I don't doubt that there no instance of aliasing
in the translation, but what I doubht is that the C compiler can trivially
detect it just by the rules of the C standard.

I have not played with f2c much, but if the translation passes pointers to
global variables to implement call by reference, the compiler may have to work
hard.

In a function like

foo(char *a, char *b)

{

}

the compiler cannot readily assume that the objects referred a and b are not
aliased, even if this was generated from a Fortran program in which all calls
to foo are well behaved. The compiler, on the other hand, may assume the
non-aliasing of pointers to objects of different types, as long as neither of
them is a char pointer, and it is not true that one of them is a pointer to a
structure which contains (directly or recursively) an instance of the type
pointed at by the other.

I think that the C compiler would have to do a little hard global work to know
that a and b are never aliased. And what if foo is generated with external
linkage? The compiler cannot know whether a function call from another
translation unit will not call with aliased arguments.

KPT Ben

unread,

Oct 21, 1996, 3:00:00 AM10/21/96

to

k...@vision.crest.nt.com (Kaz Kylheku) wrote:

>What this means is that if someone access a location via, say, a pointer
to
>double, the compiler does not have to worry about the contents of an
array of
>int. Any compiler that does is slightly brain-dead (but that does not
make
>the C language brain-dead, does it?)

Be careful here... on the PowerPC platform among others, conversions
between integer and floating-point involve writing a double value to
memory (with a double*), then reading part of that value back into an
integer register (with an int*). Screwing up the aliasing here would be
deadly to the conversion operation. Just an example, but there are
certainly a lot of special-cases to watch out for.

My favorite example of aliasing occurs in a median filter I wrote which
makes use of several sparse histograms overlapping in memory. In the code
I end up with four index values a,b,c,d whose corresponding elements I
need to increment by 1, with the catch that if a,b,c,d are not all
distinct, any collisions should be incremented by only one, not two or
three; i.e.

av = array[a];
bv = array[b];
cv = array[c];
dv = array[d];

array[a] = av + 1;
array[b] = bv + 1;
array[c] = cv + 1;
array[d] = dv + 1;

The curious aspect of this code is that it can NOT be written

array[a]++;
array[b]++;
array[c]++;
array[d]++;

because the compiler will assume that aliasing can occur, and will not be
able to preload all four values as in the first code example. It is
precisely the optimization of assuming no aliasing (and preloading the
array elements) that causes the code to behave correctly when aliasing
does occur :)

--
Ben Weiss
Imaging Scientist
MetaTools Inc.

D. J. Bernstein

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

Kaz Kylheku <k...@vision.crest.nt.com> wrote:
> I think that the C compiler would have to do a little hard global work to know
> that a and b are never aliased.

The work is easy and entirely local. The compiler simply has to create
two versions of each routine. Go read what I wrote in the first article.

---Dan

Carlie Coats

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

In article <1996Oct2202...@koobera.math.uic.edu>,

D. J. Bernstein <d...@koobera.math.uic.edu> wrote:

...assuming that all subroutines have only two arguments. What's the
combinatorial explosion when the number of arguments increases? Do
you want to do this in general?) (Hint: the number of variant object
files for N arguments grows as 2^N - N)

3 args A, B, C. Possibilities are:
noalias
alias A,B
alias B,C
alias A,C
alias A,B,C (Note that there may be some interesting
copy-propagation optimizations here)

4 args A, B, C, D. Possibilities are:
noalias
alias A,B
alias A,C
alias A,D
alias B,C
alias B,D
alias A,B,C
alias A,B,D
alias A,C,D
alias B,C,D
alias A,B,C,D

Carlie J. Coats, Jr. co...@ncsc.org *or* x...@hpcc.epa.gov
MCNC Environmental Programs phone: (919)248-9241
North Carolina Supercomputing Center fax: (919)248-9245
3021 Cornwallis Road P. O. Box 12889
Research Triangle Park, N. C. 27709-2889 USA
"My opinions are my own, and I've got *lots* of them!"

D. J. Bernstein

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

In article <54ieb2$7...@inxs.ncren.net>, Carlie Coats <co...@mcnc.org> wrote:
> ...assuming that all subroutines have only two arguments.

No, there's no such assumption.

For a C compiler to do just as good a job as a Fortran compiler on code
that could have been written in Fortran, it needs just two versions of
each routine: one with Fortran's restrictions on aliasing, and one
allowing arbitrary aliasing.

Your ``combinatorial explosion'' is nonexistent.

---Dan

Chris Torek

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

In article <1996Oct2202...@koobera.math.uic.edu>

D. J. Bernstein <d...@koobera.math.uic.edu> writes:
>>The work is easy and entirely local. The compiler simply has to create
>>two versions of each routine. Go read what I wrote in the first article.

In article <54ieb2$7...@inxs.ncren.net>, Carlie Coats <co...@mcnc.org> wrote:
>...assuming that all subroutines have only two arguments. What's the
>combinatorial explosion when the number of arguments increases?

This is true but irrelevant -- in Dan's scheme we still generate only
two variants, one that assumes optimal (i.e., no) aliasing, and one
that assumes pessimal aliasing. If the function is called with code
that obeys Fortran-style constraints, the fast unaliased code is used,
otherwise the slow code is used, even though it is probably suboptimal
for the particular case being run.
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc
El Cerrito, CA Domain: to...@bsdi.com +1 510 234 3167

Stanley Chow

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

In article <54ge7t$r...@bcrkh13.bnr.ca>,

Kaz Kylheku <k...@vision.crest.nt.com> wrote:
>In article <548olh$3...@bcarh8ab.bnr.ca>, Stanley Chow <sc...@bnr.ca> wrote:
>
>I see that Nortel's internal C++ newsgroup wasn't enough to contain you! :)

Me strong. Firewall not strong. Me come out.

And just to be nitpickingly correct, and to deflect the accusation that
I dared blaspheme against the "Holy Standard" in the Holy Temple of
comp.lang.c, this started as a thread in comp.arch and someone crossed
over to comp.lang.c; so don't blame me for starting it. I just helped to
fan the flames :-)

>>>So you see, it's far too naive and pessimistic to assume that any store can
>>>change any value.
>>
>>But you have not disproofed the "in general" bit.
>
>Sure I have. The pessimistic assumption is sufficient to avoid violating
>the abstract semantics, but never accurate. One should rely on the precise
>rules of the C language standard.

I am not aware that the "Holy Standard" actually ran programs. How exactly
can I invoke this? Does it generate better code than gcc?

>What this means is that if someone access a location via, say, a pointer to
>double, the compiler does not have to worry about the contents of an array of
>int. Any compiler that does is slightly brain-dead (but that does not make
>the C language brain-dead, does it?)

Didn't we just recently go through a thread with extensive discussion
about why the standard is distinct from the implementations? I don't
think I want to repeat that.

>Who cares what is actually done? We discuss the C language in comp.lang.c
>not what production compilers do with it. It's possible to make a conforming
>C implementation which directly implements the abstract semantics of C directly
>with no optimization whatsoever.

I guess comp.lang.c has become much more theoretical since I stopped
reading it. It used to have all kinds of people asking all sorts of
practical questions.

>[...] The fact is that the C language standard pretty much

>dicates what assumptions you may make, and there is no reason to be more
>conservative than that. It also prohibits many constructs which programmers
>take for granted, (such as my example of using a pointer to double to index
>into an array of integers).

Hmm, unless I am have serious reading difficulties, you are disagreeing
with yourself.

>Nobody calls C braindead and gets away with it around here!

I agree, people (including me) actually get real work done in C, so it
cannot be braindead; more like severely brain-damaged :-)

Stanley Chow

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

In article <54gt07$8...@bcrkh13.bnr.ca>,
Kaz Kylheku <k...@vision.crest.nt.com> wrote:
>In article <1996Oct2120...@koobera.math.uic.edu>,

>D. J. Bernstein <d...@koobera.math.uic.edu> wrote:
>>

>>If the compiler is creating the non-aliased version of ``f(a,b,c) {

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>int d[10]; g(a,b,c,d); }'' then the parameters to g aren't aliased.
>

>Are you sure about that one? I don't doubt that there no instance of aliasing
>in the translation, but what I doubht is that the C compiler can trivially
>detect it just by the rules of the C standard.

I think you are missing the point.

D.J. is suggesting that in the *non-aliased* version, the compiler can
assume all parameters are non-aliased, which will allow it to tell if
calls inside use aliased parameters.

>In a function like
>
>foo(char *a, char *b){}
>
>the compiler cannot readily assume that the objects referred a and b are not
>aliased, even if this was generated from a Fortran program in which all calls
>to foo are well behaved.

When compiling the *non-aliased* version, the compiler *can* assume a
and b are non-aliased; that is, after all, the whole point of having
a duplicate.

What the compiler must also do, is to decide which version to call. D.J.
claims that with simple analysis, all f2c output will end up on the
non-aliased version.

Note that the compiler is still conformant to the standards, and will
even make some "normal" C programs faster.

Alicia Carla Longstreet

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

Kaz Kylheku wrote:
>
> In article <1996Oct2108...@koobera.math.uic.edu>,

> D. J. Bernstein <d...@koobera.math.uic.edu> wrote:
>

> >``C's extra permissiveness makes it impossible to optimize as heavily''
> >is just another way to say ``we're too lazy to write the optimizer.''
>
> Or perhaps to even read the Holy Standard to discover the true extent of that
> permisiveness.

One of the reasons a good assembly language programmer CAN produce
highly optimized code is its total permissiveness, if the machine can do
it, you can write a thousand assembly routines to get the job done.

'C''s permissiveness gives 'C' programmers that VERY SAME ability to
produce highly optimized, efficient code. To do this we need to know:
our architecture.
our OS.
our environment.
our compiler.
our language.

It is my scincere belief that a GOOD 'C' programmer can produce code
nerly as good as a top notch assembler programmer in 1/4th the time.
Time and time again some assembly language programmer has attempted to
refute this, no one has succeeded YET!!!.

You can produce poor code, even with assembler. Good code is still in
the hands of the programmer. Don't depend on the compiler, do it
yourself.

Alicia Carla Longstreet
A thing worth doing is worth doing well.

Tanmoy Bhattacharya

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

Tanmoy Bhattacharya <tan...@qcd.lanl.gov> writes:

>
> k...@vision.crest.nt.com (Kaz Kylheku) writes:
>
> > to foo are well behaved. The compiler, on the other hand, may assume the
> > non-aliasing of pointers to objects of different types, as long as neither of
> > them is a char pointer, and it is not true that one of them is a pointer to a

> ^^^^ character

> > structure which contains (directly or recursively) an instance of the type

> ^^^^^^^^^ or union (I think: the standard's 5 minutes slippery snowy road
> away.)

And, naturally pointer to array as well.

> > pointed at by the other.
>

> To clarify, this statement actually applies to the type of the pointer
> which is being dereferenced. Thus in `void f(void*x, int*y)' x and y
> may be aliased if the only access through x is through (T*)x, (char*)x
> or an appropriate type of struct or union and y is accesed only
> through (T*)y or vice-versa. The declared type really doesn't matter:
> except what can be derived from alignment requirements.

I checked the standard today: unions are in fact allowed, not only
structs. This puts a completely new light on the argument, and I have
also posted incorrectly on this issue in the past. I am surprised that
nobody corrected me yet!

Basically, the question is exemplified by the following code:

int x (int *y, float *z) {
*z = 0;
*y = 3;
return *z * very_complicated_function(*y);
}

can the compiler optimize it to return 0; if it knows that
every_complicated_function has no side effects? The answer, I now
believe is no.

The problem is that I can call it now as

int main(void) {
union {int y; float z;} t;
x(&t.y, &t.z);
}

Now, the *y = 3 statement stores a value in t.y. After this the effect
of accesing *z, or t.z, unfortunately, does not lead to `undefined
behaviour', but rather to `implementation defined' behaviour. Which
means, the implementation has to document what happens: and the only
sensible way, I think it can do so is to access the value in *z in the
return statement. (If the implementation defined rule was that when
t.y is stored into t.z becomes zero, we just have to change the
example function to *z = 1 and ask whether the multiplication can be
optimized away.)

In other words, any two pointer parameters to a function have to be
assumed to be aliased irrespective of their type (at least, for all
sensible readings of `implementation defined'). I am sure that barring
some alignment criteria, this is true not only of function parameters,
but a much more general occurence.

Cheers
Tanmoy
--
tan...@qcd.lanl.gov(128.165.23.46) DECNET: BETA::"tan...@lanl.gov"(1.218=1242)
Tanmoy Bhattacharya O:T-8(MS B285)LANL,NM87545 H:#9,3000,Trinity Drive,NM87544
Others see <gopher://yaleinfo.yale.edu:7700/00/Internet-People/internet-mail>,
<http://alpha.acast.nova.edu/cgi-bin/inmgq.pl>or<ftp://csd4.csd.uwm.edu/pub/
internetwork-mail-guide>. -- <http://nqcd.lanl.gov/people/tanmoy/tanmoy.html>
fax: 1 (505) 665 3003 voice: 1 (505) 665 4733 [ Home: 1 (505) 662 5596 ]

Tanmoy Bhattacharya

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

k...@vision.crest.nt.com (Kaz Kylheku) writes:

> to foo are well behaved. The compiler, on the other hand, may assume the
> non-aliasing of pointers to objects of different types, as long as neither of
> them is a char pointer, and it is not true that one of them is a pointer to a
^^^^ character
> structure which contains (directly or recursively) an instance of the type
^^^^^^^^^ or union (I think: the standard's 5 minutes slippery snowy road
away.)

> pointed at by the other.

To clarify, this statement actually applies to the type of the pointer
which is being dereferenced. Thus in `void f(void*x, int*y)' x and y
may be aliased if the only access through x is through (T*)x, (char*)x
or an appropriate type of struct or union and y is accesed only
through (T*)y or vice-versa. The declared type really doesn't matter:
except what can be derived from alignment requirements.

Cheers

Bernd Paysan

unread,

Oct 22, 1996, 3:00:00 AM10/22/96

to

Chris Torek wrote:
> >...assuming that all subroutines have only two arguments. What's the
> >combinatorial explosion when the number of arguments increases?
>
> This is true but irrelevant -- in Dan's scheme we still generate only
> two variants, one that assumes optimal (i.e., no) aliasing, and one
> that assumes pessimal aliasing. If the function is called with code
> that obeys Fortran-style constraints, the fast unaliased code is used,
> otherwise the slow code is used, even though it is probably suboptimal
> for the particular case being run.

The pessimistic routine may do run-time checks first. E.g. if it knows
if a and b differs by at least 10, no aliasing occurs, it can jump to
the optimistic routine. And if your linker allows to link selected
functions (e.g. if your compiler generates .a files instead of .o
files), the double coding doesn't produce longer executables, if all
callers behave well.

--
Bernd Paysan
"Late answers are wrong answers!"
http://www.informatik.tu-muenchen.de/~paysan/

Craig Burley

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

In article <1996Oct2121...@koobera.math.uic.edu> d...@koobera.math.uic.edu (D. J. Bernstein) writes:

Craig Burley <bur...@gnu.ai.mit.edu> wrote:
> That's assuming you define "optimize" as "generate as a program that can
> be arbitrarily large to run on a theoretical machine with no practical limit
> on program size and no penalty for larger vs. smaller programs".

This is a frivolous objection. The linker can delete unused routines.

_If_ you translate a program from Fortran to C, and then run it through
this compiler and linker, you will get compiled output that is just as
small as the Fortran compiler's output.

Okay, I didn't pick up on your original post as _both_ insisting all
this useful analysis could be done statically _and_ that it would
cover all the cases. Sorry. I focused on the implication (which you
clearly stated in other posts I've since read) that no case would be
left unhandled by this technique and assumed you realized that your
link-time solution was only a (major) part of the technique, with
run-time switching left for the remaining cases. Now I realize you
don't fully appreciate the problem.

The kind of analysis you're talking about _would_ be highly useful, IMO,
but would definitely not cover all the cases -- perhaps "enough" for
almost everyone, though.

The simple fact is that once you lower Fortran to C code, you've lost
some important _semantic_ information. Fortran allows some things to
be _expressed_ that cannot be expressed in C. I'd said this many times
before on comp.lang.fortran, comp.lang.c, even gnu.misc.discuss, and
there are always objections -- but none that stand up to the light of day,
usually because they're made by people who don't know the Fortran standard
as well as they must before posting such objections, and in some cases,
don't even know the C standard all that well, though ultimately I think
it's because they have learned to focus too much on "can I make this work"
rather than "what does this _mean_", just as "men are from mars" and
thus tend to think "I'll turn the lights on" is a full and complete
response to "I'm afraid of the dark". ;-)

Before I start yet another approach to explaining why you cannot express
everything in C that you can in FORTRAN 77, I'll ask this question --
meanwhile, review old archives of comp.lang.fortran and gnu.misc.discuss
to see my earlier attempts at explaining this, if you dare ;-):

Is a CD of a recording of Beethoven's 9th Symphony capable of
_completely_ expressing that composition? That is, is it an
equivalent expression -- does it contain the full semantic content
of the score -- or is it merely a particular _implementation_?

The answer is that such a CD is _an_ _implementation_. Using the most
advanced reverse-engineering systems imaginable to extract higher-level
semantics out of lower-level encodings (such as is done by optical-
character-recognition programs), you can reconstruct much, maybe most,
maybe all _you care about_ of the score. But, you cannot reconstruct
the entire score -- certainly not the myriads of notes taken by the
director and all the musicians on their respective copies, all of which
might have some useful information, and all of which could have
contributed to that particular implementation, plus an arbitrary number
of alternate implementations _that cannot be determined as legitimate
alternate implementations simply by examining that one CD, no matter how
much reverse-engineering is done on it_.

Similarly, Fortran has a few higher-level semantic facilities that C
simply lacks, though it can provide adequate (sometimes excellent)
_implementations_ of them, including:

.AND. and .OR.
Functions
Complex arithmetic (though I'm not absolutely sure about this one)

So, using f2c to convert Fortran to C code produces an _implementation_
of that code, one that, in turn, can be implemented a myriad of ways
(just as that CD can be played on an arbitrary number of possible
stereo systems), but that _cannot_ be used to _fully_ recover the
semantic content of the original (and, no, I'm not talking about
comments!).

As a modest example, no C (or, I believe, C++) _translation_ of the
following Fortran program is possible, though of course C versions
of various implementations of it are easily written:

LOGICAL L1, L2, L3, L4, L5
READ *, L1, L2
L5 = (L1.AND.L3()) .OR. (L2.AND.L4())
PRINT *, L5
END

LOGICAL FUNCTION L3()
PRINT *, 'In L3'
L3 = .TRUE.
END

LOGICAL FUNCTION L4()
PRINT *, 'In L4'
L4 = .FALSE.
END

The reason is that there exists at least one valid implementation
of the above program that cannot be discovered from a C translation of
the above as done by a program like f2c:

LOGICAL L1, L2
REAL *, L1, L2
PRINT *, L1
END

Another valid implementation that cannot be discovered (and which will
get less argument from the Fortran crowd):

LOGICAL L1, L2, L3, L4
READ *, L1, L2
IF (L1 .OR. L2) L2 = L4()
IF (L1 .OR. L2) L1 = L2 .OR. L3()
PRINT *, L1
END

(same L3() and L4() functions as in the original)

If you don't believe the original Fortran code above can be implemented
by the third (most recent) example, you had better carefully read the
Fortran standards (77 or 90 -- I think 77 is on-line somewhere).

There are a few pieces of semantic information C conveys that cannot be
expressed in Fortran, even where Fortran (77) has the "same" operators.
For example:

% and / (at least, I'm sure about %)

That is, there's simply no way to _fully_ express a C routine like this
in Fortran:

int divmod (int a, int b, int *q, int *r)
{
*q = a / b;
*r = a % b;
}

In this case, math types will certainly argue that Fortran's way is "better",
but the point remains that it is "lower" in the sense that it is further
down the tree of successive refinement. C leaves undefined what happens for
a larger space of the operands to % (and /), while Fortran doesn't offer
precisely equivalent operator(s), so while MOD() and / can be used, they
say _more_ about the potential range of operands, and hence further limit
the possible implementations -- just as C's "equivalents" to the list of
Fortran features above say _more_ about what's going on than Fortran, also
limiting the possible implementations.

(C's catering to machine realities, as in the cases of % and /, now
seems widely seen as a short-sighted decision, or at least no longer
appropriate for modern programming. Fortran's version of this kind of
thing seems to have been more carefully thought out, and longer-lived,
despite being done by committee. I think a reason for this is the
earlier standardization effort, which did "fix" these things, in FORTRAN
77, when supercomputers and stuff were pretty well understood, while C
grew as an ad-hoc language for much longer, allowing things to be
ensconsed before a public forum was made available to decide what things
really should be standardized. Though, I believe ANSI C did clarify some
things as "unrefined" that previously were "assumed" to work a certain
way, allowing more implementation choices -- despite what lots of programmers
think, "x ^= y ^= x ^= y;" is not a standard-approved way to swap the values
in x and y, for example, though in most or all implementations of C, it
happens to work. Fortran also had the advantage of being designated
as "the language of choice for high-performance computing" a _long_ time
before C was even seriously considered for that role -- before moving
characters around quickly was considered that important. ;-)

The above examples are easier to talk about than array aliasing, which is
really an issue of what a programmer does or doesn't mean when writing
a procedure and/or a call to one, but the same kinds of examples can be
constructed (and do happen, in practice, in production code) that make
link-time array-aliasing detection by a C compilation system _fail_ to
use the non-aliased version of a procedure where the Fortran compilation
system would _succeed_. (For that matter, even a run-time system, where
the decision was made on entry to the procedure, could, I believe, be
convinced to choose the aliased version of a procedure while running an
f2c translation of a valid Fortran program...but I don't want to get
into proving that at this point.)

In summary: Fortran programmers _mean_ something different than C programmers
do when they write programs and use constructs that, on the surface, seem
quite similar (especially when they are given similar names). Since neither
language is a subset of the other languages' full range of meaning, programs
written in them cannot be fully translated to the other language without
some loss of information. The space of potential lost information can
be _partially_ recovered using techniques such as what djb suggestions (which
I still think is a great idea, for anyone with a C development system that can
handle it more easily than simply learning how to also read Fortran), but never
entirely recovered.

Burkhard Neidecker-Lutz

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

Digitals GEM C compiler does this. It's called "dynamic disambiguation".
Instead of making the choice at compile time, a simple aliasing test is
inserted that makes the choice at runtime. How to decide whether to
use this ? Some heuristics and feedback profile information is good
enough.

Burkhard Neidecker-Lutz

EUROMEDIA - Distributed Multimedia Archives for Cooperative TV Production
CEC Karlsruhe , European Applied Research Center, Digital Equip. Corp.
email: nei...@kar.dec.com

Ralph Silverman

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

Alicia Carla Longstreet (ca...@ici.net) wrote:

--
***************begin r.s. response*******************

those wishing to associate
assembler and 'c' language
forms, effectively,
may find useful
free available
products supportive of
such...
(these for (ms pc dr)dos,
(286 good)),
sphinx c--
(widely available, freeware)
fosters a freeform mixing of
such...

desmet pcc (v1.2b, v1.2c shareware)
each support
the linkage of assembler routines
as well as in-line assembler...

evidently, (internet postings),
hi-tech ppd (v7.50 shareware),
'c' language development environment,
(available as pacific.exe,
a self-extracting compressed
file),
supports both in-line assembler
and independent assembly...

***************end r.s. response*********************
Ralph Silverman
z007...@bcfreenet.seflin.lib.fl.us

Steven Correll

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

In article <1996Oct2121...@koobera.math.uic.edu>,

D. J. Bernstein <d...@koobera.math.uic.edu> wrote:

>Craig Burley <bur...@gnu.ai.mit.edu> wrote:
>> That's assuming you define "optimize" as "generate as a program that can
>> be arbitrarily large to run on a theoretical machine with no practical limit
>> on program size and no penalty for larger vs. smaller programs".
>
>This is a frivolous objection. The linker can delete unused routines.

How does the C compiler know that it's safe to call the fast version of
a routine? Do you:

--Put your entire program in a single source file so the optimizer can
trace the actual arguments back through arbitrarily many nested
function invocations to determine whether they are disjoint?
--Optimize after linking the separate compilations, so the optimizer
can see the entire program?
--Use the fast version only when the declarations of the actual arguments
appear in the same compilation as the invocation?
--Teach the C compiler to recognize #pragma "generated_by_f2c"?

Optimize-after-linking has been commercially available since the late
80s at least, but hasn't become common. A number of commercial linkers
do delete unused functions, and aspects of C++ (the one-definition rule
and templates) may make it increasingly attractive to build this into
the linker. But given how complicated modern linkers already are, vendors
aren't likely to provide this feature just to facilitate use of f2c.

When you program in Fortran, there's still something to be said for using
a Fortran compiler.
--
Steven Correll == PO Box 66625, Scotts Valley, CA 95067 == s...@netcom.com

D. J. Bernstein

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

Craig Burley <bur...@gnu.ai.mit.edu> wrote:
> Now I realize you don't fully appreciate the problem.

If you go back to my old comp.lang.misc postings you'll see that I'm
very much in favor of adding semantics to languages---for example, I've
said many times that

``divide x by y, rounding down in all cases''
``divide x by y, rounding any way you want if y is negative''
``divide x by y---oh, by the way, y is positive''

are different operations that should have different names.

However, the C language is already expressive enough to let the compiler
recognize Fortran-style array manipulations.

---Dan

D. J. Bernstein

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

In article <sjcDzq...@netcom.com>, Steven Correll <s...@netcom.com> wrote:
> How does the C compiler know that it's safe to call the fast version of
> a routine?

It does some simple local analysis, as I said.

> --Put your entire program in a single source file

No need.

> --Optimize after linking the separate compilations, so the optimizer
> can see the entire program?

No need.

> --Use the fast version only when the declarations of the actual arguments
> appear in the same compilation as the invocation?

No need.

> --Teach the C compiler to recognize #pragma "generated_by_f2c"?

No need.

Perhaps you should go back and read what I wrote.

---Dan

Preston Briggs

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

D. J. Bernstein <d...@koobera.math.uic.edu> wrote:

>Yup. f2c can insert
>
> assert((1 != 1) || (i != j));

This would be easier if you didn't trim so aggressively.
On the other hand, most people don't trim at all...

Anyway, I have 2 complaints. 1st, you're treating "assert" like an
intrinsic part of the C language. It isn't. Could mean all sorts of
things.

2nd, and more to the point, Fortran 77 allows me to write

call zap(x, x, y, z, result)

if I like. Where zap might be defined

subroutine zap(a, b, c, d, e)
real a, b, c(100), d(100), e
e = 0.0
do i = 1, 100
e = e + a*d(i) + b*c(i)
enddo
end

At the call, it looks like I've introduced an alias, but it's benign
and legal. The Fortran compiler will accumulate hold a, b, and e in
registers, giving fine, tight code. Even parallelizable. You'll look
at the call, panic, and call the slow routine which will have to store
e and reload a and b on every iteration. Certainly won't
parallelize.

Preston Briggs
pre...@tera.com

Preston Briggs

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

>2nd, and more to the point, Fortran 77 allows me to write
>
> call zap(x, x, y, z, result)
>
>if I like.

BTW, here's the chapter and verse.
In ANSI X3.9-1978, page 15-20, paragraph 15.9.3.6

For example, if a subroutine is headed by

SUBROUTINE XYZ (A,B)

and is referenced by

CALL XYZ (C,C)

then the dummy arguments A and B each become associated with
the same actual argument C and therefore with each other.
Neither A nor B may become defined during this execution of
subroutine XYZ or by any procedures referenced by XYZ.

Preston Briggs

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

> Nonsense. A tiny amount of local analysis suffices to prove the
> non-aliasing in every possible program output by f2c.

Nope. Consider the following statement

call saxpy(A(1, i), A(1, j), mumble)

Perfectly legal Fortran. We're just passing two different columns of
A to saxpy and we're confident that i != j. But unless your C
compiler can prove that i and j differ, using "a tiny amount of local
analysis", you're going to have to play it safe and call the slow
version of saxpy.

Does this come up in practice? Sure, consider Linpack. Bummer to
have to call a slow version of saxpy on Linpack!

You see, the user is allowed to introduce all sorts of apparent
aliasing at the call statement, but only if he's certain that no real
aliasing occurs. In other words, the compiler is allowed to assume
the user knows what he is doing, regadless of the facts :-)

Preston Briggs

D. J. Bernstein

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

Preston Briggs <pre...@cs.rice.edu> wrote:
> Anyway, I have 2 complaints. 1st, you're treating "assert" like an
> intrinsic part of the C language. It isn't.

Actually, it is.

> 2nd, and more to the point, Fortran 77 allows me to write
> call zap(x, x, y, z, result)
> if I like.

Really? Is this something that changed since Fortran 4?

My comments are about a language where aliased procedure arguments are
prohibited outright. Similar comments do apply to languages where the
restrictions are more complicated---e.g., read-only arguments may be
aliased---though you need a smart linker to handle the general case.

---Dan

D. J. Bernstein

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

Preston Briggs <pre...@cs.rice.edu> wrote:
> Nope.

Yup. f2c can insert

assert((1 != 1) || (i != j));

Admittedly, the necessary assertions for Fortran 90 can't be expressed
without a ``noalias'' construct in C, but I'm not talking about Fortran
90---I'm talking about Fortran. :-)

---Dan

Preston Briggs

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

>> 1st, you're treating "assert" like an
>> intrinsic part of the C language. It isn't.
>
>Actually, it is.

No, it's part of the standard library. Which means ordinary people
can use assert() for anything they want, as long as assert.h isn't
included. Look it up! However, I'll grant that you can probably
control the f2c process tightly enough that such problems are avoided.
Of course, you may have problems with ordinary C code.

>> 2nd, and more to the point, Fortran 77 allows me to write
>> call zap(x, x, y, z, result)
>> if I like.

>Really? Is this something that changed since Fortran 4?

It's true for Fortran 77; I'm not sure about Fortran 4.
Perhaps you could look it up.

Preston Briggs

Tanmoy Bhattacharya

unread,

Oct 23, 1996, 3:00:00 AM10/23/96

to

d...@koobera.math.uic.edu (D. J. Bernstein) writes:

>
> Preston Briggs <pre...@cs.rice.edu> wrote:
> > Anyway, I have 2 complaints. 1st, you're treating "assert" like an

> > intrinsic part of the C language. It isn't.
>
> Actually, it is.

True.

>
> > 2nd, and more to the point, Fortran 77 allows me to write
> > call zap(x, x, y, z, result)
> > if I like.
>
> Really? Is this something that changed since Fortran 4?
>

No, Fortran allows you to pass the same argument as long as none of
the corresponding dummy arguments in the actual procedure called `gets
defined'. The last is a technical term which encompasses all
situations where they would get modified. In the example provided,
neither of the two formal arguments corresponding to the first two x's
(a and b) were being modified, so the fortran version is correct.

> My comments are about a language where aliased procedure arguments are
> prohibited outright. Similar comments do apply to languages where the
> restrictions are more complicated---e.g., read-only arguments may be
> aliased---though you need a smart linker to handle the general case.

This is the case in Fortran: if an argument is read-only, then it can
be aliased. However, in version of Fortran prior to Fortran 90 there
was no `read-only' specification: so, read-only only depended on
whether the parameter was actually modified or not. IF(.FALSE.)
modify(a) was also allowed, so the requirement was not syntactic.

Mayan Moudgill

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

In article <54lici$l...@listserv.rice.edu>,

There are several disambiguation/anti-aliasing tricks that can be used
inside saxpy; obvious one is loop cloning
Let the original be:
float
saxpy( float a[], float b[], int size)
{
float sum = 0.0;
for( i = 0; i < size; i++ ) {
sum = a[i]*b[i];
}
return sum;
}

Then, loop cloning gives you:
saxpy( float a[], float b[], int size)
{
if( a+size <= b || b + size <= a ) {
/*** fast, vector loop ***/
}
else {
/*** slow, sequential loop ***/
}
}

or better yet:
saxpy( float a[], float b[], int size )
{
if( a+size <= b || b + size <= a ) {
/*** fast, vector loop ***/
}
else if( a == b ) {
/** fast loop that computes Sum a[i]*a[i]
}
else {
/*** slow, sequential loop ***/
}
}

Voila! saxpy is as fast as possible, no matter what the input paramters
look like.

:)
Mayan
--
------------------------------------------------------------------------------
| Mayan Moudgill | These are _MY_ opinions. Any resemblance |
| ma...@watson.ibm.com | to IBM's opinions is purely coincidental |
------------------------------------------------------------------------------

Preston Briggs

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

>> call saxpy(A(1, i), A(1, j), mumble)
>>
>>Perfectly legal Fortran. We're just passing two different columns of
>>A to saxpy and we're confident that i != j. But unless your C
>>compiler can prove that i and j differ, using "a tiny amount of local
>>analysis", you're going to have to play it safe and call the slow
>>version of saxpy.

>There are several disambiguation/anti-aliasing tricks that can be used

>inside saxpy; obvious one is loop cloning

Yes, of course. I was only responding to Bernstein's assertion that
everything could be done at compile-time, using minimal local analysis
at the call site. My example was only supposed to show that things
weren't as simple as he claimed (for Fortran 77, and whatever other
disclaimers I need to add).

For the record, I'm a big fan of loop cloning, procedure cloning,
runtime dependence analysis, etc. I encourage everyone to look for
good ideas here and publish them as soon as possible.

Preston Briggs

Richard A. O'Keefe

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

k...@vision.crest.nt.com (Kaz Kylheku) writes:
>What this means is that if someone access a location via, say, a pointer to
>double, the compiler does not have to worry about the contents of an array of
>int. Any compiler that does is slightly brain-dead (but that does not make
>the C language brain-dead, does it?)

This is all very well, but in the cases I care about, all of the objects
are the *same* type. For example, suppose I want to compute the
intersection of two sets:

#define N 1024

void
intersect(register unsigned *dst, register unsigned const *src)
{
int i;

for (i = 0; i < N; i++) dst[i] &= src[i];
}

This unrolls nicely to

void
intersect(register unsigned *dst, register unsigned const *src)
{
int i;

for (i = 0; i < N; i += 4) {
dst[i+0] = dst[i+0] & src[i+0];
dst[i+1] = dst[i+1] & src[i+1];
dst[i+2] = dst[i+2] & src[i+2];
dst[i+3] = dst[i+3] & src[i+3];
}
}

There is a significant performance boost on this machine, with its
16-byte cache subblocks, load buffer, and store buffer, if the loop body
is rescheduled to

load r4, dst[i+0] \ these four
load r5, dst[i+1] | become one
load r6, dst[i+2] | load buffer entry
load r7, dst[i+3] / loading one subblock
load r8, src[i+0] \ so do these,
load r9, src[i+1] | so if there is a collision in
load rA, src[i+2] | the cache, it happens ONCE,
load rB, src[i+3] / not 4 times
and r4, r8
store r4, dst[i+0]
and r5, r9
store r5, dst[i+1]
and r6, rA
store r6, dst[i+2]
and r7, rB
store r7, dst[i+3]

C's aliasing rules say the dst and src arguments *may* overlap, so
this performance-enhancing reorganisation (or ones like it, I may
have got this wrong, which is why I don't write code like this by hand)
is *forbidden*. Any optimisation that found it to be safe would have
to be link-time optimisation, which is not currently available to me.

I have the same problem with operations on arrays of floats.

There is an excellent chance that the 'restrict' keyword will be part
of C9X. There wouldn't be the support for it that there is if this
were not a real problem.

>Nobody calls C braindead and gets away with it around here!

Not brain-dead, just "challenged".

--
Mixed Member Proportional---a *great* way to vote!
Richard A. O'Keefe; http://www.cs.rmit.edu.au/%7Eok; RMIT Comp.Sci.

Reid E Tatge

unread,

Oct 24, 1996, 3:00:00 AM10/24/96

to

In article f...@qcd.lanl.gov, Tanmoy Bhattacharya <tan...@qcd.lanl.gov> () writes:
>...

>I checked the standard today: unions are in fact allowed, not only
>structs. This puts a completely new light on the argument, and I have
>also posted incorrectly on this issue in the past. I am surprised that
>nobody corrected me yet!

OK, hold on a sec...

>The problem is that I can call it now as
>
>int main(void) {
> union {int y; float z;} t;
> x(&t.y, &t.z);
>}
>

>In other words, any two pointer parameters to a function have to be
>assumed to be aliased irrespective of their type (at least, for all
>sensible readings of `implementation defined').

Ultimately, I suspect this will boil down to what one thinks
"implementation defined" means. But given that...

You should look at RFI #28, X3J11 Doc #91-009 (I'm language-lawyer-for-a-day).
It explicitly deals with the case of pointers into unions. It reads:

>Case 1: unions f(&u.i, &u.d)
>
> Ref 3.3.2.3, Page 43, Lines 5-11:
> ... if a member of a union object is accessed after a value has been stored
> in a different member of the object, the behavior is implementation-defined.
>
>Therefore, an alias is not permitted and the optimization is allowed.

(There's also other interesting cases discussed, but this one is most pertinent)

So, we can discuss the proper interpretation of "implementation defined", but the
intent of the committee on this issue is clear.

So, how about more discussion of loop/function cloning, interprocedural
dependence analysis, ...

-Reid

Steven Correll

unread,

Oct 25, 1996, 3:00:00 AM10/25/96

to

In article <1996Oct2318...@koobera.math.uic.edu>,

D. J. Bernstein <d...@koobera.math.uic.edu> wrote:

Surely I'm missing something, and since repeated reading of your initial
posting didn't reveal it, I had hoped that by asking questions I would elicit
the missing piece, rather than a sarcastic put-down. I notice you chose to
excise and ignore the following:

>--Use the fast version only when the declarations of the actual arguments
> appear in the same compilation as the invocation?

Could you explain the simple local analysis which enables a C compiler to
decide that "parent" can call the unaliased version of "child" if this is
the entire contents of a compilation?

void parent(int *a, int *b)
{
child(a, b);

D. J. Bernstein

unread,

Oct 25, 1996, 3:00:00 AM10/25/96

to

In article <sjcDzt...@netcom.com>, Steven Correll <s...@netcom.com> wrote:
> Could you explain the simple local analysis which enables a C compiler to
> decide that "parent" can call the unaliased version of "child" if this is
> the entire contents of a compilation?

[ void parent(int *a,int *b) { child(a,b); } ]

As I said, the compiler compiles two different versions of parent(). One
version allows arbitrary aliasing. The other version does not.

The compiler does the same thing for child().

Now, as I said, it is a purely local problem for the compiler to see
that, in the non-aliased version of parent(), it can call the
non-aliased version of child().

> I notice you chose to excise and ignore the following:

False. I quoted it and responded to it. Perhaps you need new glasses.

---Dan

William Clodius

unread,

Oct 25, 1996, 3:00:00 AM10/25/96

to

D. J. Bernstein wrote:
> <snip>

>
> As I said, the compiler compiles two different versions of parent(). One
> version allows arbitrary aliasing. The other version does not.
>
> The compiler does the same thing for child().
>
> Now, as I said, it is a purely local problem for the compiler to see
> that, in the non-aliased version of parent(), it can call the
> non-aliased version of child().

> <snip>

While this moves the complexity to local analysis, it does not eliminate
or simplify the analysis. In the general case each compiled procedure
analyzed on the local level must retain information about which
arguments or global variables are modified or might interact with one
another, and record the information in sufficiently detailed form that
almost all cases where aliasing would not be a problem are identified
and all cases where aliasing would be a problem are identified. This
information depends on the procedures it has called, so that the
complexity of such tables would grow with the number of call levels in
an NP manner for the general case. As a result compile times, and
runtime compiler sizes, would typically be viewed as unacceptable for
programs of more than a few thousand lines.

For approaches to exploit special cases where the analysis could be
linear in code size see

http://suif.stanford.edu/~bwilson/papers/pldi95/paper.html

and other work on the SUIF compilers.

--

William B. Clodius Phone: (505)-665-9370
Los Alamos National Laboratory Email: wclo...@lanl.gov
Los Alamos, NM 87545

Bob Wilson

unread,

Oct 25, 1996, 3:00:00 AM10/25/96

to

William Clodius wrote:
> While this moves the complexity to local analysis, it does not eliminate
> or simplify the analysis. In the general case each compiled procedure
> analyzed on the local level must retain information about which
> arguments or global variables are modified or might interact with one
> another, and record the information in sufficiently detailed form that
> almost all cases where aliasing would not be a problem are identified
> and all cases where aliasing would be a problem are identified. This
> information depends on the procedures it has called, so that the
> complexity of such tables would grow with the number of call levels in
> an NP manner for the general case. As a result compile times, and
> runtime compiler sizes, would typically be viewed as unacceptable for
> programs of more than a few thousand lines.
>
> For approaches to exploit special cases where the analysis could be
> linear in code size see
>
> http://suif.stanford.edu/~bwilson/papers/pldi95/paper.html
>
> and other work on the SUIF compilers.

Thanks for the plug, William, but my work is really addressing
a more general problem. If you're only concerned about compiling
Fortran code translated by f2c, you really can get away with some
very simple local analysis to select between two versions of the
procedures. The only aliases at issue here are among the formal
parameters of each procedure (at least that's my understanding
based on very limited experience with Fortran). Fortran-77 has no
pointers, so the problem is greatly simplified.

P.S. I don't know about other analyses in the SUIF compiler, but
just to avoid any confusion, my analysis is not linear. It is quite
fast for certain classes of programs, but we have discovered other
programs for which it is prohibitively slow.

Kaz Kylheku

unread,

Oct 25, 1996, 3:00:00 AM10/25/96

to

In article <sjcDzt...@netcom.com>, Steven Correll <s...@netcom.com> wrote:

>In article <1996Oct2318...@koobera.math.uic.edu>,

>D. J. Bernstein <d...@koobera.math.uic.edu> wrote:

>Could you explain the simple local analysis which enables a C compiler to
>decide that "parent" can call the unaliased version of "child" if this is
>the entire contents of a compilation?
>

> void parent(int *a, int *b)
> {
> child(a, b);
> }

What would then happen is that the aliased version of parent would call
the aliased version of child, and the non-aliased parent would call
the non-aliased version of child. The latter would safely assume that a
and b are not aliased, and could safely call the version of child which
assumes the same thing.

In the more complex example

void parent(int *a, int *b)
{

int c, d;

child(a, b);

child(&c, &d);
}

The aliased version of parent would make its first call to child() to the
aliased version of child, but the second call would always be to the unaliased
version. There is no necessity that a function compiled for aliased
arguments should call only other functions designated for aliased arguments.

This sound like a good approach; unfortunately I don't know of any compilers
that do this---at least not across translation unit boundaries. The ones I work
with generate exactly one symbol for each external function. If this mechanism
were at work, it would leave evidence by having two symbols for each function.

It's a neat idea though and could work well in practice. The increase in
code density would probably make it not worthwhile.

I would prefer a special keyword which would designate that a function's
arguments are to be treated as non-aliased. Then, if need be, I could
explicitly write two versions of the function. The standard C library
already does this with memcpy() and memmove() which are really the same
function except that the latter assumes aliasing.

Perhaps an extension of C could use the ``volatile'' keyword to denote a
function whose arguments are to be treated as non aliased. When compiling the
caller, the implementation could emit a diagnostic if it determines that a call
is being made with potentially aliased parameters to a function qualified with
this keyword. The semantics of the keyword would be a little too overloaded
for my tastes (as is already the case with ``static'').

Or perhaps a volatile designation could be applied to the individual parameter.
An parameter would be considered non-aliased to any other parameter if it has
no volatile qualifier. A volatile qualifier on an identifier would mean that it
is aliased with all other parameters that are also likewise qualified (subject
to C's rules regarding the circumstances under which aliasing may occur).
This would break backward compatibility with the existing standard though
and some code which currently passes aliased arguments might not work correctly without the addition of volatile keywords.

William Clodius

unread,

Oct 25, 1996, 3:00:00 AM10/25/96

to

Bob Wilson wrote:
>
> <snip>

>
> Thanks for the plug, William, but my work is really addressing
> a more general problem. If you're only concerned about compiling
> Fortran code translated by f2c, you really can get away with some
> very simple local analysis to select between two versions of the
> procedures. The only aliases at issue here are among the formal
> parameters of each procedure (at least that's my understanding
> based on very limited experience with Fortran). Fortran-77 has no
> pointers, so the problem is greatly simplified.

I entered this late, with all the traffic on it I rarely visit
comp.lang.c. I thought that the discussion was about the optimization of
C code in general and not just f2c's output. Was I mistaken? Was the
discussion confined to scientific code? Such C code may be similar
enough to f2c's output in its minimal use of dangerous aliasing that the
same analysis applies.

There is more to be concerned about than the formal arguments. Things
get hairier if the code plays tricks with COMMON BLOCKS, Fortran's
equivalent of global variables, particularly if EQUIVALENCE is used.
Then aliasing among COMMON BLOCK entities and arguments would have to be
considered. I would expect that globals, however, are less commonly
used in Fortran than in C simply because they are more difficult to use
in Fortran.

There are a few other less common tricks that can cause analysis
problems in Fortran. How well does your analysis deal with accessing of
different portions of an array via different arguments in the same call?
Another potential f2c problem is the ENTRY statement, which can give
calls of different "procedures" access to the same local variables.

>
> P.S. I don't know about other analyses in the SUIF compiler, but
> just to avoid any confusion, my analysis is not linear. It is quite
> fast for certain classes of programs, but we have discovered other
> programs for which it is prohibitively slow.

I remember you mentioning gcc as a (probable?) example in
comp.compilers.

D. J. Bernstein

unread,

Oct 25, 1996, 3:00:00 AM10/25/96

to

Kaz Kylheku <k...@vision.crest.nt.com> wrote:
> I would prefer a special keyword which would designate that a function's
> arguments are to be treated as non-aliased.

In general, any semantics that the compiler works with should be
expressible in the language itself, so that all optimizations can be
performed at the source level.

Knuth suggested this twenty years ago. Unfortunately, most language
designers and compiler writers seem to think they are in competition,
rather than cooperation, with human programmers.

---Dan

Zalman Stern

unread,

Oct 26, 1996, 3:00:00 AM10/26/96

to

Steven Correll (s...@netcom.com) wrote:
[Yes, I'm nuking a bunch of text here that is probably relevant.]
: Could you explain the simple local analysis which enables a C compiler to
: decide that "parent" can call the unaliased version of "child" if this is
: the entire contents of a compilation?

: void parent(int *a, int *b)
: {
: child(a, b);

: }

The compiler generates two functions that correspond to the following:

void parent@possible_alias@(int *a, int *b) {
child@posible_alias@(a, b);
}

void parent@no_alias@(int *a, int *b) {
child@no_alias@(a, b);
}

At the higher level call, the choice of which version of parent to call
will be made and if there are no calls to the other versions of parent and
child, they will be tossed.

-Z-

D. J. Bernstein

unread,

Oct 26, 1996, 3:00:00 AM10/26/96

to

William Clodius <wclo...@lanl.gov> wrote:
> Was the discussion confined to scientific code?

Yes.

Jim Giles has been claiming for at least seven years that C compilers
can never do as well as Fortran for scientific code, since the compiler
has to allow aliasing.

His response to compile-several-alternate-versions was to complain about
the number of potential aliasing patterns. (Sound familiar?)

My response, on 11 April 1990: ``As you delight in pointing out, most
parameters ever passed in real programs aren't aliased to each other.
Hence it's enough to compile one version where no parameters are aliased
and one version where all parameters are aliased if at all possible. (If
it's likely that some parameters may be aliased, the compiler can
introduce an intermediate level where just those parameters are aliased,
provided that the programmer or a global analysis points them out.)''

Richard Caley published the same idea independently a year later. I'm
sure it's been discovered many times. Anyway, it's nice to see the basic
concept finally working its way into some usable compilers.

---Dan

Preston Briggs

unread,

Oct 27, 1996, 2:00:00 AM10/27/96

to

>Hence it's enough to compile one version where no parameters are aliased
>and one version where all parameters are aliased if at all possible. (If
>it's likely that some parameters may be aliased, the compiler can
>introduce an intermediate level where just those parameters are aliased,
>provided that the programmer or a global analysis points them out.)''

>Richard Caley published the same idea independently a year later. I'm
>sure it's been discovered many times. Anyway, it's nice to see the basic
>concept finally working its way into some usable compilers.

Andrie Ershov suggested the same idea in 1966.
Other uses of cloning, implemented by Convex, for instance,
include improving interprocedural constant propagation.
It goes like this...

If you see several calls to a procedure, with a few of 'em passing one
costant argument, then clone a copy of the procedure especially for
them and take advantage of the constant. Same trick works with basic
blocks (if you have a block with multiple predecessors, you can clone
to improve the quality of information passed forward).

Nice to be able to determine when it's worth doing.

Preston Briggs

Craig Burley

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

In article <327143...@lanl.gov> William Clodius <wclo...@lanl.gov> writes:

There is more to be concerned about than the formal arguments. Things
get hairier if the code plays tricks with COMMON BLOCKS, Fortran's
equivalent of global variables, particularly if EQUIVALENCE is used.
Then aliasing among COMMON BLOCK entities and arguments would have to be
considered. I would expect that globals, however, are less commonly
used in Fortran than in C simply because they are more difficult to use
in Fortran.

In practical terms, no, COMMON is still crucial to optimize well -- though
in terms of lines of code, perhaps it is less "commonly" used. ;-)

There are a few other less common tricks that can cause analysis
problems in Fortran. How well does your analysis deal with accessing of
different portions of an array via different arguments in the same call?
Another potential f2c problem is the ENTRY statement, which can give
calls of different "procedures" access to the same local variables.

I think ENTRY is less of a problem. f2c (and, for the time being at least,
due to using the C-like gcc back end IR, g77) implements a procedure with
ENTRY as a single C "static" (really, private) procedure with the union of
all argument lists of its entry points plus an extra argument that specifies
which actual entry point is being called (plus, maybe, other args that are
not relevant to this). The actual externally visible entry points are
compiled as simple-minded wrappers around the private procedure, each specifying
its own identification number to indicate which actual entry point was
called.

So, not only can link-time intelligence fully handle this case, so can
the compiler -- all the code is in a single "source file" visible to
the C compiler, it can always "inline" the private procedure, making one
copy each for the public entry points, it can then delete the dead code
resulting from the constant propagation that culminates in an effective
"switch (5) { case 1: goto entry1; case 2: goto entry2; ... }", and so on.

For people who are still a bit confused about all this -- the simplest way to
state the effect of the Fortran language's restriction regarding aliasing
that I can think of is this:

Looking at the specification statements (declarations) in a single program
unit (procedure), if you cannot _prove_ that two "writable entities"
(variables and arrays) are aliases, you can _assume_ they are not, and
therefore reorder the actual writes to them as you please, as long as
internal ordering dependencies are obeyed (of course).

The linguistic implications of this are that not only can Fortran _compilers_
reorder the writes, so can Fortran _programmers_.

As a simple illustration, consider the following utility C routine:

extern int c; /* "Common" data area. */

int dosomething(int *a, int *b)
{
*b = *a + c;
c = -1;
*a = 0;

return *b;
}

Does a C programmer _know_ what the above function is supposed to do? No.
Not a _single_ C programmer in the audience can describe what the above
function does in general terms, because he cannot know how it is called
in every instance. Instead, he must resort to using strict, childlike
imperative language that merely mimicks the above code (almost like the
classic worthless commentary "a = a + 1; /* Add 1 to a */" ;-), because
the ordering of operations is crucial to correct operation of the procedure.

For example, is it safe to assume one of the above function's important
duties is to return the sum of the first argument (passed by "reference")
and the global `c'? No. If the caller aliases the arguments certain
ways, the function will not do that. If you assume the function is correct
as written in the context of the containing program, then you must not
assume that returning the sum of the _inputs_ is important. In some
cases, the function might be expected to return -1 or 0 _regardless_ of
the sum of the inputs, if the second argument passed in is the same as
the first or the same as the address of `c'.

This means you cannot safely reorder the above to something that might be
faster on some machines:

extern int c;

int dosomething(int *a, int *b)
{
int tmp_a = *a;
int tmp_c = c;
int tmp_sum;

*a = 0;
tmp_sum = tmp_a + tmp_c;
c = -1;

return (*b = tmp_sum);
}

That's because the latter version of the routine does _not_ do the same
thing as the former. So, it is not a proper translation, although it
might be proper in the context of a program that never calls the function
with aliasing of its writable inputs.

Now, the Fortran "version" of the original is more restricted (and thus
_not_ a clone, or "translation", in the full sense of the word). If you
see _this_ rendition of the above in Fortran, life is quite a bit simpler:

INTEGER FUNCTION DOSOMETHING(A, B)
INTEGER A, B, C
COMMON C

B = A + C
C = -1
A = 0

DOSOMETHING = B
END

(I said "simpler", not "prettier". ;-)

Not only can the _compiler_ reorder the above, so can the programmer:

INTEGER FUNCTION DOSOMETHING(A, B)
INTEGER A, B, C, TMP_A, TMP_C, TMP_SUM
COMMON C

TMPA = A
TMPC = C

A = 0
TMPSUM = TMPA + TMPC
C = -1

B = TMPSUM
DOSOMETHING = TMPSUM
END

In both cases, the ordering of the writes to the outputs was reversed, but
only in the Fortran case did that reversal produce a procedure that, as far
as callers were concerned, _identical_ in behavior to the original.

As I said in a previous post, linguistic issues are much more than
whether a particular implementation can be reached from a particular
expression in a particular language. As such, C is a lower-level
language than Fortran in many ways. Similarly, Fortran is a lower-level
language than C in some ways (though, I think, fewer, especially fewer
ways that are important given modern computers and compilers).

IMO, C more than makes up for its problems when it comes to some kinds
of programming because it has wonderful notational convenience (e.g.
&&, ||, pointers, struct/union, and so on). And, normally, "some kinds
of programming" includes all the kinds of programming I do, which is
why I haven't had to do any "serious" programming in Fortran since, oh,
around 1981 or so (1979 or so if you discount maintenance of legacy
code -- after that, the legacy code I maintained in Fortran was usually
my own, which wasn't too bad, since I'd written it at a level of something
akin to PL/1 ;-).
--

"Practice random senselessness and act kind of beautiful."
James Craig Burley, Software Craftsperson bur...@gnu.ai.mit.edu

Craig Burley

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

In article <54r9rn$f...@bcrkh13.bnr.ca> k...@vision.crest.nt.com (Kaz Kylheku) writes:

It's a neat idea though and could work well in practice. The increase in
code density would probably make it not worthwhile.

I _really_ doubt this -- at least, from what little I've seen, Fortran
programs (that are in the set of interesting ones for this discussion,
meaning, programs needing high performance) are normally _not_ deep/bushy
in terms of call stacks, library interactions, and so on.

One interesting data point for this impression of mine is that my former
permanent (yeah, right ;-) employer, Numerix, built a machine that was
"killer" for its price/point at the time for Fortran applications.

And, no machine Numerix built could cope with a call stack of more than 15
levels!!

If I ever get around to building my envisioned code-generation system, it
will certainly include the kind of optimization described in a general
sense (it'd probably have a Fortran front end anyway, so it wouldn't be
needed for anti-aliasing there per se, but if it had a C front end, it
would have this optimization).

That's why the optimization described has been something I've wanted to see
at least while I was working at Numerix way back when, perhaps before,
though I can't really remember thinking much about it before then. (I've
wanted link-time optimization for much longer than that, but didn't care
or know about, or at least understand the performance implications of, the
aliasing problem until I got there.) We kept dreaming about doing a C
compiler for the machine someday, so I often considered what "special" stuff
would make typical C code go as fast as possible on Numerix' VLIW machines --
link-time optimization would be a crucial component.

Oh, another data point from my experience at Numerix: Fortran's anti-aliasing
provision is so crucial that _both_ their compilers (one developed outside,
another newer one inside) assumed no aliasing, _despite_ the fact that everyone
thought such an assumption was non-standard! That is, they _documented_ their
compiler as "Fortran, except...you can't alias arguments/COMMON that you
write...". I was the first one to notice that this restriction _was_ blessed
by the standard, which made a bunch of people (including the manager of the
compiler development group) _very_ happy, since they could take this item
off one list ("unsupported Fortran features") and put it on another ("things
you might be surprised by", or something like that ;-).

In other words, Numerix "re-invented the wheel" of anti-aliasing assumptions
simply because doing so was such a huge win for high-performance code on
their machines, not realizing that ANSI FORTRAN 77 had already "invented"
this wheel for them.

(Did some important code violate the restriction anyway? Yup. That's life.)

Craig Burley

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

In article <1996Oct2320...@koobera.math.uic.edu> d...@koobera.math.uic.edu (D. J. Bernstein) writes:

Preston Briggs <pre...@cs.rice.edu> wrote:
> 2nd, and more to the point, Fortran 77 allows me to write
> call zap(x, x, y, z, result)
> if I like.

Really? Is this something that changed since Fortran 4?

My comments are about a language where aliased procedure arguments are

prohibited outright. Similar comments do apply to languages where the
restrictions are more complicated---e.g., read-only arguments may be
aliased---though you need a smart linker to handle the general case.

Ah, now I see where you've misunderstood the problem. Sorry I didn't
see this earlier.

Fortran _does_ allow some argument aliasing -- just not aliasing
of arguments/dummies that are _defined_ in a particular invocation
of a procedure.

I don't recall Fortran IV disallowing read-only-argument aliasing either,
but I certainly didn't get into language-lawyering until well after
F77 - probably not until around 1986 or so -- and I've paid little
attention to going back and "really" learning the older Fortrans I was
so "familiar" with as a youth.

My assumption was that F66 didn't disallow aliasing at all, but that
could well be mistaken.

Craig Burley

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

In article <1996Oct2318...@koobera.math.uic.edu> d...@koobera.math.uic.edu (D. J. Bernstein) writes:

Craig Burley <bur...@gnu.ai.mit.edu> wrote:
> Now I realize you don't fully appreciate the problem.

If you go back to my old comp.lang.misc postings you'll see that I'm
very much in favor of adding semantics to languages---for example, I've
said many times that

``divide x by y, rounding down in all cases''
``divide x by y, rounding any way you want if y is negative''
``divide x by y---oh, by the way, y is positive''

are different operations that should have different names.

We're in agreement on that sort of thing. That wasn't what I was
talking about, though.

However, the C language is already expressive enough to let the compiler
recognize Fortran-style array manipulations.

Hmm, I didn't realize that's what we were talking about. I thought
you were saying that link-time optimization would allow every possible
valid implementation (generation of machine code) of a Fortran program
by a Fortran compilation system to be output by a C compilation system
given the same Fortran program translated by an f2c-like utility.

Actually, this might be true. Very very hairy, but true.

For example, how does your f2c-like utility translate this?

SUBROUTINE X(A)
COMMON B
CALL FOO(A)
END

How does the C translation of the above assert that A and B cannot
be aliased _if and only if_ that _particular_ invocation of X (and
therefore the subsequent invocation of FOO) does not modify either
A or B -- and communicate this to FOO, which of course is in a
separate source file, even when FOO looks like this?

SUBROUTINE FOO(R)
CALL BAR(R)
END

That is, FOO doesn't even mention B?

Walter Spector

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

Craig Burley wrote:
>
> In article <1996Oct2320...@koobera.math.uic.edu> d...@koobera.math.uic.edu (D. J. Bernstein) writes:
>
> Preston Briggs <pre...@cs.rice.edu> wrote:
> > 2nd, and more to the point, Fortran 77 allows me to write
> > call zap(x, x, y, z, result)
> > if I like.
>
> Really? Is this something that changed since Fortran 4?

> I don't recall Fortran IV disallowing read-only-argument aliasing either,

As a Point of Historical Accuracy, the Fortran-66 Standard
discusses this very issue in paragraph 10.2.2 "Associations That
Effect Definition".

Basically Fortran-66 == 77 == 90 == 95 on this point. If two
variables or array elements are associated (via COMMON, EQUIVALENCE,
and/or argument substitution) then redefining one name causes the
other associated name to be considered 'undefined' accordingly.

4HWalt
2H--
4HWalt, 7HSpector
10H(wws@cray., 4Hcom)
10HMountain V, 10Hiew, Calif, 10Hornia
_._ _._ _.... _. ._.

Craig Burley

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

In article <54mem2$u...@watnews2.watson.ibm.com> ma...@cax.watson.ibm.com (Mayan Moudgill) writes:

In article <54lici$l...@listserv.rice.edu>,
Preston Briggs <pre...@cs.rice.edu> wrote:
>> Nonsense. A tiny amount of local analysis suffices to prove the
>> non-aliasing in every possible program output by f2c.
>
>Nope. Consider the following statement
>

> call saxpy(A(1, i), A(1, j), mumble)
>

[...]

There are several disambiguation/anti-aliasing tricks that can be used
inside saxpy; obvious one is loop cloning

[...]

Right, but keep in mind that we all know about these techniques. The
question is whether the statement by Dan Bernstein, which Preston
quoted in the first two lines of the quote from his post, is true.

The optimisations you talk about contribute to code bloat. When the
tests and resulting dead code can be eliminated at link time, which is
often the case, that problem goes away.

Voila! saxpy is as fast as possible, no matter what the input paramters
look like.

"As fast as possible", to Fortran programmers, means "doesn't even waste
time testing for aliasing at entry to a routine". ;-)

Hence, djb's (and mine and other peoples') advocacy of link-time
optimisation.

The question is whether "a tiny amount of local analysis suffices to
prove the non-aliasing in every possible program output by f2c". The
answer is: no. djb not only appears to have been unaware that Fortran
doesn't prohibit aliasing of read-only arguments, he apparently is
unaware (though presumably by now he knows it) that the anti-aliasing
prohibition applies on an element-by-element basis.

The fact that the prohibition does not strictly follow the "envelope" of
the signature of a procedure makes djb's assertion untrue, but that does
not change the fact that the kinds of analysis he's talking about would
certainly be highly useful, and it's the "laziness of compiler writers"
(I think that's a reasonable paraphrase of one of his statement) that
keeps them from being done. (Actually, to me it's the laziness of those
of us who design and build the entire code-development toolchain -- linkers
are the culprit here, as, too often, compiler writers know they are
limited by what linkers and other tools can do.)

(By "envelope of the signature of a procedure", I mean that Fortran's
anti-aliasing prohibition is written using terms that are valid only
when the program is actually executing. Static analysis can never
fully determine that every possible C program output by an f2c-like
conversion of a valid Fortran program has no aliasing, because static
analysis cannot possibly confirm that all possible inputs to the
program from, e.g., files will conform to the effective requirements
the program places on them to _be_ a valid Fortran program. Well, I'm
not a theoretician, so maybe the statement is false, but certainly
"a tiny amount of local analysis" is _not_ sufficient.)

Arch Robison

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

In article <1996Oct2108...@koobera.math.uic.edu> d...@koobera.math.uic.edu (D. J. Bernstein) writes:
>The upshot is that, given f2c output, this compiler will produce code
>optimized just as heavily as if it were Fortran code.
...
>``C's extra permissiveness makes it impossible to optimize as heavily''
>is just another way to say ``we're too lazy to write the optimizer.''

As a professional compiler writer, I disagree. At KAI, we've looked as hard
as anyone at optimizing C++. We sell a C++ compiler with a unique optimizer
that was built from scratch to optimize C++. For a discussion of some
of the issues, see:

http://www.kai.com/publications/comp_phys/index.html.

There are many problems with the ``two-version'' code scheme that
was suggested as an "easy" solution to C's aliasing rules.

1. It's not that easy. In real codes, the compiler cannot
easily determine when actual parameters are not aliased.
Remember, in FORTRAN, I can pass two subsections of an
array as a(i) and a(j) as two different parameters, and the callee
*knows* they are not aliased. There is no such guarantee in C.
In which case the compile-time aliasing decision must be deferred to
run-time, and the run-time check may eat up the savings.
Thus the claim about optimizing f2c output is false.

2. It will at least double the size of executables.
And worse if n-versioning is done for pairwise aliasing.

But there is a more serious impediment to optimizing C/C++, which is economics.
Most arm-chair critics of optimizers (and I suspect some academics in the field)
believe that optimizers are limited by theoretical advances or laziness.
Such critics think that somehow superduper optimization technology can make
any language as fast as FORTRAN. This might be true in theory, but that
optimizer is going to be much more complicated, run slower, and cost a lot
more than the FORTRAN optimizer. The cost is where FORTRAN wins, and no
amount of theory and hacking can change that in a free market.

So the bottom line is: C++ will never be as cheap to optimize as FORTRAN.

Having said the negative, let me say something positive. Though C++ is
far more expensive to optimize than FORTRAN, the market is willing to pay
for some optimization. And there are advanced techniques the are economical
to implement. We look at C++ codes and decide what optimizations will earn
their keep. For some kinds of codes, we can get FORTRAN-like speeds.
Furthermore, if C++ ever adopts the restrict keyword (proposed for C9x),
C++ optimizers for FORTRAN-like codes could be as cheap as FORTRAN optimizers.

Finally, if you really want better C++ optimization, don't just kibitz.
Buy our compiler and tell us that you bought it for it's optimizer.
There's nothing like sales of mousetraps to motivate better ones.
And if optimization is not an important enough issue to spend $$,
wonder no more about why optimizers do not improve.

[Does this win me the monthly ``brazen self-promotion'' award?]

Arch D. Robison Kuck & Associates Inc.
rob...@kai.com 1906 Fox Drive
217-356-2288 Champaign IL 61820
Lead Developer for KAI C++ http://www.kai.com/C_plus_plus/index.html

P.S. My chair at work really has no arms.

Walter Spector

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

I wrote:
> As a Point of Historical Accuracy, the Fortran-66 Standard
> discusses this very issue in paragraph 10.2.2 "Associations That
> Effect Definition".
>
> Basically Fortran-66 == 77 == 90 == 95 on this point. If two
> variables or array elements are associated (via COMMON, EQUIVALENCE,
> and/or argument substitution) then redefining one name causes the
> other associated name to be considered 'undefined' accordingly.

Before I get flamed, the above rule applies to COMMON and
EQUIVALENCE when types are different between the names
With argument substitution it is always in effect - regardless
of type of the two arguments.

Walt
--
Walt Spector
(w...@cray.com)
Mountain View, California

Mark Stoodley

unread,

Oct 28, 1996, 3:00:00 AM10/28/96

to

In article <326D96...@ici.net>,
Alicia Carla Longstreet <ca...@ici.net> wrote:
>Kaz Kylheku wrote:
>>
>> In article <1996Oct2108...@koobera.math.uic.edu>,

>> D. J. Bernstein <d...@koobera.math.uic.edu> wrote:
>>

>> >``C's extra permissiveness makes it impossible to optimize as heavily''
>> >is just another way to say ``we're too lazy to write the optimizer.''
>>

>> Or perhaps to even read the Holy Standard to discover the true extent of that
>> permisiveness.
>
>One of the reasons a good assembly language programmer CAN produce
>highly optimized code is its total permissiveness, if the machine can do
>it, you can write a thousand assembly routines to get the job done.
>
>'C''s permissiveness gives 'C' programmers that VERY SAME ability to
>produce highly optimized, efficient code. To do this we need to know:
>our architecture.
>our OS.
>our environment.
>our compiler.
>our language.
>
>It is my scincere belief that a GOOD 'C' programmer can produce code
>nerly as good as a top notch assembler programmer in 1/4th the time.
>Time and time again some assembly language programmer has attempted to
>refute this, no one has succeeded YET!!!.
>
>You can produce poor code, even with assembler. Good code is still in
>the hands of the programmer. Don't depend on the compiler, do it
>yourself.
>
>Alicia Carla Longstreet
>A thing worth doing is worth doing well.

Ok, this is slightly :-) off-topic for this thread, but I felt I had to
ponder/rant a little... It's also a little long, so you may want to skim
it first. This is cross-posted to comp.lang.c because this whole thread
seems to be, but I admit I don't read that group...please address specific
follow-ups to this article to comp.arch or email to me directly (otherwise
they'll be unintentially ignored :-).

I wonder how many "hack" compiler optimizations are included in compilers
because programmers take advantage of C's "permissiveness". For example,
many programmers use pointers in situations where array notation could be
used because they believe that compilers cannot efficiently compile the
array notation. This simply makes it more difficult for the compiler to
perform other optimizations, resulting in poor output code and confirming
programmers' beliefs that compilers don't do their jobs (chicken and egg
problem here). This is not to say that pointers are not useful in many
situations (they most definitely are), but I think that going behind the
compiler's back is not a long-term solution. I'm sure a commercial compiler
writer could come up with an impressively long list of hacks that programmers
use that foil a present-day compiler's optimizer.

Compiler technology has gotten a lot better in the decades since we were
first visited by C (although I won't claim that all commercial compilers
have taken full advantage of this). Unfortunately, because of the way
programmers write C programs the compiler's job is actually more difficult.
It is now necessary to reverse engineer the programmer's intent to be able
to take advantage of the advances in compiler technology.

Almost every time a programmer applies an optimization by hand to source
code, information is lost that a future (or even current) compiler may have
been able to use to generate even better code (I won't even start talking
about the maintainability/testability of such hand optimizations). For
example, scheduling is something that is realistically best left to the
compiler (given today's processors). Source code optimizations more often
than not have adverse effects upon the schedules generated by compilers
because lost information can result in fake dependencies. To get rid of
these fake dependencies, additional analyses must be performed to effectively
"undo" the optimizations that the programmer applied by hand. This is
wasted compile time for programmers, wasted development time for compiler
writers, and wasted time for researchers who have to figure out how to
"undo" before they can demonstrate the benefits of newer and better
optimizations. Although some researchers benefit because the "undo"ing is
sometimes worthy of research effort all by itself :-)

My point? I just think programmers should write code as naturally
as possible without taking advantage of C's permissiveness unless it's
absolutely necessary (i.e. for critical performance reasons). You don't
have to "depend on the compiler" but you should at least give it a fair
chance. "Do it yourself" only if it is absolutely necessary and contain
what you do as much as possible: it's in yours (and your successors in
maintenance) as well as in compiler writers' best interests. A better
design may be more appropriate than source code optimizations.

Architectures change, operating systems change policies, programs are
ported to different environments and languages, and compilers get better
(hopefully :-). You can't perfectly plan for the future, but planning
for the present is almost a guarantee of future troubles.

Mark
PhD Candidate, University of Toronto
sto...@eecg.toronto.edu

D. J. Bernstein

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

In article <553fd7$o...@kai.com>, Arch Robison <rob...@kai.com> wrote:
> 1. It's not that easy.

Sure it is. Almost all of the non-aliasing in scientific C code can be
proven this way.

> 2. It will at least double the size of executables.

It will have no effect on the size of executables.

> But there is a more serious impediment to optimizing C/C++, which is
> economics.

Right. I'm not saying that this particular optimization is a good use of
a compiler writer's time. I'm simply pointing out the error in slipping
from ``an optimizer can't _always_ do this right'' to ``an optimizer can
_never_ do this right.''

---Dan

Terje Mathisen

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to D. J. Bernstein

(mailed & posted)

D. J. Bernstein wrote:

>
> Mark Stoodley <sto...@eecg.toronto.edu> wrote:
> > Almost every time a programmer applies an optimization by hand to source
> > code, information is lost that a future (or even current) compiler may have
> > been able to use to generate even better code
>

> Use a language that lets you say ``pick any of the following chunks of
> code.'' End of problem.

>
> > For example, scheduling is something that is realistically best left to the
> > compiler (given today's processors).
>

> Sure, if you don't care about speed.
>
> Here's a problem for anyone who thinks that today's compilers understand
> the Pentium. Do a distribution count on each byte position in an array
> of 1000000 4-byte words: i.e.,
>
> for (i = 0;i < 1000000;++i) {
> ai = a[i];
> ++c0[ai & 255];
> ++c1[(ai >> 8) & 255];
> ++c2[(ai >> 16) & 255];
> ++c3[(ai >> 24) & 255];
> }
>
> Use your favorite language and your favorite compiler. Some times,
> expressed in millions of cycles:
>
> 29.4 gcc 1.42, which doesn't know anything about the Pentium, -O2
> 32.7 gcc 2.6.3, which apparently knows even less, -O3
> 10.7 me, in asm

That has to be _very_ close to optimal code:

With 10.7 cycles/dword you have maximum 21 instruction slots to work
with, assuming nearly zero loop overhead/cache misses:

Unless you changed the semantics of the code quite a bit, you must have
loaded the 4-byte words into a register and then split it into separate
parts, something that takes at least 6 instructions/pipeline slots.

The four array updates, which will use half the L1 cache (4K), uses 3
instructions each (load/increment/store).

Since I suspect that read cache misses on the 4M array will cost you
quite a bit (in my experience, something like 10 cycles to load the
first byte in a cache line, assuming a hit in L2), I think you must load
the byte values directly, instead of loading dwords and then split them,
i.e. something like this:

byte *ba = (byte *) &a;
for (i = 0;i < 4000000; i +=4 ) {
a0 = ba[i];
a1 = ba[i+1];
a2 = ba[i+2];
a3 = ba[i+3];
b0 = c0[a0];
b1 = c1[a1];
b2 = c2[a2];
b3 = c3[a3];
b0++;
b1++;
b2++;
b3++;
c0[a0] = b0;
c1[a1] = b1;
c2[a2] = b2;
c3[a3] = b3;
}

Unfortunately, this version requires 8 temporary registers inside the
loop, plus the loop index, so it is very hard to make it fit on a
Pentium.

It will also be impossible to load adjacent byte values in the same
cycle, since that will result in a L1 cache bank collision.

I would probably unroll this code even furter, into 4 iterations each of
which process 3 bytes, with cache bank staggering:

for (i = 0;i < 4000000-11; i +=12 ) {
a0 = ba[i]; /* First byte in first dword */
a1 = ba[i+5]; /* Second " " second " */
a2 = ba[i+10]; /* Third " " third " */
++c0[a0]; /* Assume these three lines to be split */
++c1[a1]; /* just like in the preceeding example! */
++c2[a2];

a0 = ba[i+3]; /* [0][3] */
a1 = ba[i+4]; /* [1][0] */
a2 = ba[i+9]; /* [2][1] */
++c3[a0]; ++c0[a1]; ++c1[a2];

a0 = ba[i+2]; /* [0][2] */
a1 = ba[i+7]; /* [1][3] */
a2 = ba[i+8]; /* [2][0] */
++c2[a0]; ++c3[a1]; ++c0[a2];

a0 = ba[i+1]; /* [0][1] */
a1 = ba[i+6]; /* [1][2] */
a2 = ba[i+11]; /* [2][3] */
++c1[a0]; ++c2[a1]; ++c3[a2];
}

This version is the only way I can see that can get rid of all AGI
stalls, while avoiding cache bank collisions and still stay within the 7
maximum available registers. There is probably no way for a compiler to
generate the corresponding asm code.

On a RISC cpu, or any machine with lots of registers, it would be easy
to compile the first version into nearly-optimal code.

Most of the times that I've beaten Pentium compilers by a lot, I have
done so by being able to use more registers, and (sometimes) use them in
non-obvious ways.

Unfortunately, the Pentium-optimized asm code I would write for this
function would run very poorly on a PentiumPro, due to the Partial
Register Stalls.

To fix this, I would need to select a PPro-optimized version (using
MOVZX) at runtime, after checking the cpuid.

Terje

--
- <Terje.M...@hda.hydro.com>
Using self-discipline, see http://www.eiffel.com/discipline
"almost all programming can be viewed as an exercise in caching"

Richard A. O'Keefe

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

d...@koobera.math.uic.edu (D. J. Bernstein) writes:

>My comments are about a language where aliased procedure arguments are
>prohibited outright.

That language isn't Fortran. Aliasing in which any of the parties is
_modified_ is banned, but arguments which are not modified may be
aliased to things that are not modified.

>Similar comments do apply to languages where the
>restrictions are more complicated---e.g., read-only arguments may be
>aliased

_That_ is Fortran, at least since the Fortran 77 standard, and I don't
think that's one of the things they changed. The intent has always been
that a Fortran compiler should be free to select between pass-by-reference
and copy-in-copy-out, for each argument individually.

>---though you need a smart linker to handle the general case.

Smart linkers are a proven technology. Indeed, the ETH thesis about
"Semantic Dictionary Encoding" pointed out that a smart linker could
afford to spend more time optimising if it was given a more compact
object representation because then it spent less reading.

When the 'restrict' keyword is adopted in C9X, which it probably will
be, many of the arguments about aliasing in C will go away.

However, this still won't *quite* solve the problem, though it won't
make a lot of practical difference. This is also illegal Fortran:

SUBROUTINE SMASHER
INTEGER TWEEDL
COMMON /SMASHED/ TWEEDL
TWEEDL = 0
END

SUBROUTINE VICTIM(K)
INTEGER K
CALL SMASHER
K = K+1
END

PROGRAM MAIN
INTEGER TWEEDL
COMMON /SMASHED/ TWEEDL
TWEEDL = 1
CALL VICTIM(TWEEDL)
PRINT *, TWEEDL
END

I compile this with one compiler, run it, and get the answer 1.
I compile this with another compiler, using the -arg=local flag, which
you generally want for speed, run it, and get the answer 2.
Is either compiler broken? No, both are fully supported by the standard.
It is the _program_ which violates the standard.

So it is not sufficient merely to look at the arguments of the current
procedure to determine whether impermissible aliasing is present.

Stanley Chow

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

In article <553l7v$bv0$1...@goanna.cs.rmit.edu.au>,
Richard A. O'Keefe <o...@goanna.cs.rmit.edu.au> wrote:
[Deleting illegal FORTRAN program]

>I compile this with one compiler, run it, and get the answer 1.
>I compile this with another compiler, using the -arg=local flag, which
>you generally want for speed, run it, and get the answer 2.
>Is either compiler broken? No, both are fully supported by the standard.
>It is the _program_ which violates the standard.

So, like any other compiler, f2c/c can choose to compile it anyway and
be correct.

>So it is not sufficient merely to look at the arguments of the current
>procedure to determine whether impermissible aliasing is present.

Depends on your goal. Do you want to check for "impermissible aliasing"
or do you want to simply compile correct programs?

--
Stanley Chow; sc...@bnr.ca, stanley....@nt.com; (613) 763-2831
Bell Northern Research Ltd., PO Box 3511 Station C, Ottawa, Ontario
Me? Represent other people? Don't make them laugh so hard.

Lawrence Kirby

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

In article <54m53f$4...@listserv.rice.edu>
pre...@cs.rice.edu "Preston Briggs" writes:

>>> 1st, you're treating "assert" like an
>>> intrinsic part of the C language. It isn't.
>>
>>Actually, it is.
>
>No, it's part of the standard library. Which means ordinary people
>can use assert() for anything they want, as long as assert.h isn't
>included.

You're right but for the wrong reason. The standard C library is an
intrinsic part of the C language so anything in the library is part of
the language too. Most of the library is defined in terms of functions
e.g. fopen() time() and even putchar() are defined as functions. They
are *always* reserved identifiers when defined with external linkage
i.e. it is not dependent on what header files are included. Assert is
unusual since it is defined as a function-like macro. As a macro it
is only reserved when assert.h is included.

--
-----------------------------------------
Lawrence Kirby | fr...@genesis.demon.co.uk
Wilts, England | 7073...@compuserve.com
-----------------------------------------

Ted

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

In article <846550...@genesis.demon.co.uk>, Lawrence Kirby
<fr...@genesis.demon.co.uk> writes

>In article <54m53f$4...@listserv.rice.edu>
> pre...@cs.rice.edu "Preston Briggs" writes:
>
>>>> 1st, you're treating "assert" like an
>>>> intrinsic part of the C language. It isn't.
>>>
>>>Actually, it is.
>>
>>No, it's part of the standard library. Which means ordinary people
>>can use assert() for anything they want, as long as assert.h isn't
>>included.
>
>You're right but for the wrong reason. The standard C library is an
>intrinsic part of the C language so anything in the library is part of
>the language too. Most of the library is defined in terms of functions
>e.g. fopen() time() and even putchar() are defined as functions. They
>are *always* reserved identifiers when defined with external linkage
>i.e. it is not dependent on what header files are included. Assert is
>unusual since it is defined as a function-like macro. As a macro it
>is only reserved when assert.h is included.
>

Lawerence, is this true for all function like macros? Is it possible to
unreserve it (eg by redefining it......i think?).
--
Ted

D. J. Bernstein

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

Mark Stoodley <sto...@eecg.toronto.edu> wrote:
> Almost every time a programmer applies an optimization by hand to source
> code, information is lost that a future (or even current) compiler may have
> been able to use to generate even better code

Use a language that lets you say ``pick any of the following chunks of
code.'' End of problem.

> For example, scheduling is something that is realistically best left to the

> compiler (given today's processors).

Sure, if you don't care about speed.

Here's a problem for anyone who thinks that today's compilers understand
the Pentium. Do a distribution count on each byte position in an array
of 1000000 4-byte words: i.e.,

for (i = 0;i < 1000000;++i) {
ai = a[i];
++c0[ai & 255];
++c1[(ai >> 8) & 255];
++c2[(ai >> 16) & 255];
++c3[(ai >> 24) & 255];
}

Use your favorite language and your favorite compiler. Some times,
expressed in millions of cycles:

29.4 gcc 1.42, which doesn't know anything about the Pentium, -O2
32.7 gcc 2.6.3, which apparently knows even less, -O3
10.7 me, in asm

---Dan

D. J. Bernstein

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

Terje Mathisen <Terje.M...@hda.hydro.com> wrote:

> D. J. Bernstein wrote:
> > 10.7 me, in asm
> That has to be _very_ close to optimal code:

Yes. I don't see how anyone could avoid 1 instruction to set up each
index, and 3 to increment each counter, for a total of 8 cycles.

> Unless you changed the semantics of the code quite a bit, you must have
> loaded the 4-byte words into a register and then split it into separate
> parts, something that takes at least 6 instructions/pipeline slots.

That's what I'm doing; as close as I want to 9 cycles.

Right now I've unrolled 2x. You've accounted for 18 cycles of real work;
the other 3 cycles are pushing and clearing and popping and decrementing
ecx, incrementing ebp, and jumping. I could save 3 of the overhead
instructions by using an extra register, but unrolling is easier.

My scheduling isn't perfect---there's one cycle with a load and a store.
Presumably bank conflicts account for .06 lost cycles per word, on
average; another .14 are lost to cache effects. I could reduce bank
conflicts by unrolling further, or eliminate them entirely by
interleaving the count arrays.

> I think you must load the byte values directly,

It should be possible to get down to 8 that way. The only problem is
that, as you noted, 12 loads and stores doesn't fit safely into 8
cycles.

With 6 temporary registers, here's a pattern that seems to work:

U U U U U U U U U U U U U U U U U U U U U U U U U U U U
1 [---|--|---][---|--|---][---|--|---][---|--|---]
44444444 55555555 66666666 44444444
2 [----|-|---][----|-|---][----|-|---][----|-|---]
5555555 6666666 4444444 5555555
3 [--|---|---][--|---|---][--|---|---][--|---|---]
666666666 444444444 555555555 666666666

1, 2, and 3 are byte registers; 4, 5, and 6 are live word registers;
[---|--|---] shows the timing of loading the register, loading the
count, incrementing the count, and storing the count.

It's easy enough to express this scheduling in C. Unfortunately, the
compiler will probably generate movzx, instead of clearing registers
outside the loop. I wonder if there are any compilers that will handle

register union { uint8 al; uint32 eax; } a;

a.eax = 0;
...
a.al = *ebp;
++c0[a.eax];

the way I want them to.

---Dan

Eduardo Horvath

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

Arch Robison (rob...@kai.com) wrote:

: 1. It's not that easy. In real codes, the compiler cannot

: easily determine when actual parameters are not aliased.
: Remember, in FORTRAN, I can pass two subsections of an
: array as a(i) and a(j) as two different parameters, and the callee
: *knows* they are not aliased. There is no such guarantee in C.
: In which case the compile-time aliasing decision must be deferred to
: run-time, and the run-time check may eat up the savings.
: Thus the claim about optimizing f2c output is false.

Isn't that what the `const' keyword is for? I don't have the ANSI spec, but I
thought that the `const' and `volatile' keywords were introduced to the
language to explicitly deal with aliasing and allow more aggressive
optimization, and a compiler could assume any `const' variable cannot be
updated.

The C++ ARM does not delve deeply into the issue. It states that `const'
objects cannot be updated through that reference, and that the meaning of
`volatile' is entirely implementation defined.

Is there any reason that a compiler needs to assume a `const' object can be
updated through an alias? If not, it would seem that a C (or C++) compiler
could apply the same level of optimization to a *properly* declared routine
that a FORTRAN compiler could.

Now the fact that the majority of C code does not use those keywords is a
function of improper programming, not an inherent limitation of the language.

Eduardo

#include <std_disclaimer>

Mark Stoodley

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

In article <1996Oct2903...@koobera.math.uic.edu>,

D. J. Bernstein <d...@koobera.math.uic.edu> wrote:

>Mark Stoodley <sto...@eecg.toronto.edu> wrote:
>> Almost every time a programmer applies an optimization by hand to source
>> code, information is lost that a future (or even current) compiler may have
>> been able to use to generate even better code
>

>Use a language that lets you say ``pick any of the following chunks of
>code.'' End of problem.

And then when a new compiler comes out you compile your code some number
of times to figure out which version of your source code works the best now?
And which runs best on platform X with compiler A and on platform Y with
compiler B, etc?

>> For example, scheduling is something that is realistically best left to the
>> compiler (given today's processors).
>

>Sure, if you don't care about speed.

Maybe if I care about how much time, effort and money I'm going to spend,
too. Both now and throughout the software's lifetime.

>
>Here's a problem for anyone who thinks that today's compilers understand
>the Pentium. Do a distribution count on each byte position in an array
>of 1000000 4-byte words: i.e.,
>
> for (i = 0;i < 1000000;++i) {
> ai = a[i];
> ++c0[ai & 255];
> ++c1[(ai >> 8) & 255];
> ++c2[(ai >> 16) & 255];
> ++c3[(ai >> 24) & 255];
> }

I assume you noticed that you've already applied an optimization by hand
here, right? You've unrolled the loop four times and "vectorized" the load.
The compiler no longer realizes that you're looping through an array of
bytes. Maybe that information is useful, maybe it isn't (probably not given
the quality of optimizing compilers that target X86 processors).

:-)

>
>Use your favorite language and your favorite compiler. Some times,
>expressed in millions of cycles:
>
> 29.4 gcc 1.42, which doesn't know anything about the Pentium, -O2
> 32.7 gcc 2.6.3, which apparently knows even less, -O3
> 10.7 me, in asm

How about my favourite processor? :-) (i.e. non-X86)

Part of what you're calling "scheduling" includes instruction *selection*
(and, I suspect, a number of other optimizations) which is not something that
compilers typically do well. I think a compiler would fare better for a
less nasty ISA such one based on RISC design philosophy.

This is a single example of a fairly small (and somewhat pathological) loop
and you're targetting a processor that *nobody* likes to generate code for.
How long did it take you to write your version in asm? How long did it take
to debug it? How maintainable is it *by another person*? How portable is it
to other architectures and even to different generations of the same
architecture (does it run as quickly on the Pentium Pro)? Would you want
to sit down and apply your scheduling magic to a loop with maybe fifty or
several hundred instructions?

While I'm sure there are hundreds of examples where a human did a better
job than the compiler did, I'm also sure that there is an equal number of
counter examples.

Mark

Alberto Moreira

unread,

Oct 29, 1996, 3:00:00 AM10/29/96

to

In article <3275C2...@hda.hydro.com>, Terje.M...@hda.hydro.com
says...

> D. J. Bernstein wrote:

[code omitted for brevity...]

> Unfortunately, this version requires 8 temporary registers inside the
> loop, plus the loop index, so it is very hard to make it fit on a
> Pentium.

One way to look at it is as follows. There are four threads, each
needs address generation, load, increment and store. The address
generation must be separated from the load by one cycle (on
both pipes). Call them A0, L0, I0, S0, A1, L1, I1, S1, etc.
Let "__" be the necessary cycle to avoid an AGI. If things
on a horizontal line can happen at the same time, we have:

A0
__ A1
L0 __ A2
I0 L1 __ A3
S0 I1 L2 __ ++
S1 I2 L3
I3
S3 __ JMP

Where "++" is the necessary i++, "--" is a bottom of the
loop decrement and "jmp" is the jump. Notice that to
avoid cache conflicts A0, A1, A2 and A3 must be in
separate cycles.

One way of scheduling the two pipes in 10 cycles
wraps the loop so that S3 from the previous iteration
is scheduled simultaneously with A0:

1. S3, A0
2. A1
3. L0, A2
4. I0, L1
5. S0, I1
6. L2, A3
7. S1, I2
8. S2, L3
9. I3, ++
10. --, JMP

This scheduling has the benefit of needing at most
3 address registers and 2 data registers at any
one time:

lea esi,a
xor eax,eax
xor ebx,ebx
xor ebx,ecx
mov al,[esi+3]
mov ebp,loopcount
mov edi,[eax*4].c3

top:

S3: mov [eax*4].c3,edi
A0: mov al,[esi]

A1: mov bl,[esi+1]
nop

L0: mov edx,[eax*4].c0
A2: mov cl,[esi+2]

I0: inc edx
L1: mov edi,[ebx*4].c1

S0: mov [eax*4].c0,edx
I1: inc edi

L2: mov edx,[ecx*4].c2
A3: mov al,[esi+3]

S1: mov [ebx*4].c1,edi
I2: inc edx

S2: mov [ecx*4].c2,edx
L3: mov edi,[eax*4].c3

I3: inc edi
add esi,4

dec ebp
jnz top

S3: mov [eax*4].c3,edi

In the above loop, eax, ebx and ecx are used
as indexes; edx and edi as data registers;
esi indexes the array, and ebp is the loop count.
Four threads of four slots each is sixteen slots;
plus two loop increments and one branch makes
nineteen.

Also, going backwards through the array might shave
one additional slot, bringing it down to 9 cycles per
iteration; but I can't see how to do it.

> It will also be impossible to load adjacent byte values in the same
> cycle, since that will result in a L1 cache bank collision.
> I would probably unroll this code even furter, into 4 iterations each
> of which process 3 bytes, with cache bank staggering:

[code again deleted for brevity...]

> This version is the only way I can see that can get rid of all AGI
> stalls, while avoiding cache bank collisions and still stay within the
> 7 maximum available registers. There is probably no way for a compiler
> to generate the corresponding asm code.

I don't know if a compiler could generate the code I wrote either,
but it could get pretty close if the code generator did the sort
of dag analysis I did to arrive at my code. That is, add dummy
nodes and dummy links to represent AGIs, cache conflicts, etc.

> On a RISC cpu, or any machine with lots of registers, it would be easy
> to compile the first version into nearly-optimal code.

I may be wrong, but how could additional registers help ? Of course,
if we had 8 arrays instead of four, I'd agree with you. And note how
the index scale factor helps.

> Most of the times that I've beaten Pentium compilers by a lot, I have
> done so by being able to use more registers, and (sometimes) use them
> in non-obvious ways.
>
> Unfortunately, the Pentium-optimized asm code I would write for this
> function would run very poorly on a PentiumPro, due to the Partial
> Register Stalls.

> To fix this, I would need to select a PPro-optimized version (using
> MOVZX) at runtime, after checking the cpuid.

I'm not as familiar with the P6 as I'm with the P5, so maybe there's
more that can be done to improve this code for the P6. One of them,
of course, is to replace the byte loads with MOVZX.

Alberto.

D. J. Bernstein

unread,

Oct 30, 1996, 3:00:00 AM10/30/96

to

Alberto Moreira <AMor...@nine.com> wrote:
> 1. S3, A0
> 2. A1
> 3. L0, A2
> 4. I0, L1
> 5. S0, I1
> 6. L2, A3
> 7. S1, I2
> 8. S2, L3
> 9. I3, ++

Cute. Here's how I'd express the idea:

U U U U U U U U U U U U U U U U U U U U U U U U U U U

1 [--|-|-] [--|-|-] [--|-|-] [--|-|-]
44444 44444 44444 44444
2 [---|-|-] [---|-|-] [---|-|-] [---|-|-]
55555 55555 55555 55555
3 [----|-|-] [----|-|-] [----|-|-] [----|-|-]
44444 44444 44444 44444

This shows how to get as close as you want to 8 cycles, except for cache
bank conflicts, with the same 5 temporary registers.

I wonder why we haven't heard any results from the compiler weenies. :-)

---Dan