Why C is much slower than Fortran

Vincent Ma

unread,

Apr 11, 1999, 3:00:00 AM4/11/99

to

Guys,

I been trying to program in both C and Fortran. The code contains
nothing but lots of multiply and add such as

for i = 0, imax
........

A(i,j,k) = B(i,j,k)*C(i,j,k) + .......

......

I found the code generated by C compiler runs MUCH slower than that
generated by Fortran compiler. (I used MS compilers with full
optimization on). I even looked at the assembly codes, they are quite
similar but the assembly code generated by Fortran compiler looks nicer
(and runs MUCH faster). Does anybody know why?

David Klein

unread,

Apr 11, 1999, 3:00:00 AM4/11/99

to

Vincent Ma <vin...@hotmail.com> writes:

Your data accesses are well "localized" in fortran, but not in
C. Fortran stores arrays by coloumn so successive values of I access
adjacent elements in memory. C stores arrays by row, so you are
jumping all over memory in C. All this is just a guess though, since
It makes assumptions about what is hidden in your '......' and how
your C code is written.

--
Use of tools distinguishes Man from Beast. And UNIX users from WINDOZE lusers.

Jon Bell

unread,

Apr 11, 1999, 3:00:00 AM4/11/99

to

Vincent Ma <vin...@hotmail.com> wrote:
>
>I found the code generated by C compiler runs MUCH slower than that
>generated by Fortran compiler.

1. Number-crunching speed has always been a high priority for Fortran
programmers, so Fortran compiler writers have a *lot* of accumulated
experience in optimizing such calculations.

2. As someone else has pointed out, storing matrices row-wise versus
column-wise can make a difference, at least on some hardware
architectures.

3. I vaguely remember reading that because of the way arguments are
passed to subroutines in Fortran versus C/C++, Fortran compilers can
make certain assumptions that speed up array calculations, that C/C++
compilers cannot. The word "aliasing" comes to mind, but I don't
remember the details.

--
Jon Bell <jtb...@presby.edu> Presbyterian College
Dept. of Physics and Computer Science Clinton, South Carolina USA
[ Information about newsgroups for beginners: ]
[ http://www.geocities.com/ResearchTriangle/Lab/6882/ ]

bg...@my-dejanews.com

unread,

Apr 11, 1999, 3:00:00 AM4/11/99

to

Vincent Ma <vin...@hotmail.com> writes:

> I been trying to program in both C and Fortran. The code contains
> nothing but lots of multiply and add such as
>
> for i = 0, imax

> A(i,j,k) = B(i,j,k)*C(i,j,k) + .......

The two main reasons your C code is slower are likely to be, in order:

(1) the stride of the inner loop. In Fortran the first index of a
multidimensional array varies fastest (so the above code is optimal,
assuming that you've shown us the only or innermost loop). In C the
last index varies fastest, so you'd want to recode your C as
for (i=0; i<=imax; i++)
a[k,j,i] = b[k,j,i]*c[k,j,i] + ... ;
Note that you'll have to permute indices consistently throughout your code.
The penalty for non-unit strides in accessing memory is less efficient
use of cache and/or interleaving.

(2) aliasing constraints. In Fortran A, B and C are guaranteed not to
overlap (unless they have the POINTER or the TARGET attribute). In C
they are only guaranteed not to overlap if they are explicitly
declared as arrays rather than pointers (and function arguments are
always pointers, even if you declare them with [] notation). Some C
compilers have extensions that let you tell them to assume no overlap.
*Judicious* use of these may help somewhat, but be careful not to make
false promises to your compiler.

Siemel Naran

unread,

Apr 11, 1999, 3:00:00 AM4/11/99

to

On Sun, 11 Apr 1999 14:09:27 GMT, bg...@my-dejanews.com

>(1) the stride of the inner loop. In Fortran the first index of a
>multidimensional array varies fastest (so the above code is optimal,
>assuming that you've shown us the only or innermost loop). In C the
>last index varies fastest, so you'd want to recode your C as
> for (i=0; i<=imax; i++)
> a[k,j,i] = b[k,j,i]*c[k,j,i] + ... ;
>Note that you'll have to permute indices consistently throughout your code.
>The penalty for non-unit strides in accessing memory is less efficient
>use of cache and/or interleaving.

Don't fully understand your argument. But a good solution would be
to write a class 3dmatrix whose access operator uses whichever method
is fastest.

>(2) aliasing constraints. In Fortran A, B and C are guaranteed not to
>overlap (unless they have the POINTER or the TARGET attribute). In C
>they are only guaranteed not to overlap if they are explicitly
>declared as arrays rather than pointers (and function arguments are
>always pointers, even if you declare them with [] notation). Some C
>compilers have extensions that let you tell them to assume no overlap.
>*Judicious* use of these may help somewhat, but be careful not to make
>false promises to your compiler.

Not only that, but A[1] which has type double** may overlap with B[2]
which also has type double**. So if you are using A[i][j][k] where
k is varying, then A[i][j] will have to be recomputed each time in the
body of the for loop. I believe that the use of slices eliminates
this aliasing problem, which is one reason why code with valarray and
slices is just as fast as Fortran -- but I'm not fully sure about
this.

--
----------------------------------
Siemel B. Naran (sbn...@uiuc.edu)
----------------------------------

Lars Gregersen

unread,

Apr 12, 1999, 3:00:00 AM4/12/99

to

On Sun, 11 Apr 1999 13:24:44 GMT, jtb...@presby.edu (Jon Bell) wrote:

> Vincent Ma <vin...@hotmail.com> wrote:
>>
>>I found the code generated by C compiler runs MUCH slower than that
>>generated by Fortran compiler.
>
>1. Number-crunching speed has always been a high priority for Fortran
>programmers, so Fortran compiler writers have a *lot* of accumulated
>experience in optimizing such calculations.

This is true, but if you have the same simple loops written in Fortran
and C you should get essentially the same code (and speed).

>2. As someone else has pointed out, storing matrices row-wise versus
>column-wise can make a difference, at least on some hardware
>architectures.

The way the data is stored is not so much of a problem as how it is
accessed :-) If you optimize the matrix multiplication to read data in
the right order things will be just as fast in C as it is in Fortran.

>3. I vaguely remember reading that because of the way arguments are
>passed to subroutines in Fortran versus C/C++, Fortran compilers can
>make certain assumptions that speed up array calculations, that C/C++
>compilers cannot. The word "aliasing" comes to mind, but I don't
>remember the details.

This is a minor concern. Aliasing will create a lot of extra reads and
writes, but it will access memory that has just been read and
therefore still is in the cache. If your algorithms run so they
optimally use the cache you will get the fastest perfomance. You can
get an optimized BLAS from Intel (this is free) and other suppliers of
CPUs that will run optimally on a specific processor.

Lars

------------------------------
Lars Gregersen (l...@kt.dtu.dk)
http://www.gbar.dtu.dk/~matlg

Tak Pui Lou

unread,

Apr 12, 1999, 3:00:00 AM4/12/99

to

I found something on this website which explain why fortran is better in
number crunching:

http://www.accu.org/events/public/c_news.htm

Lou

Michel OLAGNON

unread,

Apr 12, 1999, 3:00:00 AM4/12/99

to

In article <37119b58....@news.dtu.dk>, l...@kt.dtu.dk (Lars Gregersen) writes:
>On Sun, 11 Apr 1999 13:24:44 GMT, jtb...@presby.edu (Jon Bell) wrote:
>
>> Vincent Ma <vin...@hotmail.com> wrote:
>>>
>>>I found the code generated by C compiler runs MUCH slower than that
>>>generated by Fortran compiler.
>>
>>1. Number-crunching speed has always been a high priority for Fortran
>>programmers, so Fortran compiler writers have a *lot* of accumulated
>>experience in optimizing such calculations.
>
>This is true, but if you have the same simple loops written in Fortran
>and C you should get essentially the same code (and speed).

If you try it with the .AXPY BLAS, you should notice that you need to
specify more in C than the ``simple loop'' in order to get a similar
rescheduling of assembly instructions as you can get with many Fortran
optimizing compilers.

>
>>2. As someone else has pointed out, storing matrices row-wise versus
>>column-wise can make a difference, at least on some hardware
>>architectures.
>
>The way the data is stored is not so much of a problem as how it is
>accessed :-) If you optimize the matrix multiplication to read data in
>the right order things will be just as fast in C as it is in Fortran.
>
>>3. I vaguely remember reading that because of the way arguments are
>>passed to subroutines in Fortran versus C/C++, Fortran compilers can
>>make certain assumptions that speed up array calculations, that C/C++
>>compilers cannot. The word "aliasing" comes to mind, but I don't
>>remember the details.
>
>This is a minor concern. Aliasing will create a lot of extra reads and
>writes, but it will access memory that has just been read and
>therefore still is in the cache. If your algorithms run so they

>optimally use the cache you will get the fastest performance. You can

>get an optimized BLAS from Intel (this is free) and other suppliers of
>CPUs that will run optimally on a specific processor.

Aliasing prohibits rescheduling of instructions within unrolled loops.
In my experience, that can make a difference by a ratio of 2.
I can but support the recommendation that you get optimized BLAS. Remember
that BLAS should be called the Fortran way, even in C, without any aliasing
between the arguments

Michel

--
| Michel OLAGNON email : Michel....@ifremer.fr|
| IFREMER: Institut Francais de Recherches pour l'Exploitation de la Mer|
| Centre de Brest - B.P. 70 phone : +33-2-9822 4144|
| F-29280 PLOUZANE - FRANCE fax : +33-2-9822 4650|
| http://www.ifremer.fr/ditigo/molagnon/molagnon.html |
| NOTE: http://www.ifremer.fr/isope99/ |

Peter S. Shenkin

unread,

Apr 12, 1999, 3:00:00 AM4/12/99

to

Vincent Ma wrote in message <3710584B...@hotmail.com>...
>Guys,

>
>I been trying to program in both C and Fortran. The code contains
>nothing but lots of multiply and add such as
>
>for i = 0, imax

> ........

>
> A(i,j,k) = B(i,j,k)*C(i,j,k) + .......
>

In the Fortran code, the i-index should be the innermost (the one that
changes fastest) in the nested triple loop. In the C code, the k index
should
be innermost.

If you do it this way, Fortran might still be faster, but I doubt it will be
a big
difference.

This is because in Fortran A(1,j,k) and A(2,j,k) are adjacent in memory,
whereas in C, A(i,j,1) and A(i,j,2) are adjacent. Non-local memory
references destroy cache warmth on cache-based processors (most processors
today).

So please post again telling us exactly how you did it and exactly what the
timings are, and also telling us what platform you're running on, what
compilers, and what compiler options.

-P.

Vincent Ma

unread,

Apr 12, 1999, 3:00:00 AM4/12/99

to Peter S. Shenkin

Dear friends,

Thanks for the reply. Here are more detailed information:

-------------------------------------------------------------
#include <stdio.h>

void main()
{
float E[36][21][52];
float H1[36][21][52];
float H2[36][21][52];

short int ID[43][31][52];
float A1[10];
float A2[10];
float A3[10];

int i,j,k,nlp,nmax;
int imax, jmax, kmax;

/* ============= set parameters ============== */

imax = 35;
jmax = 20;
kmax = 50;

/* ============== input nmax ================= */

nmax = 1000;

/* ----------- set initial conditions ---------- */

for(k = 1; k <= kmax-1; k++)
{
for(j = 1; j <= jmax-1; j++)
{
for(i = 1; i <= imax-1; i++)
{

ID[i][j][k] = 0;
E[i][j][k] = 4.;
H1[i][j][k] = 5.;
H2[i][j][k] = 6.;
A1[0] = 0.;
A2[0] = 2.;
A3[0] = 3.;
}
}
}

/* ############################## */

for (nlp = 0; nlp <= nmax; nlp++)
{
for(i = 0; i <= imax-1; i++)
{
for(j = 1; j <= jmax-1; j++)
{
for(k = 1; k <= kmax-1; k++)
{
E[i][j][k] = A1[ID[i][j][k]]*E[i][j][k] + A2[ID[i][j][k]]
* (H1[i][j][k] - H1[i][j-1][k] + A3[ID[i][j][k]] * (H2[i][j][k] - H2[i][j][k-1]
);
}
}
}

}

return;

}

-----------------------------------------------------------------------------------------------------------------------

Fortran test code is same. I ran both on Pentium 133 with 64 meg RAM with MS
Powerstation and Visual C++ 5 with full optimization on. The execution time are

Fortran -- 11 sec (i loop innermost)
C -- 101 sec (i loop innermost)
C -- 62 sec (k innermost).

The difference is still a lot (6 times). I've tried many variations, it seems no
matter what I did, C is always slower. My previous experience on workstation
told me the same story that fortran is much faster than C (at least for this
scientific computing case).

bg...@my-dejanews.com

unread,

Apr 12, 1999, 3:00:00 AM4/12/99

to

sbn...@localhost.localdomain (Siemel Naran) writes:

> >(1) the stride of the inner loop. In Fortran the first index of a
> >multidimensional array varies fastest (so the above code is optimal,
> >assuming that you've shown us the only or innermost loop). In C the
> >last index varies fastest, so you'd want to recode your C as
> > for (i=0; i<=imax; i++)
> > a[k,j,i] = b[k,j,i]*c[k,j,i] + ... ;
> >Note that you'll have to permute indices consistently throughout your code.
> >The penalty for non-unit strides in accessing memory is less efficient
> >use of cache and/or interleaving.

> Don't fully understand your argument. But a good solution would be
> to write a class 3dmatrix whose access operator uses whichever method
> is fastest.

Can't do that in C, which is what the original poster was asking
about. (Yes, I know this is crossposted to the C++ newsgroup, but
that's his fault not mine.) Also, your proposed class doesn't obviate
the need for the user to specify which index ordering is desired, so
he still needs to fully understand the issue.

> >(2) aliasing constraints. In Fortran A, B and C are guaranteed not to
> >overlap (unless they have the POINTER or the TARGET attribute).

[...]

> Not only that, but A[1] which has type double**

No it doesn't. If you declare
float A[100][200][300];
in C, A[1] is of type (float [200][300]), which degrades to (float *)
in pointer contexts.

It *is* true that if the original poster happened to declare
float ***A;
and then proceeded to allocate and fill in vectors of pointers to
pointers, the additional overhead in address arithmetic would be an
even better explanation for the speed difference he observed.

> may overlap with B[2]
> which also has type double**. So if you are using A[i][j][k] where
> k is varying, then A[i][j] will have to be recomputed each time in the
> body of the for loop. I believe that the use of slices eliminates
> this aliasing problem, which is one reason why code with valarray and
> slices is just as fast as Fortran -- but I'm not fully sure about
> this.

Am I the only one to find this explanation confused?

There are several issues here. One is the exact declaration of A and B
(and by B I mean any operand in the expression on the right hand side
of the assignment). There are three main possibilities:

/* 1. */
typedef float array[100][200][300]; array A, B;

In this case, A and B do not overlap. The C compiler knows it, and all
is well.

/* 2. */
func (float A[][200][300], float B[][200][300])
{
int i, j, k;

for (i=0; i<100; i++) for (j=0; j<200; j++) for (k=0; k<300; k++)
A[i][j][k] = B[i][j][k];
}

In this case it is true that B[2] could be A[1], in which case a
dependency exists and the first loop cannot be unrolled. Whether
there is a dependency in the other two loops I'll leave for a real
C expert to answer; I'll only point out that *if* the language
allows B[0][0][1] to overlap with A[0][0][0], then even the innermost
loop cannot be unrolled. (My doubt here is mainly about the wording of
the ANSI C standard; I know that most actual implementations will
happily let me create such an overlap.)

/* 3. */
float **A, **B; /* initialisation not shown */

In this case I have no doubt that A[0][0][0] may be aliased with
B[0][0][1], preventing the compiler from prefetching elements of B.

Fortran has exactly the same problem if A has the TARGET
or the POINTER attribute and B is an expression containing at least
one operand of the same type as A (the ranks need not be equal) and
also having the POINTER or the TARGET attribute. (Except that if both
A and B are mere TARGETs, the compiler may still be able to prove that
there is no overlap. Similar proofs may be possible where the pointers
involved have just been associated, so that the compiler knows what
they are pointing to. I'll ignore those special cases.)
Fortunately, Fortran programmers know they shouldn't use those
attributes unless they really need to. (Most of the time they don't.)

Another issue is whether "A[i][j] needs to be recomputed". Clearly
this only needs to happen if the value of i or j changes in mid-loop.
C allows this, while Fortran doesn't. However, in practice this is
rarely a serious obstacle to optimisation in C because the compiler
can often prove that i and j are not modified by the body of the loop.
It could be a problem if i and j were global variables (there is
rarely a good reason for that) and the body of the loop involved
external function calls (but then each iteration would take long
enough that unrolling the loop ceases to be such a pressing need).

Vincent Ma

unread,

Apr 12, 1999, 3:00:00 AM4/12/99

to

> Powerstation and Visual C++ 5 with full optimization on

Optimized for Maximun speed.

Vincent Ma

unread,

Apr 12, 1999, 3:00:00 AM4/12/99

to jst...@poly.phys.cwru.edu

Jonathan Stott wrote:

> I have an idea that one of the optimizers is simply much better than
> the other. Perhaps your Fortran compiler discovered that the second
> loop always sets the matrix E[][][] to zero and so only needs to be
> computed once?
>
> Running your sample programs through gcc and g77 (current versions of
> each) on an old Ultrix workstation gives essentially the same
> result (within 1%):
>

Johnathan,

Thank you, I found that you are absolutely correct. I ran your codes on
workstation and it turned out to be the same. I played with the code a little
bit, it makes lots of difference by changing which loops first. And I also
found that Visual C++ 5 doesn't seem to optimize the code that well. The speed
difference is still 6 times. But on the workstation, they are the same. Thank
you for solving me the problem.

Nick Ambrose

unread,

Apr 12, 1999, 3:00:00 AM4/12/99

to

David Klein wrote:

> Vincent Ma <vin...@hotmail.com> writes:
>
> >
> > Guys,
> >
> > I been trying to program in both C and Fortran. The code contains
> > nothing but lots of multiply and add such as
> >
>

> Your data accesses are well "localized" in fortran, but not in
> C. Fortran stores arrays by coloumn so successive values of I access
> adjacent elements in memory. C stores arrays by row, so you are
> jumping all over memory in C. All this is just a guess though, since
> It makes assumptions about what is hidden in your '......' and how
> your C code is written.

Seems like a pretty good guess to me also. One other thing is that Fortran is able
to assume no arrays are aliased, whereas C/C++ often arent able to do that (if
thats the problem, you should look at valarray)

Nick

Eric Tetz

unread,

Apr 13, 1999, 3:00:00 AM4/13/99

to

Nick Ambrose wrote

> > Your data accesses are well "localized" in fortran, but not in
> > C. Fortran stores arrays by coloumn so successive values of I access
> > adjacent elements in memory. C stores arrays by row, so you are
> > jumping all over memory in C. All this is just a guess though, since
> > It makes assumptions about what is hidden in your '......' and how
> > your C code is written.
>
> Seems like a pretty good guess to me also. One other thing is that
Fortran is able
> to assume no arrays are aliased, whereas C/C++ often arent able to do
that (if
> thats the problem, you should look at valarray)

Check out the latest issue (one of the latest anyway) of DJJ. It has an
article about C++ beating Fortran in a speed race...

Noone Really

unread,

Apr 13, 1999, 3:00:00 AM4/13/99

to

bg...@my-dejanews.com wrote:
> > Not only that, but A[1] which has type double**
>
> No it doesn't. If you declare
> float A[100][200][300];
> in C, A[1] is of type (float [200][300]), which degrades to (float *)
> in pointer contexts.

The response is equally incorrect.

float[200][300] is converted to float(*)[300] by taking the address of
its first element except when it is the operand of sizeof or &.
---

Carlie Coats

unread,

Apr 14, 1999, 3:00:00 AM4/14/99

to

In article <37122569...@hotmail.com>,

Vincent Ma <vin...@hotmail.com> wrote:
> Dear friends,
>
> Thanks for the reply. Here are more detailed information:

[snip... proves he is doing triply-nested

E[i][j][k] = A1[ID[i][j][k]]*E[i][j][k] + A2[ID[i][j][k]]
* (H1[i][j][k] - H1[i][j-1][k] + A3[ID[i][j][k]]
* (H2[i][j][k] - H2[i][j][k-1] );

the "right way in both Fortran and C...]

> Fortran test code is same. I ran both on Pentium 133 with 64 meg RAM with MS
> Powerstation and Visual C++ 5 with full optimization on. The execution time are
>
> Fortran -- 11 sec (i loop innermost)
> C -- 101 sec (i loop innermost)
> C -- 62 sec (k innermost).
>
> The difference is still a lot (6 times). I've tried many variations, it seems no
> matter what I did, C is always slower. My previous experience on workstation
> told me the same story that fortran is much faster than C (at least for this
> scientific computing case).

IIRC, in the gcc/g77 development process, Craig Burley and other g77
developers had quite a job convincing Richard Stallman that certain
loop-&-array optimizations were utterly necessary if you wanted to
get reasonably competitive performance for g77. This code looks like
it would be extremely sensitive to such optimizations -- particularly
the expressions "A1[ID[i][j][k]]". So quote possibly M$ believes
this sort of code is sufficiently rare in VC++ that they don't want
to implement these classes of optimization. In fact, it would seem
they are not alone among C compiler writers in this regard.

Carlie J. Coats, Jr. co...@ncsc.org
MCNC Environmental Programs phone: (919)248-9241
North Carolina Supercomputing Center fax: (919)248-9245
3021 Cornwallis Road P. O. Box 12889
Research Triangle Park, N. C. 27709-2889 USA
"My opinions are my own, and I've got *lots* of them!"

Toon Moene

unread,

Apr 14, 1999, 3:00:00 AM4/14/99

to

Carlie Coats wrote:

> IIRC, in the gcc/g77 development process, Craig Burley and other g77
> developers had quite a job convincing Richard Stallman that certain
> loop-&-array optimizations were utterly necessary if you wanted to
> get reasonably competitive performance for g77.

That's so, but I would be hilariously amused if MS VC++ actually is so
bad as gcc-2.7.2.3 was.

In my talk at Linux Expo I'll probably show what that compiler made of
this loop on my Motorola 68040 based NeXTStation:

void copy3(int n; float a[n][n][n], float b[n][n][n], int n)
{
int i, j, k;

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
for (k = 0; k < n; k++)
b[i][j][k] = a[i][j][k];
}

[ This is GCCese for the following Fortran loop: ]

subroutine copy3(a, b, n)
real a(n,n,n), b(n,n,n)
do i = 1, n
do j = 1, n
do k = 1, n
b(k,j,i) = a(k,j,i)
enddo
enddo
enddo

Hint: The code for the body of the inner loop doesn't fit on one sheet.

--
Toon Moene (to...@moene.indiv.nluug.nl)
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
Phone: +31 346 214290; Fax: +31 346 214286
g77 Support: for...@gnu.org; egcs: egcs...@cygnus.com

Daniel Barker

unread,

Apr 15, 1999, 3:00:00 AM4/15/99

to

REPOST. Message posted over 2 days ago, but has not shown up on any news
groups yet. comp.compilers omitted from re-post.

*****

[Note: I can't find comp.compiler on my news server, so have sent to
comp.compilers instead. Apologies for any confusion.]

On 12 Apr 1999, Jonathan Stott wrote:

[snip]

> I have an idea that one of the optimizers is simply much better than
> the other. Perhaps your Fortran compiler discovered that the second
> loop always sets the matrix E[][][] to zero and so only needs to be
> computed once?
>
> Running your sample programs through gcc and g77 (current versions of
> each) on an old Ultrix workstation gives essentially the same
> result (within 1%):

[snip]

I used the you kindly provided, below, on a couple of systems.

Firstly, on a Sun SPARCcenter 2000, using GNU C as "gcc -O4", Sun C as
"cc -fast" and Sun FORTRAN 77 as "f77 -fast", I got:

GNU C: 16.4 seconds
Sun C: 8.5 seconds
Sun FORTRAN: 5.4 seconds

(The FORTRAN binary warned "Note: Nonstandard floating-point mode
enabled".)

Much more interestingly, I repeated it on a SGI PowerChallenge, using MIPS
C as "cc -Ofast" and MIPS FORTRAN 77 as "f77 -Ofast", and got:

C: 0.01 seconds
FORTRAN: 0.01 seconds

I assume the MIPS compilers correctly note that the results of the loop
are irrelevant to program output, and remove the loop entirely.

Such excellent optimization aside, the program is unusual, in that it
theoretically has no reason to run faster in either C or FORTRAN.

There is NO reason for greater inefficiency due to aliasing trouble in the
C version, since the arrays are defined and used in the same function. The
compiler knows, when it comes to the loop, that it is dealing with
non-overlapping arrays. This would NOT be so if the arrays were defined in
main(), but communicated to a different function containing the loop.

Note that expressions such as "A1[ ID[i][j][k] ]", in C, and
"A1( ID(i,j,k) )", in FORTRAN, CAN cause aliasing trouble, but it does not
matter in this case since the expressions are only read, not written. The
compiler would not generally not know, at the time of the loop, whether
each "A1( ID(i,j,k) )" occupies a different location in memory or not. (It
could do here, since it can see how "ID" was set up, but I would not blame
a compiler for failing to notice this probably unusual arrangement.)

In conclusion, the program is not well suited to showing up the
differences between C and FORTRAN, but IS well suited to showing up the
differences between different compilers. These differences are shown to be
extreme.

> The code I used for the two tests is below. I fixed a few things
> that looked like syntax errors and made the C look less like
> Fortran, but otherwise things are as originally posted.
>
> -JS
>
> /**********************************************************************/
>
> #include <stdio.h>
>
> #define imax 34
> #define jmax 20
> #define kmax 50
>
> int main(int argc, char *argv[])
> {
> float E[36][21][52], H1[36][21][52], H2[36][21][52];
> float A1[10], A2[10], A3[10];
> short ID[43][31][52];
> int i, j, k, nlp, nmax;

>
> /* ============== input nmax ================= */
>
> nmax = 1000;
>
> /* ----------- set initial conditions ---------- */
>

> for (i = 0; i < imax; i++)
> for (j = 0; j < jmax; j++)
> for (k = 0; k < kmax; k++)

> {
> ID[i][j][k] = 0;

> E[i][j][k] = 4.0;
> H1[i][j][k] = 5.0;
> H2[i][j][k] = 6.0;
> }
>
> A1[0] = 0.0;
> A2[0] = 2.0;
> A3[0] = 3.0;
>
> /* ############################## */
>
> for (nlp = 0; nlp < nmax; nlp++)
> for (i = 0; i < imax; i++)
> for (j = 0; j < jmax; j++)
> for (k = 0; k < kmax; k++)
> E[i][j][k] = A1[ ID[i][j][k] ] * E[i][j][k]
> + A2[ ID[i][j][k] ] * (H1[i][j][k] - H1[i][j - 1][k])

> + A3[ ID[i][j][k] ] * (H2[i][j][k] - H2[i][j][k - 1]);
>

> return(0);
> }
>
> C **********************************************************************
>
> PROGRAM sample
>
> REAL E(36,21,52), H1(36,21,52), H2(36,21,52)
> REAL A1(10), A2(10), A3(10)
> INTEGER*2 ID(43,31,52)
>
> INTEGER i, j, k, nlp, nmax
> INTEGER imax, jmax, kmax
>
> C /* ============= set parameters ============== */
>
> PARAMETER(imax = 35)
> PARAMETER(jmax = 20)
> PARAMETER(kmax = 50)
>
> C /* ============== input nmax ================= */
>
> nmax = 1000
>
> C /* ----------- set initial conditions ---------- */
>
> DO 100, k = 1, kmax
> DO 200, j = 1, jmax
> DO 300, i = 1, imax
> ID(i,j,k) = 1
> E(i,j,k) = 4.0
> H1(i,j,k) = 5.0
> H2(i,j,k) = 6.0
> 300 CONTINUE
> 200 CONTINUE
> 100 CONTINUE
>
> A1(1) = 0.0
> A2(1) = 2.0
> A3(1) = 3.0
>
> C /* ############################## */
>
> DO 400, nlp = 1, nmax
> DO 500, k = 1, kmax
> DO 600, j = 1, jmax
> DO 700, i = 1, imax
> E(i,j,k) = A1( ID(i,j,k) )*E(i,j,k)
> $ + A2( ID(i,j,k) )*(H1(i,j,k) - H1(i,j-1,k))
> $ + A3( ID(i,j,k) )*(H2(i,j,k) - H2(i,j,k-1))
> 700 CONTINUE
> 600 CONTINUE
> 500 CONTINUE
> 400 CONTINUE
>
> RETURN
> END

Daniel Barker.

Daniel Barker

unread,

Apr 15, 1999, 3:00:00 AM4/15/99

to

On Thu, 15 Apr 1999, Daniel Barker wrote:

> Firstly, on a Sun SPARCcenter 2000, using GNU C as "gcc -O4",

Actually, "gcc -O3". Apologies.

Daniel Barker.

Daniel Barker

unread,

Apr 18, 1999, 3:00:00 AM4/18/99

to

Thomas Elken

unread,

Apr 29, 1999, 3:00:00 AM4/29/99

to

Daniel Barker wrote:
> Firstly, on a Sun SPARCcenter 2000, using GNU C as "gcc -O4", Sun C as
> "cc -fast" and Sun FORTRAN 77 as "f77 -fast", I got:
>
> GNU C: 16.4 seconds
> Sun C: 8.5 seconds
> Sun FORTRAN: 5.4 seconds
>

> Much more interestingly, I repeated it on a SGI PowerChallenge, using MIPS
> C as "cc -Ofast" and MIPS FORTRAN 77 as "f77 -Ofast", and got:
>
> C: 0.01 seconds
> FORTRAN: 0.01 seconds
>
> I assume the MIPS compilers correctly note that the results of the loop
> are irrelevant to program output, and remove the loop entirely.

Correct you are. The SGI MIPSpro compiler is too smart.

I inserted a line of output at the end to get it to do some work:

C:
printf("final calcs: %d, %f, %f\n", nlp, E[1][jmax][kmax],
E[imax][jmax][kmax]);
FORTRAN:
write(6,*) 'final calcs: ', nlp, E(1,jmax,kmax), E(imax,jmax,kmax)

and compiled as you did ( -Ofast) and got these times:

C: 0.46 seconds (real time)
FORTRAN: 0.48 seconds

on an Octane, MIPS R10000, 195MHz

-Ofast is really a bundle of compiler options:
The important one for this discussion is:
-OPT:alias=typed

The OPT:alias options can make code with C arrays or pointers run as
fast as FORTRAN with arrays, if they use pointers correctly. We have
many examples of this being true. Some of SGI's math libraries (complib
or SCSL) are written in C. I am tuning one such C code for solving
sparse linear equation systems which achieves 520 Mflops on one
processsor out of a 600 Mflops peak (on a somewhat dense matrix).
Maximum on a BLAS matrix multiply kernel (written in assembler) is about
550 Mflops on the same R12000 processor.

The definition of *some* of the alias options for the MIPSpro C/C++
compiler are:

Suboptions Action

alias=name Specifies the pointer aliasing model to be used. By
specifiying one of the following for name, the compiler
is
able to make assumptions throughout the compilation:

name Assumption

TYPED or NO_TYPED TYPED specifies that pointers of
distinct base types are assumed to
point to distinct, non-overlapping
objects.

NO_TYPED specifies that pointers to
different base types may point to
the same object.

(C and C++ only)

RESTRICT or NO_RESTRICT RESTRICT specifies that distinct
pointers are assumed to point to
distinct, non-overlapping objects.

NO_RESTRICT specifies that distinct
pointers may point to overlapping
storage.

(C and C++ only)

> Such excellent optimization aside, the program is unusual, in that it
> theoretically has no reason to run faster in either C or FORTRAN.

The program is not that unusual. A good compiler and well written C
(but no need to avoid pointers) can match FORTRAN performance.

Regards,
Tom Elken

bg...@my-dejanews.com

unread,

May 3, 1999, 3:00:00 AM5/3/99

to

t...@cs.duke.edu (Thomas R. Truscott) writes:

> > -OPT:alias=typed

> >
> > TYPED or NO_TYPED TYPED specifies that pointers of
> > distinct base types are assumed to
> > point to distinct, non-overlapping
> > objects.

> This is a nice concept, but I think for this option to be usable
> in large-ish programs two more things are needed:
>
> (1) A variant (e.g -OPT:alias=typed_check) which generates
> a run-time check that the assumption above is correct.

Interesting, but if you are going to ask SGI to introduce this sort of
run-time check you should also encourage them to *correctly* implement
more mundane things like array bounds checking (in both Fortran and C;
their Fortran compilers claim to support it, but it is obvious to the
experienced user that a number of situations which should be checked
are not).

I'm also a little unclear as to what exactly you are proposing to
check. Surely you don't want to take *every* possible pair of
pointers of distinct base types and test it for overlap.

> With this, whenever the compiler takes advantage of alias=type it
> also adds code to trap if there is actually an overlap. The user
> can then run the program on various test inputs and gain some
> confidence that this optimization is safe.

Aha. I think such confidence would be misplaced. The optimisation may
be safe with one release of the compiler but break on the next because
of changes in how the compiler takes advantage of alias=typed.

> (2) A most conservative rule than what is stated above.

*More* conservative, surely?

> For starters, one should not assume that a "void *"
> can point only to an object of type void.

You forgot a smiley here.

> And similarly for "char *" and other pointers to objects of size 1,
> since people often use those rather than "void *".

Good C compilers tend to warn about such pointer type inconsistencies.

> Also, a pointer cast to another type should not be assumed
> to be distinct from the original pointer.

You are missing the point. If the code you are compiling relies on
pointer type casts, then -OPT:alias=typed is unsafe. Of course there
is C code out there for which this option is unsafe; that is why it
isn't turned on by default!

> Other vendors have similar "ANSI alias" options, and whenever I have
> tried them (on a large collection of software at the company where I
> work), problems have arisen. When I point out the specific problems
> to the vendors, they tell me to turn off the option. I have concluded
> from this that these options are intended to speed up benchmarks, not
> real programs.

I disagree. They are intended to speed up well-behaved programs. That
said, I think a source-level pragma to indicate the absence of
aliasing on a fine-grained, case-by case basis would be far more
useful than an all-or-nothing command line switch. One could use it to
speed up critical sections of the code. One could even use it safely,
by preceding every use with an explicit test that the target arrays
don't overlap. (One of the problems with C pointers is that one often
doesn't know where the object being pointed into begins or ends. One
could take a conservative stance and say that two pointers *might*
overlap if there is any overlap between the objects---be they named
targets or malloc()ed heap areas---they point into, but the programmer
usually has additional insight on how the pointers will be used and
can write a more precise test.) It is no accident that the C9X draft
standard is moving in a similar direction (a "restrict" keyword, or
something like that) and that one of Cray's first extensions to
Fortran was an "ignore vector dependencies" pragma (CDIR$ IVDEP).

Robert Harley

unread,

May 3, 1999, 3:00:00 AM5/3/99

to

Our moderator wrote:

> C lets you alias anything to anything, and that does indeed cause
> optimization problems. The C9X draft has a "restrict" keyword [...]

But ANSI C has a rule which disallows aliasing anything to anything!

The rule is that an object in memory can only be accessed through
lvalues of the same type, possibly in a struct or union, or of char
type (here the types are considered modulo signed/unsigned and
qualifiers).

Presumably the "alias=typed" flag uses this rule to optimise better.

The restrict keyword can be used to give extra aliasing information,
even between objects of the same type where ANSI aliasing does not
help.

Bye,
Rob.

Zalman Stern

unread,

May 7, 1999, 3:00:00 AM5/7/99

to

bg...@my-dejanews.com wrote:
: t...@cs.duke.edu (Thomas R. Truscott) writes:
: > Also, a pointer cast to another type should not be assumed

: > to be distinct from the original pointer.

: You are missing the point. If the code you are compiling relies on
: pointer type casts, then -OPT:alias=typed is unsafe. Of course there
: is C code out there for which this option is unsafe; that is why it
: isn't turned on by default!

Consider the following:

contrived_example(int *foo, float *bar, int count) {
float sum = 0;
void *temp = (void *)foo;

/* Suspend disbelief on this untyped interface for a moment... */
generate_random_permutation(temp, sizeof(int), count);

/* We want to schedule this loop knowing that foo and bar do not
* alias.
*/
while (count-- > 0) {
sum += bar[*foo];
*foo++ = 0;
}
}

If type based aliasing works the way restrict does, the compiler is
required to notice that temp aliases with foo in the above
example. (And hence cannot hoist loads from foo above the call to
generate_random_permutation .) The type based aliasing information is
used for initializing the interference information at the function
start.

(Yes, I know the above is bad code. That isn't the point...)

-Z-

Herman Rubin

unread,

May 9, 1999, 3:00:00 AM5/9/99

to

Robert Harley <har...@corton.inria.fr> wrote:
>Our moderator wrote:

>> C lets you alias anything to anything, and that does indeed cause
>> optimization problems. The C9X draft has a "restrict" keyword [...]

>But ANSI C has a rule which disallows aliasing anything to anything!

>The rule is that an object in memory can only be accessed through
>lvalues of the same type, possibly in a struct or union, or of char
>type (here the types are considered modulo signed/unsigned and
>qualifiers).

As someone who has deliberately written programs which violate all
sorts of restrictions of this type FOR SPEED, I must object to the
attitude of the compiler writers that these things must not be done.

>Presumably the "alias=typed" flag uses this rule to optimise better.

There are ways to speed up code which seem to be anathema to those
who design languages and produce compilers. Should we not let the
user inform the system what is going on, which seems not to have
been considered by those providing tools.

If the compiler questions an action of the programmer, it should
allow the programmer to decide. I believe that we need many kinds
of optimization to be done which are now prohibited, and some of
which are too difficult to be done "automatically", such as
allowing alternative macros, possibly nested, to accomplish those
actions which are desired, and optimizing the choice at the
lowest level.

>The restrict keyword can be used to give extra aliasing information,
>even between objects of the same type where ANSI aliasing does not
>help.

--
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
[I have to disagree. The language definition is a contract between the
programmer and the compiler writer. The programmer agrees to write in
the defined language, the compiler writer agrees to write it. If you want
to write in some language similar but not identical to the one that the
compiler accepts, you can hardly complain if the compiler doesn't do what
you want. But I'd still like to see a concrete examples of the macro
approach you've been advocating for many years. -John]

Terry Greyzck

unread,

May 16, 1999, 3:00:00 AM5/16/99

to

hru...@stat.purdue.edu (Herman Rubin) wrote:

Here's a simple ANSI C routine that illustrates the problem:

int i;

void
pointer_alias( int * const p, const int n )
{
for ( i = 0; i < n; i++ ) {
p[i] = p[i] + 1;
}
}

It looks harmless, but as 'i' is global, the write through 'p' can
change its value, changing the character of the loop. I've actually
seen this type of code appear in 'real' applications.

The new 'restrict' qualifier does help.

-- Terry Greyzck

>Robert Harley <har...@corton.inria.fr> wrote:
>>Our moderator wrote:
>
>>> C lets you alias anything to anything, and that does indeed cause
>>> optimization problems. The C9X draft has a "restrict" keyword [...]
>
>>But ANSI C has a rule which disallows aliasing anything to anything!
>
>>The rule is that an object in memory can only be accessed through
>>lvalues of the same type, possibly in a struct or union, or of char
>>type (here the types are considered modulo signed/unsigned and
>>qualifiers).

ter...@uswest.net

George Neuner

unread,

May 16, 1999, 3:00:00 AM5/16/99

to

On 9 May 1999 18:41:46 -0400, hru...@stat.purdue.edu (Herman Rubin)
wrote:

>
>As someone who has deliberately written programs which violate all
>sorts of restrictions of this type FOR SPEED, I must object to the
>attitude of the compiler writers that these things must not be done.
>

>If the compiler questions an action of the programmer, it should
>allow the programmer to decide. I believe that we need many kinds
>of optimization to be done which are now prohibited, and some of
>which are too difficult to be done "automatically", such as
>allowing alternative macros, possibly nested, to accomplish those
>actions which are desired, and optimizing the choice at the
>lowest level.

Compiler optimizations are meant to preserve the semantic meaning of
the program while transforming the syntactic form into an equivalent,
but faster form. When the programmer deliberately uses a form from
which the compiler cannot derive semantic meaning, the correct
optimizations that can be performed are limited.

The compiler MUST assume that the type you declared is indicative of
the way you intend to use the value. If, for example, you declare an
array of union types, the compiler has no way of knowing what type of
data will actually be present at run time and cannot be as aggressive
in optimizing access to the array. Then you the programmer get
disgusted with the performance, open code your own pointer arithmetic,
get a large performance boost and gripe about the quality of the
compiler. Huh? You were deliberately vague in your language and got
then angry when the compiler didn't understand !

Aliased values (the subject of this thread) cannot be cached in
registers across function calls because the *real* value in memory
might be changed without the cached value being changed. But aliasing
is a natural side effect of procedural code unless there is a
provision for global register optimization. Note that most compilers
do not really perform "global" register optimization - what they refer
to as global is really "global within the compilation module". True
inter-module optimization is rare.

You really have to write pure functional (mathematical) code to
guarantee no aliasing is present. Purely functional code (with no
side effects) can be optimized practically out of existence. OTOH,
code with no side effects generally isn't very useful ... but that's a
different issue.

>There are ways to speed up code which seem to be anathema to those
>who design languages and produce compilers. Should we not let the
>user inform the system what is going on, which seems not to have
>been considered by those providing tools.

Most compiler writers do understand how to accelerate code. Most
programmers do NOT understand that not all code can be accelerated.
The way to get the most out of your compiler is to write simple, clear
code and not use clever tricks.

As John said, the language specification is a contract. When you
breach the contract (deliberately or otherwise) by using language
outside the translator's domain, you can realistically only expect
"best effort" in the translation. People certainly don't expect more
from human translators. Why do they expect more from a robot?.

George Neuner
Dynamic Resolutions, Inc.

Reid Tatge

unread,

May 20, 1999, 3:00:00 AM5/20/99

to

I agree that C has some "difficult" aliasing behavior.

However, in this kind of context, "p" generally can't alias "i" if "n" is
greater than 1. (!!) Here's the deal: the compiler is allowed to assume
that the program doesn't write past the end of any declared object. So if
the loop iterates more than once, writing to p[i] can't modify any scalar
int. Generally, if it iterates "x" times, it can't modify any declared
object of size ((x - 1) * sizeof(p[i])).

So, in the below example, where we don't know the trip count of the
loop, p[i] can only alias "i" in the first iteration. So the obvious
thing to do is peel the first iteration, and then promote "i" to a
register for the remaining iterations:

i = 0;
p[0] = p[0] + 1;
for ( i++; i < n; i++ )

p[i] = p[i] + 1;

A useful piece of information when you're trying to get loop counters into
registers. More useful when you know "n" (like its a literal constant).

-Reid

Joseph H Allen

unread,

May 29, 1999, 3:00:00 AM5/29/99

to

We've been going over this argument for eons, with only two solutions
provided:

Fortran - generate fast, but possibly buggy code
C, C++ - generate correct, but slow code

The only true solution for the alias problem is going to come from hardware:

>Aliased values (the subject of this thread) cannot be cached in
>registers across function calls because the *real* value in memory
>might be changed without the cached value being changed.

This is where the problem needs to be fixed. The processor must know
the source address of the data in its registers. Each register should
snoop writes just as cache memory does. If your subfunction changes a
value which is aliased in a register, the register should be updated
or invalidated.

Unfortunately, this does not solve the most general alias problem (the
value in a register could be a derived value involving several source
memory addresses), and with pipelined superscaler processors with
register renaming this is not going to be easy to implement (just as
cache coherency issues in multi-processor systems are complex).

But what would be the speedup from this fix? If it's the same 50% to
100% speedup you get with using Fortran instead of C, I'd say that the
extra hardware would be worth it.
--
/* jha...@world.std.com (192.74.137.5) */ /* Joseph H. Allen */
[Maybe or maybe not. This idea of registers that snoop on memory has been
proposed for a long time, but except maybe for the new IA64 nobody's ever
done it. -John]

Daniel Barker

unread,

Jun 2, 1999, 3:00:00 AM6/2/99

to

On 30 Apr 1999, Daniel Barker wrote:

> void f(unsigned int *ip, int lim, size_t *used, unsigned char byte_val)
> /* for each element i of the lim-element array whose first element is
> * pointed to by ip, set the first used[i] bytes to byte_val and the
> * remaining bytes to zero; used[i] must lie in the interval
> * [0, sizeof(unsigned int)]; the resulting value of ip[i] is
> * implementation-defined */
> {

[snip]

> }

(As someone has kindly pointed out, the action of the function did not
match this comment. Apologies.)

[snip]

> [C lets you alias anything to anything, and that does indeed cause
> optimization problems. The C9X draft has a "restrict" keyword that lets
> you assure the compiler that a pointer doesn't have aliasing problems.
> I presume the MIPS hack says that only data of the same type can be
> aliased, sort of the same thing here. -John]

Might a mechanism to specify independence of loop iterations (for whatever
reason) have been more useful than a mechanism to restrict pointers?

For example, consider this code, using permutation vectors:

/* ... */

int a[LIM]; /* array of scalars */
int b[LIM]; /* array of scalars */

int ac[LIM]; /* array of cursors to a */
int bc[LIM]; /* array of cursors to b */

int i; /* loop counter */

/* ... set all elements of "a" and "b"; set each element of "ac" to a
* different value in the interval [0.."LIM"-1]; set each element of
* "bc" to a different value in the interval [0.."LIM"-1]; and then ...
*/

for (i = 0; i < LIM; i++)
a[ac[i]] = a[ac[i]] * b[bc[i]];

Array "a" is only accessed through a single (implied) pointer and array
"b" is only accessed through a single (implied) pointer. But this says
nothing about whether the iterations of this loop are independent. I do
not see how C9x's "restrict" could help much here. It could inform the
compiler that the arrays in question, "a", "ac", "b" and "bc", do not
overlap. However, if the array definitions are visible (e.g.,
function-scope within the function including the loop), the compiler could
already deduce this without "restrict".

Permutations also a present a problem for optimized FORTRAN 77
compilation, though there exist implementation-dependent ways around the
problem in either C or FORTRAN. (I do not know enough Fortran 90 to say
where that lies.)

Another area a more general "independence" directive would be useful is if
there is a function call in a loop:

int f(int i)
/* a "pure" function, relying on nothing external, static or
* I/O-related */
{
return i * i;
}

/* ... */

for (i = 0; i < LIM; i++)
a[i] = f(a[i]);

If f() is compiled separately from the function including the loop, it
might be useful for the programmer to tell the compiler that all calls of
f() are independent, and may be performed in parallel. It would be
possible to specify this in the interface to f(). However, specifying it
at the loop would allow the same mechanism to be used as with the above
code using permutations.

If neither C9x nor ANSI C++ has a standard "independence" directive, this
is a shame, and I hope one is added in the next standard along the line.

Daniel Barker.
[I believe that you could declare restricted pointers within the loop to
give the compiler the required hint. -John]

H.W. Stockman

unread,

Jun 2, 1999, 3:00:00 AM6/2/99

to

Joseph H Allen <jha...@world.std.com> wrote

> We've been going over this argument for eons, with only two solutions
> provided:
>
> Fortran - generate fast, but possibly buggy code
> C, C++ - generate correct, but slow code
>
> The only true solution for the alias problem is going to come from hardware:

Two simpler solutions, that don't work for all cases, but for a surprisingly
large number:

(1) use the restrict keyword, from the upcoming ansi C standard;
(2) upon entry to functions with ambiguous passed pointer references,
derefrence and copy heavily used data to locals.

Erik Corry

unread,

Jun 2, 1999, 3:00:00 AM6/2/99

to

In comp.arch Joseph H Allen <jha...@world.std.com> wrote:
> We've been going over this argument for eons, with only two solutions
> provided:

> Fortran - generate fast, but possibly buggy code
> C, C++ - generate correct, but slow code

> The only true solution for the alias problem is going to come from hardware:

I saw a rather nice solution on Usenet a few years ago.

All functions where optimisations are impossible due to C's liberal
aliasing rules are automatically compiled in two versions, one where
Fortran aliasing is assumed and one where it isn't. When you call the
function the compiler calls the right one according to whether it can
prove that arguments are unaliased.

Of course, while you are compiling the non-aliased version of the
function you can usually call the non-aliased versions of any
sub/functions you need, so very little, if any checking is needed. At
some point you created the arrays/poiners involved, and at that point
your compiler can probably tell that they are not aliased. The linker
can fix things up if a non-alias version of the function doesn't
exist.

Has this been tried and failed?

Another solution is the support in the new C9x standard for specifying
that parameters are unaliased. This also means that C can be made to
run as fast as Fortran, though not quite as automatically. On the
other hand, making programmers think about these issues isn't
altogether a bad thing, given things like the aliasing bug in SPECfp95
mentioned on comp.arch today.

--
Erik Corry er...@arbat.com

Greg Lindahl

unread,

Jun 2, 1999, 3:00:00 AM6/2/99

to

jha...@world.std.com (Joseph H Allen) writes:

> But what would be the speedup from this fix? If it's the same 50% to
> 100% speedup you get with using Fortran instead of C, I'd say that the
> extra hardware would be worth it.

You want to have everyone spend money on hardware that will never be
used? Right. I'd much rather have a software option to detect aliasing
mistakes. Aliasing mistakes are much less common than bounds mistakes;
how many hardware architectures have hardware bounds checking?

-- g

D. J. Bernstein

unread,

Jun 2, 1999, 3:00:00 AM6/2/99

to

George Neuner <gne...@dyn.com> wrote:
> The way to get the most out of your compiler is to write simple, clear
> code and not use clever tricks.

Can you say ``five times slower''?

That's gcc's Pentium performance on a simple, clear, straightforward
inner loop---

long double t0,t1,t2,t3,t4,t7;
int *buf;
...
for (i = -96;i < 0;++i) {
t7 = buf[i];

t0 += doublearray[0][i + 96] * t7;
t1 += doublearray[1][i + 96] * t7;
t2 += doublearray[2][i + 96] * t7;
t3 += doublearray[3][i + 96] * t7;
t4 += doublearray[4][i + 96] * t7;
}

---compared to what I get _for exactly the same computation_ by writing
more complicated code.

See http://pobox.com/~djb/hash127/install.html for the complete program.
Try compiling and running ./speed; then try again with x86-idea in
conf-opt and with your favorite Pentium compiler in conf-cc.

> As John said, the language specification is a contract.

Well, yes, we all expect the compiler to behave as documented. But the
documented behavior is inadequate. Saying ``yes, but it's documented''
is missing the point.

---Dan

Peter Mayne

unread,

Jun 3, 1999, 3:00:00 AM6/3/99

to

Greg Lindahl <lin...@pbm.com> wrote

> how many hardware architectures have hardware bounds checking?

VAX for one.

From the architecture handbok:

The FORTRAN statements:
INTEGER*4 A(L1:U1,L2:U2),I,J
A(I,J) = 1
are equivalent to:
INDEX J,#L2,#U2,#M1,#0,R0 ;M1=U1-L1+1
INDEX I,#L1,#U1,#1,R0,R0
MOVL #1,A-a[R0] ;a={{L2*M1}+L1}*4

If the subscript operand is less than the low operand or greater than the high
operand, a subscript range trap is taken.

PJDM
----
Peter Mayne, Compaq Computer Australia, Canberra, ACT
[Not to belabor the obvious, but the x86 has a bounds checking instruction
also, although I've never seen it used, -John]

Greg Lindahl

unread,

Jun 6, 1999, 3:00:00 AM6/6/99

to

"Peter Mayne" <Peter...@compaq.com> writes:
> Greg Lindahl <lin...@pbm.com> wrote
> > how many hardware architectures have hardware bounds checking?
>
> VAX for one.

And the MC68010 and x86. But, to belabor the obvious, didn't the 801
people write a paper about how you can do software bounds checking for
a rather small cost if you do some optimization of the bounds checks?

As a result, what *modern* hardware architecture has hardware bounds
checking? Zip zilch none.

Software alias analysis of subroutine arguments is even cheaper than
bounds checking. C compilers already have to figure out which
statements contain potential alias problems. In some cases, it's
obvious that it's an argument alias problem.

-- g

John Hascall

unread,

Jun 12, 1999, 3:00:00 AM6/12/99

to

Erik Corry <er...@arbat.com> wrote:
}> The only true solution for the alias problem is going to come from hardware:
}I saw a rather nice solution on Usenet a few years ago.
}All functions where optimisations are impossible due to C's liberal
}aliasing rules are automatically compiled in two versions, one where
}Fortran aliasing is assumed and one where it isn't. When you call the
}function the compiler calls the right one according to whether it can

}prove that arguments are unaliased. ...

}Has this been tried and failed?

I wonder would it be sophisticated enough to tell the
difference between something like this:

void func(int * a, int * b, int n) {
int i;
for (i = 0; i < n; ++i) a[i] += b[i];
}

called with:

a = malloc(100 * sizeof(int));
b = malloc(100 * sizeof(int));
...
func(a, b, 100);

and with:

b = malloc(150 * sizeof(int));
a = b + 50;
func(a, b, 100);

John
--
John Hascall, Software Engr, Acropolis Project Manager, ISU Computation Center
http://www.cc.iastate.edu/staff/systems/john/index.html <=- the usual crud