why is this faster?

David Liu

unread,

Feb 11, 2000, 3:00:00 AM2/11/00

to

a and b are 2 pointers to 2 arrays, both with 300 elements.
If I delete the 3 lines:
"float *ai, *bj;"
"ai=a;"
"bj=b;"
and modify all ai to a, bj to b,
then the code runs slower. Why??
The code is listed below:

float INNER_PRODUCT(float *a, float *b)
{
float sum=0;
float *ai, *bj;
int i;
ai=a;
bj=b;
for(i=0 ; i<300 ; i++)
{
sum += (*ai) * (*bj);
ai++;
bj++;
}
return(sum);
}

Aidan Kehoe

unread,

Feb 11, 2000, 3:00:00 AM2/11/00

to

Hmm .... that *is* strange. a and b are copied locally to
the function on calling, anyway, so unless it's a matter of
data alignment on memory addresses ...... How *much* faster
is it ?

James Hu

unread,

Feb 11, 2000, 3:00:00 AM2/11/00

to

On Fri, 11 Feb 2000 17:02:13 +0800, David Liu <dawenliu@_hotmail.com> wrote:

>a and b are 2 pointers to 2 arrays, both with 300 elements.
>If I delete the 3 lines:
>"float *ai, *bj;"
>"ai=a;"
>"bj=b;"
>and modify all ai to a, bj to b,
>then the code runs slower. Why??

[SNIP]

This depends on the platform you are running on. One possible
explanation is that your platform keeps function arguments in a memory
location, your compiler does not apply any optimizations that would
move function arguments into registers, and your compiler knows how to
use registers for local variables.

-- James

Richard Bos

unread,

Feb 11, 2000, 3:00:00 AM2/11/00

to

"David Liu" <dawenliu@_hotmail.com> wrote:

> a and b are 2 pointers to 2 arrays, both with 300 elements.
> If I delete the 3 lines:
> "float *ai, *bj;"
> "ai=a;"
> "bj=b;"
> and modify all ai to a, bj to b,
> then the code runs slower. Why??

> The code is listed below:
>
> float INNER_PRODUCT(float *a, float *b)
> {
> float sum=0;
> float *ai, *bj;
> int i;
> ai=a;
> bj=b;
> for(i=0 ; i<300 ; i++)
> {
> sum += (*ai) * (*bj);
> ai++;
> bj++;
> }
> return(sum);
> }

Maybe because your registers are big enough to hold a pointer, and your
compiler knows enough to use this for "local variables" (block-scope
whatsit) but not for function parameters?
In any case, we can speculate, but unless we can look inside your
compiler and its output we can't possibly say; it is unportable anyway;
and you'd better not rely on it remaining faster when you change your
optimisation level. IOW: you got lucky, but off-topically so.

Richard

Martin Ambuhl

unread,

Feb 11, 2000, 3:00:00 AM2/11/00

to

Aidan Kehoe wrote:
>
> Hmm .... that *is* strange. a and b are copied locally to
> the function on calling, anyway, so unless it's a matter of
> data alignment on memory addresses ...... How *much* faster
> is it ?

Let's test this with arrays 200 times as large as the ones
you are using.

#include <stdio.h>
#include <time.h>

#define NELEM 60000LU
float func1(float *a, float *b)
{
float sum = 0;
float *ai = a, *bj = b;
size_t i;
for (i = 0; i < NELEM; i++, ai++, bj++)
sum += *ai * *bj;
return sum;
}

float func2(float *a, float *b)
{
float sum = 0;
size_t i;
for (i = 0; i < NELEM; i++, a++, b++)
sum += *a * *b;
return sum;
}

int main(void)
{
float a[NELEM], b[NELEM], c;
clock_t start, end;
double diff;
start = clock();
c = func1(a, b);
end = clock();
diff = (double)end - start;
printf("func1 took %g ticks (%g secs) for %lu array size\n"
" ... %g ticks (%g secs)/array element\n",
diff, diff / CLOCKS_PER_SEC, NELEM,
diff / NELEM, (diff / CLOCKS_PER_SEC) / NELEM);

start = clock();
c = func2(a, b);
end = clock();
diff = (double)end - start;
printf("func2 took %g ticks (%g secs) for %lu array size\n"
" ... %g ticks (%g secs)/array element\n",
diff, diff / CLOCKS_PER_SEC, NELEM,
diff / NELEM, (diff / CLOCKS_PER_SEC) / NELEM);
fflush(stdout);
return 0;
}

func1 took 0 ticks (0 secs) for 60000 array size
... 0 ticks (0 secs)/array element
func2 took 0 ticks (0 secs) for 60000 array size
... 0 ticks (0 secs)/array element

My, that certainly is a significant difference.

>
> David Liu wrote:
>
> > a and b are 2 pointers to 2 arrays, both with 300 elements.
> > If I delete the 3 lines:
> > "float *ai, *bj;"
> > "ai=a;"
> > "bj=b;"
> > and modify all ai to a, bj to b,
> > then the code runs slower. Why??
> > The code is listed below:
> >
> > float INNER_PRODUCT(float *a, float *b)
> > {
> > float sum=0;
> > float *ai, *bj;
> > int i;
> > ai=a;
> > bj=b;
> > for(i=0 ; i<300 ; i++)
> > {
> > sum += (*ai) * (*bj);
> > ai++;
> > bj++;
> > }
> > return(sum);
> > }

--
Martin Ambuhl mam...@earthlink.net

What one knows is, in youth, of little moment; they know enough who
know how to learn. - Henry Adams

A thick skin is a gift from God. - Konrad Adenauer
__________________________________________________________
Fight spam now!
Get your free anti-spam service: http://www.brightmail.com

Lawrence Kirby

unread,

Feb 11, 2000, 3:00:00 AM2/11/00

to

In article <880jrb$7qt$1...@gemini.ntu.edu.tw>
dawenliu@_hotmail.com "David Liu" writes:

>a and b are 2 pointers to 2 arrays, both with 300 elements.
>If I delete the 3 lines:
>"float *ai, *bj;"
>"ai=a;"
>"bj=b;"
>and modify all ai to a, bj to b,
>then the code runs slower. Why??
>The code is listed below:
>
>float INNER_PRODUCT(float *a, float *b)
>{
> float sum=0;
> float *ai, *bj;
> int i;
> ai=a;
> bj=b;
> for(i=0 ; i<300 ; i++)
> {
> sum += (*ai) * (*bj);
> ai++;
> bj++;
> }
> return(sum);
>}

Compiler optimisers are complex beasts and different code can tickle
them in different ways. For example gcc produces the same code for both
while MSVC6 is different. This is for the code that uses ai and bj

mov eax, DWORD PTR _a$[esp-4]
mov ecx, DWORD PTR _b$[esp-4]
fld DWORD PTR __real@4@00000000000000000000
sub ecx, eax
mov edx, 300 ; 0000012cH
$L97:
fld DWORD PTR [ecx+eax]
fmul DWORD PTR [eax]
add eax, 4
dec edx
faddp ST(1), ST(0)
jne SHORT $L97
ret 0

This is for the code that doesn't

fld DWORD PTR __real@4@00000000000000000000
mov ecx, DWORD PTR _b$[esp-4]
mov eax, DWORD PTR _a$[esp-4]
mov edx, 300 ; 0000012cH
$L108:
fld DWORD PTR [eax]
fmul DWORD PTR [ecx]
add eax, 4
add ecx, 4
dec edx
faddp ST(1), ST(0)
jne SHORT $L108
ret 0

The difference is that the first case has a loop optimisation that
eliminates the add ecx, 4 instruction within the loop. Why the compiler
wasn't able to do this for the second case only the compiler writer
can tell you. However this does demonstrate that having more variables
in the source code doesn't always hurt efficiency and can even help it.
Finally consider

float INNER_PRODUCT3(float *a, float *b)
{
float sum=0;
int i;
for(i=0 ; i<300 ; i++)
{
sum += a[i] * b[i];
}
return(sum);
}

This compiles to essentially the same as the first (i.e. more efficient)
version above and it compiles to more efficient code than either with gcc.
So, with modern compilers don't go filling the code with extra pointers
unless you have good reason, they can just slow things down. This is a
good example of where the clearest code also turns out to be the most
efficient.
--
-----------------------------------------
Lawrence Kirby | fr...@genesis.demon.co.uk
Wilts, England | 7073...@compuserve.com
-----------------------------------------