Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Code generator detail

8 views
Skip to first unread message

Bartc

unread,
Apr 24, 2008, 9:00:44 PM4/24/08
to
I think there might be a problem with code for indirect calls:

int *pcptr; /* global */ /* Details from much larger code */
stopped=0;

do {
((void(*)(void))*pcptr)();
} while (!stopped);

pcptr usually (99.9%) points to a location containing the address of one of
two functions, both identical to this:

void pushm {
pcptr += 2;
}

Lccwin generates something like this for the indirect call (in NASM syntax):

mov eax,[pcptr]
call [eax]

But gcc generates longer code something like:

mov eax,[pcptr]
mov eax,[eax]
call eax

The problem is, the lccwin code take about 3 times longer! For 50m
iterations on my slow machine, about 2400ms for lccwin and about 00ms for
gcc.

I've tried putting in the longer code as inline asm() instructions, and on a
real test where lccwin32 had been 60% slower than gcc, with this new call,
it was about the same speed as gcc!

--
Bart

jacob navia

unread,
Apr 25, 2008, 1:58:08 AM4/25/08
to

The code is not equivalent

In lccwin I read a function pointer then call that value

In the gcc code shown, you load a pointer value, then dereference that
and use THAT value as the call value


Please send me a compilable snippet and I will look if there is a
problem with it

thanks


--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32

Bartc

unread,
Apr 25, 2008, 5:48:17 AM4/25/08
to

"jacob navia" <ja...@nospam.com> wrote in message
news:furrtk$p40$1...@aioe.org...
> Bartc wrote:

>> Lccwin generates something like this for the indirect call (in NASM
>> syntax):
>>
>> mov eax,[pcptr]
>> call [eax]
>>
>> But gcc generates longer code something like:
>>
>> mov eax,[pcptr]
>> mov eax,[eax]
>> call eax
>>
>> The problem is, the lccwin code take about 3 times longer! For 50m

> The code is not equivalent


>
> In lccwin I read a function pointer then call that value
>
> In the gcc code shown, you load a pointer value, then dereference that
> and use THAT value as the call value

pcptr is effectively a pointer to a pointer to a function. Look at the code
more carefully: lccwin ends with CALL [EAX]; gcc ends with CALL EAX;

So they both do the same; anyway in the following, commenting out the C code
and inserting the asm made this code fragment much faster when compiled with
lccwin:

do {
//((void(*)(void))*pcptr)();
_asm ("mov _pcptr,%ebx");
_asm ("mov (%ebx),%ebx");
_asm ("call %ebx");
}while (!stopped)

Your code generator produces, with -O,

; 74 ((void(*)(void))*pcptr)();
.line 74
movl _pcptr,%ebx
call *(,%ebx)

On an actual test (not just calling empty functions), this change reduced
the lccwin runtime from 3000ms to 1900ms (gcc was 1700ms). On another test,
reduced lccwin runtime from 10,000ms to 6700ms (gcc was 7100ms).

BUT: I haven't been able to reproduce these differences in a smaller test
program. Just unrolling the loop a little lost any advantage of call reg
over call [reg]. So leave this alone for now; I will just use the asm() as
needed. Although there is clearly something odd going on in the CPU.

--
Bart


jacob navia

unread,
Apr 25, 2008, 6:31:47 AM4/25/08
to
Bartc wrote:
OK

1: I changed the code generator to emit the code as you want
2: I wrote this program:

typedef void (*fnptr)(void);
fnptr pfnptr;
fnptr *ppfnptr;
void n(void)
{
}

int main(void)
{
int i;

pfnptr=n;
ppfnptr = &pfnptr;
for (i=0; i<100000000; i++)
(*ppfnptr)();
}

Then I compiled using the new code generator. Elapsed time 1.558 seconds
Then I compiled using the old code generator. Elapsed time 1.502 seconds

The difference is not significative

Code generated by the old code generator:
[0000027] 8b1d00000000 mov 0x0,%ebx (_ppfnptr)
[0000033] ff141d00000000 call *0x0(,%ebx,1)

Code generated by the new code generator:
[0000027] 8b1d00000000 mov 0x0,%ebx (_ppfnptr)
[0000033] 8b1b mov (%ebx),%ebx
[0000035] ffd3 call *%ebx

I would like to improve the code generated, but I just do not understand
why you see such a difference.

Which CPU are you using?
Which OS?

Bartc

unread,
Apr 25, 2008, 7:32:17 AM4/25/08
to

"jacob navia" <ja...@nospam.com> wrote in message
news:fusbuo$nbq$1...@aioe.org...

> Bartc wrote:
> OK
>
> 1: I changed the code generator to emit the code as you want
> 2: I wrote this program:

Like I said I couldn't reproduce the difference outside the larger program.

> I would like to improve the code generated, but I just do not understand
> why you see such a difference.
>
> Which CPU are you using?
> Which OS?

The OS is WinXP. The CPU with the big timing difference is Pentium M 1.1GHz
(a laptop).

But when I tried it on a Pentium 4 2.93Ghz machine, the differences were
minimal; although the new CALL EBX form seemed to be 5% faster than the old
CALL (EBX) form. And on this machine, on my second test, lccwin was anyway
faster than gcc!

So I wouldn't worry about it too much; I will discover other things I'm sure
pretty soon. That code on the Pentium M must have been a strange combination
of different factors.

Maybe best to keep the new code though, if only because that's what gcc
seems to use.

--
Thanks,

Bart

0 new messages