I remember reading on one MSDN blog that when NT code was changed from
cdecl to stdcall there was a 10% performance increase. Since I read
this I've been using stdcall in all my Win32 apps, but I'm honestly not
quite sure why it's "better" than cdecl. Can anyone help me out?
Thanks,
Gary
http://www.unixwiz.net/techtips/win32-callconv-asm.html
--
Un saludo
Rodrigo Corral González [MVP]
FAQ de microsoft.public.es.vc++
http://rcorral.mvps.org
OTOH if you have situations like
foo(a,b,c);
bar(a,b,c);
the caller cannot reuse the stack and has to push twice.
Whether you gain or loose depends on what you do in performance critical
code.
--
Eugene
However, no C compiler has ever generated code to take advantage of that
"feature", and I don't believe I have ever seen it even in assembly code.
So, instead of "on the other hand", I think I'd say "on the tip of the
other little fingernail".
--
- Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc
That's an interesting assertion. Have you made a survey of function call
optimizations on all C and C++ x86 compilers with all the optimization
options?
> and I don't believe I have ever seen it even in
> assembly code.
I did.
> So, instead of "on the other hand", I think I'd say "on the tip of the
> other little fingernail".
Could be. Still the original assertion that __stdcall is inherently faster
is not necessarily true. Here is a better example. The following
void foo(int a, int b, int c)
{
printf("%d %d %d\n", a, b, c);
}
void bar(int a, int b, int c)
{
printf("%d %d %d\n", a, b, c);
}
int main()
{
foo(1,2,3);
bar(1,2,3);
}
compiled with flags /Ox /Og /Ob1 /Oi /Ot /Oy /GL /G7 /GA /FD /EHsc /ML /GS
/Zc:wchar_t /Zc:forScope /GR /FAcs /W3 /Wp64 /Zi /TP and linked with /LTCG
produces the following
PUBLIC _main
; Function compile flags: /Ogty
_TEXT SEGMENT
_main PROC NEAR
; 17 : foo(1,2,3);
00040 b8 03 00 00 00 mov eax, 3
00045 b9 02 00 00 00 mov ecx, 2
0004a ba 01 00 00 00 mov edx, 1
0004f e8 00 00 00 00 call ?foo@@YAXHHH@Z ; foo
; 18 : bar(1,2,3);
00054 b8 03 00 00 00 mov eax, 3
00059 b9 02 00 00 00 mov ecx, 2
0005e ba 01 00 00 00 mov edx, 1
00063 e8 00 00 00 00 call ?bar@@YAXHHH@Z ; bar
; 19 :
; 20 : }
00068 33 c0 xor eax, eax
0006a c3 ret 0
_main ENDP
Note total *absence* of stack cleanup.
--
Eugene
Looks like __fastcall to me, but it's late and I'm too tired to go
look up all the compiler options!
--
Sev
--
40th Floor - Software @ http://40th.com/
iPlay : the ultimate audio player for PPCs
mp3,mp4,m4a,aac,ogg,flac,wav,play & record
parametric eq, xfeed, reverb: all on a ppc
Ooops. Pasted wrong .cod file from another experiment. ;-) Here is the
intended one
PUBLIC _main
; Function compile flags: /Ogsy
_TEXT SEGMENT
_main PROC NEAR
; 17 : foo(1,2,3);
push 3
push 2
push 1
call ?foo@@YAXHHH@Z ; foo
; 18 : bar(1,2,3);
push 3
push 2
push 1
call ?bar@@YAXHHH@Z ; bar
add esp, 24 ; 00000018H
; 19 :
; 20 : }
xor eax, eax
ret 0
_main ENDP
_TEXT ENDS
END
One cleanup per two calls. The other experiment was with /Ot flag that
resulted in compiler using __fastcall even though the functions were
nominally __cdecl.
--
Eugene
Good advice but this is not what this thread is about, isn't it?
--
Eugene
I don't know -- why did you write this then, if not
to make claim of some sort of "optimization" by way
of how to stack is setup (I'm giving it more credence
than that deserves). There's no optimization in this
sort of thing; and no relevance to "performance critical
code" at all. This is not the stuff that speed comes
from.
EG- [Thu, 31 Mar 2005 03:07:31 -0800]:
>OTOH if you have situations like
>foo(a,b,c);
>bar(a,b,c);
>the caller cannot reuse the stack and has to push twice.
>Whether you gain or loose depends on what you do in performance critical
>code.
--
http://www.codeproject.com/cpp/calling_conventions_demystified.asp
If something takes 2ms and something else takes 1ms then the 1ms version is
faster. Whether this will result in any meaningfull "optimization" of entire
application or some part of it is another matter. Whether developer time
would be better spent optimizing something else is another matter too. My
understanding is that this thread is about inherent speed advantage of
__stdcall over __cdecl not whether this is worth paying attention to in real
code.
>> Whether you gain or loose depends on what you do in performance
>> critical code.
> There's no optimization in this
> sort of thing; and no relevance to "performance critical
> code" at all.
Why not? If you do call functions in performance critical code you may very
well win that same 1ms. Whether there are better ways to speed the code up
is again another matter.
> This is not the stuff that speed comes
> from.
Very true.
--
Eugene
void foo(int x, int y, int z)
{ x++;
}
This changes the value of x (which is stored in a cell on the stack) in
function foo() but it does not change the value of a in the calling
program. If you reuse the stack to call bar(), bar will see the modified
value x instead of a.
Norm
--
--
To reply, change domain to an adult feline.
And what prevents this compiler from looking inside foo and checking whether
it does change its parameters? This has nothing to do with standard but only
with compiler optimizations.
--
Eugene
Sure. Just as it cannot in general perform many other optimizations.
--
Eugene