Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

__cdecl vs __stdcall

318 views
Skip to first unread message

Gary Nastrasio

unread,
Mar 29, 2005, 3:53:22 PM3/29/05
to
I understand the basic difference between these two calling conventions,
such as stack cleaning, variable arguments, and naming conventions, but
how exactly is stdcall faster than cdecl? I've tried to go through my
code in assembly to investigate this, but unfortunately my x86 assembly
isn't very strong.

I remember reading on one MSDN blog that when NT code was changed from
cdecl to stdcall there was a 10% performance increase. Since I read
this I've been using stdcall in all my Win32 apps, but I'm honestly not
quite sure why it's "better" than cdecl. Can anyone help me out?

Thanks,

Gary

Rodrigo Corral [MVP]

unread,
Mar 29, 2005, 4:08:49 PM3/29/05
to

Because the size of the parameter block is fixed, the burden of cleaning
these parameters off the stack can be shifted to the called function,
instead of being done by the calling function as in __cdecl. There are
several effects of this: the code is a tiny bit smaller, because the
parameter-cleanup code is found once -- in the called function itself --
rather than in every place the function is called. These may be only a few
bytes per call, but for commonly-used functions it can add up. This
presumably means that the code may be a tiny bit faster as well.

http://www.unixwiz.net/techtips/win32-callconv-asm.html


--
Un saludo
Rodrigo Corral González [MVP]

FAQ de microsoft.public.es.vc++
http://rcorral.mvps.org


Eugene Gershnik

unread,
Mar 31, 2005, 6:07:31 AM3/31/05
to
Rodrigo Corral [MVP] wrote:
> Because the size of the parameter block is fixed, the burden of
> cleaning these parameters off the stack can be shifted to the
> called function, instead of being done by the calling function as
> in __cdecl. There are several effects of this: the code is a tiny
> bit smaller, because the parameter-cleanup code is found once -- in
> the called function itself -- rather than in every place the
> function is called. These may be only a few bytes per call, but for
> commonly-used functions it can add up. This presumably means that
> the code may be a tiny bit faster as well.

OTOH if you have situations like

foo(a,b,c);
bar(a,b,c);

the caller cannot reuse the stack and has to push twice.
Whether you gain or loose depends on what you do in performance critical
code.

--
Eugene


Tim Roberts

unread,
Apr 1, 2005, 1:50:00 AM4/1/05
to
"Eugene Gershnik" <gers...@hotmail.com> wrote:
>
>OTOH if you have situations like
>
>foo(a,b,c);
>bar(a,b,c);
>
>the caller cannot reuse the stack and has to push twice.
>Whether you gain or loose depends on what you do in performance critical
>code.

However, no C compiler has ever generated code to take advantage of that
"feature", and I don't believe I have ever seen it even in assembly code.

So, instead of "on the other hand", I think I'd say "on the tip of the
other little fingernail".
--
- Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc

Eugene Gershnik

unread,
Apr 1, 2005, 3:43:30 AM4/1/05
to
Tim Roberts wrote:
> "Eugene Gershnik" <gers...@hotmail.com> wrote:
>>
>> OTOH if you have situations like
>>
>> foo(a,b,c);
>> bar(a,b,c);
>>
>> the caller cannot reuse the stack and has to push twice.
>> Whether you gain or loose depends on what you do in performance
>> critical code.
>
> However, no C compiler has ever generated code to take advantage of
> that "feature",

That's an interesting assertion. Have you made a survey of function call
optimizations on all C and C++ x86 compilers with all the optimization
options?

> and I don't believe I have ever seen it even in
> assembly code.

I did.

> So, instead of "on the other hand", I think I'd say "on the tip of the
> other little fingernail".

Could be. Still the original assertion that __stdcall is inherently faster
is not necessarily true. Here is a better example. The following

void foo(int a, int b, int c)
{
printf("%d %d %d\n", a, b, c);
}

void bar(int a, int b, int c)
{
printf("%d %d %d\n", a, b, c);
}

int main()
{
foo(1,2,3);
bar(1,2,3);
}

compiled with flags /Ox /Og /Ob1 /Oi /Ot /Oy /GL /G7 /GA /FD /EHsc /ML /GS
/Zc:wchar_t /Zc:forScope /GR /FAcs /W3 /Wp64 /Zi /TP and linked with /LTCG
produces the following

PUBLIC _main
; Function compile flags: /Ogty
_TEXT SEGMENT
_main PROC NEAR

; 17 : foo(1,2,3);

00040 b8 03 00 00 00 mov eax, 3
00045 b9 02 00 00 00 mov ecx, 2
0004a ba 01 00 00 00 mov edx, 1
0004f e8 00 00 00 00 call ?foo@@YAXHHH@Z ; foo

; 18 : bar(1,2,3);

00054 b8 03 00 00 00 mov eax, 3
00059 b9 02 00 00 00 mov ecx, 2
0005e ba 01 00 00 00 mov edx, 1
00063 e8 00 00 00 00 call ?bar@@YAXHHH@Z ; bar

; 19 :
; 20 : }

00068 33 c0 xor eax, eax
0006a c3 ret 0
_main ENDP

Note total *absence* of stack cleanup.

--
Eugene


Severian

unread,
Apr 1, 2005, 4:21:28 AM4/1/05
to

Looks like __fastcall to me, but it's late and I'm too tired to go
look up all the compiler options!

--
Sev

h...@40th.com

unread,
Apr 1, 2005, 4:54:04 AM4/1/05
to
If that's what you're "optimizing", you're
optimizing the wrong thing. At most you're
talking a few cycles, here and there. If
that's ever the majority of a call, and
a call made billions of times, don't
make a call in the first place.

--
40th Floor - Software @ http://40th.com/
iPlay : the ultimate audio player for PPCs
mp3,mp4,m4a,aac,ogg,flac,wav,play & record
parametric eq, xfeed, reverb: all on a ppc

Eugene Gershnik

unread,
Apr 1, 2005, 5:12:57 AM4/1/05
to
Severian wrote:
> Looks like __fastcall to me, but it's late and I'm too tired to go
> look up all the compiler options!

Ooops. Pasted wrong .cod file from another experiment. ;-) Here is the
intended one

PUBLIC _main
; Function compile flags: /Ogsy


_TEXT SEGMENT
_main PROC NEAR

; 17 : foo(1,2,3);

push 3
push 2
push 1
call ?foo@@YAXHHH@Z ; foo

; 18 : bar(1,2,3);

push 3
push 2
push 1
call ?bar@@YAXHHH@Z ; bar
add esp, 24 ; 00000018H

; 19 :
; 20 : }

xor eax, eax
ret 0
_main ENDP
_TEXT ENDS
END

One cleanup per two calls. The other experiment was with /Ot flag that
resulted in compiler using __fastcall even though the functions were
nominally __cdecl.

--
Eugene


Eugene Gershnik

unread,
Apr 1, 2005, 5:14:08 AM4/1/05
to
h...@40th.com wrote:
> If that's what you're "optimizing", you're
> optimizing the wrong thing. At most you're
> talking a few cycles, here and there. If
> that's ever the majority of a call, and
> a call made billions of times, don't
> make a call in the first place.

Good advice but this is not what this thread is about, isn't it?

--
Eugene


h...@40th.com

unread,
Apr 1, 2005, 5:49:56 AM4/1/05
to
EG- [Fri, 1 Apr 2005 02:14:08 -0800]:

I don't know -- why did you write this then, if not
to make claim of some sort of "optimization" by way
of how to stack is setup (I'm giving it more credence
than that deserves). There's no optimization in this
sort of thing; and no relevance to "performance critical
code" at all. This is not the stuff that speed comes
from.

EG- [Thu, 31 Mar 2005 03:07:31 -0800]:


>OTOH if you have situations like
>foo(a,b,c);
>bar(a,b,c);
>the caller cannot reuse the stack and has to push twice.
>Whether you gain or loose depends on what you do in performance critical
>code.

--

Nemanja Trifunovic

unread,
Apr 1, 2005, 9:11:40 AM4/1/05
to

Eugene Gershnik

unread,
Apr 1, 2005, 5:11:57 PM4/1/05
to
h...@40th.com wrote:
> EG- [Fri, 1 Apr 2005 02:14:08 -0800]:
>> h...@40th.com wrote:
>>> If that's what you're "optimizing", you're
>>> optimizing the wrong thing. At most you're
>>> talking a few cycles, here and there. If
>>> that's ever the majority of a call, and
>>> a call made billions of times, don't
>>> make a call in the first place.
>>
>> Good advice but this is not what this thread is about, isn't it?
>
> I don't know -- why did you write this then, if not
> to make claim of some sort of "optimization" by way
> of how to stack is setup

If something takes 2ms and something else takes 1ms then the 1ms version is
faster. Whether this will result in any meaningfull "optimization" of entire
application or some part of it is another matter. Whether developer time
would be better spent optimizing something else is another matter too. My
understanding is that this thread is about inherent speed advantage of
__stdcall over __cdecl not whether this is worth paying attention to in real
code.

>> Whether you gain or loose depends on what you do in performance
>> critical code.

> There's no optimization in this


> sort of thing; and no relevance to "performance critical
> code" at all.

Why not? If you do call functions in performance critical code you may very
well win that same 1ms. Whether there are better ways to speed the code up
is again another matter.

> This is not the stuff that speed comes
> from.

Very true.

--
Eugene

Norman Bullen

unread,
Apr 1, 2005, 8:10:37 PM4/1/05
to
A standard-compliant compiler can't reuse the stack. Consider:

void foo(int x, int y, int z)
{ x++;
}

This changes the value of x (which is stored in a cell on the stack) in
function foo() but it does not change the value of a in the calling
program. If you reuse the stack to call bar(), bar will see the modified
value x instead of a.

Norm

--
--
To reply, change domain to an adult feline.

Eugene Gershnik

unread,
Apr 1, 2005, 10:30:04 PM4/1/05
to
Norman Bullen wrote:
>
> A standard-compliant compiler can't reuse the stack. Consider:
>
> void foo(int x, int y, int z)
> { x++;
> }
>
> This changes the value of x (which is stored in a cell on the stack)
> in function foo() but it does not change the value of a in the calling
> program. If you reuse the stack to call bar(), bar will see the
> modified value x instead of a.

And what prevents this compiler from looking inside foo and checking whether
it does change its parameters? This has nothing to do with standard but only
with compiler optimizations.

--
Eugene


Norman Bullen

unread,
Apr 2, 2005, 11:34:49 AM4/2/05
to
The compiler may not have the source or object code for foo() available
when it compiles
foo(a,b,c);
bar(a,b,c);
so it cannot (in general) tell what foo() does with its arguments.

Eugene Gershnik

unread,
Apr 2, 2005, 11:40:20 AM4/2/05
to
Norman Bullen wrote:
> The compiler may not have the source or object code for foo()
> available when it compiles
> foo(a,b,c);
> bar(a,b,c);
> so it cannot (in general) tell what foo() does with its arguments.

Sure. Just as it cannot in general perform many other optimizations.

--
Eugene


0 new messages