Note that I've first posted this to microsoft.public.vc.language and
after that to comp.lang.c++.moderated where I was told that someone on
comp.compilers may shed some light on this :-)
The Visual Studio compiler will never inline a funtion that returns an
unwindable object (e.g. std::string, CString, etc.)
This is also true for a simple getter function that only contains one
return statement.
Does anyone know why this is and what other compilers do in such a case?
(I tried to find out for GCC, but didn't find any docs.)
See details below.
cheers,
Martin
-------- Original Message --------
Subject: inlining of functions returning an unwindable object -- rationale?
Date: Thu, 19 Nov 2009 10:01:11 +0100
From: Martin B. <0xCDC...@gmx.at>
Greetings.
The Visual Studio compiler will never inline a funtion that returns an
unwindable object (e.g. std::string, CString, etc.)
(See documentation of the inline and __forceinline keyword and the doc
for C4714:
http://msdn.microsoft.com/en-us/library/a98sb923.aspx ,
http://msdn.microsoft.com/en-us/library/z8y1yy88.aspx
)
Can anyone provide a rationale for this? It seems quite weird to me.
Consider this example:
class Simple {
public:
std::string s_;
public:
Simple()
: s_("test")
{ }
std::string get() {
return s_;
}
...
void testsimple()
{
Simple oSimple;
std::string s1( oSimple.get() );
...
void testsimple2()
{
Simple oSimple;
std::string s1( oSimple.s_ );
...
This will always, no matter what, generate a call to get(). (If you
specifiy __forceinline and activate C4714 you'll get that warning)
Find the assembly below, of which I do not claim to understand much, but
it certainly doesn't seem to me as if there's any reason for this.
Especially consider the case where the member is accessed directly. The
calls to the string related functions are exactly the same!
That is, both version will call string functions in this order:
1) string::string (ctor of Simple)
2) string::string (ctor of s1)
3) string::~string
4) string::~string
So what's the deal with not inlining such a simple getter function ??
Find the assembly (VS 2005 / VC8) below.
cheers,
Martin
***************
==>
void testsimple()
{
00401150 push ebp
00401151 mov ebp,esp
00401153 push 0FFFFFFFFh
00401155 push offset __ehhandler$?testsimple@@YAXXZ (402211h)
0040115A mov eax,dword ptr fs:[00000000h]
00401160 push eax
00401161 sub esp,3Ch
00401164 mov eax,dword ptr [___security_cookie (405004h)]
00401169 xor eax,ebp
0040116B mov dword ptr [ebp-10h],eax
0040116E push eax
0040116F lea eax,[ebp-0Ch]
00401172 mov dword ptr fs:[00000000h],eax
Simple oSimple;
00401178 push offset string "test" (403194h)
0040117D lea ecx,[ebp-2Ch]
00401180 call dword ptr
[__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char>
>::basic_string<char,std::char_traits<char>,std::allocator<char> >
(403094h)]
00401186 mov dword ptr [ebp-4],0
std::string s1( oSimple.get() );
0040118D lea eax,[ebp-48h]
00401190 push eax
00401191 lea ecx,[ebp-2Ch]
00401194 call Foo::get (401380h)
}
00401199 lea ecx,[ebp-48h]
0040119C call dword ptr
[__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char>
>::~basic_string<char,std::char_traits<char>,std::allocator<char> >
(40308Ch)]
004011A2 mov dword ptr [ebp-4],0FFFFFFFFh
004011A9 lea ecx,[ebp-2Ch]
004011AC call dword ptr
[__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char>
>::~basic_string<char,std::char_traits<char>,std::allocator<char> >
(40308Ch)]
004011B2 mov ecx,dword ptr [ebp-0Ch]
004011B5 mov dword ptr fs:[0],ecx
004011BC pop ecx
004011BD mov ecx,dword ptr [ebp-10h]
004011C0 xor ecx,ebp
004011C2 call __security_check_cookie (4019D6h)
004011C7 mov esp,ebp
004011C9 pop ebp
004011CA ret
==>
std::string get() {
00401380 push ebp
00401381 mov ebp,esp
00401383 sub esp,8
00401386 mov dword ptr [ebp-8],ecx
00401389 mov dword ptr [ebp-4],0
return s_;
00401390 mov eax,dword ptr [this]
00401393 push eax
00401394 mov ecx,dword ptr [ebp+8]
00401397 call dword ptr
[__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char>
>::basic_string<char,std::char_traits<char>,std::allocator<char> >
(403090h)]
0040139D mov ecx,dword ptr [ebp-4]
004013A0 or ecx,1
004013A3 mov dword ptr [ebp-4],ecx
004013A6 mov eax,dword ptr [ebp+8]
}
004013A9 mov esp,ebp
004013AB pop ebp
004013AC ret 4
**************************
==>
void testsimple2()
{
004011D0 push ebp
004011D1 mov ebp,esp
004011D3 push 0FFFFFFFFh
004011D5 push offset __ehhandler$?testsimple2@@YAXXZ (4022CEh)
004011DA mov eax,dword ptr fs:[00000000h]
004011E0 push eax
004011E1 sub esp,3Ch
004011E4 mov eax,dword ptr [___security_cookie (405004h)]
004011E9 xor eax,ebp
004011EB mov dword ptr [ebp-10h],eax
004011EE push eax
004011EF lea eax,[ebp-0Ch]
004011F2 mov dword ptr fs:[00000000h],eax
Simple oSimple;
004011F8 push offset string "test" (403194h)
004011FD lea ecx,[ebp-2Ch]
00401200 call dword ptr
[__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char>
>::basic_string<char,std::char_traits<char>,std::allocator<char> >
(403094h)]
00401206 mov dword ptr [ebp-4],0
std:string s1( oSimple.s_ );
0040120D lea eax,[ebp-2Ch]
00401210 push eax
00401211 lea ecx,[ebp-48h]
00401214 call dword ptr
[__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char>
>::basic_string<char,std::char_traits<char>,std::allocator<char> >
(403090h)]
}
0040121A lea ecx,[ebp-48h]
0040121D call dword ptr
[__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char>
>::~basic_string<char,std::char_traits<char>,std::allocator<char> >
(40308Ch)]
00401223 mov dword ptr [ebp-4],0FFFFFFFFh
0040122A lea ecx,[ebp-2Ch]
0040122D call dword ptr
[__imp_std::basic_string<char,std::char_traits<char>,std::allocator<char>
>::~basic_string<char,std::char_traits<char>,std::allocator<char> >
(40308Ch)]
00401233 mov ecx,dword ptr [ebp-0Ch]
00401236 mov dword ptr fs:[0],ecx
0040123D pop ecx
0040123E mov ecx,dword ptr [ebp-10h]
00401241 xor ecx,ebp
00401243 call __security_check_cookie (401A66h)
00401248 mov esp,ebp
0040124A pop ebp
0040124B ret
************************
I tried your code with g++ 4.4.1 on amd64. g++ basically emits the
same code for testsimple and testsimple2 and does not call the getter.
I compiled with:
g++ test.cpp -c -O3 -S
Are you sure you turned on all optimizations? Perhaps declaring the
getter as
std::string get() const {
return s_;
}
helps.
Another important issue is the return value optimizatoin of C++.
http://en.wikipedia.org/wiki/Return_value_optimization (though the
article claims msvc++ supports this feature)
Yet another thing one should not forget is that a language allowing
pointers which may point to literally everywhere, pointer arithmetic
and so on really makes headaches for compiler writers. C/C++
enthusiasts often claim how fast pointers are but this is only the
case if the compiler is _really_ smart. That's why Fortran is
sometimes faster than C. And since std::string uses pointers
internally this may be the reason why msvc++ is not able to inline the
function.
Hope this helped a bit.
- Roland Leissa
OK. So it's MSVC++ specific. An interesting other combination would
maybe be MinGW/32 ... may I'll get something running and compare that.
> Are you sure you turned on all optimizations? Perhaps declaring the
> getter as
> std::string get() const {
> return s_;
> }
> helps.
Oh, I specifically *haven't* enabled all optimizations. I only have
enabled inlining, which should be sufficient for this testcase. (Well,
it appears to be with MSVC++)
> Another important issue is the return value optimizatoin of C++.
> http://en.wikipedia.org/wiki/Return_value_optimization (though the
> article claims msvc++ supports this feature)
>
RVO is there with MSVC (it always is, no matter what optimization level).
This shouldn't have any bearing on inlining vs. not inlining I guess.
> Yet another thing one should not forget is that a language allowing
> pointers which may point to literally everywhere, pointer arithmetic
> and so on really makes headaches for compiler writers. C/C++
> enthusiasts often claim how fast pointers are but this is only the
> case if the compiler is _really_ smart. That's why Fortran is
> sometimes faster than C. And since std::string uses pointers
> internally this may be the reason why msvc++ is not able to inline the
> function.
>
Well. MSVC won't inline any function returning an unwindable object,
i.e. an object with a non-empty destructor, it doesn't have anything to
do with if the object has some pointer members.
The point I was making in my OP was that the generated assembly code
minus the call itself + semantics appear to be exactly the same with the
function inlined vs. not inlined, so it strange that MSVC++ will never
inline such functions.
cheers,
Martin