// Forward declarations:
class Foo;
Foo GetFoo(void);
const Foo& constReference = GetFoo();
const Foo constValue = GetFoo();
I'm interested to know whether in some scenario's (possibly depending on the
implementation of Foo and GetFoo), the initialization of constReference
might outperform the initialization of constValue, by skipping a call to
Foo's copy-constructor.
Kind regards, Niels
--
Niels Dekker
http://www.xs4all.nl/~nd/dekkerware
Scientific programmer at LKEB, Leiden University Medical Center
In principle, the compiler is allowed to call Foo's copy-constructor one
more time in the second case, compared to the first. However, it is
highly likely the compiler would optimize this extra call away in a
release build. I can't think of any situation where it would be unable
to, off the top of my head.
--
With best wishes,
Igor Tandetnik
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead. -- RFC 1925
Thanks, Igor. So "in principle", it would be preferable to bind the returned
value to a const-reference, but in practice (at least for VC++),
copy-initialization performs equally well, right?
I did some tests, but I couldn't find any difference in behavior, between
the two ways to retrieve the return value. Here's my test program:
//////////////////////////////////////////////////
class Foo
{
public:
unsigned copyCount;
Foo(void)
:
copyCount(0)
{
}
Foo(const Foo& arg)
:
copyCount(arg.copyCount + 1)
{
}
Foo& operator=(const Foo& arg)
{
copyCount = arg.copyCount + 1;
return *this;
}
};
Foo GetFoo()
{
// Hoping for Named Return Value Optimization (NRVO).
Foo returnValue;
return returnValue;
}
int main(void)
{
const Foo& constReference = GetFoo();
const Foo constValue = GetFoo();
return constReference.copyCount + (constValue.copyCount << 2);
}
//////////////////////////////////////////////////
Indeed, in Release mode, the test returns zero, indicating that none of the
Foo's has been copied. In Debug mode, the test returns 5, indicating that a
Foo is copied once in both cases. BTW, I still wonder if it wouldn't be
preferable to skip the copying in Debug mode as well. I do appreciate being
able to step into every line of source code, while debugging. But it doesn't
seem very useful to me to step into a copy-constructor, if that function is
being skipped when the application is released. What do you think?
Kind regards, Niels
Yes - for VC++ and, I expect, any modern compiler. E.g. GCC has a very
sophisticated optimizer, in my experience.
> I did some tests, but I couldn't find any difference in behavior,
> between the two ways to retrieve the return value.
This seems to prove the point rather nicely, doesn't it?
> BTW, I still
> wonder if it wouldn't be preferable to skip the copying in Debug mode
> as well. I do appreciate being able to step into every line of source
> code, while debugging. But it doesn't seem very useful to me to step
> into a copy-constructor, if that function is being skipped when the
> application is released. What do you think?
I don't have any particular opinion on this issue.
Interestingly, GCC appears to skip the copy-constructor by default (whenever
allowed), even in debug mode. While it has an option to force the compiler
to call the copy constructor, "-fno-elide-constructors". I guess this option
is rarely switched on :-)
>> I did some tests, but I couldn't find any difference in behavior,
>> between the two ways to retrieve the return value.
>
> This seems to prove the point rather nicely, doesn't it?
My little test certainly doesn't cover all possible scenario's, so I'm still
interested to see a scenario in which binding to const-reference outperforms
copy-initialization from a returned value, if there is any...
But I guess there might be another performance related difference between
the two cases, when the type has a virtual function:
const Foo& constReference = GetFoo();
const Foo constValue = GetFoo();
constReference.VirtualFunc();
constValue.VirtualFunc();
When calling the virtual function on the reference, I guess it will use the
VTable, while it won't do so on the copy-initialized value. Or would VC++
skip the VTable for the reference? I'm not sure how to test this (unless by
digging into the assembler output).
Kind regards, Niels
Not necessarily. It is clear that constReference can only be bound to an
instance of Foo here, and not any derived class. Optimizer could figure
this out and generate a direct (non-virtual) call. I don't know if it's
smart enough though, and I'm too lazy to test.
> Or would VC++ skip the VTable for the reference? I'm not sure how to
> test this (unless by digging into the assembler output).
Yes, looking at assembly appears to be the only way.
Igor Tandetnik wrote:
> Not necessarily. It is clear that constReference can only be bound to an
> instance of Foo here, and not any derived class. Optimizer could figure
> this out and generate a direct (non-virtual) call. I don't know if it's
> smart enough though, and I'm too lazy to test.
I just looked into the ASM output (compiled as "Release"). When VC++ 2008
SP1 compiles the following test as "Release", the compiler inlines
constValue.VirtualFunc(), but it doesn't inline
constReference.VirtualFunc(). Unfortunately...
////////////////////////////////////////
class Foo
{
unsigned memberData;
public:
virtual int VirtualFunc() const;
};
int Foo::VirtualFunc() const
{
return 42;
}
Foo GetFoo()
{
return Foo();
}
int main(int, char**)
{
const Foo& constReference = GetFoo();
const Foo constValue = GetFoo();
int result1 = constReference.VirtualFunc();
int result2 = constValue.VirtualFunc();
return result1 + result2;
}
////////////////////////////////////////
As shown by the ASM output:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; 222 : const Foo& constReference = GetFoo();
; 223 : const Foo constValue = GetFoo();
; 224 : int result1 = constReference.VirtualFunc();
lea ecx, DWORD PTR _$S1$[esp+8]
mov DWORD PTR _$S1$[esp+8], OFFSET ??_7Foo@@6B@
call DWORD PTR ??_7Foo@@6B@
; 225 : int result2 = constValue.VirtualFunc();
; 226 : return result1 + result2;
add eax, 42 ; 0000002a
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
BTW, when I put a breakpoint inside VirtualFunc(), the debugger does indeed
get there, even in a Release build. In Debug mode, it gets there twice (for
both constReference and constValue), in Release mode, it gets there once
(only for constReference).
I guess there's an opportunity for further improvement of the optimizer of
Visual C++, right?
Such optimization won't happen in the foreseeable future.
Otherwise it would defy the whole purpose of virtual functions,
namely polymorphism. In order polymorphism to work in C++ a
function call must be made via polymorphic type. The only
polymorphic types are pointers and references. So, a call to
virtual method of a class via reference cannot be optimized
without breaking C++ rules.
Alex
Except that, in this particular code, the reference is guaranteed to be
bound to an instance of Foo and not to an instance of its derived class,
and thus polymorphism is not in the picture. So the optimizer could, in
principle, generate a non-virtual call to VirtualFunc without changing
the observable behavior of this program. Remember - a compiled program
doesn't have to follow rules, it just has to behave as if it does.
Alex Blekhman wrote:
> Such optimization won't happen in the foreseeable future.
> Otherwise it would defy the whole purpose of virtual functions,
> namely polymorphism. In order polymorphism to work in C++ a
> function call must be made via polymorphic type. The only
> polymorphic types are pointers and references. So, a call to
> virtual method of a class via reference cannot be optimized
> without breaking C++ rules.
Igor Tandetnik wrote:
> Except that, in this particular code, the reference is guaranteed to
> be bound to an instance of Foo and not to an instance of its derived
> class, and thus polymorphism is not in the picture. So the optimizer
> could, in principle, generate a non-virtual call to VirtualFunc
> without changing the observable behavior of this program. Remember -
> a compiled program doesn't have to follow rules, it just has to
> behave as if it does.
Thanks. I guess such an optimization might be worth implementing if it's
a common case, right? So I just did a related posting at comp.lang.c++,
"The best way to retrieve a returned value... by const reference?"
http://groups.google.com/group/comp.lang.c++/browse_thread/thread/0f3ad790abe791fc
Kind regards, Niels