Return value optimization

Matthias Hofmann

unread,

May 12, 2003, 4:50:08 PM5/12/03

to

Hello,

I have a question concerning return value optimization. Let's consider the
following example:

class X;

X f()
{
X a;
return a;
}

void g()
{
X b = f();
}

Without return value optimization, the call to f() in g() will do the
following:

1. Create the X object named "a"
2. Construct a temporary from "a"
3. Construct "b" from the temporary

When return value optimization is used, there are two possible steps:

a.) The object named "b" can be constructed from the object named "a",
bypassing the temporary object
b.) The memory later occupied by the object named "b" can be used for the
object named "a"

I know that step a.) is compliant with the standard, but I am not sure about
step b.) Scott Meyers says in his book "More Effective C++" that named
objects may be optimized away and that the function f() from above then
yields the same results as if it were written like:

X f()
{
return X();
}

Section 12.8/15, however, sounds to me like this is not true - so what
exactly is allowed for an implementation?

Best regards,

Matthias Hofmann

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

John Potter

unread,

May 13, 2003, 2:48:56 PM5/13/03

to

On 12 May 2003 16:50:08 -0400, "Matthias Hofmann" <hof...@anvil-soft.com>
wrote:

> I have a question concerning return value optimization. Let's consider the
> following example:

Some problems with what "return value optimization" means. There are two
distinct optimizations here. The one within the function is often called
named return value optimization because it seems to remove the named
variable. The second is in the caller and is a result of the calling
convention for return by value.

> class X;

> X f()
> {
> X a;
> return a;
> }

> void g()
> {
> X b = f();
> }

> Without return value optimization, the call to f() in g() will do the
> following:

> 1. Create the X object named "a"

That happens in f.

> 2. Construct a temporary from "a"

That happens in f.

> 3. Construct "b" from the temporary

That happens in g.

> When return value optimization is used, there are two possible steps:

> a.) The object named "b" can be constructed from the object named "a",
> bypassing the temporary object

No. They are invisible to eachother. Inlining may not change semantics.
Treat the two functions as being in two translation units. I guess you
could claim that happens, but I don't see how to implement it. The object
named "a" expires before the return.

> b.) The memory later occupied by the object named "b" can be used for the
> object named "a"

That is what happens.

> I know that step a.) is compliant with the standard, but I am not sure about
> step b.) Scott Meyers says in his book "More Effective C++" that named
> objects may be optimized away and that the function f() from above then
> yields the same results as if it were written like:

> X f()
> {
> return X();
> }

> Section 12.8/15, however, sounds to me like this is not true - so what
> exactly is allowed for an implementation?

It's tricky wording. The first part allows the caller optimization and
the second part allows the function optimization. Both are worded to
remove a temporary; however, it gets removed twice.

Consider the calling convention. First the function.

X f () {
X a;
return a;
}

is implemented as

void f (void* p) {
X a;
// use a
new (p) X(a);
}

The optimization gives

void f (void* p) {
X& a(*new (p) X());
// use a
}

Note that the words claim that the temporary was removed while in reality
it was the local that got removed. Just legalize. The space for the
temporary was used for the name of the local. Since the local was placed
in the space for the temporary, it is possible to say that the temporary
object was removed by using its space for something else.

Now the caller.

X b = f();

Because of like types, same as.

X b(f());

Implemented as.

RawXSpace stuff;
f(&stuff);
X& b(reinterpret_cast<X&>(stuff));

Here the temporary is removed by using the space for b.

The formalism above is not quite accurate, but I think you can see how
it all works. The standard was written to allow exactly this. It was
also a standardization of existing practice.

John

Matthias Hofmann

unread,

May 13, 2003, 7:36:29 PM5/13/03

to

John Potter <jpo...@falcon.lhup.edu> schrieb in im Newsbeitrag:
voc0cvsj4osb9cc0q...@4ax.com...

> On 12 May 2003 16:50:08 -0400, "Matthias Hofmann" <hof...@anvil-soft.com>
> wrote:
> >
> > a.) The object named "b" can be constructed from the object named "a",
> > bypassing the temporary object
>
> No. They are invisible to eachother. Inlining may not change semantics.
> Treat the two functions as being in two translation units. I guess you
> could claim that happens, but I don't see how to implement it. The object
> named "a" expires before the return.

What I ment was the following: If f() is implemented the way you explained,
which is

void f (void* p)
{
X a;
// use a
new (p) X(a);
}

then the object named "b" would be constructed from the object named "a" if
the address of "b" is passed, like

void g()
{
X b;
f( &b ); // Leads to "new (&b) X(a);" within f().
}

In that case, it does not matter wether "a" expires before the return
because the copy of "a" will have been made by then.

>
> > b.) The memory later occupied by the object named "b" can be used for
the
> > object named "a"
>
> That is what happens.
>

I see it this way: The (hidden) pointer passed to f() points to the memory
which is set up to be the returned object. Within f(), this memory may
directly be used for the local object, as an optimization, or the local
object may be copied there before the function returns. As for the caller,
it is possible to pass the address of "b" as an optimization, or to pass the
address of some memory on the stack and copy that later into "b". So there
are two means of optimization, and I think they are independent of each
other, which means the compiler might use just any one of them or both, the
result should be the same.

>
> [snip]

>
> The formalism above is not quite accurate, but I think you can see how
> it all works. The standard was written to allow exactly this. It was
> also a standardization of existing practice.
>

This is what I wanted to know, thanks!

Regards,

Matthias

John Potter

unread,

May 16, 2003, 12:18:46 PM5/16/03

to

On 13 May 2003 19:36:29 -0400, "Matthias Hofmann"
<hof...@anvil-soft.com>
wrote:

> I see it this way: The (hidden) pointer passed to f() points to the

memory
> which is set up to be the returned object. Within f(), this memory may
> directly be used for the local object, as an optimization, or the
local
> object may be copied there before the function returns. As for the
caller,
> it is possible to pass the address of "b" as an optimization, or to
pass the
> address of some memory on the stack and copy that later into "b". So
there
> are two means of optimization, and I think they are independent of
each
> other, which means the compiler might use just any one of them or
both, the
> result should be the same.

Yes, of course. The double removal is the usual problem and I lost
sight
of the simple case. Thanks for the correction.

Another interesting result of this is a reread of the second part of
12.8/15. Note that gcc uses the named return value optimization in
the code posted in the other thread when the templated ctor is used.
Simplified code repeated here.

#include <typeinfo>
extern "C" int printf(const char *, ...);
struct E {
E () { }
E (E volatile&) {
printf("E::E(E volatile&)\n");
}
template<class T> E(T const&) {
printf("E::E<>(%s const&)\n", typeid(T).name());
}
template <class T> E(T&) {
printf("E::E<>(%s&)\n", typeid(T).name());
}
};
E f (E const& p) {
E e = p;
return e;
}
int main() {
E e1;
E e2 = f(e1);
}

The copy from e1 into e uses the const template. The copy from e
to the return value uses the non-const template. The copy from the
return value to e2 uses the const template. Gcc removes the second
copy. Changing to return *&e which is no longer an expression which
is the name of a local changes nothing. Changing to return
static_cast<E&>(e) shows the three copies.

Before we argue about the wording of the second part which does just
say that the implementation may omit creating the temporary, note that
this entire paragraph has been rewritten for TC1. The new wording
starts "... the implementation is allowed to omit the copy
construction".
Very clear that everything is about copy construction which we can all
agree is done by a copy constructor.

It looks like gcc may conform to the prior wording, but I don't think
it conforms to the new wording. It kept the other copy which is
correct.
I think Rani reported that EDG removes both copies which seems wrong.

If we have this right, Brian Parker finally knows how to prevent those
optimizations from destroying the side effects of his copy ctors. :)

John

Matthias Hofmann

unread,

May 17, 2003, 4:37:12 PM5/17/03

to

This sounds to me like gcc eliminates the local object, not the temporary,
like in the following pseudo code:

void f( void* pv, E const& pe)
{
E& e(*new (pv) E( pe )); // First copy: e1 into e.
}

int main()
{
E e1;
RawMemForE temp;
f( &temp, e1 );
E e2( reinterpret_cast<E const&>( temp ) ); // Third copy: return value into
e2.
}

On the other hand, one might argue that the temporary is removed, not the
local object. Maybe it is more precise to say that the temporary and the
local object have been merged, as they share the same memory.

>
> Before we argue about the wording of the second part which does just
> say that the implementation may omit creating the temporary, note that
> this entire paragraph has been rewritten for TC1. The new wording
> starts "... the implementation is allowed to omit the copy
> construction".
> Very clear that everything is about copy construction which we can all
> agree is done by a copy constructor.
>
> It looks like gcc may conform to the prior wording, but I don't think
> it conforms to the new wording. It kept the other copy which is
> correct.
> I think Rani reported that EDG removes both copies which seems wrong.
>

Even the old wording states that the temporary can be optimized away "even
if the class copy constructor or destructor have side effects". That sounds
to me like the issue is about copy construcors. For that
reason, I think gcc is neither compliant with the old wording nor with the
new one.

Regards,

Matthias