Const Reference Return Efficiency

Joe Chen

unread,

Jul 5, 2004, 6:08:35 AM7/5/04

to

Hello,

I have recently seen code written like the following:

class A {
public:
const std::string& getValue() { return value_; }

protected:
std::string value_;
};

const std::string& val = myA.getValue();

with the argument that returning by const reference
avoids the temporary construction/destruction of the returned
string and a copy constructor call for the return, if it
were returned by value - hence being more efficient.

Seeing how I cringe at premature optimization, I have a few
concerns/questions about this I was hoping the community
could help me with.

1. Since most modern compilers with do RVO on the getValue and
possibly eliminate the temporary, is avoiding the copy construction
on a return by value worth code that is encapsulation messy? I
feel uncomfortable revealing the internals of my class, or
introducting confusing lifetime symantics of the member variables
inside.

2. Will this prevent other compiler optimizations due to aliasing?
My suspicion is that since the compiler can never know how many
references to the internal representation there are, it is possible
the code is missing out on optimizations greater than what the
missing copy constructor provides.

3. Unless this is required by being in a large, tight loop and a
profiler deems it necessary, I feel that the cost of a copy construction
of the string if returned by value would be completely swamped by
other calculations done within a function and probably wouldn't
effect the overal profile. Therefore, it loses out on solid OO
encapsulation and flexibility at the cost of mistaken effciency.

What are you thoughts?
Thanks!

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Francis Glassborow

unread,

Jul 5, 2004, 11:01:59 AM7/5/04

to

In article <6766c4c.04070...@posting.google.com>, Joe Chen
<joe_w...@hotmail.com> writes

>I have recently seen code written like the following:
>
>class A {
>public:
> const std::string& getValue() { return value_; }
>
>protected:
> std::string value_;
>};
>
>const std::string& val = myA.getValue();
>
>with the argument that returning by const reference
>avoids the temporary construction/destruction of the returned
>string and a copy constructor call for the return, if it
>were returned by value - hence being more efficient.

Yes that is a very common C++ idiom.

>
>Seeing how I cringe at premature optimization, I have a few
>concerns/questions about this I was hoping the community
>could help me with.

I wonder if this is really a case of premature optimisation.

>
>1. Since most modern compilers with do RVO on the getValue and
>possibly eliminate the temporary, is avoiding the copy construction
>on a return by value worth code that is encapsulation messy? I
>feel uncomfortable revealing the internals of my class, or
>introducting confusing lifetime symantics of the member variables
>inside.

There is no conceivable way that a compiler can optimise away all copies
if you return by value. To see why, consider your example. If the string
were returned by value, that value will have to exist throughout the
lifetime of val. That means that it must have a lifetime that is not
dependant on the lifetime of myA.

Of course in the example above there is the potential for a hanging
reference but the programmer knows that because of the signature of
A::getValue(). Change that return to a return by value and there is not
such implication.

>
>2. Will this prevent other compiler optimizations due to aliasing?
>My suspicion is that since the compiler can never know how many
>references to the internal representation there are, it is possible
>the code is missing out on optimizations greater than what the
>missing copy constructor provides.

Actually the real cost of const references is much subtler though still
related to aliasing. The problem occurs because C++ allows global
variables as well as separate TUs. This means that the compiler always
has to assume that const references may in fact mutate unless it can
proof otherwise.

int foo(mytype const & mt){
// do things without function calls
// optimiser can assume mt is actually immutable
bar();
// unless the compiler can see the implementation of bar()
// or knows that mt cannot be a reference to a global
// on return from bar() it must assume that mt may have changed
}

>
>3. Unless this is required by being in a large, tight loop and a
>profiler deems it necessary, I feel that the cost of a copy construction
>of the string if returned by value would be completely swamped by
>other calculations done within a function and probably wouldn't
>effect the overal profile. Therefore, it loses out on solid OO
>encapsulation and flexibility at the cost of mistaken effciency.

Using the small string optimisation [premature? :-)] assuming the string
is small the cost of copying may be small but in general std::string
uses dynamic memory and that is often expensive.

>
>What are you thoughts?

It is the job of a class designer to have sufficient understanding of
implementation to identify when return by value is appropriate. Of
course not all class designers exercise such care and attention to
detail, but those that do should not be open to criticism for premature
optimisation. Not all optimisation is premature. This is particularly
the case for third party libraries where the library designer has little
if any knowledge of how the library will be used.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Antoun Kanawati

unread,

Jul 5, 2004, 2:59:04 PM7/5/04

to

Joe Chen wrote:

> Hello,
>
> I have recently seen code written like the following:
>
> class A {
> public:
> const std::string& getValue() { return value_; }
>
> protected:
> std::string value_;
> };
>
> const std::string& val = myA.getValue();
>
> with the argument that returning by const reference
> avoids the temporary construction/destruction of the returned
> string and a copy constructor call for the return, if it
> were returned by value - hence being more efficient.
>
> Seeing how I cringe at premature optimization, I have a few
> concerns/questions about this I was hoping the community
> could help me with.

The reservation on premature optimization is a rule of thumb, not
an article of faith.

When looking at such an issue, you have to temper your view with
the considerations of good design.

In this particular case, the usage style show below

std::string val = myA.getValue();

addresses all your concerns by removing the "const &".

However, you still have *the option* of not copying the string,
if you feel strongly about that.

Hence, one feels "good" about this style because it does not
preclude options that may be important to the user in contexts
that the designer could not have thought of (a common situation
for reusable classes).

Also, when writing reusable libraries, one is often forced
into "absolute optimization", which is even worse than "premature
optimization". Because, unlike application specific libraries,
general purpose libraries must be as universal as possible.

So, in any general purpose library, you'll find more optimization
than any particular profiling session would dictate.

Finally, you should never depend on the optimizer to make the code
qualitatively better; the optimizer is only a fine tuning phase of code
generation, and most of what it does is "optional at the discretion
of the optimizer", that is: may not happen when you want it to.
--
Antoun Kanawati
antou...@comcast.dot.net
[remove .dot and .at before use]

Alex Vinokur

unread,

Jul 5, 2004, 3:06:20 PM7/5/04

to

"Joe Chen" <joe_w...@hotmail.com> wrote in message news:6766c4c.04070...@posting.google.com...

> Hello,
>
> I have recently seen code written like the following:
>
> class A {
> public:
> const std::string& getValue() { return value_; }
>
> protected:
> std::string value_;
> };
>
> const std::string& val = myA.getValue();
>
> with the argument that returning by const reference
> avoids the temporary construction/destruction of the returned
> string and a copy constructor call for the return, if it
> were returned by value - hence being more efficient.
>

[snip]

Relevant issue has been discussed in thread "[Comparative performance] Argument passing (built-in types)" at
http://groups.google.com/groups?threadm=bjosu1%24lk214%241%40ID-79865.news.uni-berlin.de

--
Alex Vinokur
http://mathforum.org/library/view/10978.html
http://sourceforge.net/users/alexvn

Dave Harris

unread,

Jul 5, 2004, 6:33:02 PM7/5/04

to

joe_w...@hotmail.com (Joe Chen) wrote (abridged):

> I have recently seen code written like the following:
>
> class A {
> public:
> const std::string& getValue() { return value_; }
>
> protected:
> std::string value_;
> };

I would consider that OK.

> const std::string& val = myA.getValue();

Where-as this is dangerous. Safer to use:

const std::string val = myA.getValue();

so we get our own copy of the string, with no aliases or lifetime
problems.

With a good compiler, in that case, the string will only be copied once
even if we don't return it by reference. However, the reference should be
a benefit in other cases, eg:

size_t len = myA.getValue().size();

To copy the string in this case would be an unnecessary pessimisation. It
would take time O(n) instead of O(1). Generally I prefer fast code, unless
there is a good reason for it to be slow.

-- Dave Harris, Nottingham, UK

Erik Max Francis

unread,

Jul 5, 2004, 6:38:31 PM7/5/04

to

Alex Vinokur wrote:

> Relevant issue has been discussed in thread "[Comparative performance]
> Argument passing (built-in types)" at
> http://groups.google.com/groups?threadm=bjosu1%24lk214%241%40ID-79865.news.uni-berlin.de

And in that same thread, someone pointed out the numerous systematic
problems with your analysis.

--
__ Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
/ \ San Jose, CA, USA && 37 20 N 121 53 W && AIM erikmaxfrancis
\__/ In time of war the devil makes more room in hell.
-- (a German proverb)

Joe Chen

unread,

Jul 5, 2004, 6:41:01 PM7/5/04

to

Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message news:<jmU3XiOX...@robinton.demon.co.uk>...

> Yes that is a very common C++ idiom.

Is it typically a good idiom to use for optimization, even though it
returns a handle to internal data? (which I thought was bad).

> There is no conceivable way that a compiler can optimise away all copies
> if you return by value. To see why, consider your example. If the string
> were returned by value, that value will have to exist throughout the
> lifetime of val. That means that it must have a lifetime that is not
> dependant on the lifetime of myA.

Understood. I was concerned that dependent lifetimes and ownership
issues were a large design price to pay for a potentially small
optimization, or an optimization that could have been better.

> Actually the real cost of const references is much subtler though still
> related to aliasing. The problem occurs because C++ allows global
> variables as well as separate TUs. This means that the compiler always
> has to assume that const references may in fact mutate unless it can
> proof otherwise.

I guess this was my original concern. Does this code trade the
optimization of eliminating all copies of the string for possibly
better optimization that could have been done if it was returned by
value? Basically, could the code have been faster if the string was
returned by value?

Thanks,
Joe

Nicola Musatti

unread,

Jul 6, 2004, 7:38:57 AM7/6/04

to

joe_w...@hotmail.com (Joe Chen) wrote in message news:<6766c4c.04070...@posting.google.com>...

> Hello,
>
> I have recently seen code written like the following:
>
> class A {
> public:
> const std::string& getValue() { return value_; }
>
> protected:
> std::string value_;
> };
>
> const std::string& val = myA.getValue();
>

[Very sensible analysis snipped]
>
> What are you thoughts?

I think we should concern ourselves more with writing safe code than
with writing supposedly fast code. In order to do so I have a very
simple guideline on the subject:

Never return by reference unless you are returning either *this or an
argument you received by reference.

Those who preach otherwise should remember that the days of few
megahertz clocks and no cache are long gone for the vaste majority of
us.

Cheers,
Nicola Musatti

Francis Glassborow

unread,

Jul 6, 2004, 5:09:28 PM7/6/04

to

In article <a327cf48.04070...@posting.google.com>, Nicola
Musatti <Nicola....@ObjectWay.it> writes

>I think we should concern ourselves more with writing safe code than
>with writing supposedly fast code.

Agreed.

> In order to do so I have a very
>simple guideline on the subject:
>
>Never return by reference unless you are returning either *this or an
>argument you received by reference.

I think that is a poor guideline. A few years ago people would advocate
the use of copy-on-write as a way to achieve good performance even when
passing by value. More recently that idiom has come under close scrutiny
and does not seem to hold up well in multi-threaded environments.

Library implementors have to make careful judgement calls, and the good
ones get it right more often than not and go on learning from
experience.

Now let me give you a direct counter example to your guideline. Consider
a class that has an std::vector member as a repository for a certain
type of data. Now how do I provide access to that data for searching
given that I do not know the general nature of the searches that a user
may wish to implement?

Copying a vector of unknown size can be so expensive as to completely
dominate the performance of an application. The class designer has no
way to know how expensive copying will be, only the user can know that.

Between the issue of how to make available the value of a member of a
builtin type (clearly by value?) and that of a possibly very large
container (clearly by const reference?) there lies a wide intermediate
range. At what point should we switch mechanism? Now throw in
multi-threading to the mix.

The final issue is that objects rather than values are non-copyable. If
we wish to provide read access to an object that is part of a larger
object we often have no choice.

std::string is a special case because conceptually it is value based
even though its potential size often makes it more suitable to handle it
as an object.

>
>Those who preach otherwise should remember that the days of few
>megahertz clocks and no cache are long gone for the vaste majority of
>us.

But we also deal with immensely larger objects, some of which may be
distributed across the World.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Antoun Kanawati

unread,

Jul 6, 2004, 5:13:50 PM7/6/04

to

Nicola Musatti wrote:
> I think we should concern ourselves more with writing safe code than
> with writing supposedly fast code. In order to do so I have a very
> simple guideline on the subject:

[snip]

> Those who preach otherwise should remember that the days of few
> megahertz clocks and no cache are long gone for the vaste majority of
> us.

I would love to believe that the days of CPU cycle squeezing are gone.
But, they're not.

My first PC was 4.77Mhz/640KB; today, I have a p4/2.8 and 512MB. I
recall distinctly that the old box was reasonably responsive, and became
more so when I boosted it to 8Mhz. Today, I still wait for things to
happen, some programs still feel sluggish, some programs still crash,
though my hardware is thousands of times faster and my memory is
hundreds of times larger than it used to be.

But that's just on the PC (and similar entities). When you're talking
embedded machinery, these assumptions about oodles of mhz and megabytes
are not valid. Today, for each PC, there are several embedded
processors: the digital watch, the microwave, the cell phone, the PDA,
the cable box, the printer, the wireless phone, the caller-id gizmo next
to it, the CD/MP3 player in my car, the car itself, etc... and that was
only a quick glance around the room, and a look at the driveway.

Regarding safety at the expense of speed: assume that the programmer
using your libraries is a professional capable of making professional
and responsible choices. For example, if you use the const-ref return
idiom, and this programmer has a concern with safety, it's very easy
for them to make a copy; but, if they happen to have a need for an
extra cycle or two [copying a std::string is not just a cycle or
two], they'll be happy that they don't have to rewrite your class
to squeeze those cycles out.

If you want a "guiding principle": Do not ever preclude safe (or
optimal) options.

This is quite different from: Provide only the safe option even if it
costs speed and/or memory. You'll never be able to compensate for
irresponsible programming practices, and you'll provide challenges to
the irresponsible macho types that'll create more headaches than the
ones you thought you were avoiding.

--
Antoun Kanawati
antou...@comcast.dot.net
[remove .dot and .at before use]

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

David Abrahams

unread,

Jul 6, 2004, 5:15:56 PM7/6/04

to

Nicola....@ObjectWay.it (Nicola Musatti) writes:

> I think we should concern ourselves more with writing safe code than
> with writing supposedly fast code. In order to do so I have a very
> simple guideline on the subject:
>
> Never return by reference unless you are returning either *this or an
> argument you received by reference.
>
> Those who preach otherwise should remember that the days of few
> megahertz clocks and no cache are long gone for the vaste majority of
> us.

Why is *this any safer than *this->member ?
--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

Motti Lanzkron

unread,

Jul 6, 2004, 5:20:04 PM7/6/04

to

Nicola Musatti wrote:
>
>Never return by reference unless you are returning either *this or an
>argument you received by reference.

I think that's a bit of an overstatement.

What's the difference between returning a member by ref and returning
*this? Both have the same life-span so the risk of dangling references
is the same. There are many legitimate reasons one would want to
return a reference to a member (even a const one), it could be
non-copyable for instance.

Joe Chen

unread,

Jul 7, 2004, 3:58:29 AM7/7/04

to

My apologies to Francis for accidentally hitting "reply" instead of "reply
group". Francis, please disregard the personal reply!

"Francis Glassborow" <fra...@robinton.demon.co.uk> wrote in message

news:<pL3lsSQahp6AFwe$@robinton.demon.co.uk>...

> Library implementors have to make careful judgement calls, and the
> good ones get it right more often than not and go on learning from
> experience.

Does this mean that encapsulation and abstraction are at odds with
efficiency and generic programming?

> Now let me give you a direct counter example to your guideline.
> Consider a class that has an std::vector member as a repository for a
> certain type of data. Now how do I provide access to that data for
> searching given that I do not know the general nature of the searches
> that a user may wish to implement?

Isn't that the job that an iterator is designed to perform? What if the
implementor wanted to change the internal repository from a vector to a
list? I'm thinking of Meyer's EC++ #29, where it advises not to return
handles to internal data because it may make your library less flexible to
maintain, or may force changes on users when changes internal to the API are
made.

> Copying a vector of unknown size can be so expensive as to completely
> dominate the performance of an application. The class designer has no
> way to know how expensive copying will be, only the user can know that.

Wouldn't that make it the user's decision on whether to use the correct API
for his purposes? How can a library implementor possibly predict all of the
circumstance his library can be used under?
If someone writes a set of APIs that return references (const or not) to
internal data members, then those lifetime couplings can easily spread
throughout all code that uses that API. In some cases, I would end up having
to make copies of those objects anyway in order to clean up my design. In
this case, the library implementor hasn't forseen that I don't need the
performance as much as I need the simplicity of design. So, again, it begs
the question of whether good OO design is at odds with efficiency.

And should efficiency always be considered before anything else?

Thanks!
Joe

Ben

unread,

Jul 7, 2004, 1:20:40 PM7/7/04

to

>
> 1. Since most modern compilers with do RVO on the getValue and
> possibly eliminate the temporary, is avoiding the copy construction
> on a return by value worth code that is encapsulation messy? I
> feel uncomfortable revealing the internals of my class, or
> introducting confusing lifetime symantics of the member variables
> inside.

I understand your concern and somewhat agree with you. Returning
refernece to some implementation-specific inner object is a pain. And
returning by value in this case is obviously neater semantics-wise.
However, reality is cruel. RVO is not guaranteed to happen on every
compiler.
I suspect it will never be guaranteed.
And when we don't want to rely on RVO, it becomes a dilemma. Whether
we want safety and neat encapsulation or performance? I guess many
people prefer the latter. Faster, simple and the cost of safety and
encapsulation is quite affordable. After all, c++ is not an absolutely
safe language.

I'd say return-by-const-ref in this case is a sub-ideal but practical
solution.

>
> 2. Will this prevent other compiler optimizations due to aliasing?
> My suspicion is that since the compiler can never know how many
> references to the internal representation there are, it is possible
> the code is missing out on optimizations greater than what the
> missing copy constructor provides.

Maybe not. Talking about aliasing, if getValue() is inlined,
obj.getValue() is no different than obj._value to the optimizer.
Storing reference or a pointer in an object may cause aliasing
problem, but not return-by-ref.

>
> 3. Unless this is required by being in a large, tight loop and a
> profiler deems it necessary, I feel that the cost of a copy construction
> of the string if returned by value would be completely swamped by
> other calculations done within a function and probably wouldn't
> effect the overal profile. Therefore, it loses out on solid OO
> encapsulation and flexibility at the cost of mistaken effciency.
>

Well. it depends on how your class is gonna be used. As Antoun pointed
out, If it is an open world (like a lib or a common module in a big
project), u really cannot predict if return-by-val can eventually
cause performance problem.

Francis Glassborow

unread,

Jul 7, 2004, 1:43:09 PM7/7/04

to

In article <wPGGc.22651$vO1.1...@nnrp1.uunet.ca>, Joe Chen
<joe_w...@hotmail.com> writes

>My apologies to Francis for accidentally hitting "reply" instead of "reply
>group". Francis, please disregard the personal reply!

No problem.

>
>"Francis Glassborow" <fra...@robinton.demon.co.uk> wrote in message
>news:<pL3lsSQahp6AFwe$@robinton.demon.co.uk>...
> > Library implementors have to make careful judgement calls, and the
> > good ones get it right more often than not and go on learning from
> > experience.
>
>Does this mean that encapsulation and abstraction are at odds with
>efficiency and generic programming?

Sometimes, and again judgement is called for. That is one reason that
good library designers are paid (even if not always paid enough)

>
> > Now let me give you a direct counter example to your guideline.
> > Consider a class that has an std::vector member as a repository for a
> > certain type of data. Now how do I provide access to that data for
> > searching given that I do not know the general nature of the searches
> > that a user may wish to implement?
>
>Isn't that the job that an iterator is designed to perform? What if the
>implementor wanted to change the internal repository from a vector to a
>list? I'm thinking of Meyer's EC++ #29, where it advises not to return
>handles to internal data because it may make your library less flexible to
>maintain, or may force changes on users when changes internal to the API are
>made.

Actually I look at it differently, once I have 'published' some internal
detail via a return type I have renounced the option to change that
detail unless I can do so without changing the public interface. To me
public interfaces are inviolate and may only be added to, never
subtracted from or modified (unless such modification continues to
provide all previous behaviour)

>
> > Copying a vector of unknown size can be so expensive as to completely
> > dominate the performance of an application. The class designer has no
> > way to know how expensive copying will be, only the user can know that.
>
>Wouldn't that make it the user's decision on whether to use the correct API
>for his purposes? How can a library implementor possibly predict all of the
>circumstance his library can be used under?

Exactly, s/he cannot and therefore should not pre-empt decisions best
left to the user. Return by const reference leaves the user the option
to copy and pay the costs of that. Return by value removes that choice
from the user so the library designer better have good reasons for
making that choice.

>If someone writes a set of APIs that return references (const or not)

No, stick to the current discussion, returning a non-const reference is
conceptually a very different thing, one that is far more rarely
justifiable.

> to
>internal data members, then those lifetime couplings can easily spread
>throughout all code that uses that API. In some cases, I would end up having
>to make copies of those objects anyway in order to clean up my design. In
>this case, the library implementor hasn't forseen that I don't need the
>performance as much as I need the simplicity of design. So, again, it begs
>the question of whether good OO design is at odds with efficiency.

Even with your assertions about good OO design, we are discussing C++
which allows (and even encourages) the use of other paradigms. Actually
it is very hard to see how return by value equates with good OO where
some would claim that there should be no such thing as a value
(interesting to contrast that with functional programming in which there
is, effectively, no such thing as an object)

>
>And should efficiency always be considered before anything else?

No, but nor should it not be considered. As always, this is a matter of
judgement. The problem with guidelines is that they are often used as a
substitute for judgement.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Nicola Musatti

unread,

Jul 7, 2004, 2:00:44 PM7/7/04

to

Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message news:<pL3lsSQahp6AFwe$@robinton.demon.co.uk>...

> In article <a327cf48.04070...@posting.google.com>, Nicola
> Musatti <Nicola....@ObjectWay.it> writes
> >I think we should concern ourselves more with writing safe code than
> >with writing supposedly fast code.
>
> Agreed.
>
> > In order to do so I have a very
> >simple guideline on the subject:
> >
> >Never return by reference unless you are returning either *this or an
> >argument you received by reference.
>
> I think that is a poor guideline. A few years ago people would advocate
> the use of copy-on-write as a way to achieve good performance even when
> passing by value. More recently that idiom has come under close scrutiny
> and does not seem to hold up well in multi-threaded environments.
>
> Library implementors have to make careful judgement calls, and the good
> ones get it right more often than not and go on learning from
> experience.

There still are better alternatives to returning by value than
returning a reference to a class member.

> Now let me give you a direct counter example to your guideline. Consider
> a class that has an std::vector member as a repository for a certain
> type of data. Now how do I provide access to that data for searching
> given that I do not know the general nature of the searches that a user
> may wish to implement?
>
> Copying a vector of unknown size can be so expensive as to completely
> dominate the performance of an application. The class designer has no
> way to know how expensive copying will be, only the user can know that.

You might provide iterators into the vector, possibly of a type that
embeds a reference count. In this way you avoid copying the object,
but you retain control on the lifetime of the vector.

> Between the issue of how to make available the value of a member of a
> builtin type (clearly by value?) and that of a possibly very large
> container (clearly by const reference?) there lies a wide intermediate
> range. At what point should we switch mechanism? Now throw in
> multi-threading to the mix.

My point is there are more than the two alternatives. In many cases
return by value is acceptable, in others you have to return what
amounts to the address of a data member. The problem with returning a
reference to it is that you are committing to guaranteeing its
existence for an indeterminate time, while loosing track of the fact
that some external entity is holding onto it. A clear recipe for
catastrophic errors, unless you know for sure that our data member
will last till program termination (a case I should have considered in
my guideline).

I personally consider that even returning a plain pointer is better:
at least in this way you make no promise. The best solution however is
to explicitly handle the shared ownership.

[...]

> std::string is a special case because conceptually it is value based
> even though its potential size often makes it more suitable to handle it
> as an object.

I suspect that the unfortunate choice was not to distinguish between
modifiable and non modifiable strings (copy on write works perfectly
when you never write! [well, not entirely true, but almost]).

> >Those who preach otherwise should remember that the days of few
> >megahertz clocks and no cache are long gone for the vaste majority of
> >us.
>
> But we also deal with immensely larger objects, some of which may be
> distributed across the World.

That's why we invented handles and smart pointers (not to mention
caches).

Cheers,
Nicola Musatti

Nicola Musatti

unread,

Jul 7, 2004, 9:23:56 PM7/7/04

to

David Abrahams <da...@boost-consulting.com> wrote in message news:<u8ydxg...@boost-consulting.com>...

> Nicola....@ObjectWay.it (Nicola Musatti) writes:
>
> > I think we should concern ourselves more with writing safe code than
> > with writing supposedly fast code. In order to do so I have a very
> > simple guideline on the subject:
> >
> > Never return by reference unless you are returning either *this or an
> > argument you received by reference.
> >
> > Those who preach otherwise should remember that the days of few
> > megahertz clocks and no cache are long gone for the vaste majority of
> > us.
>
> Why is *this any safer than *this->member ?

Returning *this is not inherently safer than returning a reference to
a member. However once you have enough access to an object to be able
to call a member function on it all the problems related to that
object's lifetime are already exposed in the context of the call.

Cheers,
Nicola Musatti

Nicola Musatti

unread,

Jul 7, 2004, 9:33:36 PM7/7/04

to

Antoun Kanawati <ant...@comcast.net> wrote in message news:<hyxGc.33395$Oq2.7709@attbi_s52>...

> Nicola Musatti wrote:
> > I think we should concern ourselves more with writing safe code than
> > with writing supposedly fast code. In order to do so I have a very
> > simple guideline on the subject:
> [snip]
> > Those who preach otherwise should remember that the days of few
> > megahertz clocks and no cache are long gone for the vaste majority of
> > us.
>
> I would love to believe that the days of CPU cycle squeezing are gone.
> But, they're not.

Certainly not, but while some twenty years ago this was a generalized
problem, today it isn't. Worse still, way back then processors were
rather simple in architecture and we could be rather certain that a
few basic assumptions on performance held almost always. Nowadays,
with caching and pipelining most bets are off. Moreover many
mainstream applications interface external entities whose impact on
performance is much greater than the application itself. Take for
instance applications that query a relational database: the effect of
adding a well chosen index to the database may have an impact that
just cannot be matched by just manipulating the application code.

[...]

> But that's just on the PC (and similar entities). When you're talking
> embedded machinery, these assumptions about oodles of mhz and megabytes
> are not valid. Today, for each PC, there are several embedded
> processors: the digital watch, the microwave, the cell phone, the PDA,
> the cable box, the printer, the wireless phone, the caller-id gizmo next
> to it, the CD/MP3 player in my car, the car itself, etc... and that was
> only a quick glance around the room, and a look at the driveway.

I'm aware of that. Yet, while very important in terms of size, this is
a specific market which should be targeted specifically; moreover, it
may just be just a matter of time. Afterall, these devices are very
small, but they most often deal with tractable problems.

> Regarding safety at the expense of speed: assume that the programmer
> using your libraries is a professional capable of making professional
> and responsible choices. For example, if you use the const-ref return
> idiom, and this programmer has a concern with safety, it's very easy
> for them to make a copy; but, if they happen to have a need for an
> extra cycle or two [copying a std::string is not just a cycle or
> two], they'll be happy that they don't have to rewrite your class
> to squeeze those cycles out.

I guess that the assumptions you can make change with the level of
abstraction you're targeting. I'm mostly an application programmer and
it is part of my job to try and prevent mistakes from less experienced
members of our team. Yet I believe that the basic principle still
holds: make it correct first, and only make it fast afterwards. I'm
not advocating the removal of return by reference from C++; I just
think that it should almost never be used to provide access to a class
member.

> If you want a "guiding principle": Do not ever preclude safe (or
> optimal) options. This is quite different from: Provide only the safe option
> even if it costs speed and/or memory. You'll never be able to compensate for
> irresponsible programming practices, and you'll provide challenges to
> the irresponsible macho types that'll create more headaches than the
> ones you thought you were avoiding.

I'm saying a different thing: only provide the unsafe option when it
is necessary.

Cheers,
Nicola Musatti

Dave Harris

unread,

Jul 7, 2004, 9:51:22 PM7/7/04

to

da...@boost-consulting.com (David Abrahams) wrote (abridged):

> Why is *this any safer than *this->member ?

Well, I've had cases where a subclass did not want to return a reference
to *this->member. It wanted something like:

std::vector<int> one;
one.push_back( 1 );
return one; // Return a temporary - want by-copy.

which is unsafe if we are returning a const reference. An alternative was:

static std::vector<int> one;
if (one.empty())
one.push_back( 1 );
return one;

so that the vector is static and shared between all instances. However,
this doesn't work well if the vector's content is not constant (and there
are multi-threading issues to worry about too).

Basically, the interface was wrong and needed to be changed. The person
who designed it didn't anticipate how subclasses would need to implement
it. I think this kind of problem is much less likely with "return *this;",
by the nature of things.

-- Dave Harris, Nottingham, UK

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Andrey Tarasevich

unread,

Jul 7, 2004, 9:56:38 PM7/7/04

to

Joe Chen wrote:
> I have recently seen code written like the following:
>
> class A {
> public:
> const std::string& getValue() { return value_; }
>
> protected:
> std::string value_;
> };
>
> const std::string& val = myA.getValue();
>
> with the argument that returning by const reference
> avoids the temporary construction/destruction of the returned
> string and a copy constructor call for the return, if it
> were returned by value - hence being more efficient.
>
> Seeing how I cringe at premature optimization, I have a few
> concerns/questions about this I was hoping the community
> could help me with.

You seem to have convinced yourself that this is an instance of
"optimization". It doesn't look like one to me.

Even if we look at this as an optimization, it is more than just an
attempt to save CPU time. It could also be an attempt to reduce memory
fragmentation.

> 1. Since most modern compilers with do RVO on the getValue and
> possibly eliminate the temporary, is avoiding the copy construction
> on a return by value worth code that is encapsulation messy? I
> feel uncomfortable revealing the internals of my class, or
> introducting confusing lifetime symantics of the member variables
> inside.

Returning references to internal data members of the class _does_ _not_
reveal its internals unless the specification of the class explicitly
states that this reference is bound an internal data member.

> 2. Will this prevent other compiler optimizations due to aliasing?
> My suspicion is that since the compiler can never know how many
> references to the internal representation there are, it is possible
> the code is missing out on optimizations greater than what the
> missing copy constructor provides.

With such relatively complex object as 'std::string' - very unlikely.

> 3. Unless this is required by being in a large, tight loop and a
> profiler deems it necessary, I feel that the cost of a copy construction
> of the string if returned by value would be completely swamped by
> other calculations done within a function and probably wouldn't
> effect the overal profile. Therefore, it loses out on solid OO
> encapsulation and flexibility at the cost of mistaken effciency.

There are no losses to OO encapsulation here and/or flexibility. I don't
understand why even even mention flexibility here, since the
reference-returning version is more flexible for obvious reasons: it can
do all the value-returning version can do and then some.

--
Best regards,
Andrey Tarasevich

Nicola Musatti

unread,

Jul 8, 2004, 7:22:43 PM7/8/04

to

Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message news:<pL3lsSQahp6AFwe$@robinton.demon.co.uk>...

[...]

> Now let me give you a direct counter example to your guideline. Consider
> a class that has an std::vector member as a repository for a certain
> type of data. Now how do I provide access to that data for searching
> given that I do not know the general nature of the searches that a user
> may wish to implement?

Now that I think of it, I believe that a design where a class
encapsulates a data structure on which the class users' must be given
such liberty of action is very probably a wrong design.

A very elegant solution, however, is to provide a member function
which takes a predicate and an output iterator as parameters and
copies to the output iterator the vector elements that satisfy the
predicate.

If performance constraints allow it the output iterator might use
polymorphism to hide the underlying vector and the returned elements
might be wrapped in a smart pointer to handle lifetime/ownership
problems.

Cheers,
Nicola Musatti

Nicola Musatti

unread,

Jul 8, 2004, 7:24:02 PM7/8/04

to

Andrey Tarasevich <andreyta...@hotmail.com> wrote in message news:<10eoep6...@news.supernews.com>...
[...]

> You seem to have convinced yourself that this is an instance of
> "optimization". It doesn't look like one to me.
>
> Even if we look at this as an optimization, it is more than just an
> attempt to save CPU time. It could also be an attempt to reduce memory
> fragmentation.

If it's not an optimization, the only other name I have for this is
"running unnecessary risks".

> > 1. Since most modern compilers with do RVO on the getValue and
> > possibly eliminate the temporary, is avoiding the copy construction
> > on a return by value worth code that is encapsulation messy? I
> > feel uncomfortable revealing the internals of my class, or
> > introducting confusing lifetime symantics of the member variables
> > inside.
>
> Returning references to internal data members of the class _does_ _not_
> reveal its internals unless the specification of the class explicitly
> states that this reference is bound an internal data member.

In terms of conceptual information hiding this is true, because from
the class interface the client has no way of knowing that the returned
object is in fact a member of the object on which s/he called the
member function.

Even when someone else goes and deletes the container object, the
client won't know what hit her/him. In my opinion this is exactly the
reason why every effort should be made to ensure that the container
honor the contract it is offering to its clients: the referred
object's lifetime shall extend at least as long as the lifetime of any
other object who might bind a reference to it.

If not there should be a valid *and* documented reason.

[...]

> > 3. Unless this is required by being in a large, tight loop and a
> > profiler deems it necessary, I feel that the cost of a copy construction
> > of the string if returned by value would be completely swamped by
> > other calculations done within a function and probably wouldn't
> > effect the overal profile. Therefore, it loses out on solid OO
> > encapsulation and flexibility at the cost of mistaken effciency.
>
> There are no losses to OO encapsulation here and/or flexibility. I don't
> understand why even even mention flexibility here, since the
> reference-returning version is more flexible for obvious reasons: it can
> do all the value-returning version can do and then some.

Of course there is a loss of flexibility, because by returning one of
your members by reference you have forsaken the right to handle the
lifetime of your member, because you have no way of knowing when it
will be safe to delete it. If the data member is directly contained,
this costraint extends to the container object.

To me this is a breach of encapsulation at least as wide as the breach
of Porta Pia in Rome ;-) (see e.g.
http://www.livinginrome.com/rome.php?ae=972)

Cheers,
Nicola Musatti

Joe Chen

unread,

Jul 9, 2004, 10:01:48 AM7/9/04

to

"Nicola Musatti" <Nicola....@ObjectWay.it> wrote in message
news:a327cf48.04070...@posting.google.com...

> > Even if we look at this as an optimization, it is more than just an
> > attempt to save CPU time. It could also be an attempt to reduce memory
> > fragmentation.
>
> If it's not an optimization, the only other name I have for this is
> "running unnecessary risks".

One could never say for certain whether memory would fragment or not, but it
may be an attempt to reduce the potential for fragmentation by eliminating
the number of objects created.

> > There are no losses to OO encapsulation here and/or flexibility. I don't
> > understand why even even mention flexibility here, since the
> > reference-returning version is more flexible for obvious reasons: it can
> > do all the value-returning version can do and then some.
>
> Of course there is a loss of flexibility, because by returning one of
> your members by reference you have forsaken the right to handle the
> lifetime of your member, because you have no way of knowing when it
> will be safe to delete it. If the data member is directly contained,
> this costraint extends to the container object.

Here is what I have learned from this thread, and please correct me if I am
wrong! :)

There isn't much loss of flexiblity because the user of the library can
always make a copy themselves if they have the freedom to clean up their own
designs. This will give the equal performance as return by value, assuming
RVO would have been done, i.e., a call to the copy constructor:

const string& referenced = bar.getValue();
string local( referenced );

It's still considered encapsulated, because the member is accessed via a
function call.

This works well, because in my original posting, I was returning a string,
which is essentially a representation of a single value, and unlikely to
change internally inside the class. However, I do agree with you when the
object returned is more complex, and doesn't represent a single value. In
that case, the risk of the internal representation being changed in the
library increases, and we do get more complicated behaviour:

const Foo& Bar::getFoo()
{
// original way
// return foo_;

// new way
// ... convert from new representation to old one
Foo* foo = new Foo( newRepresenation_ );
return &foo;
}

Now the internal operation of the function getFoo() must be known by the
implementor to avoid this memory leak:

const Foo& referenced = bar.getFoo();
Foo local( referenced );

Which I feel leads to poor designs.

I hope that captures what this thread (eventually) was about.

Cheers!
Joe

Francis Glassborow

unread,

Jul 9, 2004, 10:07:04 AM7/9/04

to

In article <a327cf48.04070...@posting.google.com>, Nicola
Musatti <Nicola....@ObjectWay.it> writes

>Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message news:<pL3lsSQahp6AFwe$@robinton.demon.co.uk>...
>[...]
>> Now let me give you a direct counter example to your guideline. Consider
>> a class that has an std::vector member as a repository for a certain
>> type of data. Now how do I provide access to that data for searching
>> given that I do not know the general nature of the searches that a user
>> may wish to implement?
>
>Now that I think of it, I believe that a design where a class
>encapsulates a data structure on which the class users' must be given
>such liberty of action is very probably a wrong design.
>
>A very elegant solution, however, is to provide a member function
>which takes a predicate and an output iterator as parameters and
>copies to the output iterator the vector elements that satisfy the
>predicate.
>
>If performance constraints allow it the output iterator might use
>polymorphism to hide the underlying vector and the returned elements
>might be wrapped in a smart pointer to handle lifetime/ownership
>problems.

So std::vector is a wrong design? Perhaps in a perfect world, but it is
a very powerful and useful design in the real world.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Dave Harris

unread,

Jul 9, 2004, 10:31:14 PM7/9/04

to

Nicola....@ObjectWay.it (Nicola Musatti) wrote (abridged):

> The problem with returning a reference to it is that you are
> committing to guaranteeing its existence for an indeterminate
> time, while loosing track of the fact that some external entity
> is holding onto it.

That would be one convention - that the caller may assume values live
indefinitely. It puts all the burden on callee.

An alternative coding convention is to say that values returned by const
reference are only valid until the next non-const member function is
called. If a caller needs them to live longer, they should make a copy
themselves. This shifts some of the burden from callee to caller.

There's still a chance that callee may want to evolve the class in such a
way that even that minimal lifetime guarantee can't be sustained, but in
practice that is pretty rare. Whether the risk is affordable depends on
who you are writing for. For example, I have access to all of the code
which uses the code I write, so if I have to change an interface I can;
it's nasty but not the end of the world.

There is also a chance that the caller may call a non-const function and
invalidate the reference without realising it. This becomes unlikely if
the caller follows the convention of never using a reference to store the
result. Never write:

const string &value = a.getValue();

That does leave cases like:

proc( a.getValue() );

where proc() takes a string by const reference and has access to an alias
of a and calls a non-const member function on it. Such cases need careful
design anyway. For example, if getValue() does return a copy, that copy
could be "out of date" and wrong from a domain point of view even if not
invalid from a language point of view.

-- Dave Harris, Nottingham, UK

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Nicola Musatti

unread,

Aug 5, 2004, 10:13:47 AM8/5/04

to

Sorry for the delay in answering. I've been away from computers for a
few weeks.

Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message news:<ERgOa$DD3n7...@robinton.demon.co.uk>...

> In article <a327cf48.04070...@posting.google.com>, Nicola
> Musatti <Nicola....@ObjectWay.it> writes
> >Francis Glassborow <fra...@robinton.demon.co.uk> wrote in message news:<pL3lsSQahp6AFwe$@robinton.demon.co.uk>...
> >[...]
> >> Now let me give you a direct counter example to your guideline. Consider
> >> a class that has an std::vector member as a repository for a certain
> >> type of data. Now how do I provide access to that data for searching
> >> given that I do not know the general nature of the searches that a user
> >> may wish to implement?
> >
> >Now that I think of it, I believe that a design where a class
> >encapsulates a data structure on which the class users' must be given
> >such liberty of action is very probably a wrong design.

[...]

>
> So std::vector is a wrong design? Perhaps in a perfect world, but it is
> a very powerful and useful design in the real world.

Containers are actually a sort of exception to my rule, in that they
do not hide their elements; on the contrary their purpose is to make
them as available as possible. Thus I see no problem with, say,
vector<T>::front() returning a reference to the vector's first
element. I'd still object to a returning by reference the result of
front() from a class that held a vector as a private data member.

Cheers,
Nicola Musatti