Can anyone think of any other reasons to prefer returning a string by
value. I am encountering some opposition to this, mainly in the name
of performance. The performance has not been measured (of course) but
this is often the case with 'performance' arguments. Unfortunately,
"show me the figures" cuts no ice.
Regards,
Andrew Marlow
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
One advantage that passing it out by reference has is that the caller
can assign that value to either a string variable or a reference to a
string. The former gives you behavior as if you had returned the
variable by value, because the text of the referenced string will be
copied into the variable. The latter has the characteristic that if
the private member variable changes, the caller's reference will be
aware of the change, which may or may not be what your you want. e.g.:
#include <string>
class String
{
public:
std::string getStrVal() const { return str; }
std::string const & getStrRef() const { return str; }
void setStr(std::string const & new_str) { str = new_str; }
private:
std::string str;
};
int main()
{
String * hi = new String;
hi->setStr(std::string("hi"));
std::string str1 = hi->getStrVal();
std::string str2 = hi->getStrRef();
std::string const & str3 = hi->getStrRef();
hi->setStr(std::string("hello"));
// str1 and str2 still say "hi" while str3 will say "hello"
delete hi;
// str1 and str2 still say "hi" while str3 is invalid
return 0;
Warning: I'm not entirely on your side.
The main reason to return a value
(as opposed to a reference to a value)
is, as you say, to avoid having to worry
about whether the value you are referring
to will go away or be changed before you
are done using it.
The main reason to return a reference
(n.b.: a pointer is a kind of reference)
is efficiency. In my experience (based
on profiling), the cost of copying std::string
values can be pretty high.
However, if you return a reference
(e.g., std::string & or const char *), you
do have to make sure the underlying object
doesn't change.
In many cases, it's a no-brainer -- for
example, if are just returning a value
that's already in a container that
is never modified after program start-up.
On the other hand, returning by reference
might not give you _any_ increase in efficiency.
If the value you want to return is the result
of some calculation, the cost of managing the
storage space inside your object so you can
return a reference to it may be higher than
the cost of copying the value. Keeping in
mind that if the compiler can use RVO, returning
by value might not be any more expensive than
returning a reference.
Another approach if you have reason to worry
about the inefficiency of copying std::string
is to "roll your own" replacement for std::string
that has the features you need and is cheaper
to copy; e.g., a struct containing a char array,
if that works in your application.
As for performance: As a recent thread of mine ('inlining of functions
returning an unwindable object') has shown the somewhat popular Visual
Studio compiler will refuse to inline even a simple getter function if
it returns-std::string-by-value. But that's no prove for any performance
concern yet, because if you're doing something with "strings" and
performance is important, chances are that std::string is the wrong
class anyway.
I think the argument that holding a const reference to *any* class
internals is quite a bad idea should really be sufficient, because not
only do you run the risk of UB -- you also have the problem that
although you hold a const-ref the class member may well mutate in the
meantime and I don't trust no code holding a const reference to be
prepared for that const-ref mutating.
Safety first. :-)
br,
Martin
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/
Basically, the argument for passing/returning by value is performance!
This is slightly misleading IMHO. Consider:
struct X
{
std::string m_s;
const std::string& s() const
{
return m_s;
}
};
....
X x;
const std::string& r = x.s(); // affected by subsequent x destruction
std::string s = x.s(); // not affected by subsequent x destruction
It's not the return type of the function which affects this per se. I'm
not sure this is a reason to prefer return by value.
> Can anyone think of any other reasons to prefer returning a string by
> value. I am encountering some opposition to this, mainly in the name
> of performance. The performance has not been measured (of course) but
> this is often the case with 'performance' arguments. Unfortunately,
> "show me the figures" cuts no ice.
Yes - if you return by value, you can return a string you constructed on
the fly. This can be an important consideration when you're designing an
interface - other people's subclasses may not want to have to store a
string to return here.
FWIW, I'm not sure the performance either way is going to make a huge
difference with just a string. That being said, I generally do return by
const reference with things like this unless I can see a good reason to
do otherwise. Purely on the basis that I might be able to avoid a copy
here or there - it almost certainly won't affect performance in a
significant way, but aside from the interface issue I mentioned, there
aren't all that many downsides I've encountered. YMMV.
Regards,
Stu
..Certainly where performance is an issue, repeated copying of strings
can cause a measurable loss in performance. However where the library
uses the small string optimisation (no use of the heap) the loss is
pretty small. And where you have an implementation that uses move
semantics I would certainly default to using return by value and only
move to return by reference where measurement showed a performance gain
that was critical to the program (and with move semantics think that
would be pretty uncommon)
Just because you're returning by const reference doesn't mean you have
to assign it *to* a const reference. You can just as easily write
std::string blah = foo.get_string();
even if get_string() returns a const reference, and it will still make
a copy. On the other hand, returning by const reference gives you the
ability to NOT make a copy if that is what's desired. So it opens up
new possibilities, while not sacrificing any functionality.
Maybe I'm overlooking something, but I see little reason to return by
value. What matters is whether you assign it to a const reference or
to a value.
Zach
Have your member be a shared_ptr<std::string>.
And return shared_ptr<const std::string> from your getter function.
Andy.
> > Can anyone think of any other reasons to prefer returning a string by
> > value.
> I think the argument that holding a const reference to *any* class
> internals is quite a bad idea should really be sufficient, because not
> only do you run the risk of UB -- you also have the problem that
> although you hold a const-ref the class member may well mutate in the
> meantime and I don't trust no code holding a const reference to be
> prepared for that const-ref mutating.
>
> Safety first. :-)
That's a good answer. In the particular cases that caused me to raise
this issue the mutation happens not to be a possibility but that is
just lucky coincidence - the argument still holds. Cheers!
> Another approach if you have reason to worry
> about the inefficiency of copying std::string
> is to "roll your own"
I want to return by value and I am not worried about the performance
of this. The opposition is worried about performance but has no
figures or any other reason to back this up. It is just "more
efficient". I have run into this attitude many many times. It is quite
normal for people to argue for one construct over another in the name
of performance without even knowing if perfornance is an issue in that
area.
The view I am running into is that a function should return a const
ref to a string so that it can be assigned to a const ref to a string.
The argument is that doing this avoids string copying. I should have
made that clearer.
Whoa, stop, that's wrong! r is __not__ affected by x destruction,
because x gets destroyed __after__ r (sort to speak). Try thinking
like this, instead (that's what standard requires WRT variable
construction-destruction):
{ // x only available in this block
X x;
{ // r only available in this block
const std::string& r = x.s();
{ // s only available in this block
std::string s = x.s();
}
}
}
As you can see, it's not possible for r to be bad due to destroyed x.
What __is__ possible (and people make that mistake), is that you
somehow copy the reference somehow (to a pointer) and keep that after
x is destroyed, e.g.
const std::string* p;
{
X x;
const std::string& r = x.s(); // affected by subsequent x
destruction
p = &x; // Whoops!
std::string s = x.s(); // not affected by subsequent x destruction
}
p == "Undefined behavior";
Goran.
Code changes. Consider that in version 1 of the code, you have:
class x
{
const std::string& getString() { return _string; }
std::string _string;
};
Version 2:
class x
{
const std::string& getString() { return calculate_this_baby(); }
// Whoops! returning a reference to a temporary.
};
Consequence of this is that in version 2 you have to go through all
invocations and verify that you actually always assign to a string,
not to a reference, ever again. Innocuous calls like:
void f(const string& s);
...
x xInstance;
f(xInstance.getString());
are a bitch.
But compilers warn about returning a ref to temporary.
Also, depending on string implementation, copying might be inexpensive
(e.g. if it uses copy-on-write and code is single-threaded/mostly
single threaded, copying hit is very small).
All that said, I prefer to return by const ref. I am not very smart, I
guess :-).
Goran.
It is NEVER a good idea to work this way.
It is best to think of return by const reference as ALWAYS a
performance optimisation that should be treated as return by value.
If you follow this rule then nothing breaks when you switch the
implementation to return by value.
In fact it is often best to fight the urge to read the source and to
document your function as just returning "something that behaves like
std::string". This is the sort of way that you have to think for
template programming anyway - there are lots of bits in the standard
that refer to implementation defined types that behave like such and
such. There was even supposed to be a formal way of saying this to the
compiler called Concepts but it was dropped last month because of some
awkward edge cases that I don't pretend to understand.
If you really really insist on breaking encapsulation by allowing an
external reference to internal data then make it explicit in some way
so that people can see what horrors you are performing: e.g. getFoo
( const string*& ) or getFoo( const string** ).
But why would you want to do that? You have the object and you have an
inlined accessor (presumably).
Consider the effect of multi-threading. The string could change while
you are copying from the reference, so you have to use a lock
everywhere you use this construct. However if you return a copy of
the string, rather than a reference, you only have to lock inside the
get_string function while you make the local copy.
The problem is that performance figures are highly dependant on
implementations. We know (not speculation) that copy constructing a
string with the kind of implementation that was the norm in the 90's is
expensive (so much so that COW was considered a strong contender despite
problems with multi-threading). More recently most implementations have
switched to using the small string optimisation. That makes copying for
small strings much less expensive.
The problem with references is ensuring that we do not create a hanging
reference. The problem with copying is the potential cost of doing so.
Now the real problem comes down to programmers who want to have
universal 'rules'. I.e. they do not want to have to think. The situation
of returning a string member from a class object is quite different to
returning a non-member from a function.
Interface design is not for the lazy :)
Note that we also need to consider whether the caller may want only the
current value or to be able to track the value. That leads to me
wondering whether we should not sometimes return a const volatile reference.
The multithreading argument is not really appropriate.
If your class is designed for multithreading i.e. std::string getFoo
() const requires a mutex in its implementation, then you will
obviously not return by reference since lexical scoping will not be
enough to make it safe even if it is immediately assigned to a local
string.
If, however, you are just using the class in a multithreaded
environment then, provided the reference is only used in a local scope
where the object itself (not just the string) is protected by a mutex
then there is no problem.
It wasn't supposed to be a literal (or complete) example. Here's a
better one:
#include <iostream>
#include <string>
#include <boost/shared_ptr.hpp>
using boost::shared_ptr;
struct X
{
std::string m_s;
X() : m_s("Wibble") {}
const std::string& s() const
{
return m_s;
}
};
int main()
{
shared_ptr<X> x(new X);
const std::string& r = x->s();
x.reset();
// BOOM
std::cout << r << '\n';
return 0;
}
Regards,
Stu
I don't like that argument. If I need to hold a string I need to hold a
string and not some reference to some internals of some object.
Note also that:
class Foo {
string s_;
public:
string get_s() { return s; }
string const& access_s() { return s; }
};
...
string x = obj.get_s();
string y = obj.access_s();
// Because of RVO, these two calls will copy the string exactly one
time, so ret-by-val does not decrease performance
string const& rz = obj.access_s();
// Generally saves one heap allocation, so is faster.
// However validity of rz is tightly coupled to state and lifetime of
obj, so the likelihood of crashes increases.
br,
Martin
Heh, you are correct, but I wouldn't be too lax about speed issues. If
it's easy to elide a copy, why not elide it?
There are pro- and counter-counter arguments, and anything is most
often a question of balance. same here.
Goran.
Right. If you return by value, it always makes a copy. If you return
by const reference, it makes a copy when you want it to make a copy,
and doesn't make a copy when you don't want it to make a copy.
Maybe I'm still not understanding something, but by returning const
reference you get a strict superset of the functionality possible when
returning by value, with no downsides. So I don't see why you would
not want to do it.
Zach
Apart from performance (by-value can be faster as pointed out in
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/) the
simplest argument for by-value is elegance and clarity.
string process( string name, string content );
simply reads smoother than
string const& process( string const& name, const string &
content );
There is a general best-practice statement in programming that one
should avoid premature optimization. You may consider your case a good
example of this practice. First pass by value (it is clearer, safer
(dangling refs, multithreading), shorter) and if you find that
performance is insufficient, and if you measure that substituting by-
ref increases performance significantly (and it may not), only then
should you change to by-ref.
Regards,
&rzej
shared_ptr adds itself a visible overhead, and may even be slower than
returning string by value if the string has an embedded reference
counting.
Regards,
&rzej
If string can be modified by another thread, that just can't work at
all. But it's kinda orthogonal to the question, because:
1. that applies to any thread-shared data. Surely you we can't say
that everything, no matter how expensive to copy, must be passed out
by value.
2. returning by value has exactly the same problem if not properly
synchronized:
thread 1:
std::string s(obj.get_string());
thread 2:
obj.set_string(new_value);
Now... Imagine that thread 1 starts copying from obj's member string
into s and gets pre-empted in the middle of that. Then thread 2 kicks
in, frees existing data (data used by currently pre-empted thread 1!)
and sets a new string. That's undefined behavior right there.
So yes, the const& situation indeed worse with multi-threading, but
only because it can't possibly be made to work.
With multithreading, the only way is:
1. lock
2. make a copy
3. unlock.
Goran.
To clarify my original post, what I was getting at was that r would be
affected were x destroyed somehow. I didn't mean to imply that all the
variables were within one function - my bad. I guess this is why
minimal, complete code examples are recommended (because what is
effectively pseudo-code can be misleading).
Stu
> Note that we also need to consider whether the caller may want only the
> current value or to be able to track the value. That leads to me
> wondering whether we should not sometimes return a const volatile reference.
Could you elaborate on why one would need volatile?
As you well know, all that const reference means is that the intention
is not to modify the object through this alias (mutable, const_cast,
etc. notwithstanding, of course).
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620
The downside is aliasing, which can both interfere with an optimiser and
requires programmers that use it to be much more careful. It makes it
harder to reason about the code.
The bugs caused by aliasing tend to be subtle and hard to find.
For instance, think about the algorithm behind vector::push_back(T
const& x). The obvious (psuedocode) implementation of
if (notEnoughSpace)
{
AllocateBiggerBuffer();
CopyOrMoveElements();
DestroyOldBuffer();
}
AddElementToTheBack(x);
is incorrect because of possible aliasing; think about the case of
v.push_back(v[0]);
> So I don't see why you would
> not want to do it.
If at all possible, I'd rather avoid constructs that lead to subtle bugs
and harder to reason about code. Premature, unmeasured optimisation is
not a good reason to adopt this construct.
I'll only use it if the semantics require that I refer to the same
object or it is a *measured* performance bottleneck in my application.
Return by value is my default behavior.
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> I want to return by value and I am not worried about the performance
> of this. The opposition is worried about performance but has no
> figures or any other reason to back this up.
I am not sure what you are saying. Is it:
a. I don't care if return-by-value of std::string
is N times more expensive than return-by-reference,
I still think everybody should always use
return-by-value.
or
b. I am not convinced that return-by-value
is more expensive than return-by-reference
for std::string .
If it's (a), it is no longer a technical question,
but a question of whose idea of what is important
shall rule.
Generally, those who have to pay for the additional
hardware required by less efficient code (and those
employed by them) tend to disagree with you.
If it's (b), then the answer is that, in many
situations, you're wrong. And the cost can
be quite significant. I've gotten 20-30%
speed-ups just by avoiding copying of std::string,
and that alone made the difference between being
able to stick with current hardware and having
to buy a truckload of new CPU boards.
How much more expensive return-by-value is
and how much of a difference it makes will,
of course, depend upon your C++ implementation
and your application.
We can't help you with that.
That's true, but it's all very implementation depended - not all
std::strings are equal and some might not implement reference counting.
shared_ptr on the other hand is always reference-counted.
Having said that, I just remembered an item from Scott Myers' "effective
STL" where he described a std::string implementation from four leading
libraries (which ones he did not mention) and I believe all of them were
reference counted.
Andy.
Sure, it's smoother, but if parameters you pass in really are read-
only to a function, it's a bad idea. In that case, there really is
__nothing__ to gain by using pass-by-value and much to lose.
Goran.
> int main()
> {
> shared_ptr<X> x(new X);
The following line is NEVER a good idea.
I know a lot of people seem to think that a class should be written to
be idiot proof but idiots are so inventive that that is difficult - I
prefer to just not employ idiots.
> const std::string& r = x->s();
> x.reset();
>
> // BOOM
> std::cout << r << '\n';
>
> return 0;
>
> }
It is not generally true that "If you return by value, it always makes a
copy." - the reason is RVO.
In the presence of RVO, Initializing a string object with
return-by-value is exactly the same as return-by-reference (Assigning is
not however), i.e.:
// e.g. with VC8 or later *without* optimizations
Foo f;
string s1("s1");
string s2("s2");
s1 = f.get_byval(); // Needs temporary!
s2 = f.get_byref(); // No need for temporary
string s3 = f.get_byval(); // Does NOT need temporary
string s4 = f.get_byval(); // Does NOT need temporary
br,
Martin
The other problem with shared_ptr is that you then have the
possibility of dereferencing a null pointer.
The general principle is sound but this is taking it too far.
Passing by const reference is idiomatic in C++ - particularly for
strings - so it is not really much clearer.
Also the optimisation is not premature if you are writing a general
purpose function/library - If your process function MIGHT be used by
someone else passing a 10Mb string then your failure to pass by const
reference could well mean that they have to write their own function
rather than use yours.
This is why almost all templates pass by const reference.
But if we hold a const reference the compiler believes that the
underlying object will not change. By flagging it as volatile as well we
warn the compiler that the underlying object may change out of our
sight. Of course in the case of single threaded code the compiler will
assume that a const reference might change if a function is called that
has independent access to the object but in the case of multi-threaded
code the compiler (for a single TU) has no reason to expect that the
object might be modified (in a different thread)
--
> But if we hold a const reference the compiler believes that the
> underlying object will not change. By flagging it as volatile as well we
> warn the compiler that the underlying object may change out of our
> sight. Of course in the case of single threaded code the compiler will
> assume that a const reference might change if a function is called that
> has independent access to the object but in the case of multi-threaded
> code the compiler (for a single TU) has no reason to expect that the
> object might be modified (in a different thread)
Volatile does not solve the multithreaded issue; you have to synchronize
access to the object. Are there any compilers out there which
incorrectly make assumptions about their own synchronization primitives
such that they assume the underlying object hasn't changed across a
synchronization boundary?
And const volatile references make things a whole lot messier. Since
copy constructors take their parameter by const non-volatile reference,
the following code won't compile:
std::string s("Hello, world!");
std::string const volatile & scvr(s);
std::string copys(scvr); // error
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Why not have a named function approach?
Design the interface so that 2 choices are available, return by const reference
or value.
Within the class, the same internal method is called or return of internal data.
I only provide both if the return type is a heavyweight class, if it is
lightweight ora builtin type, just return by value is
provided.
You then leave it up to the programmer who uses your class to decide which
version to use.
I use the value version if the external assignee is long-lived or needs
independent existence from the class object.
I use the reference version if the external assignee is short-lived (e.g. I
might be writing the value to a log).
Stephen Howe
--
Ok, I'll bite and risk looking like an idiot :) Why do you think it's
*never* a good idea?
As to class design - I agree with you (up to a point). A true idiot will
find a way to break anything :) I'd still program defensively to guard
against the most likely client mistakes, though.
Stu
>On 2 Dec, 22:00, Zachary Turner <divisorthe...@gmail.com> wrote:
>> On Dec 2, 7:06 am, Andrew <marlow.and...@googlemail.com> wrote:
>>
>> Maybe I'm overlooking something, but I see little reason to return by
>> value. What matters is whether you assign it to a const reference or
>> to a value.
>
>The view I am running into is that a function should return a const
>ref to a string so that it can be assigned to a const ref to a string.
>The argument is that doing this avoids string copying. I should have
>made that clearer.
Why cant you provide both and leave it up to the programmer who uses your class?
Stephen Howe
> The general principle is sound but this is taking it too far.
> Passing by const reference is idiomatic in C++ - particularly for
> strings - so it is not really much clearer.
>
> Also the optimisation is not premature if you are writing a general
> purpose function/library - If your process function MIGHT be used by
> someone else passing a 10Mb string then your failure to pass by const
> reference could well mean that they have to write their own function
> rather than use yours.
> This is why almost all templates pass by const reference.
Hi,
I tend to agree with you. Obviously, I also pass by-ref, but when the
objects are not meant to be values, like polymorphic classes, or when
the structure of the code is so complicated that I cannot tell what
the consequence of passing by-val would be.
However, I find it appropriate to provide a list of arguments that one
should consider before applying the otherwise good practice of passing
by-ref. This is mostly the arguments already given in this topic:
(1). There is a family of cases where passing by-value is faster than
passing by-ref. This is somewhat counter to intuition, but still true.
The link to the explanation has already been provided a couple of
times.
(2). With the addition of rvalue-references (some compilers implement
it, some libraries emulate it), the family of cases where by-val is
faster increases.
(3). Passing by-val automatically protects against concurrent access
in multi-threaded programs.
(4). Passing by-val automatically protects against advertent or
inadvertent aliasing (even in single-threaded programs).
(5). Passing "by value" doesn't mean passing "by copy". The compiler
simply guarantees that if the called function modifies the parameter,
it is not reflected in the argument provided by the caller. The
compiler may use the copy to provide this guarantee, but it may as
well not. After all the compilers are designed to provide as good
binaries as possible.
(6). Passing by const-ref has its original meaning, already forgotten
due to this optimization technique. It used to mean that there is an
external object that we want to observe, that may change while we are
observing it, and that we do not intend to modify ourselves; e.g
listening on the port:
void observe( Port const& port ) {
while( ... ) {
if( port.value() != prev_value ) {
prev_value = port.value();
process( prev_value );
}
}
}
(7). The following call std::max("me", "you"); doesn't compile due to
the arguments being references rather than values. I know it might
have little sense if it compiled, but just for the argument's sake.
(8). In generic templates we use little if we pass arguments by value
as we can always use boost::ref (and now also std::tr1::ref, I think)
to force by-ref.
(9). Once I learnt that I started passing strings by-val in the
commercial code I am working with. The strings in my application are
short: no longer than 100 chars, and most often around 12 chars. I
observed no slowdown whatsoever.
Regards,
&rzej
Firstly because 90% of C++ programmers WILL return a member string by
const reference and it is bad to keep that reference for all the
reasons already covered.
Secondly because an implementation is most flexible if the only
assumptions that it makes its inputs are those required to implement
it defined behaviour. this allows the maximum amount of change to code
that you use without breaking anything. This is particularly true of
template code where you want the template to be useable for any type
that behaves in a certain way.
In this case the most generaly useful description of s() would be that
it returns something that is convertible to a std::string.
If we were working with a lot of legacy code then it might even
initially return const char* and only later be upgraded to string or
const string&. Indeed even for non-legacy code I have sometimes had
cause to write functions that can only return a fixed set of strings
and in such a case the simplest and, incidentally, most efficient
implementation is to return const char*.
P.S. When the next release of the std comes out and becomes widely
adopted pretty much everything will be returned by reference - either
lvalue or rvalue.
> As to class design - I agree with you (up to a point). A true idiot will
> find a way to break anything :) I'd still program defensively to guard
> against the most likely client mistakes, though.
P.S. I have just thought of one very narrow usage for references
outside of function parameters:
something like:
X& x = v[a][b];
x.foo();
x.bar();
P.P.S. When the next release of the std comes out and becomes widely
adopted pretty much everything will be returned by reference - either
lvalue or rvalue (std::string&&).
Look at the following code:
#include <string>
// Forward decl
//
void print_name(const std::string& name);
//------------------------
class Person {
public:
Person(const std::string& name);
const std::string& name_ref() const;
std::string name_val() const;
};
//------------------------
void print_person_ref(const Person& p) {
print_name( p.name_ref() );
}
//------------------------
void print_person_val(const Person& p) {
print_name( p.name_val() );
}
//------------------------
In the assembly listing after the compilation with all optimizations
(gcc 4.4.1), the value function "print_person_val()" is much larger
and has many more instructions than the reference function
"print_person_ref()". This is alone is not a proof that the return-by-
const-ref is faster but this is likely the case. Besides, code size is
a metric too.
No, you can't move from one to the other, it has to be all or nothing.
It is VERY dangerous to mix the two. Consider this:
string get_value() const;
const string& get_something() const
{
return get_value();
}
This will compile but will fail at runtime. get_something is returning
a reference to a temporary. The compiler will warn but many developers
routinely ignore compiler warnings. I just got bitten by this bug in
the project I am working on.
-Andrew Marlow
> I tend to agree with you. Obviously, I also pass by-ref, but when the
> objects are not meant to be values
This misses the point slightly. I am not talking about objects in
general but strings in particular. They are so common and almost a
fundamental type (in some languages they are part of the language
rather than part of a library). Perhaps I should have made that
clearer at the start.
> (1). There is a family of cases where passing by-value is faster than
> passing by-ref.
I think people are getting bogged down in proving (or not) that
returning a value by-ref is 'more efficient' than by-value. I always
assumed by-ref is more efficient. My argument is that returning by-
value, although technically slightly less efficient, is much better
generally for several reasons, some of which have come out on this
thread. My argument is that the benefits, which include code clarity,
outweigh any slight performance cost there might be. The cost would
have to be proved by performance figures to actually matter in a
particular application, for me to change those bits from return by-
value to return by-ref in order to get an important speed-up.
-Andrew M.
>(3). Passing by-val automatically protects against concurrent access
>in multi-threaded programs.
Be careful ... this isn't true in general unless the value can always
be accessed atomically. Values passed in registers are *usually*
safe, but multiple word objects passed in memory are not safe without
locks. Even register sized values may be a problem if they are larger
than the bus width (doubles on some 32-bit chips) or can straddle
cache lines (e.g., misaligned doubles on x86-32).
Where multiple threads are concerned you can never assume safety
unless the value can be transferred in a single bus operation.
Take for example, two 32-bit CPUs that each have double precision
floating point. One has a 64-bit bus, the other has a 32-bit bus.
Let's say for both, the double precision load is atomic - no other
thread on the CPU can interfere. But what about a thread on another
CPU?
The chip with the 32-bit bus can't lock a 64-bit value, it needs 2 bus
transfers. Might another CPU slip in? Depends on whether the CPU can
hold onto the bus for 2 consecutive transfers. What about the chip
with the 64-bit bus? It can lock the whole value if it is aligned
properly, but maybe it requires 2 transfers for a misaligned value (if
it allows such). And what happens if the value straddles 2 cache
lines? Are both lines locked simultaneously or are they locked
sequentially as they are accessed?
Also keep in mind that even if the value does fit in a register and
can be moved atomically - it might not be. Let's say that
hypothetical CPU with the 32-bit bus wants to pass a double on the
stack and the value isn't already in a register. The compiler might
choose to move it word by word using the ALU instead of doing an FPU
load/store (maybe to allow a concurrent FPU operation). Even on a
single CPU, that might allow another thread to sneak in and corrupt
the value.
And don't necessarily assume that things get better just for going to
64-bit chips. I don't know offhand of any 64-bit CPUs having 128-bit
wide memory buses (anyone?), but x86 already have 128-bit SIMD
registers and several manufacturers have already committed to support
for IEEE-754(2008) 128-bit quadruple precision floating point in the
near future.
>(5). Passing "by value" doesn't mean passing "by copy". The compiler
>simply guarantees that if the called function modifies the parameter,
>it is not reflected in the argument provided by the caller. The
>compiler may use the copy to provide this guarantee, but it may as
>well not. After all the compilers are designed to provide as good
>binaries as possible.
That's a mighty fine pin your angels are dancing on. Other than
copying there isn't any practical way of guaranteeing separation.
Unless the language specifically disallows it, a compiler could choose
to pass a read-only large object by reference instead of by value, but
to guarantee the object is read-only, the compiler needs to be able to
analyze the whole callee chain. If the object is passed into (or in a
language with nested functions, accessible by) any opaque code - such
as a library function - the compiler really has no choice but to make
a copy - though a good compiler will delay making the copy until
absolutely necessary.
George
> No, you can't move from one to the other, it has to be all or nothing.
> It is VERY dangerous to mix the two. Consider this:
>
> string get_value() const;
> const string& get_something() const
> {
> return get_value();
> }
>
> This will compile but will fail at runtime. get_something is returning
> a reference to a temporary. The compiler will warn but many developers
> routinely ignore compiler warnings. I just got bitten by this bug in
> the project I am working on.
I was not proposing that. What I was intending was:
std::string get_value() const {return mystring;}
std::string const & get_ref() const {return mystring;}
Or even:
std::string get_ref const & () const {return mystring;}
std::string get_value() const {return get_value;}
I think this thread is beginning to lose sight of my original
statement: it is the pros and cons of RETURNING a const ref to a
string. The discussion seems to be degenerating into the pros and cons
of passing objects (not just strings) by value instead of by const-
ref. That's NOT what I am talking about. Please, let's keep to the
discussion of the pros and cons of RETURNING const-ref to STRING
instead of returning string by value. Thanks.
In the signature for process, above, the two alternatives relevant to
this discussion are:
string process (const string& name, const string& content);
versus
const string& process (const string& name, const string& content);
-Andrew Marlow
> Andrew wrote:
>> When I write a method that returns a string I always return the string
>> by value. This is as opposed to returning a const reference to the
>> string held as a private data member in the object. Doing it my way
>> means that when the object goes out of scope, my string is still
>> valid. Doing it the other way means you HAVE to keep the object around
>> for as long as you have a reference to the string.
>>
>> Can anyone think of any other reasons to prefer returning a string by
>> value. I am encountering some opposition to this, mainly in the name
>> of performance. The performance has not been measured (of course) but
>> this is often the case with 'performance' arguments. Unfortunately,
>> "show me the figures" cuts no ice.
>>
>> Regards,
>>
>> Andrew Marlow
>>
>
> ..Certainly where performance is an issue, repeated copying of strings
> can cause a measurable loss in performance. However where the library
> uses the small string optimisation (no use of the heap) the loss is
> pretty small. And where you have an implementation that uses move
> semantics I would certainly default to using return by value and only
> move to return by reference where measurement showed a performance gain
> that was critical to the program (and with move semantics think that
> would be pretty uncommon)
>
That sounds like a bad advice to me. As others pointed out, returning by
reference to const is not bad and does not open any "worm-holes". I would
and i do that by default. And only if it matters, i would return by value
(see the ScopeGuard idiom).
I think the advice to generally return-by-value is *good*. Returning
by-const-ref does *only* make sense if you are directly returning a
member of the object. On the other hand, returning by-value Just Works,
regardless of what you are returning.
I agree that ... string const& get_ref() { return m_str; } ... is pretty
harmless, especially if you assign the result to a string object as
opposed to a reference. (As I have pointed out in another reply though,
*if* you assign to an object, then often there isn't *any* performance
gain of ret-by-ref - see RVO)
cheers,
Martin