Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Returning a class instance by value and a naming question

62 views
Skip to first unread message

DSF

unread,
Nov 13, 2013, 6:09:08 PM11/13/13
to
Hello,

I have a class I've been writing where I frequently need to return
the class by value so that class members can be changed without
affecting the original. The problem is all of the copying. First,
the class has to be copied into a local instance of the class. Then,
upon return, the local instance is copied into an unnamed instance
that's created before the function is called, and therefore not
destroyed when the function ends. Finally, after the function ends,
the operator= of the target is called, yet again copying the class
(unnamed). Here is an example:

FStringW FStringW::Start(size_t end) const
{
FStringW tfs(*this);
if(end < length)
{
tfs.str[end] = 0;
tfs.length = end;
}
tfs.StartIP(end);
return tfs;
}

StartIP is a member function that alters the string in place.

It returns 'end' number of characters of the start of a wide string.

I tried creating a "global" temp that could be used in place of each
function that used 'tfs'. It was a pointer to FStringW that was
created in each constructor and deleted in the destructor. I managed
to avoid an infinite loop in the constructor, but didn't realize it
created one in the destructor, too. I wasn't able to get around that
one. All the functions that returned FStringW returned a reference to
the global object, so the reference had the lifetime of until another
function uses it. That was acceptable. It would have eliminated one
copy if it worked.

So the question is: Are there any techniques for minimizing the
number of copies that occur when returning by value?

The ideal situation would be if the returned object could become the
unnamed object that's passed to operator=. But I think the compiler
would have to do that. And there's always the chance the function
could return an expression i.e. return tfs1 + tfs2.

My second question relates to the naming of member functions. Most
string-altering functions I've written have two versions, one that
returns by value and leaves the original alone, and one that alters
the string in place. For now, I'm using:

FStringW FStringW::Start(size_t end);
void FStringW::StartIP(size_t end);

But I was thinking maybe something like:

FStringW FStringW::GetStringStart(size_t end);
void FStringW::StringStart(size_t end);

Something like Crop or Cut StringEnd is more descriptive, but uses
start and end as terms for functions that do very similar things. Any
Ideas?

Thanks.
"'Later' is the beginning of what's not to be."
D.S. Fiscus

Jorgen Grahn

unread,
Nov 13, 2013, 7:17:25 PM11/13/13
to
On Wed, 2013-11-13, DSF wrote:
> Hello,
>
> I have a class I've been writing where I frequently need to return
> the class by value so that class members can be changed without
> affecting the original. The problem is all of the copying.
[...]

Are you saying that doing it the naive, straightforward way causes
performance problems in your application? Because otherwise I'd stop
worrying and just do it that way.

(I'm asking because I myself have an unfortunate tendency to
micro-optimise things, so perhaps others do, too.)

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Message has been deleted

Paavo Helde

unread,
Nov 14, 2013, 1:49:11 AM11/14/13
to
DSF <nota...@address.here> wrote in
news:g9g789tnbu5lfsjal...@4ax.com:

> Hello,
>
> I have a class I've been writing where I frequently need to return
> the class by value so that class members can be changed without
> affecting the original. The problem is all of the copying.

How do you know? Have you profiled the code and seen this is a
bottleneck?

> First,
> the class has to be copied into a local instance of the class. Then,
> upon return, the local instance is copied into an unnamed instance
> that's created before the function is called, and therefore not
> destroyed when the function ends. Finally, after the function ends,
> the operator= of the target is called, yet again copying the class
> (unnamed). Here is an example:
>
> FStringW FStringW::Start(size_t end) const
> {
> FStringW tfs(*this);
> if(end < length)
> {
> tfs.str[end] = 0;
> tfs.length = end;
> }
> tfs.StartIP(end);
> return tfs;
> }
>
> StartIP is a member function that alters the string in place.
>
> It returns 'end' number of characters of the start of a wide string.
>
> I tried creating a "global" temp that could be used in place of each
> function that used 'tfs'.

Whatever you do, don't do this. Global (non-const) variables are evil,
they complicate the data flow, make the optimizations harder and will
create lots of troubles in recursive functions or if you ever want to go
multithreaded.

> It was a pointer to FStringW that was
> created in each constructor and deleted in the destructor. I managed
> to avoid an infinite loop in the constructor, but didn't realize it
> created one in the destructor, too. I wasn't able to get around that
> one. All the functions that returned FStringW returned a reference to
> the global object, so the reference had the lifetime of until another
> function uses it. That was acceptable. It would have eliminated one
> copy if it worked.
>
> So the question is: Are there any techniques for minimizing the
> number of copies that occur when returning by value?

This is mostly a job for the optimizer. In your example, the return value
optimization (RVO) might be used by the compiler, eliminating one copy
without any work from your part.

Writing simple and straightforward code often helps the optimizer to do
its work better.

If the profiler still tells you that there is a bottleneck in copy, then
you have to do something about it. One way would be to keep unchanging
parts of the object in shared use, by adding some reference counters and
so on, but this may also cause troubles in multithreading and is actually
not guaranteed to speed things up.

A better way to increase performance is to review your algorithms. The
above method looks a lot like substr(), but still copies the whole string
even if only a tiny portion is going to be used later. It should allocate
space for and copy over only the needed amount of characters.

BTW, what is the reason you are using your own string class and not
something based on std::basic_string?

>
> The ideal situation would be if the returned object could become the
> unnamed object that's passed to operator=. But I think the compiler
> would have to do that. And there's always the chance the function
> could return an expression i.e. return tfs1 + tfs2.

> My second question relates to the naming of member functions. Most
> string-altering functions I've written have two versions, one that
> returns by value and leaves the original alone, and one that alters
> the string in place. For now, I'm using:
>
> FStringW FStringW::Start(size_t end);
> void FStringW::StartIP(size_t end);

The first one should be 'const', this also helps a bit to distinguish it.

For names, I have sometimes used a convention like Foo and ApplyFoo.

But in your example, naming the functions something like Left() and
Truncate() would be much better than Start() and StartIP() ;-) And even
better would be to get rid of your class and use std::wstring or
something else based on std::basic_string instead. Selecting good names
is important for shortening the learning curve of any reader (including
yourself in the future), with std::wstring the learning curve ought to be
zero.

hth
Paavo

Alf P. Steinbach

unread,
Nov 14, 2013, 8:44:18 AM11/14/13
to
On 14.11.2013 00:09, DSF wrote:
>
> I have a class I've been writing where I frequently need to return
> the class by value so that class members can be changed without
> affecting the original. The problem is all of the copying. First,
> the class has to be copied into a local instance of the class. Then,
> upon return, the local instance is copied into an unnamed instance
> that's created before the function is called, and therefore not
> destroyed when the function ends. Finally, after the function ends,
> the operator= of the target is called, yet again copying the class
> (unnamed).

Most of that copying is eliminated by the compiler.

However, for a simple and naively implemented string or array the
compiler can't get rid of the underlying O(n) element copying, since for
such a class that's the definition of the operation.

The programmer, in control of the class design, can do far better.

In your case, returning and assigning a leftmost substring, the whole
thing -- substring operation, returning, assignment -- can be done
in constant time, O(k) (also known as O(1)), and a very efficient
constant time at that, mainly[1] at the cost of making conversion to
zero-terminated string for the worst case an O(n) operation.

However, I gather you're doing this exercise for learning, and in that
case the O(k) design and implementation may be beyond your current level
of expertise, too hard to tackle. So don't fret about it. But it's worth
knowing that it's THERE, that it's obtainable, and that it's therefore
something to aim for (later) and something to compare against.

What you CAN do right now is to MEASURE.

MEASURE.

And compare to measurement of roughly the same when using e.g. std::string.

If things are too slow you can then, among other things, try to
implement a C++ "move constructor". Or a C++03 equivalent. Just ask
here, but do measure first!


Cheers, & hth.,

- Alf

Notes:
[1] Another possible cost is that the strings then become immutable. But
I see that mainly as an advantage, not a cost. It certainly both speeds
up and simplifies things, in general.

DSF

unread,
Nov 14, 2013, 7:45:41 PM11/14/13
to
On 14 Nov 2013 00:31:03 GMT, r...@zedat.fu-berlin.de (Stefan Ram)
wrote:

Please read *all* my answers before replying. (Or at least the last
one. It explains the context of my answers.)

>DSF <nota...@address.here> writes:
>> I have a class I've been writing where I frequently need to return
>>the class by value so that class members can be changed without
>>affecting the original.
>
> What you return is called �an instance of the class�
> or �an object of the class�. It's not the class.

I used the term "instance" everywhere else, no need to nit pick.

>>upon return, the local instance is copied into an unnamed instance
>
> Return values are rvalues, they are moved, not copied.

That's an issue of semantics. A copy followed by a delete is often
called a move, but there's still a copy involved.
Return by value:
Copy local object to unnamed object created before function call.
Call local object's destructor.
Return from function.
Copy unnamed object to target.
Call unnamed object's destructor.

Call it two "moves," but there's still two copies made.

>
>> So the question is: Are there any techniques for minimizing the
>>number of copies that occur when returning by value?
>
> Move members (move constructors and move assignements) are
> already generated by the compiler, unless you defined one of
> them or copy members or destructors.

I've never heard of a "Move Constructor." A "Copy Constructor,"
yes. Same with assignments.

> I think, you just have to take care to mark non-obvious
> rvalues with ::std::move and to have handles to resources
> as members (not large members).

What would be a "non-obvious rvalue"?



I looked up std::move. I see it's new to C++11. This probably
applies to your two previous answers.

Usenet text can convey the wrong attitude, so please take this in
the friendly manor it's meant. If you are dealing with relatively new
terms, state their context. As in:

In C++11, return values are rvalues, they are moved, not copied.

And please do not assume that everyone is using the latest
technology.

DSF

unread,
Nov 14, 2013, 8:34:49 PM11/14/13
to
On 14 Nov 2013 00:17:25 GMT, Jorgen Grahn <grahn...@snipabacken.se>
wrote:

>On Wed, 2013-11-13, DSF wrote:
>> Hello,
>>
>> I have a class I've been writing where I frequently need to return
>> the class by value so that class members can be changed without
>> affecting the original. The problem is all of the copying.
>[...]
>
>Are you saying that doing it the naive, straightforward way causes
>performance problems in your application? Because otherwise I'd stop
>worrying and just do it that way.

I assume you meant "native?" It fits into the sentence better than
"naive."
>
>(I'm asking because I myself have an unfortunate tendency to
>micro-optimise things, so perhaps others do, too.)
>
>/Jorgen

I was raised (programming-wise) at a time when every byte and every
clock cycle counted. So I tend to still operate that way, whether
it's necessary or not.

In this case, if I had an array of 100,000 FStringWs that needed one
of my string manipulations done to each member and saved in another
array, it would be faster to duplicate the entire array and then use
the "in place" version on each of the destination elements. Which
leads me to consider if I even need the versions that return by value.
One can always create an instance of a class if one doesn't want to
alter the original. Typing aloud here (Same as thinking aloud) we
would have:

FStringW fs1, fs2;
fs1 = L"I don't want to be changed";
fs2 = fs1;
fs2.StartIP(18);

Vs.

FStringW fs1, fs2;
fs1 = L"I don't want to be changed";
fs2 = fs1.Start(18);

Not counting the initial assignment to fs1, the first example copies
the string once, the second example copies the string three times.

I set up a timed loop for each of the two examples. The first
example averaged three times faster than the second.

I'm leaning toward eliminating the functions that return FStringW by
value. Creating a copy and altering it is much faster, even if it
does require a little more typing.

Ian Collins

unread,
Nov 14, 2013, 9:44:52 PM11/14/13
to
DSF wrote:
> On 14 Nov 2013 00:31:03 GMT, r...@zedat.fu-berlin.de (Stefan Ram)
> wrote:
>>
>> Return values are rvalues, they are moved, not copied.
>
> That's an issue of semantics. A copy followed by a delete is often
> called a move, but there's still a copy involved.
> Return by value:
> Copy local object to unnamed object created before function call.
> Call local object's destructor.
> Return from function.
> Copy unnamed object to target.
> Call unnamed object's destructor.
>
> Call it two "moves," but there's still two copies made.

Are you familiar with the concept of Return Value Optimisation? If not,
look it up. In most situations, what you think may be an expensive
operation, such as returning a container by value, isn't.

--
Ian Collins

DSF

unread,
Nov 15, 2013, 12:27:41 AM11/15/13
to
On Thu, 14 Nov 2013 00:49:11 -0600, Paavo Helde
<myfir...@osa.pri.ee> wrote:

>DSF <nota...@address.here> wrote in
>news:g9g789tnbu5lfsjal...@4ax.com:
>
>> Hello,
>>
>> I have a class I've been writing where I frequently need to return
>> the class by value so that class members can be changed without
>> affecting the original. The problem is all of the copying.
>
>How do you know? Have you profiled the code and seen this is a
>bottleneck?

Making a copy of the original string object and altering the copy
takes on third the time of the return by value method. The majority
of my coding involves string manipulation, so it will make a
difference. (See my response to Jorgen Grahn for details.)

{snipped my example}
>>
>> I tried creating a "global" temp that could be used in place of each
>> function that used 'tfs'.
>
>Whatever you do, don't do this. Global (non-const) variables are evil,
>they complicate the data flow, make the optimizations harder and will
>create lots of troubles in recursive functions or if you ever want to go
>multithreaded.

In the first place *nothing* in C/C++ programming is evil! Arguably,
the worst that C/C++ has(had) to offer was gets(). And I wouldn't
call it evil. Poor design? Yes. Written that way because perhaps
back at that time an 80-column terminal also had a one line input
limit? Maybe. Could be interesting research if I have some time
sometime.

Note that "global" is in quotations. It wasn't literally a global,
it was merely a class member so it would persist when member function
Start (or any other return by value function) returned. It would be
"global" storage for return by value functions for that particular
instance of the class that would persist until another RBV function is
called.

One could think of class member variables as local to the class but
global to all its members. One thing I like about C++ is that one can
have variables common to a group of functions without having to pass
them around all of the time or make them global.

I had written a database in C, and while there was nothing wrong
with it. I disliked the fact that I had to pass a pointer to a
structure to almost every function. So I re-wrote it in C++ and the
structure's members became members of the class. Quite a few of the
functions no longer required any parameters.

As far as recursive code goes, C++ shines with it as well. My
database had two recursive functions. Each of them required some data
to be available at any depth. In C, those had to be globals. In C++,
they became members of the class, allowing more than one instance of
the database in a program. I maintained both versions for a while,
but found myself "forgetting" to update the C version more and more
often until I stopped.

A directory-traversing class for Windows I've written uses the same
method and takes it one step further. If it encounters a mount point,
It creates an instance of the class and runs it with the mount point
as the starting directory, making the whole class recursive.

{More snipping}
>>
>> So the question is: Are there any techniques for minimizing the
>> number of copies that occur when returning by value?
>
>This is mostly a job for the optimizer. In your example, the return value
>optimization (RVO) might be used by the compiler, eliminating one copy
>without any work from your part.

My compiler is several thousand years old and does very little
optimization. (There's a post of mine in clc on just how literal it
is at producing assembly code that's very close to the C code that
generated it.)


>Writing simple and straightforward code often helps the optimizer to do
>its work better.

See above.

>If the profiler still tells you that there is a bottleneck in copy, then
>you have to do something about it. One way would be to keep unchanging
>parts of the object in shared use, by adding some reference counters and
>so on, but this may also cause troubles in multithreading and is actually
>not guaranteed to speed things up.

The bottleneck is not in the actual copying of the string, but
rather the number of times return by value needs to copy said string.

>A better way to increase performance is to review your algorithms. The
>above method looks a lot like substr(), but still copies the whole string
>even if only a tiny portion is going to be used later. It should allocate
>space for and copy over only the needed amount of characters.

Hmmmm. I don't seem to have a substr() function in my RTL. Must've
been added in the last 500 years or so. :o) The string class that
comes with the compiler (not part of the STL) does have a SubString
that takes a starting position and length.


>BTW, what is the reason you are using your own string class and not
>something based on std::basic_string?

I learn best by doing. I can read about how to do something over
and over and not get it, yet experiment with it and get the concept in
one or two attempts. My main interest is string manipulation, whilst
books concentrate on mathematical examples. So what better way to
learn the ins and outs of an object than to create a string class?
I've put a lot of time and energy into the string class and I'm
currently working on wrapping the 40 or so C string functions I've
written into the string class. As I mentioned in another posting in
this thread, making a copy of the string and performing the alteration
on the copy is ~three times faster than return by value. So I'm
seriously considering dropping the return by value versions and using
only the in place ones.

Oh! And when I started working on the string class, I hadn't heard
of the STL.
>>
>> The ideal situation would be if the returned object could become the
>> unnamed object that's passed to operator=. But I think the compiler
>> would have to do that. And there's always the chance the function
>> could return an expression i.e. return tfs1 + tfs2.
>
>> My second question relates to the naming of member functions. Most
>> string-altering functions I've written have two versions, one that
>> returns by value and leaves the original alone, and one that alters
>> the string in place. For now, I'm using:
>>
>> FStringW FStringW::Start(size_t end);
>> void FStringW::StartIP(size_t end);
>
>The first one should be 'const', this also helps a bit to distinguish it.

It actually is. No typing. Strictly C&P from the header file:

FStringW Start(size_t end) const;

>For names, I have sometimes used a convention like Foo and ApplyFoo.
>
>But in your example, naming the functions something like Left() and
>Truncate() would be much better than Start() and StartIP() ;-) And even
>better would be to get rid of your class and use std::wstring or
>something else based on std::basic_string instead. Selecting good names
>is important for shortening the learning curve of any reader (including
>yourself in the future), with std::wstring the learning curve ought to be
>zero.

Not much chance of me getting rid of my class. I've put too much
time and energy into it. Not to mention that doing so would require a
rewrite of every program I've written that I'm actively maintaining.

Interestingly, Start was originally called Left, but I changed it
because it sounded too "BASICy". Speaking of selecting good names, I
find quite a few of the names in the STL counterintuitive. You "add"
an item to a vector with "push_back." Why not "add"?

And speaking of "vector," is the definition of the word as it's used
here unique to C++ or programming languages in general? Every
definition I can find relates to the position of one object from
another based on the length of a line drawn between the two objects
and the angle of that line from a baseline drawn through the first
object.

Paavo Helde

unread,
Nov 15, 2013, 1:55:16 AM11/15/13
to
DSF <nota...@address.here> wrote in
news:vv6b89p1ampg8sqgg...@4ax.com:

> On Thu, 14 Nov 2013 00:49:11 -0600, Paavo Helde
> <myfir...@osa.pri.ee> wrote:
>
> Note that "global" is in quotations. It wasn't literally a global,
> it was merely a class member so it would persist when member function
> Start (or any other return by value function) returned. It would be
> "global" storage for return by value functions for that particular
> instance of the class that would persist until another RBV function is
> called.

OK, maybe I misunderstood you. Still, this design seems a bit fragile and
complicated.

[snipping long praise to OOP]

> My compiler is several thousand years old and does very little
> optimization. (There's a post of mine in clc on just how literal it
> is at producing assembly code that's very close to the C code that
> generated it.)

So it seems it's last time to ditch it. I do not follow c.l.c so I don't
know the reasons you are sticking to it, but I hope they are good.

>>BTW, what is the reason you are using your own string class and not
>>something based on std::basic_string?
>
> I learn best by doing. I can read about how to do something over
> and over and not get it, yet experiment with it and get the concept in
> one or two attempts. My main interest is string manipulation, whilst
> books concentrate on mathematical examples. So what better way to
> learn the ins and outs of an object than to create a string class?
> I've put a lot of time and energy into the string class and I'm
> currently working on wrapping the 40 or so C string functions I've
> written into the string class. As I mentioned in another posting in
> this thread, making a copy of the string and performing the alteration
> on the copy is ~three times faster than return by value. So I'm
> seriously considering dropping the return by value versions and using
> only the in place ones.
>
> Oh! And when I started working on the string class, I hadn't heard
> of the STL.

So, are you learning how to write a string class, or how to write your
application? If the former, then you certainly should study the design of
std::basic_string. While the interface is monstrous, it actually provides
the functions you seem to need. An example: instead of your

FStringW a, b;
a = ...
b = a.Start(18); // many copies of the whole string and substring (?)

with standard C++ you could write:

std::wstring a, b;
a = ...
b.assign(a, 0, 18); // 1 copy of substring

This is a bit more verbose than return by value, but has the benefit that
no optimizing skills are required from the compiler.

The assign method is documented e.g. at
http://www.cplusplus.com/reference/string/basic_string/assign/

> Not much chance of me getting rid of my class. I've put too much
> time and energy into it. Not to mention that doing so would require a
> rewrite of every program I've written that I'm actively maintaining.

So add a proper assign() method to it and use that instead of assignment
(which requires temporary objects and a good optimizing compiler).

>
> Interestingly, Start was originally called Left, but I changed it
> because it sounded too "BASICy".

Oh, so your goal seems to obstruct the code ;-) There are many better
ways to do that in C++, for example, you could overload some operators
like + or - to return substrings - much more fun! But "Start" is a good
start, considering that nothing seems to be started here.

> Speaking of selecting good names, I
> find quite a few of the names in the STL counterintuitive. You "add"
> an item to a vector with "push_back." Why not "add"?

Because "add" is too vague. It could be easily minsinterpreted to add a
number to each element of the vector, for example.

>
> And speaking of "vector," is the definition of the word as it's used
> here unique to C++ or programming languages in general? Every

It's for programming in general.

> definition I can find relates to the position of one object from
> another based on the length of a line drawn between the two objects
> and the angle of that line from a baseline drawn through the first
> object.

This is the physics/math term. Basically, it's the same (once one has
invented Cartesian coordinates), only the number of dimensions is
typically much more in programming. But there are also e.g. Hilbert
spaces with infinite number of dimensions, meaning the vectors in such
space would have infinitely many elements.

There is also "vector of attack" which is a bit different again.

But are you aware that the word "string" has even more meanings? From
superstring theory in theoretical physics, to violins and underwear?

Cheers
Paavo

Jorgen Grahn

unread,
Nov 15, 2013, 9:58:56 AM11/15/13
to
On Fri, 2013-11-15, DSF wrote:
> On 14 Nov 2013 00:17:25 GMT, Jorgen Grahn <grahn...@snipabacken.se>
> wrote:
>
>>On Wed, 2013-11-13, DSF wrote:
>>> Hello,
>>>
>>> I have a class I've been writing where I frequently need to return
>>> the class by value so that class members can be changed without
>>> affecting the original. The problem is all of the copying.
>>[...]
>>
>>Are you saying that doing it the naive, straightforward way causes
>>performance problems in your application? Because otherwise I'd stop
>>worrying and just do it that way.
>
> I assume you meant "native?" It fits into the sentence better than
> "naive."

No, I meant naive. Perhaps "obvious" is a better word. More
specifically, I meant if you need a modifiable copy of some object,
the obvious first choice is by copy contructor or operator= (), just
like for ints and plain old structs.

>>(I'm asking because I myself have an unfortunate tendency to
>>micro-optimise things, so perhaps others do, too.)
>>
>>/Jorgen
>
> I was raised (programming-wise) at a time when every byte and every
> clock cycle counted. So I tend to still operate that way, whether
> it's necessary or not.

Same here! But I try to improve. It find it helps if I mentally split
the code in time-critical and not critical parts. Then I can keep
obsessing over details in the smaller "critical" part, and just try to
be clear in the other one. It's like being an alcoholic, but only be
drunk one day a week.

(Not that clarity and speed are always mutually exclusive.)

> In this case, if I had an array of 100,000 FStringWs that needed one
> of my string manipulations done to each member and saved in another
> array, it would be faster to duplicate the entire array and then use
> the "in place" version on each of the destination elements. Which
> leads me to consider if I even need the versions that return by value.
> One can always create an instance of a class if one doesn't want to
> alter the original. Typing aloud here (Same as thinking aloud) we
> would have:
>
> FStringW fs1, fs2;
> fs1 = L"I don't want to be changed";
> fs2 = fs1;
> fs2.StartIP(18);
>
> Vs.
>
> FStringW fs1, fs2;
> fs1 = L"I don't want to be changed";
> fs2 = fs1.Start(18);
>
> Not counting the initial assignment to fs1, the first example copies
> the string once, the second example copies the string three times.
>
> I set up a timed loop for each of the two examples. The first
> example averaged three times faster than the second.

Yes, but does it make a difference in the program as a whole,
when it's in normal use? That's a more interesting question IMO.

> I'm leaning toward eliminating the functions that return FStringW by
> value. Creating a copy and altering it is much faster, even if it
> does require a little more typing.

0 new messages