Techniques to avoid minimal scope inefficiency with complex objects in loops in C++?

Martin B.

unread,

Jun 4, 2012, 9:43:42 PM6/4/12

to

Hello,

I'm currently wondering whether there's some way to avoid a certain kind
of inefficiency with "clean" code, namely that declaring complex objects
with minimal scope, where the scope is a loop, will inevitably lead to
unneccessary resource re-allocation. (See example below.)

Maybe someone here has another idea in addition to the answers already
provided on SO.

cheers,
Martin

+ + + +

Reference: [Techniques to avoid minimal scope inefficiency with complex
objects in loops in C++?][http://stackoverflow.com/q/10769289/321013]

+ + + +

# Question First

Is there an elegant solution in C++ to prevent one from having to
declare complex object variables that are only used within a loop
outside of the loop for efficiency reasons?

# Detailed explanation

A colleague has raised an interesting point wrt. to our code policy,
which states (paraphrased): *always use minimal scope for variables and
declare the variable at the first initialization*.

Coding Guide Example:

// [A] DO THIS
void f() {
...
for (int i=0; i!=n; ++i) {
const double x = calculate_x(i);
set_squares(i, x*x);
}
...
}

// [B] DON'T do this:
void f() {
int i;
int n;
double x;
...
for (i=0; i!=n; ++i) {
x = calculate_x(i);
set_squares(i, x*x);
}
...
}

This is all nice and well, and there's certainly nothing wrong with
this, until you move from primitive types to objects. (for a *certain
kind of interface*)

Example:

// [C]
void fs() {
...
for (int i=0; i!=n; ++i) {
string s;
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}
...
}

Here, the string s will be destructed, it's memory released every loop
cycle and then every cycle the `get_text` function will have to newly
allocate the memory for the s buffer.

It would be clearly more efficient to write:

// [D]
string s;
for (int i=0; i!=n; ++i) {
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}

as now the allocated memory in the s buffer will be preserved between
loop runs and it is very likely that we'll save on allocations.

*Disclaimer:* **Please note:** Since this is loops and we're talking
memory allocations, I do **not** consider it **premature optimization**
to think about this problem generally. Certainly there are cases and
loops where the overhead wouldn't matter; but `n` has the nagging
tendency to be larger that the Dev initially expects and the code has
the nagging tendency to be run in contexts where performance *does* matter.

Anyway, so now the more efficient way for the "general" loop construct
is to violate code locality and declare complex objects out of place,
"just in case". This makes me rather uneasy.

Note that I consider writing it like this:

// [E]
void fs() {
...
{
string s;
for (int i=0; i!=n; ++i) {
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}
}
...
}

is *no* solution as readability suffers even more!

**Thinking further**, the interface of the `get_text` function is
non-idiomatic anyway, as out params are *so* yesterday anyway and a
"good" interface would return by value:

// [F]
for (int i=0; i!=n; ++i) {
string s = get_text(i); // string get_text(int);
to_lower(s);
set_lower_text(i, s);
}

Here, we do *not pay double* for memory allocation, because it is
extremely likely that `s` will be constructed via RVO from the return
value, so for [F] we *pay the same* in allocation overhead as in [C].
*Unlike* the [C] case however, we can't optimize this interface variant.

So the **bottom line** seems to be that using minimal scope (can) hurt
performance and using clean interfaces -- I at least consider return by
value a lot cleaner than that out-ref-param stuff -- will prevent
optimization opportunities, at least in the general case.

The **problem** isn't so much that one would have to forgo clean code
for efficiency sometimes, the problem is that as soon as Devs start to
find such special cases, the whole Coding Guide (see [A], [B]) looses
authority.

The **question** now would be: see first paragraph

+ + + +

--
Good C++ code is better than good C code, but
bad C++ can be much, much worse than bad C code.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Mathias Gaunard

unread,

Jun 5, 2012, 3:52:21 PM6/5/12

to

On Jun 5, 3:43 am, "Martin B." <0xCDCDC...@gmx.at> wrote:

> # Question First
>
> Is there an elegant solution in C++ to prevent one from having to
> declare complex object variables that are only used within a loop
> outside of the loop for efficiency reasons?

No. The only alternative I can think of is to not use loops, but
rather algorithms.

--

Ike Naar

unread,

Jun 5, 2012, 8:32:04 PM6/5/12

to

Have you measured both variants? Which one was more efficient?

--

Dave Abrahams

unread,

Jun 6, 2012, 12:31:55 AM6/6/12

to

on Tue Jun 05 2012, Mathias Gaunard <loufoque-AT-gmail.com> wrote:

> On Jun 5, 3:43 am, "Martin B." <0xCDCDC...@gmx.at> wrote:
>
>> # Question First
>>
>> Is there an elegant solution in C++ to prevent one from having to
>> declare complex object variables that are only used within a loop
>> outside of the loop for efficiency reasons?
>
> No. The only alternative I can think of is to not use loops, but
> rather algorithms.

By which I think you mean "yes" :-). Algorithms are an elegant
abstraction that can hide the lack-of-elegance that may be required to
achieve efficiency.

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

andre

unread,

Jun 6, 2012, 8:02:52 PM6/6/12

to

>
> No. The only alternative I can think of is to not use loops, but
> rather algorithms.
>

In which case you won't know if there is unwanted copying: The
algorithm could
be following a similar guideline and have unwanted copying. Or the
algorithm
could prevent it by violating the same guideline...

At least your code "seems" to satisfy your code policy...

Dave Harris

unread,

Jun 6, 2012, 8:05:58 PM6/6/12

to

0xCDC...@gmx.at (Martin B.) wrote (abridged):

> The **problem** isn't so much that one would have to forgo clean
> code for efficiency sometimes, the problem is that as soon as
> Devs start to find such special cases, the whole Coding Guide
> (see [A], [B]) looses authority.

I think the problem is interpreting coding guides too dogmatically.
Sometimes we have to trade elegance for performance. Even if we can
avoid it in this specific case, we can't always. Programmers should
be encouraged to use their own judgement (and educated so their
judgements are sound).

And although you tried to deny it early, premature optimisation is
also often a problem. Sometimes you will find get_text() written
"the fast way" just on the off-chance that someone, someday, will
need call it many times in a tight loop. If someone wants to
sacrifice clean code for performance, they should have measurements
to justify it.

The best performance often comes, not from micro-optimisations like
reusing a buffer, but from rethinking what you are doing from a
higher level of abstraction. If your to_lower loop is truly a
bottle-neck, it might be worth looking into what get_text() and
set_lower_text() are doing. Lambda functions are often a good tool
for separating concerns. Move semantics can help avoid unnecessary
copies. Perhaps using parallel_for() instead of the explicit loops
would pay. That kind of optimisation is made harder when a single
string is reused across iterations.

parallel_for( 0, n, [=]( int i ) {
set_lower_text( i, to_lower( get_text( i ) ) );
} );

-- Dave Harris, Nottingham, UK.

--

Martin B.

unread,

Jun 6, 2012, 8:06:14 PM6/6/12

to

On 06.06.2012 06:31, Dave Abrahams wrote:
> on Tue Jun 05 2012, Mathias Gaunard<loufoque-AT-gmail.com> wrote:
>
>> On Jun 5, 3:43 am, "Martin B."<0xCDCDC...@gmx.at> wrote:
>>
>>> # Question First
>>>
>>> Is there an elegant solution in C++ to prevent one from having to
>>> declare complex object variables that are only used within a loop
>>> outside of the loop for efficiency reasons?
>>
>> No. The only alternative I can think of is to not use loops, but
>> rather algorithms.
>
> By which I think you mean "yes" :-). Algorithms are an elegant
> abstraction that can hide the lack-of-elegance that may be required to
> achieve efficiency.
>

Mathias, Dave - I fail to see how an algorithm could help. I really
don't get it.

Can you provide an example where an algorithm based solution avoids the
inefficiencies associated with declaring the temporary complex object
inside the loop body?

cheers,
Martin

--
Good C++ code is better than good C code, but
bad C++ can be much, much worse than bad C code.

Martin B.

unread,

Jun 6, 2012, 8:07:00 PM6/6/12

to

[D] is more efficient. It will re-use the buffer from string s a number
of times (except for the pathological case where the string lengths
increase strictly in steps larger than the allocation granularity of
string).
[C] will - must - delete the buffer of the s object each loop step and
do a fresh allocation.

cheers,
Martin

--
Good C++ code is better than good C code, but
bad C++ can be much, much worse than bad C code.

woodb...@gmail.com

unread,

Jun 6, 2012, 10:05:08 PM6/6/12

to

On Wednesday, June 6, 2012 8:07:00 PM UTC-4, Martin B. wrote:
> On 06.06.2012 02:32, Ike Naar wrote:

> > Have you measured both variants? Which one was more efficient?
> >
>
> [D] is more efficient. It will re-use the buffer from string s a number
> of times (except for the pathological case where the string lengths
> increase strictly in steps larger than the allocation granularity of
> string).
> [C] will - must - delete the buffer of the s object each loop step and
> do a fresh allocation.
>
> cheers,
> Martin
>
>

I use static locals in some places to help with this:

void
cmwAmbassador::mediateResponse (transactions_t::iterator itr)
{
bool safe_sending = true;
try {
localsendbuf.sock_ = itr->sock;
bool reqResult;
remoteMessages.Receive(cmwBuf, reqResult);

if (reqResult) {
setDirectory(itr->path.c_str());
static ::std::vector<File> outputFiles;
outputFiles.clear();
remoteMessages.Receive(cmwBuf, outputFiles);

save_lastruntime(itr->filename, itr->time_in_seconds);
safe_sending = false;
localMessages.Marshal(localsendbuf, true);
} else {
static ::std::string errorMsg;
errorMsg.clear();
remoteMessages.Receive(cmwBuf, errorMsg);
safe_sending = false;
localMessages.Marshal(localsendbuf, false, errorMsg);
}
localsendbuf.Flush();
} catch(::std::exception const& ex) {
#ifdef SYSLOG_AVAILABLE
syslog(LOG_ERR, "mediateResponse: %s", ex.what());
#endif
if (safe_sending) {
localMessages.Marshal(localsendbuf, false, ex.what());
localsendbuf.Flush();
}
}
}

(That function is part of this file --
http://webEbenezer.net/misc/cmwAmbassador.cc
and that file is part of this archive --
http://webEbenezer.net/misc/direct.tar.bz2
.)

That keeps the scope small and avoids the heap.
Usually someone will point out that statics don't
play well with multithreading. The cmwAmbassador is
currently single-threaded. G-d willing, in the
future we'll make it multi-threaded. The design of
the multi-threaded version of the program will be
such that a specific thread is responsible for calling
mediateResponse. So the statics will pose no more
problem then than they do today.

Brian Wood
Ebenezer Enterprises
http://webEbenezer.net

--

Dave Abrahams

unread,

Jun 7, 2012, 3:02:44 PM6/7/12

to

on Wed Jun 06 2012, "Martin B." <0xCDCDCDCD-AT-gmx.at> wrote:

> On 06.06.2012 06:31, Dave Abrahams wrote:
>> on Tue Jun 05 2012, Mathias Gaunard<loufoque-AT-gmail.com> wrote:
>>
>>> On Jun 5, 3:43 am, "Martin B."<0xCDCDC...@gmx.at> wrote:
>>>
>>>> # Question First
>
>>>>
>>>> Is there an elegant solution in C++ to prevent one from having to
>>>> declare complex object variables that are only used within a loop
>>>> outside of the loop for efficiency reasons?
>>>
>>> No. The only alternative I can think of is to not use loops, but
>>> rather algorithms.
>>
>> By which I think you mean "yes" :-). Algorithms are an elegant
>> abstraction that can hide the lack-of-elegance that may be required to
>> achieve efficiency.
>>
>
> Mathias, Dave - I fail to see how an algorithm could help. I really
> don't get it.
>
> Can you provide an example where an algorithm based solution avoids the
> inefficiencies associated with declaring the temporary complex object
> inside the loop body?

The implementor of the algorithm simply avoids declaring temporary
objects in his loops. The algorithm implementor is not (or should not
be) subject to your guideline. In fact, the guideline should probably
be rewritten in the form, "avoid doing XXX, or if you must, encapsulate
it." That's pretty much the right form for most things one might want
to generally prohibit, like casts.

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Seungbeom Kim

unread,

Jun 8, 2012, 12:21:43 AM6/8/12

to

On 2012-06-07 12:02, Dave Abrahams wrote:
> on Wed Jun 06 2012, "Martin B." <0xCDCDCDCD-AT-gmx.at> wrote:
>>
>> Mathias, Dave - I fail to see how an algorithm could help. I really
>> don't get it.
>>
>> Can you provide an example where an algorithm based solution avoids
>> the inefficiencies associated with declaring the temporary complex
>> object inside the loop body?
>
> The implementor of the algorithm simply avoids declaring temporary
> objects in his loops. The algorithm implementor is not (or should
> not be) subject to your guideline. In fact, the guideline should
> probably be rewritten in the form, "avoid doing XXX, or if you must,
> encapsulate it." That's pretty much the right form for most things
> one might want to generally prohibit, like casts.

What does an algorithm change with regard to the lifetime of
temporaries? Naturally, the functor may declare its own local
variable in the scope of the function to be called, or it may try to
use a variable that's supplied externally to the functor. In either
case, I don't see any fundamental differences from the situations with
the manually-written loops.

--
Seungbeom Kim

Dave Abrahams

unread,

Jun 8, 2012, 3:01:06 PM6/8/12

to

on Fri Jun 08 2012, Seungbeom Kim <musiphil-AT-bawi.org> wrote:

> On 2012-06-07 12:02, Dave Abrahams wrote:
>> on Wed Jun 06 2012, "Martin B." <0xCDCDCDCD-AT-gmx.at> wrote:
>>>
>>> Mathias, Dave - I fail to see how an algorithm could help. I really
>>> don't get it.
>>>
>>> Can you provide an example where an algorithm based solution avoids
>>> the inefficiencies associated with declaring the temporary complex
>>> object inside the loop body?
>>
>> The implementor of the algorithm simply avoids declaring temporary
>> objects in his loops. The algorithm implementor is not (or should
>> not be) subject to your guideline. In fact, the guideline should
>> probably be rewritten in the form, "avoid doing XXX, or if you must,
>> encapsulate it." That's pretty much the right form for most things
>> one might want to generally prohibit, like casts.
>
> What does an algorithm change with regard to the lifetime of
> temporaries? Naturally, the functor may declare its own local
> variable in the scope of the function to be called, or it may try to
> use a variable that's supplied externally to the functor. In either
> case, I don't see any fundamental differences from the situations with
> the manually-written loops.

The algorithm implementor creates whatever temporaries she wants,
wherever she wants them. The user of the algorithm can then stick to
the guideline. Does that help explain?

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Martin B.

unread,

Jun 8, 2012, 3:10:25 PM6/8/12

to

Ah! Ok, I get it.

Of course, iff I'm going to write an algorithm, it's gonna be as
efficient as it gets without contorting myself.

BUT -- that's really missing the point I think: The examples I gave in
the OP were just that: examples. People are *going to* write loops
that don't make much sense as algorithms (-> used once). And summed up
over the codebase, there will be a lot of these (different) loops. And
some (few?) of these loops will mess up performance in one (minor) way
or another.

And then I will have to mess around with the guidelines because some
took them too literally and now think they have to micro-optimize
everywhere.

I really asked this question because I though there might be a
technical, elegant solution out of this, but since noone posted
anything that I found particularly enlightening(*), I guess there
really isn't a technical solution to this, but it has to be addressed
on the pedagogical and psychological level :-)

(*): Although I have to say I found the idea of declaring a struct in
the for loop header intriguing. (If a bit weird.) `for(struct{
std::string s, int i } vals = {..., 0}; ...)`

cheers,
Martin

--

Dave Abrahams

unread,

Jun 8, 2012, 9:25:11 PM6/8/12

to

on Fri Jun 08 2012, "Martin B." <0xCDCDCDCD-AT-gmx.at> wrote:

>> The implementor of the algorithm simply avoids declaring temporary
>> objects in his loops. The algorithm implementor is not (or should
>> not be) subject to your guideline. In fact, the guideline should
>> probably be rewritten in the form, "avoid doing XXX, or if you must,
>> encapsulate it." That's pretty much the right form for most things
>> one might want to generally prohibit, like casts.
>
> Ah! Ok, I get it.
>
> Of course, iff I'm going to write an algorithm, it's gonna be as
> efficient as it gets without contorting myself.
>
> BUT -- that's really missing the point I think: The examples I gave in
> the OP were just that: examples. People are *going to* write loops
> that don't make much sense as algorithms (-> used once).

The fact that they're used once doesn't mean they make no sense as
algorithms. In fact, I'd bet 99% of those loops could be profitably
rewritten to use algorithms that already exist.

> And summed up over the codebase, there will be a lot of these
> (different) loops. And some (few?) of these loops will mess up
> performance in one (minor) way or another.

If they're important enough to optimize, they're important enough to
encapsulate.

> And then I will have to mess around with the guidelines because some
> took them too literally and now think they have to micro-optimize
> everywhere.

What would lead people to that conclusion?

> I really asked this question because I though there might be a
> technical, elegant solution out of this, but since noone posted
> anything that I found particularly enlightening(*), I guess there
> really isn't a technical solution to this, but it has to be addressed
> on the pedagogical and psychological level :-)
>
> (*): Although I have to say I found the idea of declaring a struct in
> the for loop header intriguing. (If a bit weird.) `for(struct{
> std::string s, int i } vals = {..., 0}; ...)`

Intriguing, but you should make sure it doesn't defeat optimization by
forcing some of those struct members out of registers.

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Martin B.

unread,

Jun 9, 2012, 10:09:07 AM6/9/12

to

On 09.06.2012 03:25, Dave Abrahams wrote:
> on Fri Jun 08 2012, "Martin B."<0xCDCDCDCD-AT-gmx.at> wrote:
>
>>> The implementor of the algorithm simply avoids declaring temporary
>>> objects in his loops. The algorithm implementor is not (or should
>>> not be) subject to your guideline. In fact, the guideline should
>>> probably be rewritten in the form, "avoid doing XXX, or if you
>>> must, encapsulate it." That's pretty much the right form for most
>>> things one might want to generally prohibit, like casts.
>>
>> Ah! Ok, I get it.
>>
>> Of course, iff I'm going to write an algorithm, it's gonna be as
>> efficient as it gets without contorting myself.
>>
>> BUT -- that's really missing the point I think: The examples I gave
>> in the OP were just that: examples. People are *going to* write
>> loops that don't make much sense as algorithms (-> used once).
>
> The fact that they're used once doesn't mean they make no sense as
> algorithms. In fact, I'd bet 99% of those loops could be profitably
> rewritten to use algorithms that already exist.

I guess that discussion is on another exhausting front.

*I* don't neccessarily disagree, but there seem to many people that do
disagree -- then again, those are probably the same people that find
algorithms weird and useless in general, and all I can say is that
it's hard and exhausting work to try to pr/t/each modern C++ to a team
not used to it.

>> And summed up over the codebase, there will be a lot of these
>> (different) loops. And some (few?) of these loops will mess up
>> performance in one (minor) way or another.
>
> If they're important enough to optimize, they're important enough to
> encapsulate.

Good point.

>> And then I will have to mess around with the guidelines because
>> some took them too literally and now think they have to
>> micro-optimize everywhere.
>
> What would lead people to that conclusion?

People already came to this conclusion, telling me we should extend
the guidelines so that devs are required to pre-declare complex
objects to avoid any inefficiencies in loops. Telling devs to do it
*always* certainly smells of premature optimization. That's why I
sought a elegant technical solution that "always" works. No silver
bullet I guess.

>> I really asked this question because I though there might be a
>> technical, elegant solution out of this, but since noone posted
>> anything that I found particularly enlightening(*), I guess there
>> really isn't a technical solution to this, but it has to be
>> addressed on the pedagogical and psychological level :-)
>>
>> (*): Although I have to say I found the idea of declaring a struct
>> in the for loop header intriguing. (If a bit weird.) `for(struct{
>> std::string s, int i } vals = {..., 0}; ...)`
>
> Intriguing, but you should make sure it doesn't defeat optimization
> by forcing some of those struct members out of registers.

Hmmm ... in the general case I'm not very concerned about registers
vs. no-registers, but "needless" heap de/allocations always make me
wary.

cheers,
Martin

--
Good C++ code is better than good C code, but
bad C++ can be much, much worse than bad C code.

Dave Harris

unread,

Jun 9, 2012, 2:20:34 PM6/9/12

to

0xCDC...@gmx.at (Martin B.) wrote (abridged):

> > Have you measured both variants? Which one was more efficient?
>
> [D] is more efficient. It will re-use the buffer from string s
> a number of times (except for the pathological case where the
> string lengths increase strictly in steps larger than the
> allocation granularity of string).
> [C] will - must - delete the buffer of the s object each loop step
> and do a fresh allocation.

True, but have you measured them? The difference may be smaller than
you think.

Much depends on the memory allocator. Nowadays some keep an array of
free lists, indexed by block size. In a case like this, where
essentially the same block is freed and reallocated over and over,
both allocation and deallocation could take O(1) time, operating only
at the head of the appropriate free list.

Where-as your string processing is O(N) in the length of the string.
First the characters have to be read from their source and written to
get_text's argument. Then to_lower has to read each character, figure
out its lower-case version, and write it. (And lower-casing can be
non-trivial if non-English languages are supported correctly.) Then
set_lower_text has to read each character and write it to its
destination.

Now let's think about cache coherency. Since it is the same block of
memory we are freeing and reallocating each time, it will likely
become resident in cache for the duration. Since we are dealing with
a large number of strings, the memory they take probably won't fit in
cache. You may find your execution time is so dominated by cache misses
that the difference between the optimised and unoptimised versions is
lost in the noise.

And that's without considering where the strings ultimately came from
or are going to. Were they read from disk or over a network? Are they
being rendered to the screen as pixels?

In a separate post I have tried to show how a more idiomatic version,
such as:
std::transform( begin(), end(), lower_begin(), to_lower );

by exploiting move semantics, might perform fewer string copies. If
your pattern of deallocation and allocation is cheaper than the cost
of copying strings, you may be optimising the wrong thing.

I appreciate your specific example was intended to make a more general
point. My picking at specific points is intended to make a point
equally general. Any optimisation that makes the code significantly
less elegant should be justified by measurement, or else it is, almost
by definition, premature optimisation.

-- Dave Harris, Nottingham, UK.

--

Dave Harris

unread,

Jun 9, 2012, 2:36:03 PM6/9/12

to

0xCDC...@gmx.at (Martin B.) wrote (abridged):

> I really asked this question because I though there might be a
> technical, elegant solution out of this, but since noone posted
> anything that I found particularly enlightening(*), I guess there
> really isn't a technical solution to this, but it has to be
> addressed on the pedagogical and psychological level :-)

Well, move semantics should help. Consider some straight-forward
code for expressing what you are doing:

for (int i = 0; i != n; ++i)

set_lower_text( i, to_lower( get_text( i ) ) );

I hope this passes your guidelines. Suppose the signatures are:

string get_text( int i );
string to_lower( string &&s );
void set_lower_text( int i, &&s );

So get_text() allocates and returns a copy. This is passed to to_lower() as an
r-value reference, so it can modify it in-place and return it without copying the
characters or doing an allocation. Set_lower_text() receives another r-value
reference, which it can move to its final destination, again without allocating or
copying. So we have one allocation and one copy, both in get_text(). If instead the
signatures are:

const string &get_text( int i );
string to_lower( const string &s );
void set_lower_text( int i, &&s );

then the count is the same but now the allocation and copy happens inside
to_lower(). We can provide both overloads of to_lower().

Now compare with your optimised code:

> string s;
> for (int i=0; i!=n; ++i) {
> get_text(i, s); // void get_text(int, string&);
> to_lower(s);
> set_lower_text(i, s);
> }

with signatures:

void get_text( int i, string &s );
void to_lower( string &s );
void set_lower_text( int i, const string &s );

Your version necessarily copies the string in get_text(), and again in
set_lower_text(). Set_lower_text() will probably have to do an allocation. (If
instead it takes an r-value reference and pilfers its argument's memory, s is left
empty and we gain nothing by moving it outside the loop.) So we have 1 allocation
and 2 copies. You've turned two lines of code into five and not really gained
anything over the plain version. You might have saved an allocation, but you've
definitely copied the characters an extra time, so if the allocator is efficient
(as I've argued elsewhere) you might even be slower.

Just write plain, simple code in C++11 and it will usually be fine.
If you can use iterators, so much the better:

std::transform( begin(), end(), lower_begin(), to_lower );

What's not to like?

If it really matters, I would look more at what get_text() and
set_lower_text() are actually doing. For example, supposing
set_lower_text() actually stores its argument in a vector called
lower_text:

void set_lower_text( int i, const string &s ) {
lower_text[i] = s;
// Enforce invariants here, if any.
}

You might consider using in-place modification explicitly. Add:

void set_lower_text( int i,
const std::function<void(string &)> fn ) {
fn( lower_text[i] );
// Enforce invariants here, if any.
}

so the caller code looks like:

for (int i = 0; i != n; ++i)
set_lower_text( i, [=]( string &lower_text ) {
get_text( i, lower_text );
to_lower( lower_text );
} );

This avoids the need for an intermediate buffer entirely. The work is
done directly in lower_text's own storage. If you can encapsulate the
loop, too, so much the better. You could present it as a kind-of
in-place transform:

template<class Iterator, class Func>
void set_all_lower_text( Iterator i, Func f ) {
for (auto j = lower_text.begin(); j != lower_text.end();
++j, ++i)
f( *j, *i );
// Enforce invariants here...
}
// ... or here.
}

Used like:

set_all_lower_text( text_cbegin(),
[]( string &lower_text, const string &text ) {
lower_text = text;
to_lower( lower_text );
} );

This version allows more control and economies of scale. For example,
if lower_text must be kept sorted, you can resort it once at the end,
instead of repeatedly after each string is changed. If you need to
notify listeners when the strings are changed, you can do so once
instead of once per string. If you need to lock the data against
multi-threaded access, you can do so with less overhead.

The key thing is for the lambda expression to express what we want to
do to each string in its purest and most efficient form, and then use
that with a custom algorithm that manages everything else. Separation
of concerns.

The reason I don't think this is a good solution for your particular
case is because, without measurements, it's premature optimisation
again. I don't advocate that every signature like:

void set_lower_text( int i, const string &s );

also have an in-place version like:

void set_lower_text( int i,
const std::function<void(string &)> fn );

However, where there is a proven need for performance, the in-place
version may be worth considering. And just adding the r-value
reference version:

void set_lower_text( int i, string &&s ) {
lower_text[i] = std::move( s );
// Enforce invariants here, if any.
}

gains efficiency without disrupting caller code at all.

(I have reordered these quotes for clarity.)

> Of course, iff I'm going to write an algorithm, it's gonna be
> as efficient as it gets without contorting myself.
>
> BUT -- that's really missing the point I think: The examples I
> gave in the OP were just that: examples. People are *going to*
> write loops that don't make much sense as algorithms (-> used
> once).

We may be saying the same thing here. Perhaps set_all_lower_text() is
what you had in mind for the efficient algorithm.

On the other hand, perhaps you are saying that anything used only
once shouldn't be an algorithm? If so, then I disagree. Even if
set_all_lower_text() is only used in one place, it's still providing
benefits like abstraction, information hiding, de-coupling and
separation of concerns. We have three components here - the source
of get_text(), the destination for set_lower_text(), and the code
that transfers between them - and none of them needs to know
implementation details of the other two.

I would further say that thinking in algorithms often leads to code
that is both elegant and efficient. Elegant because it is more
abstract and higher level. Efficient because modern C++ is all
about zero-cost abstractions, and clean code is easier to optimise.

Where-as a signature like:
void get_text( int i, string &result );

leads to inelegant code. It uses a side-effect to return its result,
so it can't be composed in a functional way. It forces the string to
be copied; a copy that can't be eliminated. It uses an integer index
instead of iterators so it doesn't place nicely with standard
algorithms. It forces you to use five or six lines of code where only
one or two should be needed.

> And summed up over the codebase, there will be a lot of these
> (different) loops. And some (few?) of these loops will mess up
> performance in one (minor) way or another.
>
> And then I will have to mess around with the guidelines

How does my code break your guidelines?

> because some took them too literally and now think they have to
> micro-optimize everywhere.

This does sound like pedagogical problem.

-- Dave Harris, Nottingham, UK.