Variables in for loop (style issue)

Maciej Sobczak

unread,

May 4, 2006, 6:05:20 AM5/4/06

to

Hi,

Consider this:

for (size_t i = 0; i != v.size(); ++i)
{
// do something with v[i]
}

The above is considered to be a bad practice, because it calls v.size()
every iteration and sugggests that the size of the vector might change,
which is actually not the intent.
The possible solution (and even stated as a rule in one coding standard) is:

const size_t vSize = v.size();
for (size_t i = 0; i != vSize; ++i)
{
// do something with v[i]
}

The above, however, breaks another coding standard rule saying that
variables used to control the loop should be declared in the scope of
the loop without polluting the enclosing scope (that's what the scoping
rules in for are about, right?).

The other solution might be this:

for (size_t i = 0, vSize = v.size(); i != vSize; ++i)
{
// do something with v[i]
}

But this, in turn, breaks another coding standard rule saying that
everything that could be const must be const (above, vSize cannot be
const, because it's declared together with i as non-const size_t).

The ultimate solution might be this:

{ // artificial enclosing scope

const size_t vSize = v.size();
for (size_t i = 0; i != vSize; ++i)
{
// do something with v[i]
}

}

But it smells unpleasantly because of this additional pair of braces and
the resulting indentation.

Conclusion [*] - there's no way to do it *right*.

The following might please me:

// not C++:
for ( {size_t i = 0; const size_t vSize = v.size(); }; i != vSize; ++i)
{
// do something with v[i]
}

What is your own style choice? Why?

[*] "Conclusion is the point where you got tired of thinking."

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Ulrich Eckhardt

unread,

May 4, 2006, 5:49:48 PM5/4/06

to

Maciej Sobczak wrote:
> for (size_t i = 0; i != v.size(); ++i)

[..]

> The above is considered to be a bad practice, because it calls v.size()
> every iteration and sugggests that the size of the vector might change,
> which is actually not the intent.

I wouldn't interpret too much into it, for me it is just the most simple
form to write a loop.

> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)

[..]

> The above, however, breaks another coding standard rule saying that
> variables used to control the loop should be declared in the scope of

> the loop without polluting the enclosing scope [...].

True.

> for (size_t i = 0, vSize = v.size(); i != vSize; ++i)

[...]

> But this, in turn, breaks another coding standard rule saying that
> everything that could be const must be const (above, vSize cannot be
> const, because it's declared together with i as non-const size_t).

> { // artificial enclosing scope

> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)

[...]

> }
>
> But it smells unpleasantly because of this additional pair of braces and
> the resulting indentation.

This one and the second one are too cumbersome/confusing, IMHO. Too much to
read and understand.

> What is your own style choice? Why?

The third one. Making things constant (the argument for #2 and against #3)
is important because then you can skip following code and you _know_ that
the value doesn't change. #3 however allows to keep the scope of the
variable small, so the impact of it not being constant is small.

Another one for you:
#42
for( std::pair<size_t, size_t const> range(0u, v.size());
range.first!=range.second;
++range.first)
[...]

;^)

Uli

ThosRTanner

unread,

May 4, 2006, 5:49:26 PM5/4/06

to

Maciej Sobczak wrote:
> Hi,
>
> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }
>
> The above is considered to be a bad practice, because it calls v.size()
> every iteration and sugggests that the size of the vector might change,
> which is actually not the intent.

That's probably micro-optimisation. v.size() is likely to involve a
single fetch from memory, or 2 if v doesn't happen to be a register. It
doesn't 'suggest' that the size of v might change - but it is safe if
it doesn't.

> The possible solution (and even stated as a rule in one coding standard) is:
>
> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> The above, however, breaks another coding standard rule saying that
> variables used to control the loop should be declared in the scope of
> the loop without polluting the enclosing scope (that's what the scoping
> rules in for are about, right?).
>
> The other solution might be this:
>
> for (size_t i = 0, vSize = v.size(); i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> But this, in turn, breaks another coding standard rule saying that
> everything that could be const must be const (above, vSize cannot be
> const, because it's declared together with i as non-const size_t).
>
> The ultimate solution might be this:
>
> { // artificial enclosing scope
>
> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> }
>
> But it smells unpleasantly because of this additional pair of braces and
> the resulting indentation.
>
> Conclusion [*] - there's no way to do it *right*.

Well, that's true!

> The following might please me:
>
> // not C++:
> for ( {size_t i = 0; const size_t vSize = v.size(); }; i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> What is your own style choice? Why?

Style 1, because it is the only one that is safe if the size of v
changes.

IF the calculation of the end of the loop is demonstrably expensive
(i.e. something other than v.size()), then I will use one of the other
styles. The amount of braceing depends on the circumstances.

Having said which, I'd like something along the lines of your last
suggestion - I've occasionally had requirements to code a loop where it
is helpful to have 2 variables changing over the course of the loop,
and if they aren't the same type, you have 2 declare one outside the
loop.

Carlos Moreno

unread,

May 4, 2006, 5:56:25 PM5/4/06

to

Maciej Sobczak wrote:
> Hi,
>
> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }
>
> The above is considered to be a bad practice, because it calls v.size()
> every iteration and sugggests that the size of the vector might change,
> which is actually not the intent.

Huh?? This is really new (and shocking!!) to me...

(which I guess gives away what is going to be my answer to your
question later in the message -- I'll try to surprise you, though)

To me, the above does not suggest that v.size() is going to change;
nor am I worried about efficiency issues because of calling the
function -- first, because any difference would most definitely
have no effect on the overall efficiency of the application; and
second, and more importantly, because there really is no difference
whatsoever in runtime performance between the two alternatives
(I think it would have to be a really lousy compiler and/or a
lousy vector implementation so that we could see a noticeable
difference)

> The possible solution (and even stated as a rule in one coding standard) is:
>
> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> The above, however, breaks another coding standard rule saying that
> variables used to control the loop should be declared in the scope of
> the loop

Actually, it breaks a more important rule, IMHO -- it just makes
you wonder "what the hell is that and why is this guy overcomplicating
things when we should just put i < v.size()"

> The other solution might be this:
>
> for (size_t i = 0, vSize = v.size(); i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> But this, in turn, breaks another coding standard rule saying that
> everything that could be const must be const

Again, IMHO, it breaks a more important rule -- the rule that says
that it should be possible for a normal human being to read the
code -- the above really borders the realm of write-only code.

> Conclusion [*] - there's no way to do it *right*.

I disagree -- maybe my threshold for deciding when I'm tired to
think is even before starting to think about it; but the way
I see it, the whole thing is unnecessarily overanalyzed -- using
i < v.size() is the "perfect" solution -- it can not say the
intent in a more obvious way -- how do you know the size of the
container? why, you ask the container, of course; going further
to observe that v.size() is being "called" muiltiple times falls
in the realm of "premature optimization" (well, or rather, the
typical analysis, the typical mentality that leads to premature
optimization issues)

> What is your own style choice? Why?

Use an STL algorithm!! (did I surprise you? :-))

Carlos
--

Greg Herlihy

unread,

May 4, 2006, 5:59:40 PM5/4/06

to

Maciej Sobczak wrote:
> Hi,
>
> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }
>
> The above is considered to be a bad practice, because it calls v.size()
> every iteration and sugggests that the size of the vector might change,
> which is actually not the intent.

[various alternatives considered]

> Conclusion [*] - there's no way to do it *right*.

You left out the obvious solution:

for (size_t i = v.size(); i > 0; --i)
{
// do something with v[i-1];
}

Iterating a generic container classes is probably best done with a
generic function. In that case, the program would call a function such
std::for_each and not have a for loop at all. The analogy that I use is
the difference between a mechanical and a solid state device. For loops
have all these "moving parts" to support the loop machinery; but they
have nothing else to do with the purpose of the loop - they are just a
distraction.

So encasing all of that loop logic in a single of code certainly makes
sense:

std::for_each( v.begin(), v.end(),/*do something with each item */);

Of course the difficulty is often devising the third parameter - the
one that does the work. Some programmers will go to any length in order
to use for_each. Personally, I'm not quite so doctrinaire - and I think
better iterators and better function objects are still needed before
the for loop can be declared obsolete.

Greg

Andrew Koenig

unread,

May 4, 2006, 5:57:26 PM5/4/06

to

"Maciej Sobczak" <no....@no.spam.com> wrote in message
news:e3c53n$6en$1...@sunnews.cern.ch...

> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }

> What is your own style choice? Why?

for (v_iter_type it = v.begin(); it != v.end(); ++it) {
// do something with *it
}

I prefer this form because it doesn't use v's random-access ability when it
doesn't need to do so.

Tomás

unread,

May 4, 2006, 5:56:02 PM5/4/06

to

Maciej Sobczak posted:

> Hi,
>
> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }
>
> The above is considered to be a bad practice, because it calls v.size()
> every iteration and sugggests that the size of the vector might change,
> which is actually not the intent.

I agree that the code is inefficient.

<snip>

> { // artificial enclosing scope
>
> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> }
>
> But it smells unpleasantly because of this additional pair of braces
> and the resulting indentation.

Your above sample is exactly how I do it. It's perfect in my opinion.

People who come up with "programming rules" tend to be narrow-minded.
Sure, there's things I avoid, but I'm always open to using every feature of
the language to achieve my objective. If you ever tell me that pointers are
bad, or that "goto" is bad, I'll just laugh in your face. I see it as a sign
of weakness that a programmer is hesitant to use the features placed before
them -- it conveys that they feel incompetant.

Why don't they have pointers in Java? Because programmers felt
incompetant.

> Conclusion [*] - there's no way to do it *right*.
>
> The following might please me:
>
> // not C++:
> for ( {size_t i = 0; const size_t vSize = v.size(); }; i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> What is your own style choice? Why?

This is just my own opinion, but I think you need to use your own head
more. Can't you see that the code with the "artifical braces" achieves your
objective perfectly? It has the right scope, and you get to make your object
constant.

-Tomás

Bob Hairgrove

unread,

May 4, 2006, 5:51:47 PM5/4/06

to

On 4 May 2006 06:05:20 -0400, Maciej Sobczak <no....@no.spam.com>
wrote:

One big problem is the fact that not all containers are vectors, and
calling size() is very inefficient for almost all containers except
for std::vector.

How about this:

for (size_t i=0, std::vector<whatever_v_is>::iterator it = v.begin();
it != v.end(); ++i, ++it) {
// etc.
}

I think this solves all the above problems nicely.

--
Bob Hairgrove
NoSpam...@Home.com

Nicola Musatti

unread,

May 4, 2006, 6:00:45 PM5/4/06

to

Maciej Sobczak wrote:
> Hi,
>
> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }

[discussion and unsatisfactory alternatives snipped]

> What is your own style choice? Why?

Stick to your original example and only worry about it if I have
documented performance problems.

Because I share your analysis and the first example is at least the
simplest and most common.

Cheers,
Nicola Musatti

benben

unread,

May 4, 2006, 6:01:28 PM5/4/06

to

Maciej Sobczak wrote:
> Hi,
>
> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }
>
> The above is considered to be a bad practice, because it calls v.size()
> every iteration and sugggests that the size of the vector might change,
> which is actually not the intent.

I don't see it this way. The above iteration is common practice and
whether or not the iteration alters the size of the vector should be
made very clear in the loop body.

If for any reason the loop does change the size of the vector then
size() member is actually protecting you.

> The possible solution (and even stated as a rule in one coding standard) is:
>
> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> The above, however, breaks another coding standard rule saying that
> variables used to control the loop should be declared in the scope of
> the loop without polluting the enclosing scope (that's what the scoping
> rules in for are about, right?).

This case is worse than the previous one since it reduces the
readability considerably.

>
> The other solution might be this:
>
> for (size_t i = 0, vSize = v.size(); i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> But this, in turn, breaks another coding standard rule saying that
> everything that could be const must be const (above, vSize cannot be
> const, because it's declared together with i as non-const size_t).

Same with case 2. Very poor readability. This can possibly confuse
people because they will scratch their head trying to find out why you
do so.

>
> The ultimate solution might be this:
>
> { // artificial enclosing scope
>
> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> }
>
> But it smells unpleasantly because of this additional pair of braces and
> the resulting indentation.
>
> Conclusion [*] - there's no way to do it *right*.

The right way is to stick to the first sample, because that's what we
all do and we all understand.

Ideally, the loop body should be concise enough and clear enough to show
that the size of the vector is not changed:

// The following loop clearly shows that the vector's size
// is not changed.
for (size_t i = 0; i < v.size(); ++i)
{
v[i] = 2*v[i];
cout << v[i] << ", ";
}

If what's in the loop occupies more than a page of code, I would
recommend you factor out the body to a functor. Then you can use
std::for_each family to do the loop. This is perhaps the cleanest way:

std::for_each(v.begin(), v.end(), multiply_and_print<int>(2, ", "));

Regards,
Ben

Pete Becker

unread,

May 4, 2006, 6:00:02 PM5/4/06

to

Maciej Sobczak wrote:

>
> What is your own style choice? Why?
>
>

Spending hours internally debating stylistic minutiae is not productive.
Write the loop however you want and get on with your real work.

--

Pete Becker
Roundhouse Consulting, Ltd.

Victor Bazarov

unread,

May 4, 2006, 6:00:24 PM5/4/06

to

I use the third. It seems to have the balance of performance and
coding safety I usually need.

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Martin Bonner

unread,

May 4, 2006, 6:01:50 PM5/4/06

to

My choice? One of:

for (size_t i = 0; i < v.size(); i++)

{
// do something with v[i]
}

or

for (v_t::iterator it = v.begin(); it != v.end(); ++it)
{
/// use *it or it->
}

Accessing v.size / v.end mutiple times doesn't bother me. I haven't
ever seen profile evidence suggesting it is too slow, and it just
doesn't seem likely to cost very much.

I think the two patterns are common enough (in my code anyway), that
people will just recognize them.

I do quite like the idea of:
BOOST_FOREACH( foo f, v )
{
// Use f instead of v[i] / *it.

Carl Barron

unread,

May 5, 2006, 4:46:40 AM5/5/06

to

In article <1146757031.5...@i39g2000cwa.googlegroups.com>,
Martin Bonner <martin...@yahoo.co.uk> wrote:

> Maciej Sobczak wrote:
> > Hi,
> >
> > Consider this:
> >
> > for (size_t i = 0; i != v.size(); ++i)
> > {
> > // do something with v[i]
> > }
> >
> > The above is considered to be a bad practice, because it calls v.size()
> > every iteration and sugggests that the size of the vector might change,
> > which is actually not the intent.
> > The possible solution (and even stated as a rule in one coding standard) is:

> > [snip]
1) if it works well enough don't fix it.:) In a real program there
are likely to be bigger problems...

2) use algorithms from <algorithm> If I wrote the original loop it
would probably done this way in the first place. Boost's iterator
library can be a great help.

3) construct multiple objects of the same type

for(std::vector<int>::iterator it(x.begin()),end(x.end());
it != end;++it)
{
}
or
for(int i(0),i_max(x.size());i!=i_max;++i)
{
}
anyone confused with this needs a refresher course in elementary C++:)
4) if the types are different then the 'block solution' is what I
use.
{
type_1 x(...);
type_2 y(...);
for(/* initialized above */; ...;...){}

kevin cline

unread,

May 5, 2006, 4:43:52 AM5/5/06

to

Maciej Sobczak wrote:
> Hi,
>
> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }
>
> The above is considered to be a bad practice,

By whom?

> because it calls v.size()
> every iteration and sugggests that the size of the vector might change,
> which is actually not the intent.

This is the my choice, because it is shortest and easiest to read.

> The possible solution (and even stated as a rule in one coding standard) is:
>
> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }

This sacrifices readability to possibly save one memory access per
iteration. I would do this only if this code were critical to
performance and the loop body were so tight that saving one memory
access would be noticable.

> The above, however, breaks another coding standard rule saying that
> variables used to control the loop should be declared in the scope of
> the loop without polluting the enclosing scope (that's what the scoping
> rules in for are about, right?).
>
> The other solution might be this:
>
> for (size_t i = 0, vSize = v.size(); i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> But this, in turn, breaks another coding standard rule saying that
> everything that could be const must be const (above, vSize cannot be
> const, because it's declared together with i as non-const size_t).

You have too many rules. It should be obvious that vSize is only
assigned once. If it's not, then the function is too long.

>
> The ultimate solution might be this:
>
> { // artificial enclosing scope
>
> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }
>
> }
>

This style of coding is a symptom of obsessive-compulsive disorder.

John Fly

unread,

May 5, 2006, 4:44:46 AM5/5/06

to

I actually do use :

{ // artificial enclosing scope

const size_t vSize = v.size();
for (size_t i = 0; i != vSize; ++i)
{
// do something with v[i]
}

}

Andrei Alexandrescu (See Website For Email)

unread,

May 5, 2006, 4:46:12 AM5/5/06

to

Andrew Koenig wrote:
> for (v_iter_type it = v.begin(); it != v.end(); ++it) {
> // do something with *it
> }
>
> I prefer this form because it doesn't use v's random-access ability when it
> doesn't need to do so.

As it turns out, I had to shy away from this style and use indices in my
code as of late. Current compilers and processor guidelines all advice
preferring indices to pointers (especially when calling functions)
because the compiler has a lot less work to do to figure out there's no
aliasing. It's kinda sad - I do like the iterator style.

Andrei

Arndt Muehlenfeld

unread,

May 5, 2006, 4:57:30 AM5/5/06

to

Maciej Sobczak wrote:

> Hi,
>
> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }
>
> The above is considered to be a bad practice, because it calls v.size()
> every iteration and sugggests that the size of the vector might change,
> which is actually not the intent.

It is always funny to see people do "optimizations" like:

for (int i = max_i; i > 0; --i)
for (int j = max_j; j > 0; ++j)
... some code ...

"because it is faster" which today in most cases is not true,
when, for instance, the above encoded algorithm could be solved
in O(n log n) instead of O(n*n) time.

Since even that is irrelevant for small n and as long as you don't run into
well documented performance problems I like Pete Beckers answer most:

> Write the loop however you want and get on with your real work

Cheers,
Arndt

Alberto Ganesh Barbati

unread,

May 6, 2006, 10:17:40 AM5/6/06

to

Bob Hairgrove ha scritto:

>
> How about this:
>
> for (size_t i=0, std::vector<whatever_v_is>::iterator it = v.begin();
> it != v.end(); ++i, ++it) {
> // etc.
> }
>
> I think this solves all the above problems nicely.

It doesn't solve anything. First of all v.end() would be re-evaluated
each iteration as v.size() in the OP's original example. But, most
importantly, the code is ill-formed: you just can't declare two
variables with different types in a for clause!

Ganesh

Alberto Ganesh Barbati

unread,

May 6, 2006, 10:18:05 AM5/6/06

to

Greg Herlihy ha scritto:

>
> So encasing all of that loop logic in a single of code certainly makes
> sense:
>
> std::for_each( v.begin(), v.end(),/*do something with each item */);
>
> Of course the difficulty is often devising the third parameter - the
> one that does the work. Some programmers will go to any length in order
> to use for_each. Personally, I'm not quite so doctrinaire - and I think
> better iterators and better function objects are still needed before
> the for loop can be declared obsolete.
>

We now have an alternative that doesn't require function objects:

BOOST_FOREACH(Type e, v)
{
// do something with element e
}

Looks like magic, isn't it?

Ganesh

Maciej Sobczak

unread,

May 6, 2006, 10:13:36 AM5/6/06

to

OK, it looks that I got all possible responses. :D

I will try to address them all in this message, to avoid replying to
everybody.

Carlos Moreno wrote:

>>Consider this:
>>
>>for (size_t i = 0; i != v.size(); ++i)
>>{
>> // do something with v[i]
>>}
>>
>>The above is considered to be a bad practice, because it calls v.size()
>>every iteration and sugggests that the size of the vector might change,
>>which is actually not the intent.

> To me, the above does not suggest that v.size() is going to change;

> nor am I worried about efficiency issues because of calling the
> function

I'm not worried about performance *at all*, so all replies that took
performance as a guideline (or dismissed performance as a guideline) are
missing my point.

The point is that the only reason to call some function twice is to
express the fact that it might not return the same value twice.
If you know that the function is going to return the same value, then
calling it more than once is just sloppy. IMHO, of course.

*If* I'm coding a loop that might change the size of the vector, then
I'm defensively coding the way presented above. But when I know that the
size is not going to change, the I want to see exactly one call to size().

That's why I don't like the above.

>>The possible solution (and even stated as a rule in one coding standard) is:
>>
>>const size_t vSize = v.size();
>>for (size_t i = 0; i != vSize; ++i)
>>{
>> // do something with v[i]
>>}

> Actually, it breaks a more important rule, IMHO -- it just makes

> you wonder "what the hell is that and why is this guy overcomplicating
> things when we should just put i < v.size()"

Actually, it's a rule in one coding standard. Really.
And it expresses the fact that the size is not going to change.
The only problem is that it introduces in the outer scope a new name
that is needed only for the purpose of looping.

BTW - one of the responses mentioned that it should be clear from the
body of the loop whether the container is being resized or not.
Actually, I like the idea of being able to deduce from the "header" of
the loop only what will be the number of iterations. I should not need
to inspect the body of the loop for this, unless it's really the case.

>>for (size_t i = 0, vSize = v.size(); i != vSize; ++i)
>>{
>> // do something with v[i]
>>}

> Again, IMHO, it breaks a more important rule -- the rule that says

> that it should be possible for a normal human being to read the
> code -- the above really borders the realm of write-only code.

I agree with you that it's not what the beginners books are teaching and
therefore not what the "average" programmer will understand with the
first reading. This is not the optimal solution.

>>What is your own style choice? Why?
>
> Use an STL algorithm!! (did I surprise you? :-))

No, you didn't surprise me. It's just that the body of the loop does not
fit into the third argument of for_each. :)
And without *true* lambda there is no good solution for this.

Note also that the problem that I'm talking about is not really about
iterating over the vector. It's about looping. That's why the responses
that suggest the use of iterators do not solve anything.

Thanks to everybody for replying so far (and I hope it's not the end of
the discussion), your responses confirmed me that the subject is not
black and white.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

kanze

unread,

May 6, 2006, 10:22:05 AM5/6/06

to

Pete Becker wrote:
> Maciej Sobczak wrote:

> > What is your own style choice? Why?

> Spending hours internally debating stylistic minutiae is not
> productive. Write the loop however you want and get on with
> your real work.

Good advice. Except that for most of us, it would be "write the
loop however your bosses want".

More generally, do whatever the others in the project are doing.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,

May 6, 2006, 10:20:22 AM5/6/06

to

Andrew Koenig wrote:
> "Maciej Sobczak" <no....@no.spam.com> wrote in message
> news:e3c53n$6en$1...@sunnews.cern.ch...

> > Consider this:

> > for (size_t i = 0; i != v.size(); ++i)
> > {
> > // do something with v[i]
> > }

> > What is your own style choice? Why?

> for (v_iter_type it = v.begin(); it != v.end(); ++it) {
> // do something with *it
> }

> I prefer this form because it doesn't use v's random-access
> ability when it doesn't need to do so.

Do we, or don't we? Maciej didn't say. And random-access isn't
the only reason we might prefer an index: suppose that the
something we do with "v[i]" is "v[i] += i".

My own interpretation of his posting is that it was just an
example. He could have used your example just as well, and
asked the same question: the actual benchmarks I've run (some
time ago) showed that accessing v.end() each time through the
loop DID make a difference. Of course, it was only a small
difference; for it to make a difference in an actual
application, it would have to be a very, very tight loop in a
very critical path. In other words, I write it like you do,
unless I have some need of the index in the loop, in which case,
I use Maciej's first solution.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,

May 6, 2006, 10:22:26 AM5/6/06

to

Maciej Sobczak wrote:

> Consider this:

> for (size_t i = 0; i != v.size(); ++i)
> {
> // do something with v[i]
> }

> The above is considered to be a bad practice, because it calls
> v.size() every iteration and sugggests that the size of the
> vector might change, which is actually not the intent.

Are you kidding? The first and most important rule is the KISS
rule. Don't add unnecessary complication. The simpler the
better. I'd use an iterator rather than an index if I didn't
need the index in the loop, and I'd use an algorithm rather than
a loop if there was a suitable one available.

> The possible solution (and even stated as a rule in one coding
> standard) is:

> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }

> The above, however, breaks another coding standard rule saying
> that variables used to control the loop should be declared in
> the scope of the loop without polluting the enclosing scope
> (that's what the scoping rules in for are about, right?).

So you've got a broken rule. If (and only if) the profiler
shows that you cannot afford the call v.size() each time through
the loop, you use this idiom.

> The other solution might be this:

> for (size_t i = 0, vSize = v.size(); i != vSize; ++i)
> {
> // do something with v[i]
> }

> But this, in turn, breaks another coding standard rule saying
> that everything that could be const must be const (above,
> vSize cannot be const, because it's declared together with i
> as non-const size_t).

It also breaks the usual rule against declaring two variables in
the same statement. I know that I have trouble reading such
things, and I imagine that others do as well.

> The ultimate solution might be this:

> { // artificial enclosing scope

> const size_t vSize = v.size();
> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]
> }

> }

> But it smells unpleasantly because of this additional pair of
> braces and the resulting indentation.

Unnecessary complication, for no gain.

There are special cases where an extra pair of braces to control
lifetime are justified. I do it a lot in unit tests, for
example (and every unit test framework I've seen seems to do
so), and I do it at times with older versions of Sun CC (which
don't support the standard lifetime of temporaries). But it
takes some justification, and there isn't any here.

> Conclusion [*] - there's no way to do it *right*.

I fail to see any problem with your first suggestion. It's what
I use, and it's really the only variant I've ever seen in actual
code. It's the easiest to write, and the easiest to get right;
anything else is just extra work -- even just thinking about the
issue is a waste of time which could be more productively spent.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,

May 6, 2006, 10:21:43 AM5/6/06

to

Bob Hairgrove wrote:

[...]

> One big problem is the fact that not all containers are
> vectors, and calling size() is very inefficient for almost all
> containers except for std::vector.

Is it? His code in the loop said "do something with v[i]".
What container supports v[i], and has an inefficient size()
function.

> How about this:

> for (size_t i=0, std::vector<whatever_v_is>::iterator it = v.begin();
> it != v.end(); ++i, ++it) {
> // etc.
> }

> I think this solves all the above problems nicely.

But introduces a new one: it's not legal C++, and won't compile.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Victor Bazarov

unread,

May 6, 2006, 11:18:07 AM5/6/06

to

Arndt Muehlenfeld wrote:
> [..]

> It is always funny to see people do "optimizations" like:
>
> for (int i = max_i; i > 0; --i)
> for (int j = max_j; j > 0; ++j)

Really? *PLUS-PLUS* j?

> ... some code ...
>
> "because it is faster" which today in most cases is not true,
> when, for instance, the above encoded algorithm could be solved
> in O(n log n) instead of O(n*n) time.

Do tell more.

> [..]

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Carlos Moreno

unread,

May 6, 2006, 11:20:06 AM5/6/06

to

Pete Becker wrote:

>>What is your own style choice? Why?
>
> Spending hours internally debating stylistic minutiae is not productive.
> Write the loop however you want and get on with your real work.

I kind of disagree with this view.

It is worth spending hours debating these kinds of issues whenever
you do it once and for all -- that is, after long debate, *if* there
is evidence to support one specific conclusion, then you stick to
that style afterwards, without spending again the hours of debate
on a per-case basis.

Kind of the "pass std::strings by reference to const" -- many books
(even books from high-reputation authors) spend pages and pages in
this "debate" (well, at this point is no longer a debate -- but the
authors present it in the form of the original debate, so that the
reader/newcomer can understand the rationale). But the conclusion
is: "understand the reasons, but then do it without even thinking
about it"

I get this same feeling from Maciej's original question -- even though
I was indeed surprised with his analysis: I thought this was an
issue simple enough and settled enough to avoid the debate at this
point in time.

Carlos
--

Andrei Alexandrescu (See Website For Email)

unread,

May 6, 2006, 11:31:57 AM5/6/06

to

Maciej Sobczak wrote:
> Hi,

>
> Consider this:
>
> for (size_t i = 0; i != v.size(); ++i)
> {

> // do something with v[i]
> }

I followed the thread with a lot of interest because I am curious about
people's experience and general attitude in such a small matter.

I've noticed a few basic patterns of behavior when it comes to small
style issues. On one hand, (1) there are people who can be highly
effective even though their preoccupation for small issues is absent.
Then, there are others (2) who are not very good programmers although
they preoccupy themselves with small matters quite a lot. Then there are
those who (3) have bad habits at the low level and high level alike, and
finally those who (4) try to develop good habits at all levels - small
and large.

I myself fall in (4), so my opinion is biased. Also, I believe opinions
in this group might be biased towards (1) and (4) because of self-selection.

I'd call (3) and (4) "fractal pattern of behavior" because the habits at
high level mirror habits at low level. The main observation that I
wanted to share is the distribution. An overwhelming majority of the
programmers I know do exhibit a fractal pattern of behavior. There are
notable exceptions, and I could definitely name at least a few
programmers who fall in (1) and (2). But the most likely behavior is,
sloppy lines of code come with sloppy designs, and clear lines of code
come with clear designs.

Within that context, I believe a basic answer "get a life" to the OP
misses both OP's inclination and the point of his question, which I take
was: given some coding guidelines that are all non-competitive (don't
contradict one another), how come a basic language construct doesn't
allow me to follow them all easily and cleanly? And I believe that's a
valid question, for both the philosophical implications (for lack of a
more snobbish word) and the telluric, teleological implications (for
lack of two more snobbish words, one of which is prone to creating
confusion). The concrete implications have support in the anecdotal
evidence that many programming errors happen due to loops, so it strikes
one as odd that such a basic construct is so hard to use orderly.

About the efficiency issue, it's been said that it shouldn't be a
concern most of the time. I would agree with such arguments when
efficiency is at odds with other considerations, but ideally we should
be on lookout for something structured, nice, elegant, _and_ efficient.
Then you can afford to write good code that's naturally efficient. And
in the case of a for loop with fixed bounds there's no law of physics
standing in the way of a safe and efficient looping construct.

To answer the "what do you do" question, I took some time and wrote my
own FOR_RANGE macro, akin to Eric Niebler's excellent FOREACH. I guess
this is testament to the fact that I fall straight within the (4)
category above. My macro is less portable (uses typeof), but provably
more efficient (and I happen to need that for the kind of code I write)
and simpler to implement. It goes like this:

FOR_RANGE(i, 0, v.size()) {
...
}

and guarantees that the bounds will be evaluated only once and that
client code can't mutate i throughout the loop.

Andrei

benben

unread,

May 6, 2006, 1:19:29 PM5/6/06

to

Tomás wrote:
> Maciej Sobczak posted:
>
>> Hi,
>>
>> Consider this:
>>
>> for (size_t i = 0; i != v.size(); ++i)
>> {
>> // do something with v[i]
>> }
>>
>> The above is considered to be a bad practice, because it calls v.size()
>> every iteration and sugggests that the size of the vector might change,
>> which is actually not the intent.
>
>
> I agree that the code is inefficient.

What makes you think the code is inefficient? A member function as
simple as size() would almost definitely be inlined.

Ben

[snip]

Robert Mabee

unread,

May 7, 2006, 8:43:05 AM5/7/06

to

Greg Herlihy wrote:
> You left out the obvious solution:
>
> for (size_t i = v.size(); i > 0; --i)
> {
> // do something with v[i-1];
> }

I first saw this as a micro-optimization, especially on CISC CPUs
that could use the condition code set by the decrement, or maybe
even use a decrement-and-branch instruction, as in:

for (...; --i >= 0; )

However, it also meets the OP's concerns about evaluating the end
value repeatedly. The lower limit is a constant rather more often
than the upper limit is. I think that makes this a good candidate
for the coding-style authorities to prefer, and FOR_EACH macros to
expand into. However, the way I wrote ir has a hidden trap for the
unwary, since it won't exit for unsigned types. That must be why
Greg used v[i - 1] and "> 0". I'd rather the style of the loop not
affect how the body is coded. I also don't care for "--i != (size_t)-1"
which I'm sure I've seen all over. What do you think of:

for (size_t i = v.size(); i > 0; )
{ --i;
...

Carlos Moreno

unread,

May 7, 2006, 8:43:49 AM5/7/06

to

Maciej Sobczak wrote:

>>>for (size_t i = 0; i != v.size(); ++i)
>>>{
>>> // do something with v[i]
>>>}
>>>
>>>The above is considered to be a bad practice, because it calls v.size()
>>>every iteration and sugggests that the size of the vector might change,
>>>which is actually not the intent.
>
>
>>To me, the above does not suggest that v.size() is going to change;
>>nor am I worried about efficiency issues because of calling the
>>function
>
>
> I'm not worried about performance *at all*, so all replies that took
> performance as a guideline (or dismissed performance as a guideline) are
> missing my point.
>
> The point is that the only reason to call some function twice is to
> express the fact that it might not return the same value twice.
> If you know that the function is going to return the same value, then
> calling it more than once is just sloppy. IMHO, of course.

We seem to disagree in there -- if you compute the square root twice,
do you expect to obtain a different result the second time?

The fact that if the function could return different values for
different calls means that *you must* call the function again does
not mean that if you call the function again is *because* the
result is going to change -- I know that this is not what you're
saying; my point is, I don't see a reason to *expect* that the
function could return a different size. Not in a simple and
straightforward situation like calling size() for a container.

Let's put it this way: if the setup line of a for loop is what
tips you about a container changing its contents, then there is
something severely wrong with that code (or with that coding
standard).

There *must* be much more important clues about that, since it
is an event of monumental importance (compared to calling v.size()
more than once -- such a trivial thing being the giveaway for
such a monumental event tells me that there's something wrong
with the code).

There's also the fact that if you "cache" the value of the size,
then I'd be really worried about the body of the loop, unless
the compiler enforced constness -- caching the value of size()
*relies* on the vector's size not changing; calling size()
many times work in all cases. Why would I try to avoid the
solution that works in all cases without sacrifice of efficiency
and arguably without sacrifice of readability?

Notice that the argument above (about worrying unless the
compiler enforces constness) could be countered by noticing
that the body of the loop should not be as long as to make it
unclear whether or not the vector changes -- well, then that
would be the same argument against being worried that calling
size() might indicate that the vector could change -- the
body of the loop should be simple enough to know if it changes.

Anyway, all that leaves us with the argument that usually the
only reason to avoid calling a function twice when it is known
that the function will return the same is efficiency. Sure,
if I already computed the arctan of the complex comjugate of
the square root of the sum of e to the power of a very
complicated expression, why would I waste time in computing
that result again?

Carlos
--

James Kanze

unread,

May 7, 2006, 8:48:27 AM5/7/06

to

benben wrote:
> Tomás wrote:
>> Maciej Sobczak posted:

>>> Consider this:

>>> for (size_t i = 0; i != v.size(); ++i)
>>> {
>>> // do something with v[i]
>>> }

>>> The above is considered to be a bad practice, because it
>>> calls v.size() every iteration and sugggests that the size
>>> of the vector might change, which is actually not the
>>> intent.

>> I agree that the code is inefficient.

> What makes you think the code is inefficient? A member
> function as simple as size() would almost definitely be
> inlined.

And even if it isn't, you'd have to be doing very little in the
loop for it to make a difference. But that could hardly be what
was meant -- without profiling, how can you know, and without
knowing what was to be done with v[i], you cannot meaningfully
profile.

What is obviously being refered to is programmer efficiency.
Not that of the person writing the code, since this is obviously
the most efficient solution for him, but of the person reading
it. Maciej's contention is that this loop misleads the reader
into thinking that something in the loop does modify the size of
the array. I don't agree with him; I use v.size() (or v.end(),
or whatever) a lot even when the vector's size won't be
modified, as do most of the people I've worked with, so we don't
read this into the call to v.size(). But apparently his
experience is different.

The real issue may be one of what it is important to say, when.
And how, of course. I tend to think that whether the size of
the vector gets modified is of relatively little importance.
And usually, I'm using iterators; changing the size would
invalidate the iterators, so the question is moot. And --
doubtlessly based on the code I've seen -- if the call to size()
is extracted from the loop, as a reader, I understand that this
loop has caused a performance problem. And I think this is a
rather common reading.

--
James Kanze kanze...@neuf.fr

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

James Kanze

unread,

May 7, 2006, 8:48:05 AM5/7/06

to

Maciej Sobczak wrote:

[...]

> The point is that the only reason to call some function twice
> is to express the fact that it might not return the same value
> twice.

I think this is where you're wrong. A function is just an
expression, like any other expression. You obviously don't go
out of your way to call a function more often than necessary,
any more than you would go out of your way to evaluate any
expression (including simply reading a variable) more often than
necessary. But unless the function is significant, the number
of times it gets called is not a parameter in the equation. The
rule is simple: keep it simple. If calling the function more
than once makes things simpler, then do it.

> If you know that the function is going to return the same
> value, then calling it more than once is just sloppy. IMHO, of
> course.

Writing a lot of extra code just to avoid calling it doesn't
help things either. As my high school English teacher pounded
into me: good writing is clear and concise. That goes for
writing code, too. Extra verbiage is just that -- extra
verbiage.

>>> The possible solution (and even stated as a rule in one
>>> coding standard) is:

>>> const size_t vSize = v.size();
>>> for (size_t i = 0; i != vSize; ++i)
>>> {
>>> // do something with v[i]
>>> }

>> Actually, it breaks a more important rule, IMHO -- it just
>> makes you wonder "what the hell is that and why is this guy
>> overcomplicating things when we should just put i < v.size()"

> Actually, it's a rule in one coding standard. Really. And it
> expresses the fact that the size is not going to change. The
> only problem is that it introduces in the outer scope a new
> name that is needed only for the purpose of looping.

Which isn't really a problem, either. If you really object to
the function call (or the profiler says that it is an issue),
this would seem to be the clearest and the concisest way of
writing it.

[...]

> Thanks to everybody for replying so far (and I hope it's not
> the end of the discussion), your responses confirmed me that
> the subject is not black and white.

Yes and no. Pete Becker really gave the definitive answer:-).

--
James Kanze kanze...@neuf.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

Tomás

unread,

May 7, 2006, 8:46:05 AM5/7/06

to

benben posted:

> Tomás wrote:
>> Maciej Sobczak posted:
>>
>>> Hi,
>>>
>>> Consider this:
>>>
>>> for (size_t i = 0; i != v.size(); ++i)
>>> {
>>> // do something with v[i]
>>> }
>>>
>>> The above is considered to be a bad practice, because it calls v.size()
>>> every iteration and sugggests that the size of the vector might change,
>>> which is actually not the intent.
>>
>>
>> I agree that the code is inefficient.
>
> What makes you think the code is inefficient? A member function as
> simple as size() would almost definitely be inlined.
>
> Ben

I'll give an example. Let's say we're programming for Windows, and that
we're going through the system registry value by value. Let's say that
there's a system function we can invoke which is called "GetAmountValues".
Let's say that this system function opens up and analyses the registry
files on the hard disk, and takes approximately 3.4 seconds to calculate
how many values there are.

Let's say that there are 432 543 values. If we wrote the following loop:

for (unsigned long i = 0; i != GetAmountValues(); ++i)
{
ChangeValue( i, "New Value" );
}

Then we would have an efficiency problem. If you're running a loop whose
condition doesn't change, then I'd always advocating storing the value in a
variable rather than constantly calling a function. Here's the new and
improved loop:

{
unsigned long amount = GetAmountValues();

for (unsigned i = 0; i != amount; ++i )
{
ChangeValue( i, "New Value" );
}

}

-Tomás

Dave Harris

unread,

May 8, 2006, 8:19:12 AM5/8/06

to

benh...@yahoo.com.au (benben) wrote (abridged):

> What makes you think the code is inefficient? A member function as
> simple as size() would almost definitely be inlined.

With my local implementation of std::vector, if 'v' is an instance
variable, it will be inlined as:

for (size_t i = 0; i != this->v._end - this->v._begin; ++i)

Although in some cases this can be optimised, there is enough
indirection
that usually it can't.

-- Dave Harris, Nottingham, UK.

Pete Becker

unread,

May 8, 2006, 8:19:44 AM5/8/06

to

Robert Mabee wrote:

>
> for (size_t i = v.size(); i > 0; )
> { --i;
> ...
>

It doesn't meet the original (implied) requirements, since it goes
through the elements in the opposite order from the original code. You
can certainly speculate that that's not an actual requirement, but
before rewriting working code you'd better be sure that your rewrite is
actually correct.

--

Pete Becker
Roundhouse Consulting, Ltd.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Dave Harris

unread,

May 8, 2006, 8:18:26 AM5/8/06

to

no....@no.spam.com (Maciej Sobczak) wrote (abridged):
> for (size_t i = 0; i != v.size(); ++i)

>
> The above is considered to be a bad practice, because it calls
> v.size()
> every iteration and sugggests that the size of the vector might
> change,
> which is actually not the intent.

The alternatives create an extra local variable. If the expression were
complex the variable might clarify the code, but v.size() is so simple
that the code is clearer without it. In my view, simplicity and clarity
trump your other concerns so the above form is preferred.

(Obviously one might use iterators instead, but the same issues arise
with
v.end().)

> const size_t vSize = v.size();

> for (size_t i = 0; i != vSize; ++i)

>
> The above, however, breaks another coding standard rule saying that
> variables used to control the loop should be declared in the scope of
> the loop without polluting the enclosing scope (that's what the
> scoping
> rules in for are about, right?).

I wouldn't do that unless it was simpler.

> for (size_t i = 0, vSize = v.size(); i != vSize; ++i)
>

> But this, in turn, breaks another coding standard rule saying that
> everything that could be const must be const

I don't have that "const" rule, and I would write code like that. In
fact
I think it is idiomatic to use a shorter name for vSize:

for (size_t i = 0, s = v.size(); i != s; ++i)

for (iterator i = v.begin(), e = v.end(); i != e; ++i)

Using "s" or "e" here is no worse than using "i", and better in that it
will not be mentioned again in the body of the loop. However, this
version
is more complex than the original. The original is better style, and I'd
only use these if I wanted to sacrifice some clarity for performance.

I don't have the "const" rule because that keyword adds clutter and is
usually not helpful in situations like these. It only because helpful
when
the code is unclear as to which variables vary and which don't, and
then a
better solution is to rewrite the code to make it clearer. (It's a shame
C++ does not make variables const by default, though, so we could have
const without the clutter.)

> { // artificial enclosing scope

>
> const size_t vSize = v.size();

> for (size_t i = 0; i != vSize; ++i)

>
> But it smells unpleasantly because of this additional pair of
> braces and
> the resulting indentation.

Yep. Yet more clutter which obscures the code.

> Conclusion [*] - there's no way to do it *right*.

Indeed. C++ is not a perfect language. However, I also think your style
guidelines are not helping you write clear C++ code. I know some people
require simple rules universally enforced, but I am not one of them.

-- Dave Harris, Nottingham, UK.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Pete Becker

unread,

May 8, 2006, 8:17:02 AM5/8/06

to

Tomás wrote:

>>
>> What makes you think the code is inefficient? A member function as
>> simple as size() would almost definitely be inlined.
>>
>> Ben
>
>
>
> I'll give an example. Let's say we're programming for Windows, and
> that
> we're going through the system registry value by value. Let's say that
> there's a system function we can invoke which is called
> "GetAmountValues".
> Let's say that this system function opens up and analyses the registry
> files on the hard disk, and takes approximately 3.4 seconds to
> calculate
> how many values there are.

In other words, a different problem might have different performance
issues and might therefore require a different solution. But this
discussion is about calling vector::size, which does not take 3.4

seconds to calculate how many values there are.

--

Pete Becker
Roundhouse Consulting, Ltd.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Bob Hairgrove

unread,

May 8, 2006, 8:43:30 AM5/8/06

to

On 6 May 2006 10:21:43 -0400, "kanze" <ka...@gabi-soft.fr> wrote:

> Bob Hairgrove wrote:
>
> [...]
>> One big problem is the fact that not all containers are
>> vectors, and calling size() is very inefficient for almost all
>> containers except for std::vector.
>
> Is it? His code in the loop said "do something with v[i]".
> What container supports v[i], and has an inefficient size()
> function.

Good point. I suppose they go hand in hand. My concerns here were
guided by what Scott Meyers says in "Effective STL" where he
recommends testing "if (v.empty())" instead of "if (v.size())" where v
can be a variable of just about any STL container type.

>> How about this:
>
>> for (size_t i=0, std::vector<whatever_v_is>::iterator it = v.begin();
>> it != v.end(); ++i, ++it) {
>> // etc.
>> }
>
>> I think this solves all the above problems nicely.
>
> But introduces a new one: it's not legal C++, and won't compile.

Oops ... so we declare i outside of the loop, then it should work (or
did you see anything else wrong with it?)

--
Bob Hairgrove
NoSpam...@Home.com

Greg Herlihy

unread,

May 8, 2006, 3:26:29 PM5/8/06

to

Robert Mabee wrote:
> Greg Herlihy wrote:
>> You left out the obvious solution:
>>
>> for (size_t i = v.size(); i > 0; --i)
>> {
>> // do something with v[i-1];
>> }
>
> I first saw this as a micro-optimization, especially on CISC CPUs
> that could use the condition code set by the decrement, or maybe
> even use a decrement-and-branch instruction, as in:
>
> for (...; --i >= 0; )
>
> However, it also meets the OP's concerns about evaluating the end
> value repeatedly. The lower limit is a constant rather more often
> than the upper limit is. I think that makes this a good candidate
> for the coding-style authorities to prefer, and FOR_EACH macros to
> expand into. However, the way I wrote ir has a hidden trap for the
> unwary, since it won't exit for unsigned types. That must be why
> Greg used v[i - 1] and "> 0". I'd rather the style of the loop not
> affect how the body is coded. I also don't care for "--i !=
> (size_t)-1"
> which I'm sure I've seen all over. What do you think of:
>
> for (size_t i = v.size(); i > 0; )
> { --i;
> ...

I agree that i-1 would be, at the very least, an error-prone way
represent the current index in the body of a loop. And as noted, the
i-1 index ties the loop's implementation to the operations of the
loop's body. Ideally, the two should remain separate.

One could make the observation that - although placing --i in the body
of the loop is certainly an improvement over an i-1 current index - the
goal of keeping the body and the loop apart has not been met. In fact
some of the loop's implementation now resides in the body.

Therefore I think your original idea for the loop is very close to
optimal. So why not revise it like so:

for (size_t i = v.size(); i-- > 0; )
{
// do something with v[i]

I can't see a problem with post-decrementing i. After all, if there
really is an issue with a postdecrement expression involving a built-in
integral type, then I cannot see when it would ever be appropriate to
use a postdecrement expression. And at that point the postfix ++ and --
operators should simply be retired from C++ altogether.

Greg

Andrei Alexandrescu (See Website For Email)

unread,

May 8, 2006, 3:21:23 PM5/8/06

to

James Kanze wrote:
> Maciej Sobczak wrote:
>
> [...]
>
>> The point is that the only reason to call some function twice
>> is to express the fact that it might not return the same value
>> twice.
>
>
> I think this is where you're wrong. A function is just an
> expression, like any other expression. You obviously don't go
> out of your way to call a function more often than necessary,
> any more than you would go out of your way to evaluate any
> expression (including simply reading a variable) more often than
> necessary. But unless the function is significant, the number
> of times it gets called is not a parameter in the equation. The
> rule is simple: keep it simple. If calling the function more
> than once makes things simpler, then do it.

The answer misses his point. Reading a constant is an expression, but
that's irrelevant. I could just as well answer that all programs are
composed of lexemes. A constant has a very clear meaning: it's a
snapshot of another expression and won't change. That has a different
semantics than invoking the function, which can (and in the example
given, is) be impure.

Besides, I think the reasoning "unless the function is significant"
isn't properly greased. I don't know which functions are significant and
which aren't; and however you define it, I'd rather have a method that
works just as well with "significant" functions and "insignificant"
functions alike, freeing my mind from figuring out which are which
through profiling, which might not be too good at figuring "the death of
a thousand blows" caused by incremental inefficiencies caused by a
pervasive style.

>>>> The possible solution (and even stated as a rule in one
>>>> coding standard) is:
>
>
>>>> const size_t vSize = v.size();
>>>> for (size_t i = 0; i != vSize; ++i)
>>>> {
>>>> // do something with v[i]
>>>> }
>
>
>>> Actually, it breaks a more important rule, IMHO -- it just
>>> makes you wonder "what the hell is that and why is this guy
>>> overcomplicating things when we should just put i < v.size()"

I never wondered about such code. A snapshot of the vector's size is
taken, and looped up to. I find it a clear statement of the assumption
that v's size is a loop-invariant, something that the terser version
fails to convey.

>> Thanks to everybody for replying so far (and I hope it's not
>> the end of the discussion), your responses confirmed me that
>> the subject is not black and white.
>
>
> Yes and no. Pete Becker really gave the definitive answer:-).

I disagree. As many simplistic answers, it misses most dimensions of the
question - in this case, a certain metaphysical restlessness concerning
an incumbent disharmony engendered by halcyon juxtaposition :o).

Andrei

Arndt Muehlenfeld

unread,

May 8, 2006, 3:39:26 PM5/8/06

to

Victor Bazarov wrote:

> Arndt Muehlenfeld wrote:
>> [..]
>> It is always funny to see people do "optimizations" like:
>>
>> for (int i = max_i; i > 0; --i)
>> for (int j = max_j; j > 0; ++j)
>
> Really? *PLUS-PLUS* j?

Oops, typo ;-)

>
>> ... some code ...
>>
>> "because it is faster" which today in most cases is not true,
>> when, for instance, the above encoded algorithm could be solved
>> in O(n log n) instead of O(n*n) time.
>
> Do tell more.
>

I just wanted to say, that I have encountered people trying to go great
length for low-level optimisation, while it would have been better to chose
another algorithm. And sometimes these "optimisations" do not achieve
anything but making the code more unreadable.

This probably does not apply to all situations, but in my opinion low-level
optimisation usually is the job of the compiler. I would try to concentrate
on correctness and efficiency of the algorithm.

Cheers,
Arndt

Mikael 'Zayenz' Lagerkvist

unread,

May 8, 2006, 3:44:42 PM5/8/06

to

Robert Mabee wrote:
What do you think of:
>
> for (size_t i = v.size(); i > 0; )
> { --i;
> ...

My suggestion would be

for (int i = v.size(); i--; ) { ... }

which has the nice properties that it is very short, and that all the
looping-logic is written in the loop-header. As for readability, it
works well if it is an idiom for the project.

Tomás

unread,

May 8, 2006, 3:49:25 PM5/8/06

to

> In other words, a different problem might have different performance
> issues and might therefore require a different solution. But this
> discussion is about calling vector::size, which does not take 3.4
> seconds to calculate how many values there are.

Agreed, but it's the principle that I'm trying to advocate.

Let's say that on a particular platform, it takes "size" 537 nanoseconds to
process, and let's assume that overall, the entire program will take an
extra 17 microseconds to execute.

You strike me as the kind of person who would say, "17 microseconds, that's
nothing! Don't bother fixing it". If you are of that disposition, you might
even be in the majority on a C++ newsgroup.

I however am in with the efficiency crowd; it makes programming more
enjoyable for me.

-Tomás

Maciej Sobczak

unread,

May 8, 2006, 3:41:18 PM5/8/06

to

James Kanze wrote:

> What is obviously being refered to is programmer efficiency.
> Not that of the person writing the code, since this is obviously
> the most efficient solution for him, but of the person reading
> it.

Exactly, this is the source of the discussion. Or, putting it another
way, how many precise things can we say about the code by just reading
it without resorting to larger-scale code analysis.

There are two things that provoked me to post my questions. First is
that there are really coding standards that force programmers making the
kind of distinctions I've tried to describe. Second is that there are
languages which actually make such distinctions more natural than C++.
Some say that those languages are more appropriate for safety-critical
domains, exactly becauese they don't blur the distinctions that may
prove important.

> Maciej's contention is that this loop misleads the reader
> into thinking that something in the loop does modify the size of
> the array. I don't agree with him; I use v.size() (or v.end(),
> or whatever) a lot even when the vector's size won't be
> modified, as do most of the people I've worked with, so we don't
> read this into the call to v.size().

Which is a very good argument. My experience with other programmers is
exactly the same - writing v.size() in the second caluse of the for loop
is a common and well understood idiom.
The ojective is, however, to build a better understanding of *why*
things are done the way they're done, even when it comes to things like
for loops (kind of Andrei's "improving at all levels" idea - note also
that respected authors spent pages in CUJ discussing how basic stuff
like for loops should be written, so I'm in a good company ;-) ).

Taking this all into account, the question is how to write code in the
way that is self-descriptive to the extent that cannot be possibly improved.
And even when it comes to such simple things I can see contradicting
recommendations. Some suggest calling v.size() every time, because it's
more defensive (always "works"). Others suggest to use for_each. And
it's already where we have contradiction, because for_each is *not*
defensive in the above sense - it evaluates the loop's limiting
conditions only once.

Anyway - if I have in my program two loops, one where the limiting
condition changes and one where it doesn't, I consider that they express
different designs and therefore I want them to *look* differently.
Calling v.size() all the time blurs this distinction and some
fundamental questions about the loop itself cannot be answered locally.
Whether this is a problem or not largerly depends on what is the target
audience of the code. There are cases where I would agree with Pete
Becker and just go on with other stuff, but there are also cases where I
wouldn't.
This is also why coding standards are not universal and very often are
selected on the per-project basis - but that's a different story.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

James Kanze

unread,

May 8, 2006, 3:47:17 PM5/8/06

to

Carlos Moreno wrote:
> Maciej Sobczak wrote:

[...]

>> The point is that the only reason to call some function twice
>> is to express the fact that it might not return the same
>> value twice. If you know that the function is going to
>> return the same value, then calling it more than once is just
>> sloppy. IMHO, of course.

> We seem to disagree in there -- if you compute the square root
> twice, do you expect to obtain a different result the second
> time?

Presumably, Maciej wouldn't call sqrt twice with the same
argument. Also, I think we should accept that he is a
reasonable person, and is only considering a limited locality; I
don't believe that he is suggesting that we should avoid calling
sqrt(2.0) twice in a program of a million or so lines -- simply
that in his opinion, within a single (small) function, it makes
more sense to call it once and cache the return value, rather
than to call it twice. I don't agree with him, but I'm not
going to try to force him into an exagerated position just to
win my argument.

> The fact that if the function could return different values
> for different calls means that *you must* call the function
> again does not mean that if you call the function again is
> *because* the result is going to change -- I know that this is
> not what you're saying; my point is, I don't see a reason to
> *expect* that the function could return a different size. Not
> in a simple and straightforward situation like calling size()
> for a container.

> Let's put it this way: if the setup line of a for loop is
> what tips you about a container changing its contents, then
> there is something severely wrong with that code (or with that
> coding standard).

Let's put it this way: when you're dealing with STL containers,
you simply have to know. And there can be lots of conflicting
signals: the fact that he is using indices, and not iterators,
for example, could in some contexts suggest that the container
is modified (in a way which would invalidate the iterators).

> There *must* be much more important clues about that, since it
> is an event of monumental importance (compared to calling
> v.size() more than once -- such a trivial thing being the
> giveaway for such a monumental event tells me that there's
> something wrong with the code).

> There's also the fact that if you "cache" the value of the
> size, then I'd be really worried about the body of the loop,
> unless the compiler enforced constness -- caching the value of
> size() *relies* on the vector's size not changing; calling
> size() many times work in all cases. Why would I try to avoid
> the solution that works in all cases without sacrifice of
> efficiency and arguably without sacrifice of readability?

So you don't use iterators. vector<>::iterator counts on the
vector's size not increasing.

I agree with you that leaving the call to size() in the for
isn't a sufficient enough signal that the value might change.
On the other hand, I don't really buy the argument about leaving
it there for reasons of robustness -- the same reasoning would
have us ban the use of iterators.

In the end, it is a style issue, and opinions may vary. I
prefer his first version because I find it more concise, and at
least as clear as, if not clearer than, the other alternatives.
He finds it less clear (I don't think you can argue the
conciseness), because it leaves open the question of whether the
size of the vector changes or not. I disagree, because I think
that as signals go, this one is too subtle, and also, because it
only works in a few, very special cases and because I imagine
that the information is generally fairly evedent anyway, if it
is needed. I might add that I don't think it is a generally
recognized signal -- but local coding standards could affect
that.

--
James Kanze kanze...@neuf.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

James Kanze

unread,

May 8, 2006, 3:46:56 PM5/8/06

to

Tomás wrote:
> benben posted:

>> Tomás wrote:
>>> Maciej Sobczak posted:

>>>> Consider this:

>>>> for (size_t i = 0; i != v.size(); ++i)
>>>> {
>>>> // do something with v[i]
>>>> }

>>>> The above is considered to be a bad practice, because it
>>>> calls v.size() every iteration and sugggests that the size
>>>> of the vector might change, which is actually not the
>>>> intent.

>>> I agree that the code is inefficient.
>> What makes you think the code is inefficient? A member
>> function as simple as size() would almost definitely be
>> inlined.

> I'll give an example. Let's say we're programming for Windows,

> and that we're going through the system registry value by
> value. Let's say that there's a system function we can invoke
> which is called "GetAmountValues". Let's say that this system
> function opens up and analyses the registry files on the hard
> disk, and takes approximately 3.4 seconds to calculate how
> many values there are.

> Let's say that there are 432 543 values. If we wrote the
> following loop:

> for (unsigned long i = 0; i != GetAmountValues(); ++i)
> {
> ChangeValue( i, "New Value" );
> }

> Then we would have an efficiency problem.

Maybe, maybe not. The profiler (or actually executing the
program) would tell. If the program is too slow, and the
profiler says that the calls to GetAmountValues() are the
bottleneck, then of course, you'd modify the code.

In the case of Windows or Unix API's, there's also an argument
that using a separate variable allows giving the value a
significant name -- the names in the API are almost always too
verbose (Windows) or too succinct (Unix) to be really
understandable:-).

Note that this argument holds anytime the expression is non
trivial. Where it breaks down is when the expression is a
trivial function with the same name as the variable.

> If you're running a loop whose condition doesn't change, then
> I'd always advocating storing the value in a variable rather
> than constantly calling a function. Here's the new and
> improved loop:

> {
> unsigned long amount = GetAmountValues();

> for (unsigned i = 0; i != amount; ++i )
> {
> ChangeValue( i, "New Value" );
> }
> }

To verbose. If the profiler says you have to, by all means.
But until then, it's just extra crud -- more work for the
author, and more work for whoever has to read his code.

--
James Kanze kanze...@neuf.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Maciej Sobczak

unread,

May 8, 2006, 3:42:31 PM5/8/06

to

Carlos Moreno wrote:

>>The point is that the only reason to call some function twice is to
>>express the fact that it might not return the same value twice.
>>If you know that the function is going to return the same value, then
>>calling it more than once is just sloppy. IMHO, of course.
>
> We seem to disagree in there -- if you compute the square root twice,
> do you expect to obtain a different result the second time?

No, you took it backwards. It should be: if I know that the square root
of something does not change, then I don't compute it twice.

Most of the time that's the question of refactoring:

double x = ...;
double y = 1 + sqrt(x);
double z = 3 * sqrt(x);

The above code is just sloppy. Better:

double x = ...;
const double sqrt_x = sqrt(x);
double y = 1 + sqrt_x;
double z = 3 * sqrt_x;

even if they give the same results. And that's *not* about performance.

Above, it's the source code repetition that makes you spot the
opportunity for some refactoring. This is not present directly in the
for loop, where v.size() is written only once (in the sense of source
code), but is present in the design and control flow. Calling v.size()
over and over again either shows that I'm *not sure* whether it's
constant in the given context or that I even want to explicitly express
that it's not.

> I know that this is not what you're
> saying; my point is, I don't see a reason to *expect* that the
> function could return a different size. Not in a simple and
> straightforward situation like calling size() for a container.

Which refers to the fact that it's an established idiom, well understood
by those who write and read it. That's perfectly OK. But I'm asking
whether this idiom itself is the optimal one.

> Anyway, all that leaves us with the argument that usually the
> only reason to avoid calling a function twice when it is known
> that the function will return the same is efficiency.

No. For me the reason is to translate your design and intentions into
source code in a way that does not leave absolutely any room for
misinterpretations. If you assume some fact (like the vector having
constant size), express this assumption in the code.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

benben

unread,

May 8, 2006, 3:45:28 PM5/8/06

to

> I'll give an example. Let's say we're programming for Windows, and that
> we're going through the system registry value by value. Let's say that
> there's a system function we can invoke which is called "GetAmountValues".
> Let's say that this system function opens up and analyses the registry
> files on the hard disk, and takes approximately 3.4 seconds to calculate
> how many values there are.
>
> Let's say that there are 432 543 values. If we wrote the following loop:
>
> for (unsigned long i = 0; i != GetAmountValues(); ++i)
> {
> ChangeValue( i, "New Value" );
> }
>
>
> Then we would have an efficiency problem. If you're running a loop whose
> condition doesn't change, then I'd always advocating storing the value in a
> variable rather than constantly calling a function. Here's the new and
> improved loop:
>
> {
> unsigned long amount = GetAmountValues();
>
> for (unsigned i = 0; i != amount; ++i )
> {
> ChangeValue( i, "New Value" );
> }
>
> }

I am sorry for any potential confusion i might have added. But I think
the OP made it very clear he was looping over a vector and the problem
was a style issue. I'm sure he was talking about std::vector because
that's the only vector topical here anyway.

What you just said, of course, still holds true. But that is a little
beyond style issues.

Regards,
Ben

Carlos Moreno

unread,

May 8, 2006, 3:50:32 PM5/8/06

to

Victor Bazarov wrote:
> Arndt Muehlenfeld wrote:
>
>>[..]
>>It is always funny to see people do "optimizations" like:
>>
>>for (int i = max_i; i > 0; --i)
>> for (int j = max_j; j > 0; ++j)
>
> Really? *PLUS-PLUS* j?

C'mon... Be a little nicer when nitpicking :-) (it is
clear that it was a typo/oversight -- go easy on the guy)

>>"because it is faster" which today in most cases is not true,
>>when, for instance, the above encoded algorithm could be solved
>>in O(n log n) instead of O(n*n) time.
>
> Do tell more.

I'm sure he was referring to the fact that, quite ironically,
lots of people often go so obsessively optimizing every single
line of code in their way, convinced that their code is now
so much faster (which *very often* is false), when the obvious
optimization would be, if anything, at the level of the
algorithm, and not at the level of individual lines of code.

The above could have been a bubble-sort (which for small
sets, I guess it is actually more efficient than most NlogN
sorting algorithm) being used when quick-sort or some other
more efficient algorithms would be available and could
represent a 100-fold improvement in speed (for that section
of the code), instead of -- *possibly*, not even surely --
a few microseconds saved because of the presumably faster
idiom of looping downwards.

Carlos
--

James Kanze

unread,

May 9, 2006, 8:16:29 AM5/9/06

to

Tomás wrote:

[...]

> I however am in with the efficiency crowd; it makes
> programming more enjoyable for me.

If you are programming for your own personal pleasure, this is a
killer argument. If you are paid by someone to program,
however, it is unacceptable -- you're being paid to be
productive.

--
James Kanze kanze...@neuf.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

James Kanze

unread,

May 9, 2006, 8:16:54 AM5/9/06

to

Bob Hairgrove wrote:
> On 6 May 2006 10:21:43 -0400, "kanze" <ka...@gabi-soft.fr> wrote:

>> Bob Hairgrove wrote:
>>
>> [...]
>>> One big problem is the fact that not all containers are
>>> vectors, and calling size() is very inefficient for almost
>>> all containers except for std::vector.

>> Is it? His code in the loop said "do something with v[i]".
>> What container supports v[i], and has an inefficient size()
>> function.

> Good point. I suppose they go hand in hand. My concerns here
> were guided by what Scott Meyers says in "Effective STL" where
> he recommends testing "if (v.empty())" instead of "if
> (v.size())" where v can be a variable of just about any STL
> container type.

Well, a perhaps more telling argument for using v.empty()
instead of v.size() != 0 is that it says more explicitly what
you are testing for.

>>> How about this:
>>> for (size_t i=0, std::vector<whatever_v_is>::iterator it = v.begin();
>>> it != v.end(); ++i, ++it) {
>>> // etc.
>>> }
>>> I think this solves all the above problems nicely.

>> But introduces a new one: it's not legal C++, and won't
>> compile.

> Oops ... so we declare i outside of the loop, then it should
> work (or did you see anything else wrong with it?)

Why i, and not it? For that matter, why use two variables, when
one will do the trick?

--
James Kanze kanze...@neuf.fr
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France +33 (0)1 30 23 00 34

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Jerry Coffin

unread,

May 9, 2006, 8:22:30 AM5/9/06

to

In article <e3n6mp$36s$1...@emma.aioe.org>,
kanze...@neuf.fr says...

[ ... ]

> > If you're running a loop whose condition doesn't change, then
> > I'd always advocating storing the value in a variable rather
> > than constantly calling a function. Here's the new and
> > improved loop:
>
> > {
> > unsigned long amount = GetAmountValues();
>
> > for (unsigned i = 0; i != amount; ++i )
> > {
> > ChangeValue( i, "New Value" );
> > }
> > }
>
> To verbose. If the profiler says you have to, by all means.
> But until then, it's just extra crud -- more work for the
> author, and more work for whoever has to read his code.

IMO, this goes beyond just being too verbose. Minmizing
the scope of a variable has two benefits. The first is to
minimize the possibility of collision between
identifiers. The second is to make the scope of its use
apparent and easy to ascertain.

If your loop is in such a large scope that one variable
outside the loop instead of inside it makes a substantial
difference in either respect, then what's really needed
almost certainly isn't an extra scope here, but a serious
look at the problems in the surrounding code to fix
whatever's leading to the problem in the first place.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Tomás

unread,

May 9, 2006, 8:28:03 AM5/9/06

to

James Kanze posted:

> > {
> > unsigned long amount = GetAmountValues();
>
> > for (unsigned i = 0; i != amount; ++i )
> > {
> > ChangeValue( i, "New Value" );
> > }
> > }
>

> Too verbose. If the profiler says you have to, by all means.

> But until then, it's just extra crud -- more work for the
> author, and more work for whoever has to read his code.

We'll have to agree to disagree, because I find the code to be quite nice.

-Tomás

We'll have

Andrei Polushin

unread,

May 9, 2006, 8:32:23 AM5/9/06

to

Maciej Sobczak wrote:
> The ultimate solution might be this:
>
> { // artificial enclosing scope

>
> const size_t vSize = v.size();

> for (size_t i = 0; i != vSize; ++i)
> {
> // do something with v[i]

> }
> }
>
> But it smells unpleasantly because of this additional pair of braces
> and the resulting indentation.

>From the crazy point of view, the code above might be restated with
"storage class", something like "static":

for (size_t i = 0; ; ++i)
{
static const size_t vSize = v.size(); // initialized once
if (i == vSize) {
break;
}
// do something with v[i]
}

The next is to eliminate the name of the local variable, leaving the
storage class keyword in place (from now on, the code is not C++):

for (size_t i = 0; i != (static v.size()); ++i)
{
// do something with v[i]
}

There might be another keyword, not "static", to state that the
lifetime of the temporary variable is loop-wide. Probably, "auto":

for (size_t i = 0; i != (auto v.size()); ++i)
{
// do something with v[i]
}

The "auto" keyword above means "the result of v.size() is stored in
the local variable with the lifetime of the loop itself".

Note the parenthneses are optional, so the syntax is quite nice:

for (size_t i = 0; i != auto v.size(); ++i)
{
// do something with v[i]
}

Reviewing it again, we may discover that the anonymous syntax is the
terse form of "definitions in expressions", where the definition should
return the result for evaluation (see D&E 3.11.5.2 for discussion):

for (size_t i = 0; i != (size_t vSize = v.size()); ++i)
{
// do something with v[i]
}

> Conclusion [*] - there's no way to do it *right*.

That is the direct way to make it right-doable, isn't it?

--
Andrei Polushin

kanze

unread,

May 9, 2006, 4:02:06 PM5/9/06

to

Maciej Sobczak wrote:
> James Kanze wrote:

> > What is obviously being refered to is programmer efficiency.
> > Not that of the person writing the code, since this is
> > obviously the most efficient solution for him, but of the
> > person reading it.

> Exactly, this is the source of the discussion. Or, putting it
> another way, how many precise things can we say about the code
> by just reading it without resorting to larger-scale code
> analysis.

I keep referring back to what my high school English teacher
said: "Good writing is clear and concise." The more you say
with the less words, the more the code is concise. At some
point, however, the cost in clarity outweighs the additional
conciseness. Up until there, I think most of us would agree.
Trying to find a concensus as to where that last point is,
however, is going to be difficult.

> There are two things that provoked me to post my questions.
> First is that there are really coding standards that force
> programmers making the kind of distinctions I've tried to
> describe.

Nothing new there. It's not rare that there is no really good
solution, and you have to choose the least bad.

> Second is that there are languages which actually make such
> distinctions more natural than C++. Some say that those
> languages are more appropriate for safety-critical domains,
> exactly becauese they don't blur the distinctions that may
> prove important.

> > Maciej's contention is that this loop misleads the reader
> > into thinking that something in the loop does modify the
> > size of the array. I don't agree with him; I use v.size()
> > (or v.end(), or whatever) a lot even when the vector's size
> > won't be modified, as do most of the people I've worked
> > with, so we don't read this into the call to v.size().

> Which is a very good argument. My experience with other
> programmers is exactly the same - writing v.size() in the
> second caluse of the for loop is a common and well understood
> idiom. The ojective is, however, to build a better
> understanding of *why* things are done the way they're done,
> even when it comes to things like for loops (kind of Andrei's
> "improving at all levels" idea - note also that respected
> authors spent pages in CUJ discussing how basic stuff like for
> loops should be written, so I'm in a good company ;-) ).

Well, CUJ had to have something to put into its articles:-).

Seriously: if you're actually writing code, Pete had the only
correct answer. Stopping to even think about it costs
productivity, and may result in delays in the delivery of the
product. But of course, you're not actually writing code when
you post an article in this forum, and some meta-discussion
concerning what should be in the programming guidelines is
always necessary, in any project.

My personal opinion is that you're taking things a little too
far; that it isn't that important, that the additional
information you want to impart isn't important enough (in the
general case, anyway) to add complexity to the written form of
the program. Nor to invest the time necessary to ensure that
all of the programmers on the project are aware of the
convention, and profit from it -- if most of the readers of your
code see the altnerances as random variations, rather than a
signal "size changes or not", then all you've done is add noise.

> Taking this all into account, the question is how to write
> code in the way that is self-descriptive to the extent that
> cannot be possibly improved. And even when it comes to such
> simple things I can see contradicting recommendations. Some
> suggest calling v.size() every time, because it's more
> defensive (always "works"). Others suggest to use for_each.
> And it's already where we have contradiction, because for_each
> is *not* defensive in the above sense - it evaluates the
> loop's limiting conditions only once.

But for_each says that you don't modify the size... or do
anything else which might invalidate its iterators. That's part
of the message of using for_each -- the loop is executed exactly
n times, where n is determined before entering the loop. In
fact, that's its only message: that there are no break's, no
continue's and no other fiddling around which will change this.

I think that this is generally known and recognized by
programmers familiar with the STL. That is: the message will be
understood immediately, and I don't need any particular
"training" for the programmers for it to pass.

> Anyway - if I have in my program two loops, one where the
> limiting condition changes and one where it doesn't, I
> consider that they express different designs and therefore I
> want them to *look* differently.

I can understand that, although I'm not sure that everyone
considers it so important. There is a logical difference
between a loop which executes exactly n time, n being determined
once and for all before the start of the loop, and more general
loops. The questions are 1) how much does it cost (in
conciseness) to impart this information by means of the loop
structure, and 2) how much extra work, if any, is necessary to
ensure that the information will be understood.

> Calling v.size() all the time blurs this distinction

Only if you suppose that the position of the call is distinctive
in this regard. I would generally expect the loop not to change
the size of the vector, regardless of where the call is. But it
occurs to me that I have at least a couple of exceptions --
cases where I switched from iterators to an index because I was
doing push_back's (and invalidating the iterators) in the loop.
So I don't know.

In the end, I think that the question only really becomes
relevant when you know why the loop is there -- what its role in
the algorithm is. And once you know that, you probably do know
whether the vector changes size or not, independantly of where
the call to v.size() is.

--
James Kanze GABI Software

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Dennett

unread,

May 9, 2006, 4:06:46 PM5/9/06

to

Tomás wrote:
>> In other words, a different problem might have different performance
>> issues and might therefore require a different solution. But this
>> discussion is about calling vector::size, which does not take 3.4
>> seconds to calculate how many values there are.
>
> Agreed, but it's the principle that I'm trying to advocate.
>
> Let's say that on a particular platform, it takes "size" 537 nanoseconds to
> process, and let's assume that overall, the entire program will take an
> extra 17 microseconds to execute.
>
> You strike me as the kind of person who would say, "17 microseconds, that's
> nothing! Don't bother fixing it". If you are of that disposition, you might
> even be in the majority on a C++ newsgroup.
>
> I however am in with the efficiency crowd; it makes programming more
> enjoyable for me.

Efficiency comes in many forms; I think Pete Becker is
acting here as an advocate for another efficiency crowd,
one which supports the least expenditure (of time, of
money, etc.) to build a program which meets its
requirements.

I suspect Pete's also spent more time shaving cycles
than many people, as some of his code is rather widely
deployed and one effective requirement for it would
often be that customers not whine too much about
performance problems. But optimizing when the business
needs it is not the same as spending time optimizing
for the sake of intellectual challenge. Sometimes the
challenge of shaving microseconds can be stimulating,
and sometimes it's what a business needs, but it's
good to be clear about why you're doing it, and whether
it's the appropriate thing to do in the circumstances.

-- James

Francis Glassborow

unread,

May 9, 2006, 7:15:58 PM5/9/06

to

In article <e3n6mp$36s$1...@emma.aioe.org>, James Kanze
<kanze...@neuf.fr> writes

> > If you're running a loop whose condition doesn't change, then
> > I'd always advocating storing the value in a variable rather
> > than constantly calling a function. Here's the new and
> > improved loop:
>
> > {
> > unsigned long amount = GetAmountValues();
>
> > for (unsigned i = 0; i != amount; ++i )
> > {
> > ChangeValue( i, "New Value" );
> > }
> > }
>
>To verbose. If the profiler says you have to, by all means.
>But until then, it's just extra crud -- more work for the
>author, and more work for whoever has to read his code.

And if the programmer is convinced that they want to go that way they
should declare amount to be const. However, often the compiler can work
out that it can hoist the evaluation of GetAmountValues() outside the
loop.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Francis Glassborow

unread,

May 9, 2006, 7:16:41 PM5/9/06

to

In article <445ef977$0$10578$3b21...@aconews.univie.ac.at>, Arndt
Muehlenfeld <mueh...@sbox.tugraz.at> writes

>I just wanted to say, that I have encountered people trying to go great
>length for low-level optimisation, while it would have been better to chose
>another algorithm. And sometimes these "optimisations" do not achieve
>anything but making the code more unreadable.

Actually they sometimes achieve pessimisation:-) And I can remember one
piece of code where the programmer had optimised everything to such an
extent that he had to put a delay loop in to slow the program down for
the benefit of the human users :-)

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Carlos Moreno

unread,

May 9, 2006, 7:26:17 PM5/9/06

to

Maciej Sobczak wrote:

> There are two things that provoked me to post my questions. First is
> that there are really coding standards that force programmers making the
> kind of distinctions I've tried to describe. Second is that there are
> languages which actually make such distinctions more natural than C++.
> Some say that those languages are more appropriate for safety-critical
> domains, exactly becauese they don't blur the distinctions that may
> prove important.

I guess one of my big question marks over your reasons to analyze
this issue is: why would I want to make the distinction of whether
or not the vector changes size? The idiom "for ( .... )" tells me
that I want to go through each element, and end the loop after I
processed the *last one* -- if the size changes, I still want the
loop to end after I processed the last one.

The above argument, in a sense, goes with the "defensive" argument
you talk about.

> Taking this all into account, the question is how to write code in the
> way that is self-descriptive to the extent that cannot be possibly improved.

Call an STL or STL-like algorithm -- when you see something like
count_if (v.begin(), v.end(), ...), you get the best of *all* worlds;
the call tells you that the size will not change (it strongly suggests
it, at the very least, even if it doesn't enforce it); it tells you
about the intent to count things (as opposed to something else, and
without the need to resort to high-level algorithm analysis to see
what's going on); and last, but not least, you do not evaluate
v.end() or v.size() with each pass of the loop. Talk about best
of all worlds.

> And even when it comes to such simple things I can see contradicting
> recommendations. Some suggest calling v.size() every time, because it's
> more defensive (always "works"). Others suggest to use for_each. And
> it's already where we have contradiction, because for_each is *not*
> defensive in the above sense - it evaluates the loop's limiting
> conditions only once.

It is defensive, in that it almost guarantees that the range will
not change (unless the programmer does it almost intentionally to
make things wrong)

> Anyway - if I have in my program two loops, one where the limiting
> condition changes and one where it doesn't, I consider that they express
> different designs and therefore I want them to *look* differently.

I would go as far as to suggest that if that is the case, then a
for loop is *not* the right tool for it.

Conceptually, the for loop is something that tells you, up front
and directly, how many times the loop will execute. If the vector's
size can change, then we do not know how many times it will execute
from the header of the loop, and hence it should probably be a
while loop.

Anyway, how many *reasonable* examples do you have of well-written
code with a loop written like "for (int i = 0; i < v.size(); i++)"
where the vector's size changes?

> Calling v.size() all the time blurs this distinction and some
> fundamental questions about the loop itself cannot be answered locally.
> Whether this is a problem or not largerly depends on what is the target
> audience of the code. There are cases where I would agree with Pete
> Becker and just go on with other stuff, but there are also cases where I
> wouldn't.

I disagree with the first half of his point (when he says that
spending time on these kinds of things is not adviseable). But
I do agree with the second part, that *for this* particular case,
it's been settled enough as to accept that either way is "good
enough" and that perhaps your first form is more than good enough
as to move on and spend time on more critical things.

Carlos
--

Andrei Alexandrescu (See Website For Email)

unread,

May 9, 2006, 7:27:39 PM5/9/06

to

Tomás wrote:
> James Kanze posted:
>
>
>> > {
>> > unsigned long amount = GetAmountValues();
>>
>> > for (unsigned i = 0; i != amount; ++i )
>> > {
>> > ChangeValue( i, "New Value" );
>> > }
>> > }
>>
>>Too verbose. If the profiler says you have to, by all means.
>>But until then, it's just extra crud -- more work for the
>>author, and more work for whoever has to read his code.
>
>
>
> We'll have to agree to disagree, because I find the code to be quite nice.

If only amount were const...

Andrei

Francis Glassborow

unread,

May 9, 2006, 7:23:17 PM5/9/06

to

In article <xOH7g.8862$j7.3...@news.indigo.ie>, Tomás <NU...@NULL.NULL>
writes

>You strike me as the kind of person who would say, "17 microseconds, that's
>nothing! Don't bother fixing it". If you are of that disposition, you might
>even be in the majority on a C++ newsgroup.

Yes, 17 micro-seconds is pretty small beer compared with the time it
would take to 'fix' it. Worse still, often the apparent fixes actually
take longer (programmers are often working on high end machines with
oodles of resources but the product will be used on a much more
constricted machine where code size can adversely affect speed -- think
cache and paging)

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Tomás

unread,

May 9, 2006, 7:28:00 PM5/9/06

to

James Kanze posted:

> Tomás wrote:
>
> [...]
> > I however am in with the efficiency crowd; it makes programming more
> > enjoyable for me.
>
> If you are programming for your own personal pleasure, this is a
> killer argument. If you are paid by someone to program,
> however, it is unacceptable -- you're being paid to be
> productive.

Agreed :)

Just yesterday I was running profiling to see if:

typedef numeric_limits<unsigned long> numL;

for (unsigned long i = 0; i < numL.max(); ++i) { ; }

was faster/slower than:

for (unsigned long i = 0; i != numL.max(); ++i) { ; }

-Tomás

Pete Becker

unread,

May 9, 2006, 7:25:45 PM5/9/06

to

Tomás wrote:

>>In other words, a different problem might have different performance
>>issues and might therefore require a different solution. But this
>>discussion is about calling vector::size, which does not take 3.4
>>seconds to calculate how many values there are.
>
>
> Agreed, but it's the principle that I'm trying to advocate.
>
> Let's say that on a particular platform, it takes "size" 537 nanoseconds to
> process, and let's assume that overall, the entire program will take an
> extra 17 microseconds to execute.
>
> You strike me as the kind of person who would say, "17 microseconds, that's
> nothing! Don't bother fixing it".

There isn't enough information here to make that judgment.

>
> I however am in with the efficiency crowd; it makes programming more
> enjoyable for me.
>

Efficiency of what? If you miss a project deadline because you were busy
fine tuning code that already meets its speed requirements, nobody's
going to care that you were making it "more efficient."

--

Pete Becker
Roundhouse Consulting, Ltd.

Walter Bright

unread,

May 10, 2006, 8:38:44 AM5/10/06

to

Andrei Alexandrescu (See Website For Email) wrote:
> Andrew Koenig wrote:
>> for (v_iter_type it = v.begin(); it != v.end(); ++it) {
>> // do something with *it
>> }
>>
>> I prefer this form because it doesn't use v's random-access ability when it
>> doesn't need to do so.
>
> As it turns out, I had to shy away from this style and use indices in my
> code as of late. Current compilers and processor guidelines all advice
> preferring indices to pointers (especially when calling functions)
> because the compiler has a lot less work to do to figure out there's no
> aliasing. It's kinda sad - I do like the iterator style.

It's interesting that what makes for the fastest way to write a for loop
varies depending on the:

1) compiler

2) CPU

3) register pressure

etc. I'd see people go to considerable effort tuning this, only to have
the result become suboptimal because of some other factor. It's one of
the motivations to properly support a foreach construct:

foreach (e; v)
{
... do something with e ...
}

instead of:

for (int i = 0; i < v.size(); i++)

{ T e = v[i];
... do something with e ...
}

or the tedious and typo-prone:

for (v_iter_type it = v.begin(); it != v.end(); ++it)
{ v_value_type e = *it;
... do something with e ...
}

After all, we let the compiler do register allocation because such is
non-portable and nearly everyone gets it wrong anyway. The same
reasoning applies to loops. Let the compiler author, who presumably
knows what he's doing, pick the right way to implement the loop.

foreach, of course, is implemented this way in the D programming
language. After using it for a while, it's clearly a big win over the
old way of doing loops.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

Carlos Moreno

unread,

May 10, 2006, 8:34:25 AM5/10/06

to

Maciej Sobczak wrote:
> Carlos Moreno wrote:
>
>
>>>The point is that the only reason to call some function twice is to
>>>express the fact that it might not return the same value twice.
>>>If you know that the function is going to return the same value, then
>>>calling it more than once is just sloppy. IMHO, of course.
>>
>>We seem to disagree in there -- if you compute the square root twice,
>>do you expect to obtain a different result the second time?
>
> No, you took it backwards. It should be: if I know that the square root
> of something does not change, then I don't compute it twice.

We still disagree -- I actually made a slight mistake in my phrasing
above: I meant to say "the square root of 2 twice"

My complaint aboout what you say is that you're now going beyond the
obvious semantic, and coding with your mind spending effort on what's
going on behind the scenes.

If I need the square root of two in my program, I think the least
sloppy thing is to just say that -- say *exactly* what you want;
you want the square root of 2. Granted, this is not an argument
that goes directly against your alternative: you can name your
variable square_root_of_2 for that matter, and the code is *also*
saying exactly what you want.

But both options transmitting equally the intent of the code, the
difference is only the programmer being concerned with efficiency
issues. Unless it is obvious (and I mean, immediately obvious for
anyone with no requirement more than merely having a clue) that
the option of computing it twice is excessively expensive, then
I wouldn't be excessively compelled to "cache" the result of the
function.

My point is: when I need the square root of two, I shouldn't be
concerned with it being a function, being computed right there,
inlined and optimized away by the compiler, etc. I just code it
the straightforward way (which is writing sqrt(2) twice), and
worry about it later if I need to.

Again, I'm not strongly opposing to your argument of writing it
the other way, in that the "optimization" is extremely
straightforward and anyone would understand it. What I'm
opposing to is your claim that not doing so (not "caching" the
value) is sloppy. I strongly disagree with that part.

> Most of the time that's the question of refactoring:
>
> double x = ...;
> double y = 1 + sqrt(x);
> double z = 3 * sqrt(x);
>
> The above code is just sloppy. Better:
>
> double x = ...;
> const double sqrt_x = sqrt(x);
> double y = 1 + sqrt_x;
> double z = 3 * sqrt_x;

There is one extra line of code... So, I'd argue: better?
Really?

> Which refers to the fact that it's an established idiom, well understood
> by those who write and read it. That's perfectly OK. But I'm asking
> whether this idiom itself is the optimal one.

How about this new construct:

for <const v> (int i = 0; i < v.size(); i++)

The const v modifier in a for loop guarantees and enforces
constness (including absence of aliases?) of the given
expression (the container, in this case), and thus allows
the optimizer to do one call to v.size() and cache the
result.

That would make me happy and I would adopt it. In the mean
time, the "standard" way to write it (your option #1) is
more than good enough for my taste, pickyness included.

Carlos
--

Tomás

unread,

May 10, 2006, 8:45:26 AM5/10/06

to

>> I however am in with the efficiency crowd; it makes programming more
>> enjoyable for me.
>>
>
> Efficiency of what? If you miss a project deadline because you were
> busy fine tuning code that already meets its speed requirements,
> nobody's going to care that you were making it "more efficient."

I don't get paid to program, and I don't have a boss -- programming is a
hobby of mine; therefore, I can take as much time as I like to fine-tune
my code. Not only that, but I enjoy the time spent fine-tuning the code,
so it's a win-win situation.

-Tomás

Vidar Hasfjord

unread,

May 10, 2006, 8:46:49 AM5/10/06

to

> Maciej Sobczak wrote:
> for (size_t i = 0, vSize = v.size(); i != vSize; ++i)

> {
> // do something with v[i]
> }

> But this, in turn, breaks another coding standard rule saying that
> everything that could be const must be const (above, vSize cannot be
> const, because it's declared together with i as non-const size_t).

As you say the limitation here is that the C++ for-loop allows only one
declaration (though multi-object) in its initialization part. But this
language limitation is easily overcome, at least in this instance, by
defining and using an aggregate. For example:

for (index i (0, v.size ()); i.in_range (); i.next ())
{
// do something with v [i.value ()]
}

Here the index constructor takes two constants defining the half-open
range. Members provide test, access and advance. All very clear and
concise. If the member functions used above feel too verbose an
implicit conversion to size_t and operators can be defined for
convenience, of course.

With this looping construct being so common, the index class could be
candidate for inclusion in a library and a coding standard. It can be
parameterized and improved upon, I'm sure.

Regards,
---
Vidar Hasfjord

Pete Becker

unread,

May 10, 2006, 8:46:07 AM5/10/06

to

Francis Glassborow wrote:
> In article <xOH7g.8862$j7.3...@news.indigo.ie>, Tomás <NU...@NULL.NULL>
> writes
>
>>You strike me as the kind of person who would say, "17 microseconds, that's
>>nothing! Don't bother fixing it". If you are of that disposition, you might
>>even be in the majority on a C++ newsgroup.
>
>
> Yes, 17 micro-seconds is pretty small beer compared with the time it
> would take to 'fix' it.

Unless the requirement is something like 15 microseconds response time.
There simply is not enough information here to make that decision.

--

Pete Becker
Roundhouse Consulting, Ltd.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Dave Harris

unread,

May 10, 2006, 8:40:40 AM5/10/06

to

no....@no.spam.com (Maciej Sobczak) wrote (abridged):

> The above code is just sloppy. Better:
>
> double x = ...;
> const double sqrt_x = sqrt(x);
> double y = 1 + sqrt_x;
> double z = 3 * sqrt_x;
>
> even if they give the same results. And that's *not* about performance.

That's debatable. The old code was more self-contained. The line:

double z = 3 * sqrt(x);

does not require us to search for the definition of another variable
before we know what it means. It was easier to refactor, eg to reorder the
code or even to move z into a different function, because it didn't have
the spurious dependency. We didn't have to worry about updating the
variable if the value of x changed. We didn't have to update the type of
the variable if the type of x changed. The old code was shorter and
simpler.

-- Dave Harris, Nottingham, UK.

Tomás

unread,

May 10, 2006, 8:45:05 AM5/10/06

to

Andrei Alexandrescu (See Website For Email) posted:

> Tomás wrote:
<snip>

>>> > unsigned long amount = GetAmountValues();

<snip>

>> We'll have to agree to disagree, because I find the code to be quite
>> nice.
>
> If only amount were const...

You're right; I didn't catch that before I posted.

I'm usually quite particular with my use of "const"; for instance when I'm
writing a function, I'll make all of its parameters non-const, e.g.:

bool SomeFunc( int i, unsigned c, char* d )
{
/* Do some stuff */

return true;
}

and then when I'm finished, I'll read back over the function body, and if
I didn't alter values, I'll stick "const" in the parameter list, e.g.:

bool SomeFunc( int const i, unsigned c, const char* d )
{
c = 7;

++d;

return true;
}

-Tomás

Carlos Moreno

unread,

May 10, 2006, 8:40:19 AM5/10/06

to

Tomás wrote:

>> > {
>> > unsigned long amount = GetAmountValues();
>>
>> > for (unsigned i = 0; i != amount; ++i )
>> > {
>> > ChangeValue( i, "New Value" );
>> > }
>> > }
>>
>>Too verbose. If the profiler says you have to, by all means.
>>But until then, it's just extra crud -- more work for the
>>author, and more work for whoever has to read his code.
>
> We'll have to agree to disagree, because I find the code to be quite nice.

I have to question your taste! :-)

For one, there's a const missing -- which is kind of shocking;
someone is willing to go so much out of their way to enforce
tight locality of the data, and you do nothing to enfore/flag
constness?

Carlos
--

Bob Hairgrove

unread,

May 10, 2006, 8:42:20 AM5/10/06

to

On 9 May 2006 08:16:54 -0400, James Kanze <kanze...@neuf.fr> wrote:

>Why i, and not it? For that matter, why use two variables, when
>one will do the trick?

I am simply assuming that the OP needs to have the index, for whatever
reason ... e.g., when you might need to keep a list and a vector in
synch?

--
Bob Hairgrove
NoSpam...@Home.com

Andrei Alexandrescu (See Website For Email)

unread,

May 10, 2006, 4:52:41 PM5/10/06

to

kanze wrote:
> Maciej Sobczak wrote:
>
>>James Kanze wrote:
>
>
>>>What is obviously being refered to is programmer efficiency.
>>>Not that of the person writing the code, since this is
>>>obviously the most efficient solution for him, but of the
>>>person reading it.
>
>
>>Exactly, this is the source of the discussion. Or, putting it
>>another way, how many precise things can we say about the code
>>by just reading it without resorting to larger-scale code
>>analysis.
>
>
> I keep referring back to what my high school English teacher
> said: "Good writing is clear and concise."

You may want to stop referring to that. The advice matches, but by
chance. Prose writing (write-once-read-many) has a different charter
than code writing (write-many-read-many). Prose should also respect many
rules that code shouldn't, and vice versa.

>>There are two things that provoked me to post my questions.
>>First is that there are really coding standards that force
>>programmers making the kind of distinctions I've tried to
>>describe.
>
>
> Nothing new there. It's not rare that there is no really good
> solution, and you have to choose the least bad.

It is intriguing that a good solution does not contradict any "law".
That is the interesting point.

>>Which is a very good argument. My experience with other
>>programmers is exactly the same - writing v.size() in the
>>second caluse of the for loop is a common and well understood
>>idiom. The ojective is, however, to build a better
>>understanding of *why* things are done the way they're done,
>>even when it comes to things like for loops (kind of Andrei's
>>"improving at all levels" idea - note also that respected
>>authors spent pages in CUJ discussing how basic stuff like for
>>loops should be written, so I'm in a good company ;-) ).
>
>
> Well, CUJ had to have something to put into its articles:-).

Apparently not well enough since they're now no more.

> Seriously: if you're actually writing code, Pete had the only
> correct answer. Stopping to even think about it costs
> productivity, and may result in delays in the delivery of the
> product. But of course, you're not actually writing code when
> you post an article in this forum, and some meta-discussion
> concerning what should be in the programming guidelines is
> always necessary, in any project.

The paragraph above is contentless. "You shouldn't chat on the phone
while driving. But now you're not driving. So you can chat on the
phone." Obviously the OP meant to devise one clear solution that then
can be applied without fuss. To me, such things are very productive. I
acquire some idiom (sometimes as simple as a naming convention) and then
I don't need to stop a few seconds every so often to rethink things on a
per-case basis. Furthermore, I don't like Damocles' sword over me: "Is
this function important? What is important? Oh, wait, I shouldn't care
about that, I'm supposed to spend some unbounded amount of time later to
profile the code so I clean up the inefficiencies that I am
systematically entering now."

Second, I believe it's good to have an inclination towards thinking of
such small issues. As I mention in other posts, generally programmers
who don't care about that are sloppy at all levels. I conjecture that
the vast majority of programmers I've ever worked with would improve a
lot if only they only found the motivation to put some order in the way
they write code. Posts like yours demotivate such people and reinforces
their decision to dismiss such things as rubbish, when in fact small
coding issues are avatars of a larger sense of order.

> But for_each says that you don't modify the size... or do
> anything else which might invalidate its iterators. That's part
> of the message of using for_each -- the loop is executed exactly
> n times, where n is determined before entering the loop. In
> fact, that's its only message: that there are no break's, no
> continue's and no other fiddling around which will change this.
>
> I think that this is generally known and recognized by
> programmers familiar with the STL. That is: the message will be
> understood immediately, and I don't need any particular
> "training" for the programmers for it to pass.

Yet the for_each loop is severely limited in what it can effect. That it
evaluates its boundaries only once is about the only good thing about
for_each.

Andrei

Maciej Sobczak

unread,

May 10, 2006, 4:53:55 PM5/10/06

to

Carlos Moreno wrote:

> Call an STL or STL-like algorithm -- when you see something like
> count_if (v.begin(), v.end(), ...), you get the best of *all* worlds;
> the call tells you that the size will not change (it strongly suggests
> it, at the very least, even if it doesn't enforce it); it tells you
> about the intent to count things (as opposed to something else, and
> without the need to resort to high-level algorithm analysis to see
> what's going on); and last, but not least, you do not evaluate
> v.end() or v.size() with each pass of the loop. Talk about best
> of all worlds.

Of course, this is good approach. At least in those cases where the
"doing" part of the loop can be provided in a way that does not qualify
for the obfuscated C++ contest. :)

> Conceptually, the for loop is something that tells you, up front
> and directly, how many times the loop will execute. If the vector's
> size can change, then we do not know how many times it will execute
> from the header of the loop, and hence it should probably be a
> while loop.

This is reasonable. In fact, it could be part of yet another coding
standard. :)

> Anyway, how many *reasonable* examples do you have of well-written
> code with a loop written like "for (int i = 0; i < v.size(); i++)"
> where the vector's size changes?

I have it in one of my projects. The body of the loop is extremely
simple, just one-liner that calls the member function of the object
pointed by pointer stored in the vector:

for (...)
{
v[i]->foo();
}

The fun is that the referenced objects are polymorphic and the function
being called is virtual. This means that the actual work that is
performed is not stated locally (in the source code sense) and actually
one of the derived types adds more "job elements" to the vector by means
of push-back. The result is that the vector *can* change its size and
it's not immediately clear from the body of the loop. I have therefore
expressed this fact by calling v.size() every time. Contrary to this,
other loops express different (more strict) intents and are coded
differently (in more strict way).

So - this is not a pure theoretical stuff, at least in this project.

In the sibling post James wrote:

> In the end, I think that the question only really becomes
> relevant when you know why the loop is there -- what its role in
> the algorithm is. And once you know that, you probably do know
> whether the vector changes size or not, independantly of where
> the call to v.size() is.

This is very important point. On the other hand, just knowing what is
the purpose of some function/object/etc. in the program does not relieve
me from the obligation to give it a name that best communicates its
purpose. Function named "f" might not be the best choice even if
everybody in the room knows what this function does - similarly with
control constructs and other parts of the program: the more they
communicate, the easier they are to interpret, by both programmers and
automated tools.
And in the context of automated tools: note that making the idiomatic
distinction between different forms of loops makes them easier for
static analysis. The fact that programmers use a lot of knowledge that
is not formalised in the code and is only "flying in the air" does not
make it any better for the static analysers.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

Ivan Vecerina

unread,

May 10, 2006, 4:54:54 PM5/10/06

to

"Francis Glassborow" <fra...@robinton.demon.co.uk> wrote in message

news:9h3AmuES...@robinton.demon.co.uk...

: In article <e3n6mp$36s$1...@emma.aioe.org>, James Kanze

: <kanze...@neuf.fr> writes

: > > If you're running a loop whose condition doesn't change, then

: > > I'd always advocating storing the value in a variable rather

: > > than constantly calling a function. Here's the new and

: > > improved loop:

: >

: > > {

: > > unsigned long amount = GetAmountValues();

: >

: > > for (unsigned i = 0; i != amount; ++i )

: > > {

: > > ChangeValue( i, "New Value" );

: > > }

: >

: >To verbose. If the profiler says you have to, by all means.

: >But until then, it's just extra crud -- more work for the

: >author, and more work for whoever has to read his code.

:

: And if the programmer is convinced that they want to go that way they

: should declare amount to be const. However, often the compiler can work

: out that it can hoist the evaluation of GetAmountValues() outside the

: loop.

I'm afraid that this compiler optimization is actually a pipe dream.

In practice this optimization won't be possible if:

- GetAmountValues() ( or contnr.size() ) is not an inlined function,

because it will then need to be called in case it has side-effects.

- More dramatically, if the loop calls *any* non-inlined function,

the compiler has to assume that this non-inlined function might

change the result of GetAmountValues() ( or contnr.size() ).

And that, even if the container is locally seen as const.

[ there can be exceptions to this for local vars, but they are rare ].

So maybe this will happen in the age of whole program optimizers and

compiled code refactorers, but we're not very far down that road...

Regards,

Ivan

--

http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form

Ivan Vecerina

unread,

May 10, 2006, 4:57:29 PM5/10/06

to

"James Dennett" <jden...@cox.net> wrote in message

news:HSU7g.46058$iU2.3599@fed1read01...

: Tomás wrote:

[...]

: > I however am in with the efficiency crowd; it makes programming more

: > enjoyable for me.

:

: Efficiency comes in many forms; I think Pete Becker is

: acting here as an advocate for another efficiency crowd,

: one which supports the least expenditure (of time, of

: money, etc.) to build a program which meets its

: requirements.

:

: I suspect Pete's also spent more time shaving cycles

: than many people, as some of his code is rather widely

: deployed and one effective requirement for it would

: often be that customers not whine too much about

: performance problems.

[...]

This took me to somewhat of a digression:

Last month I was tuning an image processing step in a vision-guided

manufacturing platform I am working on. That step was taking up to

12 seconds. Happens in a background thread, but still might delay

the human operator(s) who supervise the process. They care about

time in manufacturing. So this code was worth taking a look into

with a profiler...

As it turns out, almost half of that time was spent in a call to...

std::vector<pixel*>::push_back()

Damn, I thought I had called reserve(), and I was swap-ing and

reusing my vector instances that store "rings" of processed pixels.

Turns out that, in this library implementation, vector::push_back()

was in the best case performing a dozen of function calls (not

including debug-mode iterator checking, or situations where

a reallocation would be needed).

How I wished that the most common path of vector::push_back had

been optimized... I believe that something like the following

would have been safe and standards-conformant:

void push_back(const _T& _Val)

{

if( _Mylast < _Myend ) { // 'end()' has not reached end of alloc

new (_Mylast) _T(_Val); // construct the new element in place

++ _Mylast;

return;

}

// ...else handle the general case with reallocation...

}

Well, no luck for me this time. This library implementation is

written very cleanly by calling functions like size(), capacity(),

uninitialized_fill(), etc. Very consistent coding conventions,

good for quality and maintenance even by non-experts... but more

expensive at runtime -- noticeably so, for me at least.

Of course, this is a specific case. And things wouldn't be as bad

in an optimized build with inlining -- I happen to release my app

compiled in debug mode for safety reasons, and only few of its

modules are performance-critical.

I can't blame the writer(s) of this library. It is a high quality

implementation, the most popular one on the most "popular" OS.

And I find it a much more serious issue when the library does not

behave as it should and has real bugs (which happens too btw).

Yet it is annoying that I can't just "fix" the library code

(well, I could, but you don't want to mess with the release

version of a commercial product).

I ended up writing a lightweight re-implementation of std::vector.

It reduced the execution time of this processing step by >70%.

Several seconds that will matter to human operators thousands

and thousands of times ( I hope they'll appreciate my few

extra hours of effort ;) ).

I understand the constraints of commercial library writers,

and why they are (very reasonably) more likely to be in

a certain kind of "performance" crowd.

The scenario I described might be quite specific.

And correctness matters more than execution speed.

But when you write library code that will be used by thousands,

someone will end up wishing that you had put more care into

squeezing out performance out of some key functions.

Especially the most common path of a key function

like vector::push_back() ...

Sometimes the line between being productively passionate

and being unprofessional is SO thin... But I'm happy that

I sometimes still find myself having to watch my step.

Cheers,

Ivan

--

http://ivan.vecerina.com/contact/?subject=NG_POST <- email contact form

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Carl Barron

unread,

May 10, 2006, 5:25:27 PM5/10/06

to

In article <1147221534.4...@u72g2000cwu.googlegroups.com>,
Vidar Hasfjord <vattila...@yahoo.co.uk> wrote:

> As you say the limitation here is that the C++ for-loop allows only one
> declaration (though multi-object) in its initialization part. But this
> language limitation is easily overcome, at least in this instance, by
> defining and using an aggregate. For example:

if the variables are of the same type then something like
`for(int i(0),max(10);i!=max;++i) { /* */}
is legal and intializes two ints in the for loop without adding
complexity of member function calls.

Pete Becker

unread,

May 10, 2006, 5:21:41 PM5/10/06

to

Tomás wrote:
>
> I don't get paid to program, and I don't have a boss -- programming is a
> hobby of mine; therefore, I can take as much time as I like to fine-tune
> my code. Not only that, but I enjoy the time spent fine-tuning the code,
> so it's a win-win situation.
>

Nothing wrong with that. But in this sort of discussion, everyone has to
be aware that various people have different goals, and those goals
determine, in part, what the best approach is. There is no universal
best way of doing something.

--

Pete Becker
Roundhouse Consulting, Ltd.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Andrei Alexandrescu (See Website For Email)

unread,

May 10, 2006, 5:27:03 PM5/10/06

to

Walter Bright wrote:
> It's one of
> the motivations to properly support a foreach construct:
>
> foreach (e; v)
> {
> ... do something with e ...
> }

[...]

> foreach, of course, is implemented this way in the D programming
> language. After using it for a while, it's clearly a big win over the
> old way of doing loops.

Yah, on the other hand D doesn't have const, which makes it impossible
to express any concept of a loop invariant properly.

Testis unus, testis nullus :oD.

Andrei

Vidar Hasfjord

unread,

May 11, 2006, 1:48:55 PM5/11/06

to

Carl Barron wrote:
> if the variables are of the same type then something like
> `for(int i(0),max(10);i!=max;++i) { /* */}
> is legal and intializes two ints in the for loop without adding
> complexity of member function calls.

The original poster included and expressed displeasure about this very
option. He wants a construct that clearly expresses that the end value
(max) is a constant. I was responding to that, and for that the only
solution is an aggregate, unless you accept declarations outside the
initializer of the for-loop.

Personally, I don't use "index". I use any and all of the idioms
presented by the original poster without dwelling to much on it.
Instead I look forward to C++0x with "foreach" --- as well as concepts
(which will allow containers as arguments to algorithms, i.e. no more
of the "begin" and "end" clutter when you want to process the whole
container) and lamdas (for use with for_each and other concise
iterative constructs). I imagine the ideal loop for iterating the
indexes of a container would be:

foreach (size_t i : index_range (v)) //...

Regards,
---
Vidar Hasfjord

Walter Bright

unread,

May 11, 2006, 1:50:04 PM5/11/06

to

Andrei Alexandrescu (See Website For Email) wrote:
> Second, I believe it's good to have an inclination towards thinking of
> such small issues. As I mention in other posts, generally programmers
> who don't care about that are sloppy at all levels. I conjecture that
> the vast majority of programmers I've ever worked with would improve a
> lot if only they only found the motivation to put some order in the
> way
> they write code. Posts like yours demotivate such people and
> reinforces
> their decision to dismiss such things as rubbish, when in fact small
> coding issues are avatars of a larger sense of order.

That agrees with my experience. I can't recall ever seeing any code
where the design was sloppy but the low level coding was great, or the
low level coding was sloppy and the design great.

It's also true that disorganized, sloppy prose goes along with
disorganized, sloppy design and disorganized, sloppy low level coding
practices.

I'll reserve judgment, however, on whether or not a sloppy desk and
sloppy office has any correlation with coding quality <g>.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Walter Bright

unread,

May 11, 2006, 2:02:28 PM5/11/06

to

Andrei Alexandrescu (See Website For Email) wrote:

> Walter Bright wrote:
>> It's one of
>> the motivations to properly support a foreach construct:
>>
>> foreach (e; v)
>> {
>> ... do something with e ...
>> }
> [...]
>> foreach, of course, is implemented this way in the D programming
>> language. After using it for a while, it's clearly a big win over the
>> old way of doing loops.
>
> Yah, on the other hand D doesn't have const, which makes it impossible
> to express any concept of a loop invariant properly.

Const (as implemented in C++) is quite useless for expressing loop
invariants (I know, I've implemented it, and had to back it out). The
trouble is that pointer aliasing, multiple threads, const casting, and
mutable all conspire to undermine any useful loop invariant conclusions
based on 'const'.

The way loop invariants are done in a decent C++ optimizer is to ignore
const, and instead collect information about where expression values are
changed, where they may be changed, and where they are guaranteed not to
change based on the most lax interpretation of the language rules
possible.

For example, given the C++ loop:
const size_t *pend = ...;
for (size_t i = 0; i < *pend; i++)
...
can the optimizer cache *pend as a loop invariant? Nope - even if it's
const. You'll have to cache manually:
const size_t end = *pend;
for (size_t i = 0; i < end; i++)
...
Does the 'const' here help? Nope. Conventional data flow analysis will
tell us if 'end' can change within the loop or not, so const adds
nothing.

The pointer aliasing problem is very significant, as it's the barrier to
doing Fortran-style loop optimizations. Const clearly adds nothing to
this, otherwise there never would have been a 'noalias' and a 'restrict'
type modifier.

The D foreach semantics require that v (the aggregate) be loop
invariant, although the contents of the aggregate can change. This means
that, for example, an array can't change size or location during the
foreach loop, a hash map can't be rehashed, etc. That enables the
compiler to automatically cache information about the aggregate and
treat it as loop invariant.

To say something is 'loop invariant' is to say its value does not change
during the loop, no exceptions. C++ const offers nothing resembling that
guarantee, and so is not useful in determining loop invariants.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Francis Glassborow

unread,

May 11, 2006, 7:16:50 PM5/11/06

to

In article <137ec$44618bc7$55da16d7$50...@news.hispeed.ch>, Ivan Vecerina
<INVALID_u...@ivan.vecerina.com> writes

>So maybe this will happen in the age of whole program optimizers and
>
>compiled code refactorers, but we're not very far down that road...
>
>

I think we are further than you think. Visual Studio tries to delay code
generation till link time. At that stage there is often enough
information to do hoisting out of loops.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Maciej Sobczak

unread,

May 11, 2006, 7:10:30 PM5/11/06

to

Carlos Moreno wrote:

>>No, you took it backwards. It should be: if I know that the square root
>>of something does not change, then I don't compute it twice.
>
> We still disagree -- I actually made a slight mistake in my phrasing
> above: I meant to say "the square root of 2 twice"
>
> My complaint aboout what you say is that you're now going beyond the
> obvious semantic, and coding with your mind spending effort on what's
> going on behind the scenes.
>
> If I need the square root of two in my program, I think the least
> sloppy thing is to just say that -- say *exactly* what you want;

> you want the square root of 2. [...]

Sure. The problem is that programming differs from school math in *how*
things are made up. In math, you can write arbitrary expression and
treat it as kind of "literal" that just has the value you want.
log(sqrt(pi) + sqrt(2 * e))? Fine, it is as good on paper as any other
"literal", just like 1 or 1/2. In programming, though (unless you use
some substitution-based system like Mathematica), what you write is a
"recipe" for something and this fact stresses the "doing" factor, not
present in math on paper. In the case of square root of 2, irrational
numbers are not part of C++ and therefore there is no possibility to
write them as literals. You have to *make* them.

I agree that expressing the domain concepts in their most natural way is
a good idea. Just as changing programming languages, if the one you're
using currently does not fit the given domain perfectly. In the case of
square root of 2 in C++, it's the flow chart and stepping in the
debugger that reveal what's really going on.
So - I'm not convinced, at least not in C++. :)

> My point is: when I need the square root of two, I shouldn't be
> concerned with it being a function, being computed right there,
> inlined and optimized away by the compiler, etc.

It's a valid argument and I understand it. But I try not to forget about
the "medium" which is used for communicating ideas in code.

>>double x = ...;
>>double y = 1 + sqrt(x);
>>double z = 3 * sqrt(x);
>>
>>The above code is just sloppy. Better:
>>
>>double x = ...;
>>const double sqrt_x = sqrt(x);
>>double y = 1 + sqrt_x;
>>double z = 3 * sqrt_x;
>
> There is one extra line of code...

There is one less function call. :)

> How about this new construct:
>
> for <const v> (int i = 0; i < v.size(); i++)

Easy.

v[i] = 7; // bang! v is supposed to be const

> The const v modifier in a for loop guarantees and enforces
> constness

And it's not what is needed. It's the *size* (or, more generally, the
loop's limiting condition) that is supposed to be const, not the whole
vector.
You would need to express it this way:

for <const v.size()> (int i = 0; i != v.size(); ++i)

and I ask you: in what way is the above better than my

const size_t vSize = v.size();
for (int i = 0; i != vSize; ++i)

? :)

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

kanze

unread,

May 11, 2006, 7:13:10 PM5/11/06

to

Andrei Alexandrescu (See Website For Email) wrote:

> kanze wrote:
> > Maciej Sobczak wrote:

> >>James Kanze wrote:

> >>>What is obviously being refered to is programmer efficiency.
> >>>Not that of the person writing the code, since this is
> >>>obviously the most efficient solution for him, but of the
> >>>person reading it.

> >>Exactly, this is the source of the discussion. Or, putting it
> >>another way, how many precise things can we say about the code
> >>by just reading it without resorting to larger-scale code
> >>analysis.

> > I keep referring back to what my high school English teacher
> > said: "Good writing is clear and concise."

> You may want to stop referring to that. The advice matches,
> but by chance. Prose writing (write-once-read-many) has a
> different charter than code writing (write-many-read-many).
> Prose should also respect many rules that code shouldn't, and
> vice versa.

I'm not sure. The analogy obviously isn't 100% -- there are
differences. On the other hand, she said "good writing", not
"good prose", and the differences are more along the lines of
the differences between prose fiction, poetry, and a technical
treatise. A C++ program obviously has different constraints
than a novel, but so does a poem. Are the differences greater?
I'm not sure about it.

I'd also disagree with the write-many-read-many categorization,
but I think that may be just a simplification on your part.
Write-few-read-many would be closer to correct, but even then...
there is a difference between being the original author, and
making a modification. And a lot of prose gets reworked by
people other than the original author as well. (The program
documentation, for starters:-). Still, I agree with what I
think was your import: that we should write code with regards to
making it easy to rework or to modify, and that such issues are
rarely present when writing normal English text. Note however
that in order to modify code, we must first understand it --
which brings us back to "good writing". This aspect of code may
introduce additional constraints, but it doesn't remove any that
are present in writing.

[...]

> > But for_each says that you don't modify the size... or do
> > anything else which might invalidate its iterators. That's
> > part of the message of using for_each -- the loop is
> > executed exactly n times, where n is determined before
> > entering the loop. In fact, that's its only message: that
> > there are no break's, no continue's and no other fiddling
> > around which will change this.

> > I think that this is generally known and recognized by
> > programmers familiar with the STL. That is: the message
> > will be understood immediately, and I don't need any
> > particular "training" for the programmers for it to pass.

> Yet the for_each loop is severely limited in what it can
> effect. That it evaluates its boundaries only once is about
> the only good thing about for_each.

It's limitations are in fact the good thing. When you see
for_each, you know that what is being done conforms to these
limitations -- for_each is not find_if, for example. It's value
is not in the function itself -- it is in its opposition with
other functions (and explicit loops).

If, of course, your programming style always visits all of the
elements in a sequence, for_each doesn't tell you anything more
than a simple for loop would. But I don't think that this is
your case:-).

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kanze

unread,

May 11, 2006, 7:14:15 PM5/11/06

to

Carlos Moreno wrote:
> Maciej Sobczak wrote:

[...]
> > Anyway - if I have in my program two loops, one where the
> > limiting condition changes and one where it doesn't, I
> > consider that they express different designs and therefore I
> > want them to *look* differently.

> I would go as far as to suggest that if that is the case, then
> a for loop is *not* the right tool for it.

> Conceptually, the for loop is something that tells you, up
> front and directly, how many times the loop will execute. If
> the vector's size can change, then we do not know how many
> times it will execute from the header of the loop, and hence
> it should probably be a while loop.

That probably depends on the language. What you say certainly
holds for Fortran and Pascal. My impression is that it is not
part of the usual conventions for C and C++. Think of the
number of times the for loop is terminated by a sentinal value,
for example:
for ( Node* p = start ; p != NULL ; p = p->next ) ...
for ( char const* s = str ; *s != '\0' ; ++ s ) ...
I would say that both of the above are very idiomatic C, and
also C++, although we use them less often in C++ because we also
have other tools. I would also consider something like:
for ( int i = 0 ; i < sizeV && someCondition( v[i] ) ; ++ i) {
}
as the "standard" implementation of a linear search in C.

Note that in the first two cases above, the end doesn't "move"
(can't think of a good word for what I want to say), even if it
isn't evaluated at the start of the loop. Still, I actually
have something along the lines of:

for ( Transition state = 0 ; state < myTable.size() ; ++ state ) {
DFAState& stateDesc = *myTable[ state ] ;
for ( unsigned c = 0 ; c < charCount ; ++ c ) {
stateDesc.transitionTable[ c ] = nextState( stateDesc, c )
;
}
}

in my regular expression parser -- and newState does just that:
it creates a new DFA state, and inserts it into myTable (which
causes myTable.size() to return a different value). I don't see
how using while instead of for would make this any clearer.

> Anyway, how many *reasonable* examples do you have of

> well-written code with a loop written like "for (int i = 0; i

> < v.size(); i++)" where the vector's size changes?

I find the above example reasonable:-). That's one.

Carlos Moreno

unread,

May 11, 2006, 7:26:00 PM5/11/06

to

Pete Becker wrote:
> Tomás wrote:
>
>>I don't get paid to program, and I don't have a boss -- programming is a
>>hobby of mine; therefore, I can take as much time as I like to fine-tune
>>my code. Not only that, but I enjoy the time spent fine-tuning the code,
>>so it's a win-win situation.
>
> Nothing wrong with that. But in this sort of discussion, everyone has to
> be aware that various people have different goals, and those goals
> determine, in part, what the best approach is. There is no universal
> best way of doing something.

Just keeping in mind that in virtually 100% of the cases, when working
with code for a real application, that type of argument in favor of
efficiency ends up being wrong and making the development process
inefficient *for no good reason*.

Not to mention the fact that for anyone but the true gurus that know
the internals of many compilers and many CPUs on many operating
systems, the "obvious" optimization at the coding level is not really
an optimization, and often even ends up being an "unoptimization".
(not sure if that's the right word). In other words, maybe even
doing it for fun, it's not really a win-win ... It could be a
win in the part of fun, and a draw in the other part, since the
overall efficiency of the application was not affected in the
least. Or a win-lose, if the presumed optimization ended up being
an unoptimization. Of course, maybe part of the fun was about
finding out that the effect was the opposite than one expected.

Carlos
--

vande...@gmail.com

unread,

May 11, 2006, 7:44:42 PM5/11/06

to

Walter Bright wrote:
[...]

> Const (as implemented in C++) is quite useless for expressing loop
> invariants (I know, I've implemented it, and had to back it out). The
> trouble is that pointer aliasing, multiple threads, const casting, and
> mutable all conspire to undermine any useful loop invariant conclusions
> based on 'const'.
>
> The way loop invariants are done in a decent C++ optimizer is to ignore
> const, and instead collect information about where expression values are
> changed, where they may be changed, and where they are guaranteed not to
> change based on the most lax interpretation of the language rules
> possible.
>
> For example, given the C++ loop:
> const size_t *pend = ...;
> for (size_t i = 0; i < *pend; i++)
> ...
> can the optimizer cache *pend as a loop invariant? Nope - even if it's
> const. You'll have to cache manually:
> const size_t end = *pend;
> for (size_t i = 0; i < end; i++)
> ...
> Does the 'const' here help? Nope. Conventional data flow analysis will
> tell us if 'end' can change within the loop or not, so const adds
> nothing.

Note quite. If you happened to pass "end" by reference (to const) in
the
body of that last loop, your const-ignoring optimizer would have to go
to
a lot more trouble to determine that "end" hasn't changed (if indeed it

were able to do so at all). An optimizer that takes the const
annotation
into consideration would trivially know that it can assume the
invariance of
"end".

I do agree that "const" is not very strong in C++, and as a type
qualifier it's
usually more trouble than it's worth IMO (in part because it's not
exactly
what you want to overload for; rvalue references will mitigate that
problem).
However, as a "read-only storage" annotation it could be useful.

Daveed

Victor Bazarov

unread,

May 11, 2006, 7:43:08 PM5/11/06

to

Vidar Hasfjord wrote:
> Carl Barron wrote:
>> if the variables are of the same type then something like
>> `for(int i(0),max(10);i!=max;++i) { /* */}
>> is legal and intializes two ints in the for loop without adding
>> complexity of member function calls.
>
> The original poster included and expressed displeasure about this very
> option. He wants a construct that clearly expresses that the end value
> (max) is a constant. I was responding to that, and for that the only
> solution is an aggregate, unless you accept declarations outside the
> initializer of the for-loop.
>
> Personally, I don't use "index". I use any and all of the idioms
> presented by the original poster without dwelling to much on it.
> Instead I look forward to C++0x with "foreach" --- as well as concepts
> (which will allow containers as arguments to algorithms, i.e. no more
> of the "begin" and "end" clutter when you want to process the whole
> container) and lamdas (for use with for_each and other concise
> iterative constructs). I imagine the ideal loop for iterating the
> indexes of a container would be:
>
> foreach (size_t i : index_range (v)) //...

template<class T> class iterate {
T i;
const T max;
public:
iterate(T i, T max) : i(i), max(i) {}
bool not_yet() const { return i < max; }
void operator++() { ++i; }
T var() const { return i; }
T invar() const { return max; }
// or even
operator T() const { return i; }
};
...
for (iterate<int> v(0,10); v.not_yet(); ++v) {
... v.var() ...
// or even
... [v] ...
}

:-)))

[Disclaimer: code untested]

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

Andrei Alexandrescu (See Website For Email)

unread,

May 11, 2006, 7:54:21 PM5/11/06

to

Walter Bright wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>
>>Walter Bright wrote:
>>
>>>It's one of
>>>the motivations to properly support a foreach construct:
>>>
>>> foreach (e; v)
>>> {
>>> ... do something with e ...
>>> }
>>
>>[...]
>>
>>>foreach, of course, is implemented this way in the D programming
>>>language. After using it for a while, it's clearly a big win over the
>>>old way of doing loops.
>>
>>Yah, on the other hand D doesn't have const, which makes it impossible
>>to express any concept of a loop invariant properly.
>
>
> Const (as implemented in C++) is quite useless for expressing loop
> invariants (I know, I've implemented it, and had to back it out).

[snip]

That's known fact. But I wasn't comparing D with C++, but instead with
what it aims to be, a _better_ C++. So it naturally would embed a better
const :o).

Andrei

Walter Bright

unread,

May 11, 2006, 11:18:32 PM5/11/06

to

Andrei Alexandrescu (See Website For Email) wrote:
> Walter Bright wrote:
>> Const (as implemented in C++) is quite useless for expressing loop
>> invariants (I know, I've implemented it, and had to back it out).

> That's known fact. But I wasn't comparing D with C++, but instead with
> what it aims to be, a _better_ C++.

But isn't that, by definition, comparing it? <g>

> So it naturally would embed a better const :o).

My post about foreach was in the context of loop idioms, loop
invariants, and loop techniques, and I don't believe that C++ const
plays a useful role in such (as explained), so C++ isn't better in that
regard. In fact, it's arguable that const as a type modifier may be
worse than no const at all, since nearly all I talk to about it are
surprised to find out that const offers no guarantee of constancy. It
leads people into a false sense of how their programs work.

That said, D will probably get some notion of const. But it will be one
that works, i.e. can be relied upon by both the programmer and the
optimizer.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Walter Bright

unread,

May 11, 2006, 11:24:48 PM5/11/06

to

Consider:

int x;
const int& end = x;

Now, inside the loop, I change x. const reference end's value has now
changed, even though it's supposedly "const". An optimizer that assumed
end didn't change is going to break.

You could say "don't write loops like that", and I agree, but I can't
implement an optimizer like that. People can legally write such code
(and they do!), and the optimizer had better not break it.

> I do agree that "const" is not very strong in C++, and as a type
> qualifier it's
> usually more trouble than it's worth IMO (in part because it's not
> exactly
> what you want to overload for; rvalue references will mitigate that
> problem).
> However, as a "read-only storage" annotation it could be useful.

I should add that C++ actually has two different meanings for const -
one as a type modifier, the other as a storage class. The storage class
one is actually constant, and can be put in read-only storage. The type
modifier meaning, though, cannot be, as it isn't constant.

D does have const, but as a storage class, not as a type modifier.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

P.S. I've never understood why one would want to overload const and
non-const functions with otherwise the same argument types. (Setting
aside template type traits tricks for the moment.)

Vidar Hasfjord

unread,

May 12, 2006, 7:58:33 AM5/12/06

to

Thank for the implementation, but I'm not too keen on the naming.

Now use "size_t" as the default template type, and add a constructor
that accepts a container:

template <class C>
explicit index (const C& c) : i (0), max (c.size ()) {}

And we can shorten the OP's original loop header to:

for (index i (v); i.good (); ++i)

Now imagine the reverse loop:

for (reverse_index i (v); i.good (); ++i)

and we've eliminated common off-by-one errors.

OP wrote:
> there's no way to do it *right*.

Still? (:-))

vande...@gmail.com

unread,

May 12, 2006, 8:20:23 PM5/12/06

to

Walter Bright wrote:
> vande...@gmail.com wrote:
> > Walter Bright wrote:
> >> Nope - even if it's
> >> const. You'll have to cache manually:
> >> const size_t end = *pend;
> >> for (size_t i = 0; i < end; i++)
> >> ...
> >> Does the 'const' here help? Nope. Conventional data flow analysis
> >> will
> >> tell us if 'end' can change within the loop or not, so const adds
> >> nothing.
> >
> > Note quite. If you happened to pass "end" by reference (to const)
> > in the body of that last loop, your const-ignoring optimizer would
> > have to go to a lot more trouble to determine that "end" hasn't
> > changed (if indeed it were able to do so at all). An optimizer that
> > takes the const annotation into consideration would trivially know
> > that it can assume the invariance of "end".
>
> Consider:
>
> int x;
> const int& end = x;
>
> Now, inside the loop, I change x. const reference end's value has now
> changed, even though it's supposedly "const". An optimizer that assumed
> end didn't change is going to break.

I'm not sure why you bring that example up: It's unrelated to the
one you mentioned earlier and which I was commenting on.
In this latter example, the const is not top level, and hence it is
indeed not useful to an optimizer.

> You could say "don't write loops like that", and I agree, but I can't
> implement an optimizer like that. People can legally write such code
> (and they do!), and the optimizer had better not break it.
>
> > I do agree that "const" is not very strong in C++, and as a type
> > qualifier it's
> > usually more trouble than it's worth IMO (in part because it's not
> > exactly
> > what you want to overload for; rvalue references will mitigate that
> > problem).
> > However, as a "read-only storage" annotation it could be useful.
>
> I should add that C++ actually has two different meanings for const -
> one as a type modifier, the other as a storage class.

It's technically only one meaning (with two grammatical locations,
although both location can be useful for optimization purposes):
A type qualifier. From an optimization perspective, it's only useful
when appearing at the top level of a variable type. (That's probably
what you mean by the "storage class meaning", but the language
is consistent in treating is as a type qualifier.)

> The storage class
> one is actually constant, and can be put in read-only storage. The type
> modifier meaning, though, cannot be, as it isn't constant.
>
> D does have const, but as a storage class, not as a type modifier.

>From an optimization perspective that makes sense.

> P.S. I've never understood why one would want to overload const and
> non-const functions with otherwise the same argument types. (Setting
> aside template type traits tricks for the moment.)

I understood it back when I made the same mistake ;-)

Daveed

Andrei Alexandrescu (See Website For Email)

unread,

May 13, 2006, 7:15:19 AM5/13/06

to

{Moderator, Please watch the heat. Were I not certain that neither party
would take offense I might have rejected this for a little too much.
-mod/fwg}

kanze wrote:
> Still, I agree with what I
> think was your import: that we should write code with regards to
> making it easy to rework or to modify, and that such issues are
> rarely present when writing normal English text. Note however
> that in order to modify code, we must first understand it --
> which brings us back to "good writing". This aspect of code may
> introduce additional constraints, but it doesn't remove any that
> are present in writing.

Ehm. Yah, that was my "import". The discussion could become quite long,
and you'll beat me to it as usual because I lack the resources to reply
to your posts :o). But IMHO code writing must fulfill two important
requirements that literature doesn't have to: (1) precision; (2) ease of
modification. One kind of literature also striving for precision is
legal text, and to the joy of lawyers, there's never a way to write
something precise enough in human language so as to leave no room for
interpretation. I don't know of literature that must satisfy (2).

I don't think (1) and (2) above are "additional constraints", they are
organic constraints that interplay with the others. For example, "good
writing" becomes writing that, among other things, makes intent clear in
order to facilitate (1) and is modular and easy to understand to
facilitate (2).

>>Yet the for_each loop is severely limited in what it can
>>effect. That it evaluates its boundaries only once is about
>>the only good thing about for_each.
>
> It's limitations are in fact the good thing. When you see
> for_each, you know that what is being done conforms to these
> limitations -- for_each is not find_if, for example. It's value
> is not in the function itself -- it is in its opposition with
> other functions (and explicit loops).

This argument is misplaced. Yes, there's a lot to say about using the
right tool for the job, but for_each has a big problem: it's not
scalable. Adding one more operation to a for_each explodes the cost into
writing an entire new type. In contrast, for loops are one <Enter> away
from accepting more code, which is what often happens. for_each is just
the wrong way to cut the diamond.

As an example of what's scalable and what's not, a coding guideline I
personally adhere to is to brace my code "profilactically".

for (this_; that; the_other) {
one_expression;
}

instead of:

for (this_; that; the_other)
one_expression;

This is to impart liquidity to code: I add and subtract expressions
later without messing with the braces. Some people don't like the
gratuitous braces, and I understand them, but I personally prefer it
this way because it frees my mind from the permanent effort of
distinguishing loops with one expression from loops with several
expressions. I believe that that distinction is irrelevant. To wit, code
changes a lot back and forth between the two cases. I enjoy not having
additional friction (adding and removing braces) when changing code.

Now take a for_each loop. for_each is the biggest emperor's clothes
event since the tulip craze in Holland. So somebody wants to use
for_each and first tries to get lucky by reusing some little functor in
the standard lib.

for_each(v.begin(), v.end(), stl_functor(args));

If now we need to do stl_functor and something more, there are a few
options. First is to obstinate in using STL amenities and delve into the
unreadability of binders and other good things.

Second is write a your own functor - a hecatomb of code that has to go
outside the scope it's used in, with constructors, destructor, bugs, the
works. The usual soothing comment is that maybe that loop body raised to
the status of a class "could be reused elsewhere" - something that,
let's face it, never ends up happening. Abstractions get reused, not
random loop bodies.

Third is to accept that the emperor is butt-naked and switch to a
regular for loop (or Boost's FOREACH which is so useful, it would be
nice to also have in separation from Boost).

Fourth is to delve into a lambda library, which looks pretty good when
you come from (1) and (2), but not so attractive when you come from (3),
and which further slowly forces on you an entire new language as you try
to scale things.

In any case, my point is that for_each won't scale. You want to do
something little, you got to change things big time.

for_each is a tool good for so few jobs, one could just as well avoid it
entirely instead of sitting down and thinking of the minority of cases
in which it's justifiable - just like goto.

And sorry James for picking on you, but you are just lifting the ball
for a dunk there. for_each could be just as well the epitome of code
that is not "clear and concise" - and gratuitously! Yet I distinctly
remember a post that rewrote a one-liner in oh so many ugly lines just
to accommodate for_each. Guess who wrote it? :o)

Ok, I found it:

http://groups.google.com/group/comp.lang.c++.moderated/msg/609bdccf9665c2d9

I also add the response I gave back then, just because I feel my
lavaliere joke went unnoticed:

http://groups.google.com/group/comp.lang.c++.moderated/msg/e8e8242718116417

You gotta admit I'm at least consistent :o). But your viewpoint seems
quite dissonant to me. How can you internally reconcile the "clear and
terse" ideal with the running faucet in your post that I point to? (Just
like with the "C++ is safer than Java" trial Kanze vs. Alexandrescu, I
am getting ready for a 1800-lines post that I'll take the weekend off to
read and won't be able to answer...)

Andrei

Carlos Moreno

unread,

May 13, 2006, 7:18:05 AM5/13/06

to

Maciej Sobczak wrote:

>>>double x = ...;
>>>double y = 1 + sqrt(x);
>>>double z = 3 * sqrt(x);
>>>
>>>The above code is just sloppy. Better:
>>>
>>>double x = ...;
>>>const double sqrt_x = sqrt(x);
>>>double y = 1 + sqrt_x;
>>>double z = 3 * sqrt_x;
>>
>>There is one extra line of code...
>
> There is one less function call. :)

Well... But... there is one extra line of code!! :-)

>>How about this new construct:
>>
>>for <const v> (int i = 0; i < v.size(); i++)
>
> Easy.
>
> v[i] = 7; // bang! v is supposed to be const

Yup. Thought about it a minute after posting! (sorry!)

> You would need to express it this way:
>
> for <const v.size()> (int i = 0; i != v.size(); ++i)

Indeed -- that was going to be my argument; in fact, wouldn't
it be nice (yes, I know it's practically impossible) that we
could specify arbitrarily, at any scope of the code that a
certain expression is to be const? (such that the compiler
*enforces* that nothing in the code is allowed if it has the
effect or side-effect of modifying the expression)

The above would be but one of the many uses of such feature.

> and I ask you: in what way is the above better than my
>
> const size_t vSize = v.size();
> for (int i = 0; i != vSize; ++i)
>
> ? :)

Oh come one, Maciej!!! :-)

*That* one is easy! <const v.size()> *enforces* (that is,
causes the compiler to *enforce*) the fact that v.size()
is not changing -- your version above says nothing about
it, and it fact fails if the vector's size accidentally
changes (yes, sure, you might catch it right away during
your systematic debugging tests; but the other version
wouldn't even make it to the debugging tests, as the
compiler would flag the problem).

Plus... (yes! you guessed!!) your version has one extra
line of code!! :-)

Carlos
--

Walter Bright

unread,

May 13, 2006, 7:20:05 AM5/13/06

to

Walter Bright wrote:
> Consider:
>
> int x;
> const int& end = x;

I want to expand on that a bit. Given:

int x = 10;

const int& end = x;

for (int i = 0; i < end; i++)
{
x = 7;
printf("i = %d\n", i);
}

Note that this is legal, standards conforming code. It doesn't use
pointers, casts, unions, mutables, and it's single threaded. There's no
attempt to subvert the type system. It's even const-correct.

But 'end', despite being 'const', has its value change during the loop.
It isn't loop invariant.

Furthermore, this is a trivial loop. Suppose it was written in the more
modern style, with iterators, FOREACH macros (of which there are many
implementations), other macros, typedefs, and templates, dependent on an
arbitrary number of #include's. Remember, the language cannot help you,
as it's const-correct. What are the chances of spotting such a problem
with a visual code review?

This *does* happen in real code. It is not at all unusual for a loop
body to attempt to change the aggregate it is looping over (such as by
adding or removing elements).

So what has const as type modifier, as well as const-correctness,
brought to the table to justify all the effort and hype spent on it?
There's got to be a better way.

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Walter Bright

unread,

May 13, 2006, 7:24:55 AM5/13/06

to

vande...@gmail.com wrote:
> I'm not sure why you bring that example up: It's unrelated to the
> one you mentioned earlier and which I was commenting on.
> In this latter example, the const is not top level, and hence it is
> indeed not useful to an optimizer.

I'm sorry, I misunderstood you. I infer you're talking about:

void foo(const int& end) { ... }
...
for (int i = 0; i < end; i++)
{
foo(end);
}

The Standard 5.2.2-5:
"[Note: a function can change the values of its nonconst
parameters, but these changes cannot affect the values
of the arguments except where a parameter is of a reference type
(8.3.2); if the reference is to a constqualified
type, const_cast is required to be used to cast away the constness in
order to modify the
argument’s value. Where a parameter is of const reference type a
temporary object is introduced if
needed (7.1.5, 2.13, 2.13.4, 8.3.4, 12.2). In addition, it is possible
to modify the values of nonconstant
objects through pointer parameters. ]"

I read that as foo() may modify 'end' by using a const_cast, and
therefore the optimizer must assume that it might.

> It's technically only one meaning (with two grammatical locations,
> although both location can be useful for optimization purposes):
> A type qualifier. From an optimization perspective, it's only useful
> when appearing at the top level of a variable type. (That's probably
> what you mean by the "storage class meaning", but the language
> is consistent in treating is as a type qualifier.)

The Standard does treat the two cases differently, for example, see
7.1.5.1-2:

"A variable of const qualified integral or enumeration type initialized
by an integral
constant expression can be used in integral constant expressions (5.19)."

-Walter Bright
www.digitalmars.com C, C++, D programming language compilers

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Dave Harris

unread,

May 14, 2006, 7:46:35 AM5/14/06

to

wal...@digitalmars-nospamm.com (Walter Bright) wrote (abridged):

> Const (as implemented in C++) is quite useless for expressing loop
> invariants (I know, I've implemented it, and had to back it out).

> [...]

> The way loop invariants are done in a decent C++ optimizer is to ignore
> const, and instead collect information about where expression values are
> changed, where they may be changed, and where they are guaranteed not to
> change based on the most lax interpretation of the language rules
> possible.

I think the previous posters were more focused on helping human readers
than on helping the compiler. The optimiser will happily scan the code and
figure out for itself which values may change. Human readers sometimes
appreciate being given the information up-front.

Also, the very fact that the keyword "const" is redundant is what makes it
a useful consistency check. If I declare a variable const and then later
try to change it, the compiler won't be worried, but /I/ should be.

-- Dave Harris, Nottingham, UK.