C++ FAQ

matt

unread,

Jun 28, 2009, 4:30:58 PM6/28/09

to

Hi all,

I have put together (yet another) C++ FAQ and the source code for all
the examples contained within it.

I thought it might be good to post the link here, as it answers many
questions that are frequently posted to this newgroup.

In addition, I was interested if anyone has comments regarding the FAQ
and advice. I would welcome any / all feedback. My email is listed
in the FAQ document itself.

http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php

Thanks,
Matt.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Seungbeom Kim

unread,

Jun 29, 2009, 1:17:00 AM6/29/09

to

matt wrote:
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>
> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php

1.8 says:
> The class initialization list is a list that is defined within a header
> definition before the entry point of the function.

"within a header definition"? What is a header definition?
The initialization list has nothing to do with a header.

If you want to say where the list is used, the most important to mention
is "in (the definition of) a constructor", which you didn't mention.

> It is used to initialize some or all of the class member variables.

True, but it's also used to initialize the base subobjects.
And you cannot use it to initialize static data members.

> The initialization may be based on parameters passed into
> the constructor, but this is not a requirement.

Well, the list can use whatever is available at that point,
including parameters, other variables, and functions.
The above seems to be a fairly convoluted way to say the same thing.

--
Seungbeom Kim

Seungbeom Kim

unread,

Jun 29, 2009, 1:15:47 AM6/29/09

to

matt wrote:
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>
> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php

1.5 says, "Prefix returns a non-const reference whereas postfix returns
a const object." However, you cannot simply say that "postfix returns
a const object" as a truth.

Opinions differ regarding whether the postfix operator should return
a const object or a non-const object, but there are good reasons why
it should return a non-const object. (For example, see the thread
"Overload operators" here in May 2009.)

Moreover, postfix operators already defined in the language for built-in
types return non-const objects. 13.6[over.built]/3,4 lists:

VQ T& operator++(VQ T&);
T operator++(VQ T&, int);

VQ T& operator--(VQ T&);
T operator--(VQ T&, int);

The example for non-built-in types in the standard, 13.5.7[over.inc],
also shows postfix operators returning non-const objects:

class X {
public:
X& operator++(); // prefix ++a
X operator++(int); // postfix a++
};

class Y { };
Y& operator++(Y&); // prefix ++b
Y operator++(Y&, int); // postfix b++

though this is not to be taken as a mandate.

With these counterexamples, I think you should not say "postfix returns
a const object" in the name of a FAQ.

--
Seungbeom Kim

Seungbeom Kim

unread,

Jun 29, 2009, 1:17:32 AM6/29/09

to

matt wrote:
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>
> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php

1.9 says

C++ does not allow for the modification of temporaries

but this is not true. Objects, including temporaries, are modifiable
if they are not const-qualified (or if they have a mutable component).

struct S { S& operator++(); };
S f();
f() = S(); // assignment is fine
++f(); // increment is fine

What is relevant here is that, a temporary, being an rvalue, cannot be
modified _if it's of non-class type_. 3.10[basic.lval]/10 says,

| An lvalue for an object is necessary in order to modify the object
| except that an rvalue of class type can also be used to modify its
| referent under certain circumstances. [Example: a member function
| called for an object (9.3) can modify the object.]

Therefore, in order for an rvalue to be modifiable, it has to be of
class type.

--
Seungbeom Kim

Seungbeom Kim

unread,

Jun 29, 2009, 1:15:57 AM6/29/09

to

matt wrote:
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>
> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php

1.6 mentions null_ptr, but it's named nullptr. See 2.14.7[lex.nullptr]
at <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2914.pdf>,
for example.

--
Seungbeom Kim

Seungbeom Kim

unread,

Jun 29, 2009, 1:15:09 AM6/29/09

to

matt wrote:
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>
> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php

Item 1.4 says:

> The use of the namespace keyword followed by an open curly bracket
> (rather than by a name and then a curly bracket) creates an
> anonymous or unnamed namespace. The functions and definitions
> that are declared within this namespace have ï¿½internal linkageï¿½
> within this translation.

This is wrong. They still have external linkage. It's just that they are
declared in a namespace whose name is uniquely chosen by the compiler
and that they are automatically imported to the current scope.

7.3.1.1[namespace.unnamed]/1 says:
| An unnamed-namespace-definition behaves as if it were replaced by
|
| namespace unique { /* empty body */ }
| using namespace unique;
| namespace unique { namespace-body }
|
| where all occurrences of unique in a translation unit are replaced
| by the same identifier and this identifier differs from all other
| identifiers in the entire program.

In addition, "the functions and definitions" is not a very good
classification; functions also have definitions, and declarations
that are not definitions are also members of the namespace. Since
a namespace is a declarative region and a declarative region deals
with names, I would just say "the names": "The names that are
declared within this namespace..."

--
Seungbeom Kim

Seungbeom Kim

unread,

Jun 29, 2009, 1:19:46 AM6/29/09

to

matt wrote:
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>
> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php

3.7 says

There is wide spread agreement in the programming community
that goto should be completely avoided in all cases.

which doesn't leave any room for exceptions. I don't think there's
such a universal consensus. For example, breaking out of multi-level
loops is most easily done with a goto. It's not impossible to write
a goto-free version, but the point is that you should weigh its benefit
versus its cost. It's not certainly that goto should be *completely*
avoided *in all cases*. You may argue that goto shouldn't be used,
but you shouldn't try to state it as a truth in the name of a FAQ.

Goto also allows a nice trick such as:

template <typename Iter>
void join_print(Iter begin, Iter end, std::ostream& os, const char* delim)
{
if (begin == end) return; else goto print_element;
for (; begin != end; ++begin) {
os << delim;
print_element:
os << *begin;
}
}

where a goto-free version would incur a check at each iteration of
the loop.

--
Seungbeom Kim

Seungbeom Kim

unread,

Jun 29, 2009, 1:22:30 AM6/29/09

to

matt wrote:
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>
> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php

4.12 and 4.13 use

stream << setiosflags(ios::fixed)

to set the fixed-point notation, which effectively calls

stream.setf(ios::fixed)

but it's not correct, because it doesn't clear the existing flags
for floating-point notations. (Specifically, the stream may already
have ios_base::scientific set, and having both flags set is not
recommended.) The correct way is to call

stream.setf(ios_base::fixed, ios_base::floatfield)

which first clears the existing flags and then sets the given flag.
It can also be done simply by:

stream << fixed

--
Seungbeom Kim

Seungbeom Kim

unread,

Jun 29, 2009, 1:21:38 AM6/29/09

to

matt wrote:
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>
> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php

4.3 introduces a custom-made isDigit function. It's neither very long
nor complicated, but why would you want to reinvent the wheel instead
of just using std::isdigit in the standard library?

In addition, the classification functions in 4.4 are not very well-
written, and I don't consider their quality adequate for a FAQ.
For example, isInteger doesn't recognize a negative integer, and
isNumber doesn't recognize a negative number nor a real number
written in the scientific notation (e.g. "6.02e-23").
Again, it's much advisable to use strtol()/strtod() functions or
boost::lexical_cast than to reinvent the wheel.

I appreciate your effort to compile a collection of FAQs, but a
good one should have answers that are in wide acceptance and use.
Your own inventions may start good Usenet discussions, but without
such prior verification, those may have errors and lead its readers
to confusion, and may not gain a wide recognition as a good FAQ.

--
Seungbeom Kim

Francis Glassborow

unread,

Jun 29, 2009, 1:29:32 PM6/29/09

to

Seungbeom Kim wrote:
truth in the name of a FAQ.
>
> Goto also allows a nice trick such as:
>
> template <typename Iter>
> void join_print(Iter begin, Iter end, std::ostream& os, const char* delim)
> {
> if (begin == end) return; else goto print_element;
> for (; begin != end; ++begin) {
> os << delim;
> print_element:
> os << *begin;
> }
> }
>
> where a goto-free version would incur a check at each iteration of
> the loop.
>

Sorry but I do not consider that a nice trick but a truly ugly piece of
code that should not see the light of day. It is a good example why many
of us are so suspicious of uses of goto. Granted that the author of the
FAQ needs to modify his statement about goto, but this is not an example
I would want to see in a FAQ.

--

litb

unread,

Jun 29, 2009, 1:28:41 PM6/29/09

to

On 28 Jun., 22:30, matt <li...@givemefish.com> wrote:
> Hi all,
>
> I have put together (yet another) C++ FAQ and the source code for all
> the examples contained within it.
>
> I thought it might be good to post the link here, as it answers many
> questions that are frequently posted to this newgroup.
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>

In 1.3, you describe using directive, but you call it "using
declaration". While a using declaration actually is a declaration for
the thing you name in the scope the using declaration appears in, a
using directive does declare nothing in the scope it appears in. I
would recommend you to explain the danger of using-directives too: in
func, cout and endl will be found at global scope if looked up
unqualified, and may cause ambiguity with names declared there, even
though the using-directive was placed into a local scope. This danger
doesn't happen with using declarations, of course.

In 1.8, you say that you *must* use a "class initialization list".
This isn't true always:

struct X { int &r; }; // fine
X x = { someInt };

Last but not least, please note that not all temporaries are rvalues
(i didn't find this in your paper, i just want to tell you about
this). Often believed otherwise, some temporaries deliberately can
bind to nonconst references, because they are lvalues. Exception
objects are the ones that i know off-head: catch(E &e) { } is allowed.

Alf P. Steinbach

unread,

Jun 29, 2009, 1:28:15 PM6/29/09

to

* Seungbeom Kim:

>
> Goto also allows a nice trick such as:
>
> template <typename Iter>
> void join_print(Iter begin, Iter end, std::ostream& os, const char* delim)
> {
> if (begin == end) return; else goto print_element;
> for (; begin != end; ++begin) {
> os << delim;
> print_element:
> os << *begin;
> }
> }
>
> where a goto-free version would incur a check at each iteration of
> the loop.

*Hark*.

I wouldn't exactly call that a "nice" trick, rather the opposite...

It's a conventional loop-and-a-half. Any programmer who doesn't recognize a
loop-and-a-half on sight (yes Kim, that's you! he he[1]) should train himself or
herself in recognizing such loops and in expressing them properly. Off the cuff:

template< typename Iter >
void joinPrintOn(
std::ostream& os,
Iter begin,
Iter const end,
char const* const delim = " "
)
{
if( begin == end ) { return; }

for( ;; )
{
os << *begin;
++begin;
if( begin == end ) { break; }
os << delim;
}
}

Cheers & hth.,

- Alf

Notes:
[1] Of course you couldn't have written that monstrosity without recognizing the
loop-and-a-half for what it is. But, seeing that code, some ridicule, like some
punishment rather than applause, was called for. ;-) That code is just bad[1].

--
Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
No ads, and there is some C++ stuff! :-) Just going there is good. Linking
to it is even better! Thanks in advance!

Seungbeom Kim

unread,

Jun 29, 2009, 5:27:57 PM6/29/09

to

Francis Glassborow wrote:

> Seungbeom Kim wrote:
>>
>> Goto also allows a nice trick such as:
>>
>> template <typename Iter>
>> void join_print(Iter begin, Iter end, std::ostream& os, const char*
>> delim)
>> {
>> if (begin == end) return; else goto print_element;
>> for (; begin != end; ++begin) {
>> os << delim;
>> print_element:
>> os << *begin;
>> }
>> }
>>
>> where a goto-free version would incur a check at each iteration of
>> the loop.
>
> Sorry but I do not consider that a nice trick but a truly ugly piece of
> code that should not see the light of day. It is a good example why many
> of us are so suspicious of uses of goto. Granted that the author of the
> FAQ needs to modify his statement about goto, but this is not an example
> I would want to see in a FAQ.

It's a "trick", and I don't want it published in a FAQ, either. :)

However, I don't see anything so bad about it. Can you elaborate?

--
Seungbeom Kim

Louis Lavery

unread,

Jun 29, 2009, 5:26:02 PM6/29/09

to

Seungbeom Kim wrote:
> matt wrote:
>>
>> In addition, I was interested if anyone has comments regarding the FAQ
>> and advice. I would welcome any / all feedback. My email is listed
>> in the FAQ document itself.
>>
>> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php
>
> 3.7 says
>
> There is wide spread agreement in the programming community
> that goto should be completely avoided in all cases.
>
> which doesn't leave any room for exceptions. I don't think there's
> such a universal consensus. For example, breaking out of multi-level
> loops is most easily done with a goto. It's not impossible to write
> a goto-free version, but the point is that you should weigh its benefit
> versus its cost. It's not certainly that goto should be *completely*
> avoided *in all cases*. You may argue that goto shouldn't be used,
> but you shouldn't try to state it as a truth in the name of a FAQ.
>
> Goto also allows a nice trick such as:
>
> template <typename Iter>
> void join_print(Iter begin, Iter end, std::ostream& os, const char* delim)
> {
> if (begin == end) return; else goto print_element;
> for (; begin != end; ++begin) {
> os << delim;
> print_element:
> os << *begin;
> }
> }

Yuk!

template <typename Iter>
void join_print(Iter begin, Iter end, std::ostream& os, const char* delim)
{

if (begin != end)
{
os << delim;

while(++begin != end)
{
os << delim << *begin;
}
}
}

Louis.

--

Francis Glassborow

unread,

Jun 30, 2009, 1:11:15 AM6/30/09

to

Seungbeom Kim wrote:
> Francis Glassborow wrote:
>> Seungbeom Kim wrote:
>>>
>>> Goto also allows a nice trick such as:
>>>
>>> template <typename Iter>
>>> void join_print(Iter begin, Iter end, std::ostream& os, const char*
>>> delim)
>>> {
>>> if (begin == end) return; else goto print_element;
>>> for (; begin != end; ++begin) {
>>> os << delim;
>>> print_element:
>>> os << *begin;
>>> }
>>> }
>>>
>>> where a goto-free version would incur a check at each iteration of
>>> the loop.
>>
>> Sorry but I do not consider that a nice trick but a truly ugly piece of
>> code that should not see the light of day. It is a good example why many
>> of us are so suspicious of uses of goto. Granted that the author of the
>> FAQ needs to modify his statement about goto, but this is not an example
>> I would want to see in a FAQ.
>
> It's a "trick", and I don't want it published in a FAQ, either. :)
>
> However, I don't see anything so bad about it. Can you elaborate?
>

Like many uses of goto there are cleaner alternatives (as other have
shown with their posts). I am not alone in disliking jumps into loops
(out of them is OK, that is what break is for)

goto is all too often a symptom of lazy coding or lack of familiarity
with the idioms of the language.

Yes It can be used to get out of a nest of loops, but generally a nest
of loops is symptomatic ....

I am not of the school that says 'never use goto' but I cannot remember
when I last found the need for one.

--

Igor Mikushkin

unread,

Jun 30, 2009, 1:10:37 AM6/30/09

to

On Jun 29, 1:30 am, matt wrote:
>
> In addition, I was interested if anyone has comments regarding the FAQ
> and advice. I would welcome any / all feedback. My email is listed
> in the FAQ document itself.
>
> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php
>

Hello Matt!

std::vector<std::string> parseCommandLine(int argc, char** argv)
{
//Parse out command-line args. Ignore the first arg, which is
// the application name. If you don't want to ignore this arg,
// loop from i=00 to i<argc rather than from i=1 to i<argc.
vector<string> commandLineArgs;
for(int i=1; i<argc; ++i)
commandLineArgs.push_back(argv[i]);
return commandLineArgs;
}

I suppose it is better to write it that way:

std::vector<std::string> parseCommandLine(int argc, char** argv)
{
return std::vector<std::string>(argv + 1, argv + argc);
}

Thanks,
Igor

Dragan Milenkovic

unread,

Jun 30, 2009, 1:11:49 AM6/30/09

to

Seungbeom Kim wrote:
> Francis Glassborow wrote:
>> Seungbeom Kim wrote:
>>>
>>> Goto also allows a nice trick such as:
>>>
>>> template <typename Iter>
>>> void join_print(Iter begin, Iter end, std::ostream& os, const char*
>>> delim)
>>> {
>>> if (begin == end) return; else goto print_element;
>>> for (; begin != end; ++begin) {
>>> os << delim;
>>> print_element:
>>> os << *begin;
>>> }
>>> }
>>>
>>> where a goto-free version would incur a check at each iteration of
>>> the loop.
>>
>> Sorry but I do not consider that a nice trick but a truly ugly piece of
>> code that should not see the light of day. It is a good example why many
>> of us are so suspicious of uses of goto. Granted that the author of the
>> FAQ needs to modify his statement about goto, but this is not an example
>> I would want to see in a FAQ.
>
> It's a "trick", and I don't want it published in a FAQ, either. :)
>
> However, I don't see anything so bad about it. Can you elaborate?

It's very ugly. I'd rather write something like this...

// If you don't like "*begin++", just add a line with "++begin".

if (begin == end) return;

os << *begin++;
while (begin != end) {
os << delim << *begin++;
}

... which reads "write the first element and then all the other elements
separated by the delimiter"... instead of being expressed in terms
of labels and jumps to the middle of something. You see, in order to
understand what the loop itself does, we need the starting state and
the iteration step. But in your case, we jumped to the middle of
the loop, so we have to analyze the function as a whole - expression
by expression, instead of breaking it into chunks (the way my function
can be broken). Your example is simple enough so it can be easily
understood, but what if it was more complex?

--
Dragan

Jonathan Thornburg [remove -animal to reply]

unread,

Jun 30, 2009, 2:48:25 PM6/30/09

to

Seungbeom Kim wrote:
> Goto also allows a nice trick such as:
>
> template <typename Iter>
> void join_print(Iter begin, Iter end, std::ostream& os, const char*
> delim)
> {
> if (begin == end) return; else goto print_element;
> for (; begin != end; ++begin) {
> os << delim;
> print_element:
> os << *begin;
> }
> }
>
> where a goto-free version would incur a check at each iteration of
> the loop.

Two comments:

First, this is an example of Duff's device
http://en.wikipedia.org/wiki/Duff%27s_device
and should be credited as such.

Second, there are other (IMHO more persuasive) arguments for keeping
goto in the language (albeit labelled with Knuth's "double dangerous
bend" signs). One that comes to mind is machine-generated code. For
example, if you consider compiling a finite state machine (e.g., for
regular-expression pattern matching) into C++, you'll see that goto
can be quite useful. For a fancier example, consider open-coding a
yacc/bison parser as in
http://compilers.iecc.com/comparch/article/91-10-075
This machine-generated code needs gotos to function properly.

--
-- "Jonathan Thornburg [remove -animal to reply]" <jth...@astro.indiana-zebra.edu>
Dept of Astronomy, Indiana University, Bloomington, Indiana, USA
"C++ is to programming as sex is to reproduction. Better ways might
technically exist but they're not nearly as much fun." -- Nikolai Irgens

Alan McKenney

unread,

Jun 30, 2009, 2:47:35 PM6/30/09

to

On Jun 30, 1:11 am, Francis Glassborow

<francis.glassbo...@btinternet.com> wrote:
> Seungbeom Kim wrote:
> > Francis Glassborow wrote:
> >> Seungbeom Kim wrote:
>
> >>> Goto also allows a nice trick such as:

> Like many uses of goto there are cleaner alternatives (as other have
> shown with their posts). I am not alone in disliking jumps into loops
> (out of them is OK, that is what break is for)

The downside of "break" is that it leaves you at the
end of the loop. Sometimes you want to do some
processing that's only done if you fall through the
end of the loop. For example:

char line[81];
int j = 0;
for ( int i = 0; i < 81 ; ++i )
{
line[i] = in[j];
if ( !in[j] ) goto done;
}
line[79] = '>';
line[80] = 0;
done:
printf( "%s\n", line );
....

BTW, "break" and "return" are themselves a kind of "goto".

> Yes It can be used to get out of a nest of loops, but generally a nest
> of loops is symptomatic ....

In some lines of work, nested loops are inevitable. E.g.,
numerical calculations.

Francis Glassborow

unread,

Jun 30, 2009, 7:53:15 PM6/30/09

to

Alan McKenney wrote:
> On Jun 30, 1:11 am, Francis Glassborow
> <francis.glassbo...@btinternet.com> wrote:

>
>> Like many uses of goto there are cleaner alternatives (as other have
>> shown with their posts). I am not alone in disliking jumps into loops
>> (out of them is OK, that is what break is for)
>
> The downside of "break" is that it leaves you at the
> end of the loop. Sometimes you want to do some
> processing that's only done if you fall through the
> end of the loop. For example:
>
>
> char line[81];
> int j = 0;
> for ( int i = 0; i < 81 ; ++i )
> {
> line[i] = in[j];

How does j change?

> if ( !in[j] ) goto done;
> }
> line[79] = '>';
> line[80] = 0;
> done:
> printf( "%s\n", line );
> ....
>

char line[81];
int j = 0;

line[79] = '>'; // set provisional line end
line[80] = 0;

for ( int i = 0; i < 81 ; ++i )
{
line[i] = in[j];

if(! in[j]) break
j++; // or whatever
}
printf("%s\n", line);

> BTW, "break" and "return" are themselves a kind of "goto".

Well yes and no. They are constrained in ways that make it hard if not
impossible to create spaghetti code.

>
>> Yes It can be used to get out of a nest of loops, but generally a nest
>> of loops is symptomatic ....
>
> In some lines of work, nested loops are inevitable. E.g.,
> numerical calculations.

Not true. Just use function calls. They will make your code more
readable and less prone to errors during maintenance

Martin Eisenberg

unread,

Jul 1, 2009, 4:44:47 AM7/1/09

to

Francis Glassborow wrote:
> Alan McKenney wrote:
>> On Jun 30, 1:11 am, Francis Glassborow

>>> Yes It can be used to get out of a nest of loops, but

>>> generally a nest of loops is symptomatic ....
>>
>> In some lines of work, nested loops are inevitable. E.g.,
>> numerical calculations.
>
> Not true. Just use function calls. They will make your code more
> readable and less prone to errors during maintenance

The question is, what are bare nested loops are symptomatic of?
Hiding inner loops in functions obviously makes multilevel break
harder, but positing that as an argument against using goto that way
would be circular.

Martin

--
Quidquid latine scriptum est, altum videtur.

matt

unread,

Jul 1, 2009, 10:23:26 PM7/1/09

to

>
<4A4837F...@bawi.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Original-Date: Wed, 1 Jul 2009 02:25:16 -0700 (PDT)
X-Submission-Address: c++-s...@netlab.cs.rpi.edu

On Jun 29, 7:21 am, Seungbeom Kim <musip...@bawi.org> wrote:
> matt wrote:
>
>> In addition, I was interested if anyone has comments regarding the
>> FAQ
>> and advice. I would welcome any / all feedback. My email is listed
>> in the FAQ document itself.
>
>> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php
>

> 4.3 introduces a custom-made isDigit function. It's neither very long
> nor complicated, but why would you want to reinvent the wheel instead
> of just using std::isdigit in the standard library?
>
> In addition, the classification functions in 4.4 are not very well-
> written, and I don't consider their quality adequate for a FAQ.
> For example, isInteger doesn't recognize a negative integer, and
> isNumber doesn't recognize a negative number nor a real number
> written in the scientific notation (e.g. "6.02e-23").
> Again, it's much advisable to use strtol()/strtod() functions or
> boost::lexical_cast than to reinvent the wheel.
>
> I appreciate your effort to compile a collection of FAQs, but a
> good one should have answers that are in wide acceptance and use.
> Your own inventions may start good Usenet discussions, but without
> such prior verification, those may have errors and lead its readers
> to confusion, and may not gain a wide recognition as a good FAQ.

Yes, this is a good point. I updated the section on digits to suggest
using isdigit rather than the custom version, and the other functions
to use either strtol / strtod or boost::lexical_cast.

Does this new implementation look correctly coded?

Thanks,
Matt.

--

Ulrich Eckhardt

unread,

Jul 1, 2009, 10:21:45 PM7/1/09

to

Dragan Milenkovic wrote:
> Seungbeom Kim wrote:
>> Goto also allows a nice trick such as:
>>
>> template <typename Iter>
>> void join_print(Iter begin, Iter end, std::ostream& os, const char*
>> delim)
>> {
>> if (begin == end) return; else goto print_element;
>> for (; begin != end; ++begin) {
>> os << delim;
>> print_element:
>> os << *begin;
>> }
>> }

[...]

> It's very ugly. I'd rather write something like this...
>
> // If you don't like "*begin++", just add a line with "++begin".
>
> if (begin == end) return;
>
> os << *begin++;
> while (begin != end) {
> os << delim << *begin++;
> }
>
> ... which reads "write the first element and then all the other elements
> separated by the delimiter"... instead of being expressed in terms
> of labels and jumps to the middle of something.

Replace the "os << *it" with something more complex, possibly a few lines of
code. If you do that, it will become more and more tempting to use Kim's
variant in order to avoid this duplication. OTOH, it then becomes equally
tempting to just do that (unnecessary) comparison to skip the delimiter
inside the loop, because its overhead will be dwarfed by the rest anyway.

Suggestion:

if(begin==end)
return;
while(true) {
os << *begin;
++begin;
if(begin==end)
break;
os << delim;
}

That said, using a goto with a label (which itself can be like a well-named
identifier!) to leave two nested loops is one place where I used this and
where it is the cleanest method I could find. Actually, if there were local
functions, this could change though.

bool search_it() {
for(...)
for(...)
if(...)
goto found_it;

not_found:
cleanup();
return false;

found_it:
cheer();
return true;
}

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

matt

unread,

Jul 1, 2009, 10:22:33 PM7/1/09

to

>
<075452b5-8855-41ab...@t21g2000yqi.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Original-Date: Wed, 1 Jul 2009 02:21:17 -0700 (PDT)
X-Submission-Address: c++-s...@netlab.cs.rpi.edu

On Jun 30, 7:10 am, Igor Mikushkin <igor.mikush...@gmail.com> wrote:

> On Jun 29, 1:30 am, matt wrote:
>
>
>
>> In addition, I was interested if anyone has comments regarding the
>> FAQ
>> and advice. I would welcome any / all feedback. My email is listed
>> in the FAQ document itself.
>
>> http://www.givemefish.com/Downloads/Download_cppFAQ/Download_cppFAQ.php
>

> Hello Matt!
>
> std::vector<std::string> parseCommandLine(int argc, char** argv)
> {
> //Parse out command-line args. Ignore the first arg, which is
> // the application name. If you don't want to ignore this arg,
> // loop from i=00 to i<argc rather than from i=1 to i<argc.
> vector<string> commandLineArgs;
> for(int i=1; i<argc; ++i)
> commandLineArgs.push_back(argv[i]);
> return commandLineArgs;
>
> }
>
> I suppose it is better to write it that way:
>
> std::vector<std::string> parseCommandLine(int argc, char** argv)
> {
> return std::vector<std::string>(argv + 1, argv + argc);
>
> }
>

Igor,

thanks for that suggestion. I included it, while leaving the original
as is. I think the original function is easier to understand for many
readers, but the latter might be preferred for actual implementation.

Matt.

--

Francis Glassborow

unread,

Jul 1, 2009, 10:21:37 PM7/1/09

to

>
<4A482EDC...@bawi.org>
<i7SdnT4BLM95-tXX...@bt.com>
<h2b1pm$fp$1...@news.stanford.edu>
<qYudnTXATecgo9TX...@bt.com>
<8e0d4e83-3a5f-4ee8...@z28g2000vbl.googlegroups.com>
<g8idnXMOcZWnEdfX...@bt.com>
<h2e64n$mc0$1...@news.motzarella.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
X-Original-Date: Wed, 01 Jul 2009 09:56:37 +0100
X-Submission-Address: c++-s...@netlab.cs.rpi.edu

Martin Eisenberg wrote:
> Francis Glassborow wrote:
>> Alan McKenney wrote:
>>> On Jun 30, 1:11 am, Francis Glassborow
>>>> Yes It can be used to get out of a nest of loops, but
>>>> generally a nest of loops is symptomatic ....
>>> In some lines of work, nested loops are inevitable. E.g.,
>>> numerical calculations.
>> Not true. Just use function calls. They will make your code more
>> readable and less prone to errors during maintenance
> The question is, what are bare nested loops are symptomatic of?
> Hiding inner loops in functions obviously makes multilevel break
> harder, but positing that as an argument against using goto that way
> would be circular.

Actually there is no problem with the multi-level break using
functions because functions used this way return bool:

bool level2(/* parameter list */);
then we can write
for(int i(0); i!=limit; ++i){
if(not level2(/*whatever*/)) break;
}

However this only starts to scratch the potential of C++, using a
class with a function operator allows me to package all the data. And
one of the curious things about function operators is that they often
outperform the alternatives using plain functions. One reason seems to
me the locality of the member variables works well with cache.

The point is that using goto stops programmers exploring the
alternatives, often in the belief that those will be inferior to the
goto solution.

I accept that goto can be useful for machine generated code (as long
as you have no intention of trying to read and understand such code --
i.e. you are going to treat such code as some form of opaque
assembler. I also am not dogmatic that goto is necessarily bad but I
issued a challenge 18 years ago for an example of goto (in C) that
provided real benefit and have still to see one.

goto makes it a trifle easier to write QUAD (quick and dirty) code
that will be thrown away after it has been used, but IME its use
otherwise is symptomatic of a programmer who is unfamiliar with the
alternatives in C and C++.

Frank Birbacher

unread,

Jul 1, 2009, 10:41:34 PM7/1/09

to

Hi!

This is an interesting discussion where I like to express my opinion.

> * Seungbeom Kim:
>>
>> Goto also allows a nice trick such as:

Yes, fairly easy.

Alf P. Steinbach schrieb:
> *Hark*.
[...]

> That code is just bad[1].

Louis Lavery schrieb:
> Yuk!

Dragan Milenkovic schrieb:
> It's very ugly.

Francis Glassborow schrieb:

> Sorry but I do not consider that a nice trick but a truly ugly piece
> of code that should not see the light of day. It is a good example why
> many of us are so suspicious of uses of goto.

I see you intutionally you strongly oppose. I cannot object as much to
Kim's example than the four of you do. I also think "goto" will not
automatically lead to understandable code, but I do think that any
experienced C++ programmer will fairly easily recognize what Kim's code
does.

Alf P. Steinbach schrieb:

> It's a conventional loop-and-a-half. Any programmer who doesn't recognize a
> loop-and-a-half on sight (yes Kim, that's you! he he[1]) should train
> himself or
> herself in recognizing such loops and in expressing them properly. Off
> the cuff:

[snipped Alf's version using "for(;;)" and "break"]

Jumping in or out of such a simple loop is the same in terms of
complexity. Might be one is more used to jumping out of loops than into
them. I like most the examples using neither of these jumps, like
Dragan's. If there weren't "return" nor "break" nor "continue" you would
express the same thing using "goto" and it would be more obvious that
those statements are nothing more than simple jumps.

I think jumping into a loop is easy to understand. Using it is a matter
of taste. Objecting to it results from different habits, like opposing
to the formatting of some source code. Using few jumps is a good thing
nevertheless. "goto"s , whether disguised as "break" or not, always add
to complexity. Complexity is always hard to understand. That's why we
need professionals after all.

I use "goto" rarely.

Frank

--

Nevin :-] Liber

unread,

Jul 2, 2009, 4:51:47 AM7/2/09

to

In article
<fc071297-1580-415b...@a36g2000yqc.googlegroups.com>,
matt <li...@givemefish.com> wrote:

> > std::vector<std::string> parseCommandLine(int argc, char** argv)
> > {
> > return std::vector<std::string>(argv + 1, argv + argc);
> >
> > }
> >
>
> Igor,
>
> thanks for that suggestion. I included it, while leaving the original
> as is. I think the original function is easier to understand for many
> readers, but the latter might be preferred for actual implementation.

Why copy it into a vector at all? That should be a decision the caller,
not the callee, makes.

Better to return the a range of iterators, as in:

return std::pair<char const*, char const*>(argv +1, argv + argc);

Actually, in non-generic code, I'd use a custom struct instead of
std::pair, as I find it far better when my member variables have useful
names.

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620

Francis Glassborow

unread,

Jul 2, 2009, 5:07:18 AM7/2/09

to

Frank Birbacher wrote:

> Jumping in or out of such a simple loop is the same in terms of
> complexity. Might be one is more used to jumping out of loops than into
> them. I like most the examples using neither of these jumps, like
> Dragan's. If there weren't "return" nor "break" nor "continue" you would
> express the same thing using "goto" and it would be more obvious that
> those statements are nothing more than simple jumps.

In C++ jumping into a loop can meet issues when variables are declared
in the loop. This is not an issue for jumping out of a loop; the
compiler simply recognises that you are leaving the loop (even better
when an explicit exit is used with break) and creates code to clear up.

Of course goto is perfectly safe for simple clean code, however its use
then gains relatively little. However once you allow goto it becomes
increasingly used as code becomes more complex. Good, experienced
programmers realise that the temptation to use goto is a symptom of code
getting overly complex.

It is also interesting to note that no one in this thread has yet
mentioned that forward goto are relatively safe (even though they may
inhibit programmers from learning better ways) but the truly evil
version is the backward goto (jumping to something earlier)

Dragan Milenkovic

unread,

Jul 2, 2009, 9:47:31 PM7/2/09

to

Francis Glassborow wrote:
[snip]

> It is also interesting to note that no one in this thread has yet
> mentioned that forward goto are relatively safe (even though they may
> inhibit programmers from learning better ways) but the truly evil
> version is the backward goto (jumping to something earlier)

I was hoping it wouldn't come to this... but you have uttered
the name of an ancient evil. No programmer will ever be safe again.

--
Dragan

Jonathan Lee

unread,

Jul 2, 2009, 9:58:33 PM7/2/09

to

> I issued a challenge 18 years ago for an example of goto (in C)
> that provided real benefit and have still to see one.

Maybe you've seen this one before (below). Took it from the bottom of
http://david.tribble.com/text/goto.html

I find the goto version much easier to read. I don't know if other
people do, but because I do I think that's a (real) benefit. The
example furthermore highlights an important fact to me: "while(true)
{}" code is pretty silly when the programmer likely means "go to
there". Yet it occurs often enough...

--Jonathan

// Goto version ------------------------------------------------
int parse()
{
Token tok;

reading:
tok = gettoken();
if (tok == END)
return ACCEPT;
shifting:
if (shift(tok))
goto reading;
reducing:
if (reduce(tok))
goto shifting;
return ERROR;
}

// Structured version -------------------------------------
int parse() {
Token tok;

while (true) {
tok = gettoken();
if (tok == END) break;
while (true) {
if (shift(tok))
break;
if (!reduce(tok))
return ERROR
}
}
return ACCEPT;

Alan McKenney

unread,

Jul 2, 2009, 9:57:28 PM7/2/09

to

I think all this discussion goes to show that there
is hardly unanimity that "goto" should never be used.

On Jun 30, 7:53 pm, Francis Glassborow

<francis.glassbo...@btinternet.com> wrote:
> Alan McKenney wrote:
> > On Jun 30, 1:11 am, Francis Glassborow
> > <francis.glassbo...@btinternet.com> wrote:

...

> > The downside of "break" is that it leaves you at the
> > end of the loop. Sometimes you want to do some
> > processing that's only done if you fall through the
> > end of the loop. For example:
>

> > [ code w/ a loop with an exit via "goto" ]
>
> [ rearranged code that doesn't use "goto" ]

I agree, in most, perhaps all cases, it is
possible to avoid using goto. The question is
whether the code w/o goto is clearer than the
code with goto. IMHO, there are situations
where "goto" is cleaner. "Clearer" and "cleaner"
are in the eye of the beholder, though.

> > BTW, "break" and "return" are themselves a kind of "goto".
>
> Well yes and no. They are constrained in ways that make it hard if not
> impossible to create spaghetti code.

Trust me, it is not that hard to write
spaghetti code in any language. I have
spent the past few weeks trying to untangle
some spaghetti C++, and there's not a goto
in it.

> >> Yes It can be used to get out of a nest of loops, but generally a nest
> >> of loops is symptomatic ....
>
> > In some lines of work, nested loops are inevitable. E.g.,
> > numerical calculations.
>
> Not true. Just use function calls. They will make your code more
> readable and less prone to errors during maintenance

I flatly disagree.

I do not see how, for instance, the following code can
be made "more readable" by encapsulating pieces in
functions:

for ( int i = 0; i < a_rows; ++i ) {
for ( int j = 0; j < b_cols; ++j ) {
double temp = 0.0;
for ( int k = 0; k < a_cols; ++k ) {
temp += A[i][k]*B[k][j];
}
C[i][j] = temp;
}
}

Anyone who has done matrix operations would instantly
recognize that this supposed to be a matrix multiply,
which they would not (instantly) if you put each loop
inside a separate function. Indeed, a lot of _compilers_
would recognize this as a matrix multiply.

Indeed, it is by putting little bits of code
in functions, so that you have to flip back and
forth to follow the flow of control, that you
end up with spaghetti code in C. Not to mention
that it makes it much harder for the compiler
to rearrange the code to improve performance
(to the extent that that is possible in C.)

The potential for spaghetti code is even greater
in C++, because now you have inheritance, virtual
functions, overloads with default parameters, etc.,
so you not only have to find function "f()", you
have to figure out which f() is the one you are
actually using at any particular time you hit that
line of code.

Been there, done that, still looking for the T-shirt.

Jerry Coffin

unread,

Jul 3, 2009, 12:28:18 AM7/3/09

to

In article <4A482EDC...@bawi.org>, musi...@bawi.org says...

[ ... ]

> template <typename Iter>
> void join_print(Iter begin, Iter end, std::ostream& os, const char* delim)
> {
> if (begin == end) return; else goto print_element;
> for (; begin != end; ++begin) {
> os << delim;
> print_element:
> os << *begin;
> }
> }
>
> where a goto-free version would incur a check at each iteration of
> the loop.

First of all, given that each iteration of the loop does an output
operation, there's essentially no chance at all that the conditional
in the loop is going to make a measurable difference in speed -- just
in really round numbers, conditional tests take on the order of
nanoseconds and I/O operations take on the order of microseconds to
milliseconds.

Second, it's not really true anyway. This basic type of example (a
loop and a half) is well known, and it's equally well known that
avoiding the goto involves nothing more than duplicating a bit of
code. If the code you have to duplicate is sufficiently substantial,
it can be worthwhile, but in this case it's quite trivial:

template <typename Iter>
void join_print(

Iter begin,
Iter end,
std::ostream& os,
const char* delim)
{

if (begin!=end) {
os << *begin++;
for (;begin!=end; ++begin)
os << delim << *begin;
}
}

Of course, I'd prefer to separate concerns, and use std::copy with
the infix_ostream_iterator that was recent posted to comp.lang.c++:

// Warning: only minimally tested.
// infix_iterator.h
// Lifted from Jerry Coffin's 's prefix_ostream_iterator
#if !defined(INFIX_ITERATOR_H_)
#define INFIX_ITERATOR_H_
#include <ostream>
#include <iterator>

template <class T,
class charT=char,
class traits=std::char_traits<charT> >

class infix_ostream_iterator :
public std::iterator
<std::output_iterator_tag,void,void,void,void>
{
std::basic_ostream<charT,traits> *os;
charT const* delimiter;
bool first_elem;
public:
typedef charT char_type;
typedef traits traits_type;
typedef std::basic_ostream<charT,traits> ostream_type;

infix_ostream_iterator(ostream_type& s)
: os(&s),delimiter(0), first_elem(true)
{}
infix_ostream_iterator(ostream_type& s, charT const *d)
: os(&s),delimiter(d), first_elem(true)
{}
infix_ostream_iterator<T,charT,traits>& operator=(T const &item)
{
if (!first_elem && delimiter != 0)
*os << delimiter;
*os << item;
first_elem = false;
return *this;
}

infix_ostream_iterator<T,charT,traits> &operator*() {
return *this;
}
infix_ostream_iterator<T,charT,traits> &operator++() {
return *this;
}
infix_ostream_iterator<T,charT,traits> &operator++(int) {
return *this;
}
};
#endif

Using this, a call to join_print like this:

join_print(vect.begin(), vect.end(), std::cout, "|");

would be replaced with a call to copy, like this:

std::copy(vect.begin(), vect.end(),
std::infix_ostream_iterator<whatever>(std::cout, "|"));

While that's slightly longer, it's also considerably more flexible:
it's an ostream_iterator that can be used with any algorith that
would work with another ostream_iterator.

--
Later,
Jerry.

LR

unread,

Jul 3, 2009, 12:29:09 AM7/3/09

to

I'm not 100% sure this is correct, but IMHO this might be more readable
and doesn't use goto. Requires a bit of a rewrite of Token.

int parse() {
Token tok;

while(tok.gettoken() != END) {
while(!tok.shift()) {
while(!tok.reduce()) {
return ERROR;
}
}
}
return ACCEPT;
}

LR

Keith H Duggar

unread,

Jul 3, 2009, 12:43:54 AM7/3/09

to

On Jul 1, 10:21 pm, Francis Glassborow

<francis.glassbo...@btinternet.com> wrote:
> The point is that using goto stops programmers exploring the
> alternatives, often in the belief that those will be inferior to the
> goto solution.

And dogmatically preaching it not be used also stops programmers
from exploring alternative goto solutions that might be superior.

> I accept that goto can be useful for machine generated code (as long
> as you have no intention of trying to read and understand such code --
> i.e. you are going to treat such code as some form of opaque
> assembler. I also am not dogmatic that goto is necessarily bad but I
> issued a challenge 18 years ago for an example of goto (in C) that
> provided real benefit and have still to see one.

I provided an answer to your challenge more than 2 years ago. It
seems you ignored it? (Though I could have sworn you did respond
at some point.) The Kuzmin goto version has fewer operations and
variables than any goto-less version proposed so far.

http://groups.google.com/group/comp.lang.c++.moderated/msg/8d4661d9a4e74147

Consider Yevgeni Kuzmin's efficient cicrle-drawing algorithm (this
version below evaluates f on a radius r octant centered at (0,0)):

template < class F > inline
void circle_octant ( int r, F f )
{
int x = r ;
int y = 0 ;
int e = -r / 2 ;
if ( r & 1 ) --e , goto odd ;
even :
f(x,y) ;
if ( y >= x ) return ;
e += y++ ;
if ( e >= 0 ) e -= --x ;
odd :
f(x,y) ;
if ( y >= x ) return ;
e += ++y ;
if ( e >= 0 ) e -= --x ;
goto even ;
}

Now, I hope your "real benefit" requirement does not turn into a
"No true Scotsman" fallacy. In other words, fewer operations and
variables is a *real* benefit.

KHD

Francis Glassborow

unread,

Jul 3, 2009, 12:19:33 PM7/3/09

to

Jonathan Lee wrote:
>> I issued a challenge 18 years ago for an example of goto (in C)
>> that provided real benefit and have still to see one.
>
> Maybe you've seen this one before (below). Took it from the bottom of
> http://david.tribble.com/text/goto.html

Yes, and it does not convince me not least because it involves a
backward goto. If you already know what it does it is readable but if
you do not it isn't.

>
> I find the goto version much easier to read. I don't know if other
> people do, but because I do I think that's a (real) benefit. The
> example furthermore highlights an important fact to me: "while(true)
> {}" code is pretty silly when the programmer likely means "go to
> there". Yet it occurs often enough...

Depends how you read it. I am quite happy with the usage. It tells me at
once that the way out of this loop is going to be a break or a return
and I can look for it.

>
> --Jonathan
>
> // Goto version ------------------------------------------------
> int parse()
> {
> Token tok;
>
> reading:
> tok = gettoken();
> if (tok == END)
> return ACCEPT;
> shifting:
> if (shift(tok))
> goto reading;
> reducing:
> if (reduce(tok))
> goto shifting;
> return ERROR;
> }
>
> // Structured version -------------------------------------

Well this still has multiple return statements to irritate the SESE
aficionados :-)

> int parse() {
> Token tok;
>
> while (true) {
> tok = gettoken();
> if (tok == END) break;
> while (true) {
> if (shift(tok))
> break;
> if (!reduce(tok))
> return ERROR
> }
> }
> return ACCEPT;
> }
>

My version:

int parse() {
Token tok;
while (true) {
tok = gettoken();

if (tok == END) return ACCEPT;
while (true) {
if (not shift(tok))
if (not reduce(tok)) return ERROR
}
}
}

Now the SESE experts will hate that (but they will hate the other two
versions as well.) But to me, even if I did not know the problem domain
I can quickly recognise what is going on. Check if the next token marks
the end. If so stop and return that the data was acceptable. Otherwise
see if it has a valid meaning, possibly after reducing it in some way.
If not stop processing and return an error.

And my point is that you do not need to know anything about parsing in
order to read that code.

Frank Birbacher

unread,

Jul 3, 2009, 12:21:55 PM7/3/09

to

Hi!

LR schrieb:
> Jonathan Lee wrote:
>> // Goto version ------------------------------------------------
[snip]

Nice one. I like it. :)
It is easy to follow despite the goto.

> int parse() {
> Token tok;
>
> while(tok.gettoken() != END) {
> while(!tok.shift()) {
> while(!tok.reduce()) {
> return ERROR;
> }
> }
> }
> return ACCEPT;
> }

Yeah, also readable. Notice the innermost loop could as well be an
"if(!tok.reduce) return ERROR;"

Both versions are clear. I cannot say, the goto version is bad.

Frank

Francis Glassborow

unread,

Jul 3, 2009, 12:25:49 PM7/3/09

to

Keith H Duggar wrote:

> I provided an answer to your challenge more than 2 years ago. It
> seems you ignored it? (Though I could have sworn you did respond
> at some point.) The Kuzmin goto version has fewer operations and
> variables than any goto-less version proposed so far.

I do not remember it and had I replied I think you might have remembered
it, However let us not have a discussion about what happened then. And I
am not sure what metric you are using for 'fewer operations'.

>
> http://groups.google.com/group/comp.lang.c++.moderated/msg/8d4661d9a4e74147
>
> Consider Yevgeni Kuzmin's efficient cicrle-drawing algorithm (this
> version below evaluates f on a radius r octant centered at (0,0)):
>
> template < class F > inline
> void circle_octant ( int r, F f )
> {
> int x = r ;
> int y = 0 ;
> int e = -r / 2 ;
> if ( r & 1 ) --e , goto odd ;
> even :
> f(x,y) ;
> if ( y >= x ) return ;
> e += y++ ;
> if ( e >= 0 ) e -= --x ;
> odd :
> f(x,y) ;
> if ( y >= x ) return ;
> e += ++y ;
> if ( e >= 0 ) e -= --x ;
> goto even ;
> }
>
> Now, I hope your "real benefit" requirement does not turn into a
> "No true Scotsman" fallacy. In other words, fewer operations and
> variables is a *real* benefit.

My version:

template < class F > inline
void circle_octant ( int r, F f ){

int x = r ; // note that r needs to be larger than 0
int y = 0 ; // I am not sure whether both x and y
// should be unsigned
bool isodd = r & 1;
int e = -r/2 - isodd;
while (true);

f(x,y) ;
if ( y >= x ) return ;

e += y++ + isodd;
isodd = !isodd;

if ( e >= 0 ) e -= --x ;
}
}

In practice r will be fairly large (for some meaning of 'fairly' and
'large' :-) ) Now repeat process until y is greater than or equal to x.
The algorithm almost demands code repetition for efficiency but the
above code shows how we can avoid that if we wish. Of course if we do
not want to pay with a bool variable we can pay by writing the almost
repeat code.

Of course the SESE aficionados might like my code (some hope :-) )
because it uses a single return statement.

Frank Birbacher

unread,

Jul 3, 2009, 12:22:08 PM7/3/09

to

Hi!

Keith H Duggar schrieb:

> The Kuzmin goto version has fewer operations and
> variables than any goto-less version proposed so far.

> template < class F > inline

inline ?? This function is a little too big to inline it.

Here is my "goto"-less version. It is essentially the same, as the gotos
are just disguised behind the switch. I swapped the odd and even parts
in the loop.

template<typename F>
void circle_octant(const int r, F f)
{
int x=r, y=0, e=-r/2;

switch(r&1)
{
case 1:
--e;
while(true)
{
f(x,y);
if(y >= x) return;
e += y++;
if(e >= 0) e -= --x;
case 0:
f(x,y);
if(y >= x) return;
e += ++y;
if(e >= 0) e -= --x;
}
}
}

Frank

Francis Glassborow

unread,

Jul 3, 2009, 12:21:09 PM7/3/09

to

LR wrote:

> I'm not 100% sure this is correct, but IMHO this might be more readable
> and doesn't use goto. Requires a bit of a rewrite of Token.
>
> int parse() {
> Token tok;
>
> while(tok.gettoken() != END) {
> while(!tok.shift()) {
> while(!tok.reduce()) {
> return ERROR;
> }
> }
> }
> return ACCEPT;
> }
>

Interesting that this is almost the code I wrote. However my style is to
identify special cases (in this case the END token) and return directly.
The programmer does not then have to scan forward for the end of the loop.

Frank Birbacher

unread,

Jul 3, 2009, 12:22:06 PM7/3/09

to

Hi!

Nevin :-] Liber schrieb:

> Why copy it into a vector at all? That should be a decision the caller,
> not the callee, makes.

Well, the function fullfills just that purpose: copy array of char* to
vector of string. Not generic though, but beginners don't learn
"generic" first. Thus a good example for a FAQ.

But well, yes, copying an array to a vector doesn't make much of a
difference. A vector is a managed array. The argument array is also
"managed", you don't need to free it manually. Thus no gain.

> Better to return the a range of iterators, as in:
>
> return std::pair<char const*, char const*>(argv +1, argv + argc);

Ok, that code is more generic, but it looks like a no-op. There is not
much this function does. It neither does useful computation nor does it
encapsulate something. I think this function is useless. Plus: you
cannot even give its return value to the constructor of a std container
(yet) leading to more code at the caller.

Frank

--

Jerry Coffin

unread,

Jul 3, 2009, 12:25:52 PM7/3/09

to

In article <a338a0f9-2038-4bde-8973-
60cf35...@g19g2000yql.googlegroups.com>, dug...@alum.mit.edu
says...

[ ... ]

> I provided an answer to your challenge more than 2 years ago. It
> seems you ignored it? (Though I could have sworn you did respond
> at some point.) The Kuzmin goto version has fewer operations and
> variables than any goto-less version proposed so far.

As it stands right now, this produces precisely zero operations,
because this is not well formed C++.

[ ... ]

> template < class F > inline
> void circle_octant ( int r, F f )
> {
> int x = r ;
> int y = 0 ;
> int e = -r / 2 ;
> if ( r & 1 ) --e , goto odd ;

It appears that this was _intended_ to be equivalent to:
if (r&1) {
--e;
goto odd;
}

But, since 'goto' is a statement, not an expression, it can't be an
operand of the comma operator.

> even :
> f(x,y) ;
> if ( y >= x ) return ;
> e += y++ ;
> if ( e >= 0 ) e -= --x ;
> odd :
> f(x,y) ;
> if ( y >= x ) return ;
> e += ++y ;
> if ( e >= 0 ) e -= --x ;
> goto even ;
> }

You mention the number of variables, but at least to me, that seems a
relatively unimportant measurement, in itself. For msot people, speed
is what matters most. If there's a serious difference in memory
usage, that can make a difference, but it has to be fairly
substantial before it matters. Other than that, using an extra
varaible mostly becomes a concern when/if the variables used in an
inner loop won't all fit in registers, so the code ends up referring
to memory in each iteration -- but this is really only a problem
because it'll generally have a (strong) adverse effect on speed.

With that in mind, I'd consider the following code:

template <class F>
inline void circle_octant(int r, F &f) {

int x = r;
int y = 0;

int odd = r&1;
int e = (-r/2)-odd;
do {
f(x,y);
e += y++ + odd;
if (e >= 0)
e -= --x;
odd ^= 1;
} while (y < x);
f(x,y);
}

This does have one extra variable ('odd') but I have a hard time
believing that anybody would honestly consider that a major problem
in itself. In terms of speed, the two seem quite competitive.
Depending on the compiler, compiler options, and radius, either one
might come out faster than the other.

IMO, your code is harder to read. I'd expect it to produce larger
object code as a rule as well (and it does so in my testing). It
appears to provide little or no advantage to overcome those
shortcomings.

--
Later,
Jerry.

Keith H Duggar

unread,

Jul 4, 2009, 8:55:31 AM7/4/09

to

Francis Glassborow wrote:
> Keith H Duggar wrote:
> >http://groups.google.com/group/comp.lang.c++.moderated/msg/8d4661d9a4...

The inner loop of your solution is identical (not token by token
but in net operations) to Alf Steinbach's in the original thread
linked above.

This solution adds an additional variable 'isodd' and increases
the operation count by two: '!isodd' and '+ isodd'.

In fact it was exactly the elimination of a variable and the en-
coding of some operations by static code position rather than by
calculation that formed the crux of Kuzmin's optimizations.

> Of course if we do
> not want to pay with a bool variable we can pay by writing the almost
> repeat code.

Can you please show the code for what you are taking about here?

> Of course the SESE aficionados might like my code (some hope :-) )
> because it uses a single return statement.

Yes, and, your solution is also more compact. So one can imagine
scenarios (tiny cache, many inlines, etc) where it might even be
more optimal. However, the point I'm making here is that we have
a goto solution that gives tangible differences (fewer variables
and fewer operations) than any goto-less version so far.

KHD

Keith H Duggar

unread,

Jul 4, 2009, 8:55:32 AM7/4/09

to

Jerry Coffin wrote:

> Keith Duggar wrote:
> > I provided an answer to your challenge more than 2 years ago. It
> > seems you ignored it? (Though I could have sworn you did respond
> > at some point.) The Kuzmin goto version has fewer operations and
> > variables than any goto-less version proposed so far.
>
> As it stands right now, this produces precisely zero operations,
> because this is not well formed C++.
>

> > template < class F > inline
> > void circle_octant ( int r, F f )
> > {
> > int x = r ;
> > int y = 0 ;
> > int e = -r / 2 ;
> > if ( r & 1 ) --e , goto odd ;
>
> It appears that this was _intended_ to be equivalent to:
> if (r&1) {
> --e;
> goto odd;
> }
>
> But, since 'goto' is a statement, not an expression, it can't be an
> operand of the comma operator.

Yes, of course you are right. I don't know how that got in.

> > even :
> > f(x,y) ;
> > if ( y >= x ) return ;
> > e += y++ ;
> > if ( e >= 0 ) e -= --x ;
> > odd :
> > f(x,y) ;
> > if ( y >= x ) return ;
> > e += ++y ;
> > if ( e >= 0 ) e -= --x ;
> > goto even ;
> > }
>
> You mention the number of variables, but at least to me, that seems a
> relatively unimportant measurement, in itself. For msot people, speed
> is what matters most. If there's a serious difference in memory
> usage, that can make a difference, but it has to be fairly
> substantial before it matters. Other than that, using an extra
> varaible mostly becomes a concern when/if the variables used in an
> inner loop won't all fit in registers, so the code ends up referring
> to memory in each iteration -- but this is really only a problem
> because it'll generally have a (strong) adverse effect on speed.

First, all this speculation about what is or is not important in
scenario xyz on platform abc is, in my view, irrelevant. To meet
the challenge I need only provide a goto implementation that has
some tangible differences that may be advantageous in some cases
and which cannot be reproduced without goto. Fewer variables and
fewer operations are examples of such differences.

That said, I see that above you do recognize how fewer variables
can be an advantage even when considering only speed. In my view
those possibilities meet any reasonable notion of "real benefit"
given in the challenge.

> With that in mind, I'd consider the following code:
>
> template <class F>
> inline void circle_octant(int r, F &f) {
> int x = r;
> int y = 0;
> int odd = r&1;
> int e = (-r/2)-odd;
> do {
> f(x,y);
> e += y++ + odd;
> if (e >= 0)
> e -= --x;
> odd ^= 1;
> } while (y < x);
> f(x,y);
>
> }
>
> This does have one extra variable ('odd') but I have a hard time
> believing that anybody would honestly consider that a major problem
> in itself. In terms of speed, the two seem quite competitive.
> Depending on the compiler, compiler options, and radius, either one
> might come out faster than the other.
>
> IMO, your code is harder to read. I'd expect it to produce larger
> object code as a rule as well (and it does so in my testing). It
> appears to provide little or no advantage to overcome those
> shortcomings.

"believing ... honestly ... major ... seem ... might ... opinion
... appears" etc subjective points do not change the *objective*
fact that your solution, just like Francis' and Alf's before it,
has an additional variable and two extra operations in the inner
loop: '+ odd' and 'odd ^= 1'. Agreed? Which is more optimal in a
particular scenario/platform is a different matter.

KHD

--

Seungbeom Kim

unread,

Jul 4, 2009, 8:55:33 AM7/4/09

to

Francis Glassborow wrote:
>
> The point is that using goto stops programmers exploring the
> alternatives, often in the belief that those will be inferior to the
> goto solution.

IMO, they should explore all the options, including goto (with its
drawbacks, of course). We're not in a world where there are just
"use goto wherever possible" and "avoid goto completely" options.

> I accept that goto can be useful for machine generated code (as long
> as you have no intention of trying to read and understand such code --
> i.e. you are going to treat such code as some form of opaque
> assembler. I also am not dogmatic that goto is necessarily bad but I
> issued a challenge 18 years ago for an example of goto (in C) that
> provided real benefit and have still to see one.

I don't know the details of that challenge, but I find it surprised
that you haven't seen an example, especially in C. How do you do
cleanup in case of an error in the middle of successive allocation
of resources, when there are no destructors available?

s = socket(...);
if (s < 0) goto cleanup;

r = connect(...);
if (r < 0) goto cleanupS;

f = open(...);
if (f < 0) goto cleanupS;

while (...) {
if (some error) goto cleanupF;
}

cleanupF:
close(f);

cleanupS:
close(s);

cleanup:
/* other common stuff */
return result;

Certainly you could write a goto-less version (with nested ifs or some
code duplication), but I don't see a better a better alternative here.

I don't enjoy using goto in general, but I feel no guilt as I use it
if it clearly conveys my intention and the alternatives are not as
appealing.

--
Seungbeom Kim

Francis Glassborow

unread,

Jul 4, 2009, 4:55:30 PM7/4/09

to

Keith H Duggar wrote:

>
> The inner loop of your solution is identical (not token by token
> but in net operations) to Alf Steinbach's in the original thread
> linked above.
>
> This solution adds an additional variable 'isodd' and increases
> the operation count by two: '!isodd' and '+ isodd'.
>
> In fact it was exactly the elimination of a variable and the en-
> coding of some operations by static code position rather than by
> calculation that formed the crux of Kuzmin's optimizations.
>
>> Of course if we do
>> not want to pay with a bool variable we can pay by writing the almost
>> repeat code.
>
> Can you please show the code for what you are taking about here?

Here it is

template < class F > inline
void circle_octant ( int r, F f ){
int x = r ; // note that r needs to be larger than 0
int y = 0 ; // I am not sure whether both x and y
// should be unsigned

if(r & 1){
--e;
f(x,y);
e+= ++y;

}
while (true);
f(x,y) ;
if ( y >= x ) return ;

e += y++;

if ( e >= 0 ) e -= --x ;

f(x,y) ;
if ( y >= x ) return ;
e += ++y;
if ( e >= 0 ) e -= --x ;
}
}

Now if it is speed that we are interested in the above will certainly
run no slower because it requires two fewer if statements if r is odd
and the same number of executable statements when r is even. That is
because it uses the implied pre-conditions to the lead in case.

Now I have demonstrated that by avoiding use of goto I can produce code
that is either significantly more compact (and faster where limited
cache is available) or potentially marginally faster.

BTW on most systems the bool variable (isodd or odd) would be held in a
register and so would not result in any memory overhead. And I would not
bet that a really good optimising compiler would not eliminate it
altogether and generate the equivalent assembler to the version using goto.

I also noted that the code you wrote involved one goto to jump into a
loop and a second one to jump back to the start of the loop. That second
one makes the code that much harder to read and demonstrates that once a
programmer starts using goto its use tends to proliferate.

Francis Glassborow

unread,

Jul 4, 2009, 4:55:58 PM7/4/09

to

Seungbeom Kim wrote:
> Francis Glassborow wrote:

Yes and we are in the subjective area. What appeals to one person has
the reverse effect on another

s = socket(...);
if (s >= 0){
r = connect(...);
if (r >= 0){
f = open(...);
if (f >= 0){
while (...) {
if (some error) {
// process error;
break;
}
}
close(f);
}
}
close(s);

}
/* other common stuff */
return result;
}

And though the above code avoids using goto it does have that horrible
nest of if statements. And as you say, with destructors we can avoid
those. Of course the above code seems to assume that there is some
acceptable return value that can be preset before we start.

--

Message has been deleted

LR

unread,

Jul 4, 2009, 5:01:14 PM7/4/09

to

In which case, we might mention that speed is the thing that we're
optimizing for. Optimizations for space of the executable code and
speed may conflict. Optimizing for readability will likely conflict
with optimizations for space and speed.

> That said, I see that above you do recognize how fewer variables
> can be an advantage even when considering only speed. In my view
> those possibilities meet any reasonable notion of "real benefit"
> given in the challenge.

I agree. But would you agree that readability and space of the
executable are therefore less important in this particular case?

[snip counter example with an added variable, but seemed to me to...
...be more readable and have a smaller executable code]

> "believing ... honestly ... major ... seem ... might ... opinion
> ... appears" etc subjective points do not change the *objective*
> fact that your solution, just like Francis' and Alf's before it,
> has an additional variable and two extra operations in the inner
> loop: '+ odd' and 'odd ^= 1'. Agreed? Which is more optimal in a
> particular scenario/platform is a different matter.

So by "fewer operations" you mean at runtime, rather than the number of
compiled operations?

Given the constraints that we are facing, if we wish to eliminate gotos,
and that is the only goal, not to increase readability of the code, and
not to make the size of the exe smaller, then would the code below meet
your conditions for eliminating the goto without adding more variables?

// IMO less readable,
// likely takes more space,
// probably runs in the same time as circle_octant
//
// we don't have to use #define
//
#define UGLY(z)\
f(x,y);\
if(y>=x) return;\
e += z;\
if(e>=0) e -= --x
#define UGLY_ODD UGLY(++y)
#define UGLY_EVEN UGLY(y++)

template< class F > inline
void ugly_circle_octant(int r, F f)

{
int x = r ;
int y = 0 ;
int e = -r / 2 ;

if (r&1) {
--e;
UGLY_ODD;
}
while(true) {
UGLY_EVEN;
UGLY_ODD;
}

}

LR

unread,

Jul 5, 2009, 5:01:02 AM7/5/09

to

Francis Glassborow wrote:
> Keith H Duggar wrote:
>

[snippage]

>>
>>> Of course if we do
>>> not want to pay with a bool variable we can pay by writing the almost
>>> repeat code.
>> Can you please show the code for what you are taking about here?
> Here it is
> template < class F > inline
> void circle_octant ( int r, F f ){
> int x = r ; // note that r needs to be larger than 0
> int y = 0 ; // I am not sure whether both x and y
> // should be unsigned

int e = -r / 2 ;

> if(r & 1){
> --e;
> f(x,y);
> e+= ++y;

if ( e >= 0 ) e -= --x ; // I think required for r == 1
> }
//> while (true);
while(true) {

> f(x,y) ;
> if ( y >= x ) return ;
> e += y++;
> if ( e >= 0 ) e -= --x ;
> f(x,y) ;
> if ( y >= x ) return ;
> e += ++y;
> if ( e >= 0 ) e -= --x ;
> }
> }

This time it's for me to note that there is some similarity in our code.
I think yours was more readable. I took the liberty of making a few
minor corrections.

I'm wondering, since as you note, there is a requirement for r >= 1,
wouldn't it be a good idea to add if(r < 1) return;

LR

Keith H Duggar

unread,

Jul 5, 2009, 11:13:53 PM7/5/09

to

LR wrote:
> Francis Glassborow wrote:
> > Keith H Duggar wrote:
> > > Can you please show the code for what you are taking about here?
> > Here it is
>
> > template < class F > inline
> > void circle_octant ( int r, F f )
> > {
> > int x = r ;

> > int y = 0 ;

> > int e = -r / 2 ;
> > if(r & 1) {
> > --e ;
> > f(x,y) ;
> > e+= ++y ;

> > if ( e >= 0 ) e -= --x ; // required for r == 1
> > }
> > while (true) {

> > f(x,y) ;
> > if ( y >= x ) return ;
> > e += y++;
> > if ( e >= 0 ) e -= --x ;
> > f(x,y) ;
> > if ( y >= x ) return ;
> > e += ++y;
> > if ( e >= 0 ) e -= --x ;
> > }
> > }
>
> This time it's for me to note that there is some similarity in our code.
> I think yours was more readable. I took the liberty of making a few
> minor corrections.
>
> I'm wondering, since as you note, there is a requirement for r >= 1,
> wouldn't it be a good idea to add if(r < 1) return;

r >= 1 is not a requirement. r == 0 is, in fact, a valid radius
with the correct function call set for r == 0 being only {(0,0)}.
Furthermore, all the versions given so far (except for Jerry C's
do while) give the correct call set. So I'm not sure why Francis
added the "note that r needs to be larger than 0" comment?

KHD

Jerry Coffin

unread,

Jul 5, 2009, 11:13:24 PM7/5/09

to

In article <4456ce03-9338-45a2-a3b3-
2c2cee...@o6g2000yqj.googlegroups.com>, dug...@alum.mit.edu
says...

[ ... ]

> To meet
> the challenge I need only provide a goto implementation that has
> some tangible differences that may be advantageous in some cases
> and which cannot be reproduced without goto. Fewer variables and
> fewer operations are examples of such differences.

In other words, according to your "rules", you win by postulating a
mere possibility, and nobody else can win except by proving a
negative -- testing code with every _possible_ implementation of C++
that ever has or could be imagined, and showing superiority for their
code with every one.

That proves exactly one thing: that you refuse to admit it, no matter
how thoroughly you're proved wrong.

--
Later,
Jerry.

Keith H Duggar

unread,

Jul 6, 2009, 1:50:34 AM7/6/09

to

On Jul 4, 4:55 pm, Francis Glassborow

First, as LR has already pointed out in his response, one of the
if statements you eliminated is required for r == 1. Without it,
(ignoring the minor bugs LR also fixed) the code gives the wrong
function call set for r == 1. The correct set is {(1,0), (0,1)}.
After necessary minor corrections:

template < class F > inline
void circle_octant ( int r, F f )
{
int x = r ;

int y = 0 ;
int e = -r / 2 ;

if(r & 1) {
--e ;
f(x,y) ;
e+= ++y ;

// if ( e >= 0 ) e -= --x ; // required for r == 1

}
while (true) {
f(x,y) ;
if ( y >= x ) return ;
e += y++;
if ( e >= 0 ) e -= --x ;
f(x,y) ;
if ( y >= x ) return ;
e += ++y;
if ( e >= 0 ) e -= --x ;
}
}

gives {(1,0), (1,1)} instead. Uncommenting the line fixes it and
produces the exact same runtime operations except for the single
'if ( y >= x )'; but, it does so by adding additional statements
and the additional bloat buys us the smallest possible gain of 1
or 2 ops. Compare that to the goto version compared to the first
isodd solution where the goto bloat buys us 2*r reduction in run
time operations (ie a linear improvement).

> Now I have demonstrated that by avoiding use of goto I can produce code
> that is either significantly more compact (and faster where limited
> cache is available) or potentially marginally faster.

See above. The "marginally" is in reality "tiniest possible bit"
of at most 2 ops. So converting to a "structured" version gained
effectively nothing for the bloat (compared to say linear gains).

So I think it is accurate to say that to convert the goto version
to a "structured" version, we either have to pay with additional
runtime ops or code bloat, at least with the alternatives so far
for this goto example.

In fact, Knuth and Floyd proved Jacopini's conjecture that gotos
in a program of the form

L1 : if ( C1 ) goto L2 ;
S1 ;
if ( C2 ) goto L2 ;
S2 ;
goto L1 ;
L2 : S3 ;

cannot always be removed without additional runtime computation.
In fact, this circle algorithm is one example of another classic
goto flow called the "loop-and-a-half" since as Dijkstra pointed
out it loops "n and and half times":

L1 : S1 ;
if ( C1 ) goto L2 ;
S2 ;
goto L1 ;
L2 : S3 ;

that in general cannot be eliminated without code duplication as
your second version does or additional computation as your first
version did. Note that the interested reader should refer to the
classic Knuth paper:

"Structured Programming with go to Statements"
Computing Surveys, VOL 6, No. 4, December 1974
Donald E. Knuth

for his detailed response to the goto elimination camp including
several analyzed examples and many useful references.

> I also noted that the code you wrote involved one goto to jump into a
> loop and a second one to jump back to the start of the loop. That second
> one makes the code that much harder to read and demonstrates that once a
> programmer starts using goto its use tends to proliferate.

First, I disagree that the jump back makes the code any "harder"
to read. When I first read the paper that provides this code, it
was immediately clear and obvious what it does. Second, I do not
know what was in Yevgeni Kuzmin's brain when he decided to use a
goto for the jump back; perhaps it was consistency since one was
used for the half-jump; perhaps he thought utilizing well named
labels (EVEN and ODD) was clearer than misleading the reader by
using some infinite loop structure ( while(true) for(;;) ) for a
control that was in fact not infinite. I think concluding that a
small, carefully crafted piece of code demonstrates a "tendency"
on the part of the designer is a *huge* leap.

KHD

Keith H Duggar

unread,

Jul 6, 2009, 10:29:36 AM7/6/09

to

Jerry Coffin wrote:
> That proves exactly one thing: that you refuse to admit it, no matter
> how thoroughly you're proved wrong.

I hate to break it to you, Jerry, but Knuth already *proved* the
*fact* that certain goto structures cannot be transformed to the
"structured" form without introducing additional calculations or
code duplication. And, those great debates between such icons as
Dijkstra, Knuth, Hoare, Cooper, etc had *nothing to do* with any
platform or specific implementation or compiler etc. They argued
in terms of math, algorithms, arithmetic, etc.

That is why I'm trying to tell you that this implementation yada
is a strawman distraction. For my part it is entirely sufficient
and appropriate and more accurate to discuss this on the grounds
of operation count, code statements, etc.

Jerry Coffin wrote:
> Keith H Duggar wrote:

> > To meet
> > the challenge I need only provide a goto implementation that has
> > some tangible differences that may be advantageous in some cases
> > and which cannot be reproduced without goto. Fewer variables and
> > fewer operations are examples of such differences.
>
> In other words, according to your "rules", you win by postulating a
> mere possibility, and nobody else can win except by proving a
> negative -- testing code with every _possible_ implementation of C++
> that ever has or could be imagined, and showing superiority for their
> code with every one.

No, see above, you have entirely failed to understand my points.
I'm trying to tell you that whatever your favorite platforms and
compilers do is irrelevant. Instead, it is the *abstracted* code
properties that are relevant. Things such as operation count and
code size. Ie, the things that both common-sense and theory tell
you matter for general consideration.

For the specific case of the circle_octant implementations we've
seen here, the current comparison in abstract terms follows. The
runtime operation count for all implementations so far is linear
(roughly) in r:

rops(r) = R0 + R1 * r

while the static operation count (code size) is constant

cops(r) = C0

The rankings of the best algorithms so far (where 1 is best) is

constant R0 R1 C0
goto (switch) = 2 1 2
while compact = 2 2 1
while duplicate = 1 1 3

So it is objectively true that the goto version gives a mix that
cannot be replicated with "structured" code. Also, note that the
mix goto offers in English is as follows: "the goto solution and
the duplicate code while solution both offer optimal linear, R1,
constant; of these two solutions the goto is the most compact."

(Note, I'm treating Frank Birbacher's Duff's Device solution and
the goto as the same solution. It's always fun to see the Duff's
Device; but, I am hoping we don't have to argue about whether it
is "structured" or not ;-)

In my opinion, that mix (minimum linear op count, nearly minimum
constant op count, most compact of minimum R1) is very likely to
be the optimal choice in many real world applications. The while
compact version is likely the optimal choice when goto is not. I
do not see the while duplicate as a likely choice. However, this
paragraph is *irrelevant* to the objective arguments; it is just
here for opinion and color.

KHD

--

Francis Glassborow

unread,

Jul 6, 2009, 1:17:06 PM7/6/09

to

Keith H Duggar wrote:
> Jerry Coffin wrote:
>> That proves exactly one thing: that you refuse to admit it, no matter
>> how thoroughly you're proved wrong.
>
> I hate to break it to you, Jerry, but Knuth already *proved* the
> *fact* that certain goto structures cannot be transformed to the
> "structured" form without introducing additional calculations or
> code duplication. And, those great debates between such icons as
> Dijkstra, Knuth, Hoare, Cooper, etc had *nothing to do* with any
> platform or specific implementation or compiler etc. They argued
> in terms of math, algorithms, arithmetic, etc.

OK, in a sense you have 'proved' your point, well Knuth et al did so.
However there is an assumption built into the proof, it completely
ignores the fact that practical computers (as opposed to idealised
Turing Machines) use multi-level memory systems where performance is
drastically reduced when it becomes necessary to drop down a level.
Compact code maybe marginally slower when the expanded form does not
suffer from a cache miss but will be dramatically faster when it will
fit into cache and the expanded form won't. Now whether or not the code
fits in cache depends on two things, the size of the cache and the size
of the code.

That brings to mind a discussion I was involved with ...(no it does not
matter so I will not appeal to authority) where we were considering the
efficiency of various sorting and searching algorithms.
Suppose you have a 100 ints stored in a structure and need to determine
as fast as possible whether some int value is present. Which container
would you place them in and does it matter if they are sorted :-)

Where the data cache is large enough a linear search on data stored
unsorted in a std::vector will likely outperform the alternatives
because the data is simply loaded into cache and searched requiring an
average of 50 comparisons (which might even be carried out
simultaneously if you have enough cores/cpus) A binary search requires
fewer comparisons but requires that the data be sorted and also has
extra computational overheads (+ it cannot be done in parallel). Using
other data structures such as std::set seem attractive in theory but
fall over because of the lack of locality in the data resulting in
repeated cache misses.

Bob Lied

unread,

Jul 6, 2009, 6:36:34 PM7/6/09

to

The deep indents and tortured condition flags of traditional
structured programming drove me to an idiom for procedural
functions many years ago. Here's how I would pretend not to
use gotos:

int parse()
{
Token tok;

enum { reading, shifting, reducing, error, done } howToParse = reading;

do switch ( howToParse )
{
case reading:
tok = gettoken();
howToParse = (tok == END ? done : shifting);
break;

case shifting:
howToParse = ( shift(tok) ? reading : reducing );
break;

case reducing:
howToParse = ( reduce(tok) ? shifting : error );
break;

case error:
return ERROR;

} while ( howToParse != done );

Jerry Coffin

unread,

Jul 6, 2009, 8:47:03 PM7/6/09

to

In article <fbd7cc74-97eb-455a-ae8c-
ae432c...@y7g2000yqa.googlegroups.com>, dug...@alum.mit.edu
says...

> Jerry Coffin wrote:
> > That proves exactly one thing: that you refuse to admit it, no matter
> > how thoroughly you're proved wrong.
>
> I hate to break it to you, Jerry, but Knuth already *proved* the
> *fact* that certain goto structures cannot be transformed to the
> "structured" form without introducing additional calculations or
> code duplication.

First of all, the fact that you think you're "breaking it" to anybody
demonstrates little more than your own ignorance. His proof has been
well known for decades now.

The problem is with the conclusion you're trying to draw. You're
taking a proof of a very specific fact, and attempting to draw a MUCH
more general conclusion from it.

You have a couple of basic problems. First of all, the code you
posted really wasn't a particularly good example of the structure for
the Jacopini conjecture -- in particular, while it does fit the
requirements, it _already_ duplicates nearly all the code in the two
legs. As such, even though it's proven that the transformation to
structured code has to duplicate some code, in the SPECIFIC case you
posted, the structured version can actually duplicate LESS code than
what you posted!

Second, rather than claiming a theoretical difference (which is what
Knuth proved) you claimed "real benefit". That claim, however, relies
on the intersection of the theoretical difference with an arbitary,
quite possibly nonexistent, platform.

The intersection between "theoretical" and "uiqte possibly
nonexistent" doesn't necessarily lie on the real plane.

To qualify as a "real benefit" you have to do a great deal more than
show a difference that could theoreitcally lead to a benefit, and
then postulate that some platform could provide a benefit from that
difference. Quite the contrary, that is nearly a defining example of
a benefit that is only theoretical, NOT real!

> And, those great debates between such icons as
> Dijkstra, Knuth, Hoare, Cooper, etc had *nothing to do* with any
> platform or specific implementation or compiler etc. They argued
> in terms of math, algorithms, arithmetic, etc.

This is completely irrelevant. His claim was a purely theoretical
one. Your claim was of a "real benefit". Until a benefit can be
measured and demonstrated on a real platform, it's not real -- it's
theoretical.

> That is why I'm trying to tell you that this implementation yada
> is a strawman distraction. For my part it is entirely sufficient
> and appropriate and more accurate to discuss this on the grounds
> of operation count, code statements, etc.

While there are certainly cases where it's entirely appropriate to
deal purely in such abstract terms, discussion of a "real benefit" is
NOT such a case.

The rest of your analysis (using the word extremely loosely) is of no
relevance, even at best, so while it contains some mistakes, I won't
bother pointing them out because they're irrelevant.

--
Later,
Jerry.

Keith H Duggar

unread,

Jul 7, 2009, 3:19:07 PM7/7/09

to

On Jul 6, 1:17 pm, Francis Glassborow

<francis.glassbo...@btinternet.com> wrote:
> Keith H Duggar wrote:
> > Jerry Coffin wrote:
> >> That proves exactly one thing: that you refuse to admit it, no matter
> >> how thoroughly you're proved wrong.
>
> > I hate to break it to you, Jerry, but Knuth already *proved* the
> > *fact* that certain goto structures cannot be transformed to the
> > "structured" form without introducing additional calculations or
> > code duplication. And, those great debates between such icons as
> > Dijkstra, Knuth, Hoare, Cooper, etc had *nothing to do* with any
> > platform or specific implementation or compiler etc. They argued
> > in terms of math, algorithms, arithmetic, etc.
>
> OK, in a sense you have 'proved' your point, well Knuth et al did so.
> However there is an assumption built into the proof, it completely
> ignores the fact that practical computers (as opposed to idealised
> Turing Machines) use multi-level memory systems where performance is
> drastically reduced when it becomes necessary to drop down a level.

Sure. However it is also entirely possible and in fact common in
algorithmic analysis (at least these days) to posit a reasonable
model for cache behavior and analyze the algorithm against that.

> Compact code maybe marginally slower when the expanded form does not
> suffer from a cache miss but will be dramatically faster when it will
> fit into cache and the expanded form won't. Now whether or not the code
> fits in cache depends on two things, the size of the cache and the size
> of the code.

And the amortized cost of the cache fault compared to the faster
inner loop depends on how often the function is called for those
faults and the average value of r etc. Of course, these are real
trade-offs that people do analyze and measure probably everyday.

And, I believe that you and I both know there are real scenarios
in use today (in addition to reasonable modeled scenarios) where
the goto version is optimal. Do you agree? Does this satisfy the
"real benefit" criteria of your challenge? Or do you believe the
the goto version is sub-optimal in every (real) scenario?

> That brings to mind a discussion I was involved with ...(no it
> does not matter so I will not appeal to authority) where we were

Darn. I would like to have known and for my part I wouldn't have
taken it as appeal to authority since it's more anecdotal there.

> considering the efficiency of various sorting and searching algorithms.
> Suppose you have a 100 ints stored in a structure and need to determine
> as fast as possible whether some int value is present. Which container
> would you place them in and does it matter if they are sorted :-)
>
> Where the data cache is large enough a linear search on data stored
> unsorted in a std::vector will likely outperform the alternatives
> because the data is simply loaded into cache and searched requiring an
> average of 50 comparisons (which might even be carried out
> simultaneously if you have enough cores/cpus) A binary search requires
> fewer comparisons but requires that the data be sorted and also has
> extra computational overheads (+ it cannot be done in parallel). Using
> other data structures such as std::set seem attractive in theory but
> fall over because of the lack of locality in the data resulting in
> repeated cache misses.

Agreed, and very interesting. Do you happen to have a write-up
of the analysis available?

KHD

Jerry Coffin

unread,

Jul 8, 2009, 8:43:27 AM7/8/09

to

In article <fa95f4fd-979a-4b1d-bc87-4f434916d588
@j32g2000yqh.googlegroups.com>, dug...@alum.mit.edu says...

[ ... ]

> And, I believe that you and I both know there are real scenarios
> in use today (in addition to reasonable modeled scenarios) where
> the goto version is optimal. Do you agree?

Quite honestly, I rather doubt it.

The problem is that for it to be optimal, you need a CPU with a
rather strange combination of characteristics.

For the extra variable to mean anything, you need a CPU with fewer
than four registers. About the only thing that fits that is the
ancient 6502/6510/65xx series -- but I'm fairly sure there's no such
thing as a reasonably modern C++ compiler (e.g. that includes
templates) for such targets, so that pretty much eliminates your code
from consideration for that target.

For the 'odd ^= 1' to mean anything, you pretty much need a scalar
CPU. Neither the preceding and following instructions have any
dependency on this one, so on any sort of superscalar CPU, this one
will execute for free.

Except in a rather unusual case, the '+odd' will require a single
extra instruction. In the unusual case, you might be able to treat
'odd' as a 'carry in', so you just use an add-with-carry instruction
instead of a plain add instruction, eliminating any extra time or
instruction. The reason this is unusual is that the carry is normally
a flag that's affected by other instructions, so you can't normally
plan on using it for long-term storage. A few CPUs (e.g. the Alpha)
store flags in a user-specified register instead, so you might be
able to (I haven't written any assembly code for an Alpha in quite a
while though, so I don't remember the details well enough to say that
this _would_ work out, only that it seems like it might be possible).

In the other direction, however, your goto-based code also has both
an unconditional jump (to create the loop) AND a conditional jump (in
each branch of the code) for exiting from the loop. My do-loop based
code, on the other hand, has only one conditional branch to form the
loop. Your odd/even jump at the beginning won't be amenable to
accurate branch prediction unless the odd/even radius of the circles
you draw is fairly predictable.

Inside the loop, on a scalar CPU, we get two extra logic instructions
versus one extra branch instructions -- which will usually end up
about even, but if there's a difference it will usually be to favor
the logic, not the jump.

On a superscalar CPU, the execution of at least one and possibly both
logic instructions will be "hidden". At very best, the goto version
might come out about even here, but it'll usually be somewhat behind.

Outside the loop, the goto version has an unpredictable jump at the
beginning versus fall-through code in the do-while versions. With a
really short pipeline, that'll favor the fall-through code by
something like 2:1, with a longer pipeline and branch prediction, it
could favor the fall-through code by up to 15:1 or so.

> Does this satisfy the
> "real benefit" criteria of your challenge? Or do you believe the
> the goto version is sub-optimal in every (real) scenario?

I have a hard time coming up with a scenario where it's _likely_ to
be optimal. For it to gain any real advantage, the CPU seems to need
at least two of the following three characteristics:
1) Has a branch predictor
2) Does only scalar execution
3) Is _extremely_ register starved
With a sufficiently bad mismatch between the speed of registers and
memory, being sufficiently register starved _could_ give it an
advantage in the absence of the other two, but even that's would
require rather special circumstances. With such slow memory, the
larger code of the goto version would slow it down a great deal as
well -- so we'd also have to postulate something like a CPU with a
code cache, but no data cache. There are also some DSPs with
completely separate memory and busses for code and data, so if we
postulated the code memory being a lot faster than the data memory,
this could theoretically work.

All of those conditions are about the opposite of what generally
happens in reality though. Most designs add more registers long
BEFORE they add anything like superscalar execution or branch
prediction. Likewise, they usually add caching first. DSPs that have
separate memory for code and data typically devote MORE (often a lot
more) bandwidth to data than code, not the reverse.

One other possibility I considered was that the routine could be used
in a relatively special-purpose fashion. A reasonable CPU _could_
execute the loops themselves at about the same speed, so if we could
make that initial jump predictable, the goto version would stand a
reasonable chance of coming out about even. One way to do that would
be to draw the same size of circle (or at least sizes with the same
low-bit parity) in a row, before we did a different size.

Unfortunately, a bit of testing seems to show that, at least with the
compilers and CPUs I have handy, the goto version loses by around 10%
even in a scenario designed specifically to help it be as competitive
as possible. OTOH, that's close enough that I can pretty easily
believe combined with a different CPU, it could tie or come so close
nobody would care about the difference.

--
Later,
Jerry.

Dietmar Kuhl

unread,

Jul 9, 2009, 1:41:40 AM7/9/09

to

We're closing down this part of the conversation because it has become

too personal. We apologize for having let it get this far.

- The Moderators

--

Dietmar Kuhl

unread,

Jul 9, 2009, 1:41:48 AM7/9/09

to

We're closing down this part of the conversation because it has become

too personal. We apologize for having let it get this far.

- The Moderators

--

Keith H Duggar

unread,

Jul 14, 2009, 10:18:33 AM7/14/09

to

Jerry Coffin wrote:

> dug...@alum.mit.edu wrote:
> > And, I believe that you and I both know there are real scenarios
> > in use today (in addition to reasonable modeled scenarios) where
> > the goto version is optimal. Do you agree?
>
> Quite honestly, I rather doubt it.

[snip architecture assumptions and speculations]

If one insists on sticking to concrete architectures rather than
abstract C++, what is far more useful is measurement, instead of
incomplete speculation. For example profiling this code

http://www.duggar.org/pub/code/circle/circle.tgz

on two platforms gives these results

================================================================
relative time per call +/- 2% - higher is slower
radius range [08,32)
radius calls 2^23 x 8 runs
----------------------------------------------------------------
Intel Core Duo, g++ 4.0.1, shark profiler
................................................................
circle5 100% : Duggar modified Kuzmin removing uc jump
circle0 110% : Kuzmin

circle4 120% : Glassborrow triple duplicate forever

circle1 140% : Duggar while
circle6 145% : Duggar while + goto to remove f call

circle2 140% : Steinbach/Glassborrow forever
circle3 140% : Coffin do/while (broken for r==0)
----------------------------------------------------------------
PPC G4, g++ 3.3, gprof profiler
................................................................
circle5 100% : Duggar modified Kuzmin removing uc jump
circle0 105% : Kuzmin

circle4 100% : Glassborrow triple duplicate forever

circle1 115% : Duggar while
circle6 115% : Duggar while + goto to remove f call

circle2 130% : Steinbach/Glassborrow forever
circle3 130% : Coffin do/while (broken)
================================================================

Kuzmin goto is about 25% to 30% faster across the board than the
compact loops given in this thread and 10% - 30% faster than the
fastest compact loops I know of. Glassborrow triple duplicate is
the only goto-free solution to slip past Kuzmin and tie a faster
modified Kuzmin but only on PPC.

If anyone is interested, and decides to profile the code, please
share your results with us. Also, if you find problems with that
test code or have ideas for improving it for profiling purposes,
please let me know.

> Unfortunately, a bit of testing seems to show that, at least with the
> compilers and CPUs I have handy, the goto version loses by around 10%
> even in a scenario designed specifically to help it be as competitive
> as possible. OTOH, that's close enough that I can pretty easily
> believe combined with a different CPU, it could tie or come so close
> nobody would care about the difference.

Fortunately the profiling I have done periodically over the last
five years on multiple platforms using various profiling methods
(gprof, Quantify, shark, etc) consistently contradicts the claim
above.

Above I've given some examples collected today on two platforms.
I have also sampled these and similar codes on other systems for
many hours and for a variety of parameters. The Kuzmin goto code
consistently outperformed structured versions by upwards of 15%.

Can you provide the code you used in your "bit of testing"? Also
can you reconcile your speculations with these results? I've put
the assembler code .s files generated by g++ for both the tested
platforms in the same directory to help refine your assumptions.

http://www.duggar.org/pub/code/circle/circle.ppc.s
http://www.duggar.org/pub/code/circle/circle.itc.s

KHD

--

Jerry Coffin

unread,

Jul 14, 2009, 6:26:36 PM7/14/09

to

In article <6a10abb9-928d-4a7e-a7cd-
e422b8...@f30g2000vbf.googlegroups.com>, dug...@alum.mit.edu
says...

[ ... ]

> Can you provide the code you used in your "bit of testing"? Also
> can you reconcile your speculations with these results? I've put
> the assembler code .s files generated by g++ for both the tested
> platforms in the same directory to help refine your assumptions.

Sure:

#include <iostream>
#include <vector>
#include <time.h>

struct noshow {
mutable unsigned int x_, y_;
noshow() : x_(0), y_(0) {}
void operator()(int x, int y) const {
x_ += x;
y_ += y;
}
friend std::ostream &operator<<(std::ostream &os, noshow const
&n) {
return os << n.x_ << " " << n.y_ <<"\n";
}
};

template < class F >
inline void circle_octant ( int r, F &f )

{
int x = r;
int y = 0;
int e = -r / 2 ;

if ( r & 1 ) { --e; goto odd ; }

even :
f(x,y) ;
if ( y >= x ) return ;
e += y++ ;
if ( e >= 0 ) e -= --x ;
odd :
f(x,y) ;
if ( y >= x ) return ;
e += ++y ;
if ( e >= 0 ) e -= --x ;
goto even ;
}

template <class F>
inline void circle_octant2(int r, F &f) {

int x = r;
int y = 0;
int odd = r&1;
int e = (-r/2)-odd;
do {
f(x,y);
e += y++ + odd;
if (e >= 0)
e -= --x;
odd ^= 1;
} while (y < x);
f(x,y);
}

int main() {
int const max_radius = 1024;
int const min_radius = 8;
int const limit = 1000;

noshow noshow1, noshow2;
clock_t time1=0, time2=0;

for (int radius=min_radius; radius<max_radius; ++radius) {
clock_t start1 = clock();
for (int i=0; i<limit; ++i)
circle_octant(radius, noshow1);
clock_t stop1 = clock();
time1 += stop1 - start1;

clock_t start2 = clock();
for (int i=0; i<limit; ++i)
circle_octant2(radius, noshow2);
clock_t stop2 = clock();
time2 += stop2-start2;
}

std::cout << noshow1 << noshow2;
std::cout << "Time1: " << time1 << "\n";
std::cout << "Time2: " << time2 << "\n";

return 0;
}

Here are the results with MS VC++ 9, AMD 5200+:

2600493312 3726143080
2600493312 3726143080
Time1: 1365
Time2: 1135
2600493312 3726143080
2600493312 3726143080
Time1: 1261
Time2: 1207
2600493312 3726143080
2600493312 3726143080
Time1: 1285
Time2: 1183

Here are the results with Comeau, AMD 5200+:

D:\c\source>aout
2600493312 3726143080
2600493312 3726143080
Time1: 1374
Time2: 969

D:\c\source>aout
2600493312 3726143080
2600493312 3726143080
Time1: 1200
Time2: 1112

D:\c\source>aout
2600493312 3726143080
2600493312 3726143080
Time1: 1283
Time2: 1029

Here are the results with gcc, AMD 5200+:

Administrator@fc2 /cygdrive/d/c/source
$ ./circle1
2600493312 3726143080
2600493312 3726143080
Time1: 1283
Time2: 1170

Administrator@fc2 /cygdrive/d/c/source
$ ./circle1
2600493312 3726143080
2600493312 3726143080
Time1: 1370
Time2: 1083

Administrator@fc2 /cygdrive/d/c/source
$ ./circle1
2600493312 3726143080
2600493312 3726143080
Time1: 1315
Time2: 1138

I can't easily cut 'n paste the results from the other computers, but
I also tested on my a couple of Intel-based machines (my kid's Dell,
my Wife's iMac), my Wife's PowerPC G4-based ibook, and (just for
grins), on my wife's Treo and my old iPaq. The latter two required
different UI code (they don't support a command line at all), but the
test code itself remained identical.

As for reconciling your results, given the real facts of the
situation, I see no reason to even attempt to do so. What you tested
was NOT what either of us actually posted!

Since I prefer to have some built-in timing code, so you don't need
profilers and such to get results, I reworked things a bit more to
consolidate the timing code. In the process, I did change the code
from template functions to template functor classes. This seems to
have no effect on speed compared to the code above that uses them
exactly as posted, but makes it a bit simpler to deal with multiple
versions of the drawing code. Since it was easy, I added another of
your attempts along with Francis Glassborrow's code. Along with that,
since you seem to consider it important, I added a bit of code to
mine to deal with r==0:

#include <iostream>
#include <vector>
#include <time.h>

struct noshow {
mutable unsigned int x_, y_;
noshow() : x_(0), y_(0) {}
void operator()(int x, int y) const {
x_ += x;
y_ += y;
}
friend std::ostream &operator<<(std::ostream &os, noshow const
&n) {
return os << n.x_ << " " << n.y_ <<"\n";
}
};

template < class F >
struct circle0 {
void operator()(int r, F &f )

{
int x = r;
int y = 0;
int e = -r / 2 ;

if ( r & 1 ) { --e; goto odd ; }

even :
f(x,y) ;
if ( y >= x ) return ;
e += y++ ;
if ( e >= 0 ) e -= --x ;
odd :
f(x,y) ;
if ( y >= x ) return ;
e += ++y ;
if ( e >= 0 ) e -= --x ;
goto even ;
}

};

template <class F>
struct circle1 {
void operator()(int r, F &f) {
if (r==0)
return;

int x = r;
int y = 0;
int odd = r&1;
int e = (-r/2)-odd;
do {
f(x,y);
e += y++ + odd;
if (e >= 0)
e -= --x;
odd ^= 1;
} while (y < x);
f(x,y);
}

};

template <class F>
struct circle2 {
void operator()(int r, F &f)

{
int x = r ;
int y = 0 ;

bool o = r & 1 ;
int e = -r / 2 - o ;

for ( ; ; ) {

f(x,y) ;
if ( y >= x ) return ;

e += y + o ;
++y ;
o = !o ;

if ( e >= 0 ) e -= --x ;
}
}

};

template <class F>
struct circle3 {

void operator()(int r, F &f) {

int x = r ;
int y = 0 ;

int e = -r / 2 ;

if(r & 1) {
--e ;
f(x,y) ;

e+= ++y ;

if ( e >= 0 ) e -= --x ; // required for r == 1
}
while (true) {

f(x,y) ;
if ( y >= x ) return ;
e += y++;
if ( e >= 0 ) e -= --x ;

f(x,y) ;
if ( y >= x ) return ;
e += ++y;
if ( e >= 0 ) e -= --x ;
}
}

};

template <template <class D> class octant, class F>
clock_t timer(octant<F> &o, int radius, int reps, F &f) {
clock_t start = clock();
for (int i=0; i<reps; i++)
o(radius, f);
clock_t stop = clock();
return stop - start;
}

int main() {
int const max_radius = 1024;
int const min_radius = 8;
int const limit = 1000;

noshow noshow1, noshow2, noshow3, noshow4;

clock_t time1=0, time2=0, time3=0, time4 = 0;

for (int radius=min_radius; radius<max_radius; ++radius) {
time1 += timer(circle0<noshow>(), radius, limit, noshow1);
time2 += timer(circle1<noshow>(), radius, limit, noshow2);
time3 += timer(circle2<noshow>(), radius, limit, noshow3);
time4 += timer(circle3<noshow>(), radius, limit, noshow4);
}

std::cout << noshow1 << noshow2 << noshow3 << noshow4;
std::cout << "\nKeith's original:\t" << time1;
std::cout << "\nJerry's do/while:\t" << time2;
std::cout << "\nKeith's \"improved\":\t" << time3;
std::cout << "\nFrancis's while loop:\t" << time4;
return 0;
}

Strangely, adding the test for r==0 to my code actually makes it run
a bit faster (even though I'm never giving it a radius of 0, so that
branch is never taken -- apparently that test lets the oompiler
simplify some other part of the code, though I haven't looked at the
assembly output to figure out what). In terms of relative speed, this
doesn't make a whole lot of difference though:

c:\c\source>circle1test
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080

Keith's original: 1120
Jerry's do/while: 961
Keith's "improved": 1159
Francis's while loop: 1196
c:\c\source>circle1test
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080

Keith's original: 1084
Jerry's do/while: 965
Keith's "improved": 1158
Francis's while loop: 1194
c:\c\source>circle1test
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080

Keith's original: 1084
Jerry's do/while: 950
Keith's "improved": 1150
Francis's while loop: 1170

Changing the compiler options can change some of the rankings. For
example, if we don't use the SSE2 instruction set, but do use link
time code generation, Francis's code moves up to second place:

c:\c\source>circle1test
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080

Keith's original: 1161
Jerry's do/while: 938
Keith's "improved": 1198
Francis's while loop: 1007
c:\c\source>circle1test
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080

Keith's original: 1160
Jerry's do/while: 943
Keith's "improved": 1188
Francis's while loop: 1018
c:\c\source>circle1test
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080
2600493312 3726143080

Keith's original: 1187
Jerry's do/while: 948
Keith's "improved": 1165
Francis's while loop: 1051

I won't bore you more with pages of results from other compilers and
platforms, but the summary is pretty simple: my code is consistently
the fastest. Francis's code is the least consistent, varying all the
way from last place to a close second place.

If we only look at the best set of options for each piece of code, on
each platform, the results are more consistent: my code is the
fastest, Francis's is a close second place, and your two attempts are
nearly tied for last place, though the original is probably slightly
faster than your modified version (the modified version sometimes
wins, but the original wins quite a bit more often).

--
Later,
Jerry.

James Hopkin

unread,

Jul 15, 2009, 8:42:55 AM7/15/09

to

On Jul 14, 11:26 pm, Jerry Coffin <jerryvcof...@yahoo.com> wrote:
>
> As for reconciling your results, given the real facts of the
> situation, I see no reason to even attempt to do so. What you tested
> was NOT what either of us actually posted!
>

I'm probably missing something obvious, but could you explain what the
difference is between the your tests and Keith's tests? I can only see
the use of a profiler vs. embedded timing, yet each set of results is
calling in favour of who wrote them!

James

--

Jerry Coffin

unread,

Jul 15, 2009, 4:53:41 PM7/15/09

to

In article <466d0172-4011-4a70-9b77-96de0660fc62
@n11g2000yqb.googlegroups.com>, tasj...@gmail.com says...

>
> On Jul 14, 11:26 pm, Jerry Coffin <jerryvcof...@yahoo.com> wrote:
> >
> > As for reconciling your results, given the real facts of the
> > situation, I see no reason to even attempt to do so. What you tested
> > was NOT what either of us actually posted!
> >
>
> I'm probably missing something obvious, but could you explain what the
> difference is between the your tests and Keith's tests? I can only see
> the use of a profiler vs. embedded timing, yet each set of results is
> calling in favour of who wrote them!

I can't give a conclusive answer -- partly because he hasn't given
enough information that I could duplicate his testing. Partly also
because even if I could duplicate his testing, I'm not entirely sure
I'd trust the results anyway. Profilers are great for finding where
your code is spending most of its time. They're not quite so good for
measuring fairly small differences in tiny pieces of code like we're
dealing with here.

Among other things, profilers have something a bit like a Heisenberg
uncertainty principle, in that the act of observation always has some
effect on the results, but we never know exactly what that effect
might be. In typical cases it's small enough to give useful results
-- but this is nearly the worst possible kind of case. Specifically,
we're dealing with a truly minuscule routine (around 10 machine
instructions for the inner loop). The profiler adds a roughly fixed
amount of "noise" to the signal, but in this case the signal is so
small that the noise may well be larger than the signal we're trying
to measure.

As noted above, he also modified the code before testing it.
Specifying the "draw pixel" function in a template parameter often
improves the chances that the function will be produced inline
instead of producing an actual function call. For some completely
unknown reason, he chose to modify all the routines in question to
instead call a global function directly -- substantially increasing
the chance that we'll end up with a function call inside the inner
loop (slowing all of the routines down, but not necessarily all
equally).

Those are a few of the obvious possibilities. There are less obvious
ones, including some that stem from those, especially the lack of
information about compiler and options used. Even a seemingly minor
difference in something like code alignment could account for quite a
bit when we're dealing with such a minuscule loop that has to be
called extremely often to take long enough to time reasonably well.

--
Later,
Jerry.

Keith H Duggar

unread,

Jul 16, 2009, 3:48:09 AM7/16/09

to

Jerry Coffin <jerryvcof...@yahoo.com> wrote:
> Keith H Duggar wrote:

> > Can you provide the code you used in your "bit of testing"? Also
> > can you reconcile your speculations with these results? I've put
> > the assembler code .s files generated by g++ for both the tested
> > platforms in the same directory to help refine your assumptions.
>
> Sure:

Thanks for the effort. Unfortunately, I'm sorry to say that your
results are fundamentally flawed and useless, for the purpose of
comparing these functions at least. The critical problem is that
you did not disable inlining when you compiled the code.

When profiling toy programs, such as this, where the entire test
code (including multiple versions of very similar functions with
simple driver loops and stress functions) resides in one and the
same translation unit, inlining triggers destructive (for timing
purposes) optimizations that convolve the code beyond repair and
dash any hopes we have of measuring the algorithms of interest.

One can see the problems by looking at the assembly code. In the
particular case of your code some examples are 1) since the time
function loops limit times for a *constant* radius the optimizer
can lift code that only depends on radius (such as the if (r==0)
check you added to your do/while) outside of the limit test loop
2) since the stress function, f, is so simple specifically a few
adds, the optimizer can remove the adds for the f(x,y) call when
either x or y is zero (for example if one tries to fix your loop
by adding an f(0,0) call to the if (r==0) it is simply optimized
away) 3) the optimizer's heuristics can cause different inlining
decisions for equivalent code in different functions 4) code can
be shared across functions causing jumps across hundreds or more
bytes of code memory 5) etc.

Finally, there is another problem with your timings. You iterate
time about a thousand times and yet the total clock_t is roughly
only a 1000 or so. This means that the CLOCKS_PER_SEC on systems
you tested is too small (perhaps 100). Thus the entire time loop
runs in single clock tick or less on average. This can introduce
a systematic bias in the timings that depends on code structure.
This is another reason to use profilers as they address this.

I added the full set of functions and changed the numbering (and
descriptions) to match my earlier post. I also fixed some issues
with using clock() on systems with low resolution ticks and also
added sample normalization code to make sure large radius values
do not dominate the time sums. The clock() time results averaged
over eight runs (without inlining!) gives

================================================================
relative times - lower is faster
radius range [08,32) and [08,1024)
radius calls 1000 x 8 runs
----------------------------------------------------------------
Intel Core Duo, g++ 4.0.1, clock() timing
................................................................
circle0 100% 100% : Kuzmin
circle5 100% 98% : Duggar modified Kuzmin removing uc jump

circle4 102% 100% : Glassborow triple duplicate forever

circle1 119% 108% : Duggar while
circle6 109% 108% : Duggar while + goto to remove f call

circle2 109% 107% : Steinbach/Glassborow forever
circle3 153% 122% : Coffin do/while (broken for r==0)
----------------------------------------------------------------
PPC G4, g++ 3.3, clock() timing (may be inaccurate; slow clock)
................................................................
circle0 100% 100% : Kuzmin
circle5 102% 101% : Duggar modified Kuzmin removing uc jump

circle4 100% 106% : Glassborow triple duplicate forever

circle1 108% 105% : Duggar while
circle6 105% 105% : Duggar while + goto to remove f call

circle2 105% 113% : Steinbach/Glassborow forever
circle3 106% 102% : Coffin do/while (broken for r==0)
================================================================

Giving the nearly same ordering as my profiler measurements. The
narrowing of the gaps results from the classic problems of using
coarse clock() calls. For one it cannot distinguish between self
and children time so the times now include timer loop iteration,
f body time, etc.

> As for reconciling your results, given the real facts of the
> situation, I see no reason to even attempt to do so. What you
> tested was NOT what either of us actually posted!

What ever are you talking about?? Here is the code you posted:

template <class F>
inline void circle_octant(int r, F &f) {

int x = r;
int y = 0;
int odd = r&1;
int e = (-r/2)-odd;
do {
f(x,y);
e += y++ + odd;
if (e >= 0)
e -= --x;
odd ^= 1;
} while (y < x);
f(x,y);
}

and here is the code take from the file I posted

void circle3 ( int r )

{
int x = r;
int y = 0;

int odd = r&1;
int e = (-r/2)-odd;
do {
f(x,y);
e += y++ + odd;
if (e >= 0)
e -= --x;
odd ^= 1;
} while (y < x);
f(x,y);
}

and that is identical except for the function header! As pointed
out before inlining must be disabled to obtain accurate results.
And, surely you realize the extra f argument, and especially the
missing template keyword are trivial differences in this context
here of comparing the function body performance? Although before

you were hung up on the template vs non-template when you wrote:

> ancient 6502/6510/65xx series -- but I'm fairly sure there's no such
> thing as a reasonably modern C++ compiler (e.g. that includes
> templates) for such targets, so that pretty much eliminates your code
> from consideration for that target.

which of course was just a trivial nit-pick so maybe that's it?

As for Francis' duplicate version, it is also identical to what
he posted except for necessary bug fixes! And his first compact
while version is only cosmetically different from the Steinbach
version posted two years ago (ex while(true) vs for(;;)). Since
I already had Steinbach's version I just left as is.

So this:

> What you tested was NOT what either of us actually posted!

has no basis that I can see.

> Since I prefer to have some built-in timing code, so you don't need
> profilers and such to get results, I reworked things a bit more to

That's fine. We can use either. Doesn't Visual C++ have profiler
tools that might be decent? However, as mentioned earlier, there
are a number issues to consider when using clock() and your code
did not address them.

As to your new code, it doesn't compile because it tries to bind
non-const references to temporaries in the lines such as

> time1 += timer(circle0<noshow>(), radius, limit, noshow1);

I fixed this by making the operator() const and changing time to
take them by const &. I also changed the function names to match
my initial file (to avoid future confusion such as the confusion
you have about which version is who's), changed the descriptions
because they are wrong, and added the variants that are actually
mine (circle1, 5, and 6). This file is located at

http://www.duggar.org/pub/code/circle/coffin.tgz

> Keith's original: 1120
> Jerry's do/while: 961
> Keith's "improved": 1159
> Francis's while loop: 1196

You have confused the owners. Your time3/circle2 is NOT "Keith's
improved" anything. It is the Steinbach/Glassborow forever loop,
despite cosmetic differences between the two. timer4/circle3 is
Francis' *second* triple duplicate forever loop. Finally, time1
circle0 is Kuzmin's original, not mine. Please be more careful.

> Strangely, adding the test for r==0 to my code actually makes it run
> a bit faster (even though I'm never giving it a radius of 0, so that

That should have been a red flag that something was amiss. See
the comments about inlining at the begging of this post.

> branch is never taken -- apparently that test lets the oompiler
> simplify some other part of the code, though I haven't looked at the
> assembly output to figure out what). In terms of relative speed, this
> doesn't make a whole lot of difference though:

It makes a big difference on Intel. On the PPC the compiler gets
the static branch prediction right reducing the cost to zero for
the radius > 0 cases.

> I won't bore you more with pages of results from other compilers and
> platforms, but the summary is pretty simple: my code is consistently
> the fastest. Francis's code is the least consistent, varying all the
> way from last place to a close second place.
>
> If we only look at the best set of options for each piece of code, on
> each platform, the results are more consistent: my code is the
> fastest, Francis's is a close second place, and your two attempts are
> nearly tied for last place, though the original is probably slightly

> faster than your modified...

Why do you keep referring to Kuzmin's algorithm as my attempt?
You have included none of my attempts (circle1, 5, 6 in the web
file) in your test code. Perhaps if we keep the numbering fixed
to match my file we will have less confusion.

Anyhow, the results were bogus because of inlining convolutions.
Once that is turned off the timings with your code (after fixes)
are consistent with the profiling results posted previously. The
Kuzmin and modified Kuzmin (1,5) outperform all others excepting
Glassborow duplicate (4) in certain scenarios.

The new Coffin do/while is really falling behind here. Probably
because of the new extra branch (that wasn't actually tested in
your trials because the inlining allowed the optimizers to lift
the check out of the time loop into the outer radius loops) and
it still produces the wrong f call set. As said in other posts,
it should call f(r,0) when r == 0. If does fine on PPC; but who
knows how it will do once it actually gives the correct f calls.

KHD

--

Keith H Duggar

unread,

Jul 16, 2009, 3:49:44 AM7/16/09

to

Jerry Coffin wrote:

> James Hopkin wrote:
> > On Jul 14, 11:26 pm, Jerry Coffin <jerryvcof...@yahoo.com> wrote:
>
> > > As for reconciling your results, given the real facts of the
> > > situation, I see no reason to even attempt to do so. What you tested
> > > was NOT what either of us actually posted!
>
> > I'm probably missing something obvious, but could you explain what the
> > difference is between the your tests and Keith's tests? I can only see
> > the use of a profiler vs. embedded timing, yet each set of results is
> > calling in favour of who wrote them!
>
> I can't give a conclusive answer -- partly because he hasn't given
> enough information that I could duplicate his testing. Partly also

Um, I gave you the code, the platforms, the Makefile, the params
used for testing and even some shell scripts that I used.

> because even if I could duplicate his testing, I'm not entirely sure
> I'd trust the results anyway. Profilers are great for finding where
> your code is spending most of its time. They're not quite so good for
> measuring fairly small differences in tiny pieces of code like we're
> dealing with here.

Using clock() the way you used it on the systems you measured is
much worse. See my comments regarding this in my other response.

> Among other things, profilers have something a bit like a Heisenberg
> uncertainty principle, in that the act of observation always has some
> effect on the results, but we never know exactly what that effect
> might be.

As if inserting clock() calls does not have this same problem?

> In typical cases it's small enough to give useful results
> -- but this is nearly the worst possible kind of case. Specifically,
> we're dealing with a truly minuscule routine (around 10 machine
> instructions for the inner loop). The profiler adds a roughly fixed
> amount of "noise" to the signal, but in this case the signal is so
> small that the noise may well be larger than the signal we're trying
> to measure.

Using clock() as coarsely as your code does adds the looping and
children times into the mix drowning out part of the signal.

> As noted above, he also modified the code before testing it.
> Specifying the "draw pixel" function in a template parameter often
> improves the chances that the function will be produced inline

Why do you believe a template parameter has any better chance of
being inlined?

> instead of producing an actual function call. For some completely
> unknown reason, he chose to modify all the routines in question to
> instead call a global function directly -- substantially increasing
> the chance that we'll end up with a function call inside the inner
> loop (slowing all of the routines down, but not necessarily all
> equally).

Given that inlining was disabled, the chance of call inside inner
loops (in fact everywhere f(x,y) appears) is 100%. Thus, template
makes no difference. The global function was chosen to reduce the
overhead as much as possible (short of invasive changes) in order
to make the code conveniently faster for testing. (Since I used a
profiler, the execution time of the f(x,y) function itself is not
relevant since the profiler can break out children times. This is
not the case for the Jerry's clock() based which convolves timing
of the circle functions with both their children and calling loop
executions times.)

We must realize that inlining is our enemy not friend when trying
to profile nearly identical algorithms in a combined toy program.
Inlining together with the optimizations it allows convolutes the
code beyond recovery (even for every profiler I've tried).

It's fine to inline when you are trying to optimize a piece of an
actual program since that is the actual context in which the code
will ultimately run. But not for these toy scenarios.

By the way, the purpose of the template in the original post was
just for clarity of presentation. I noticed for example that you
changed it from my original F f, to F & f. Ultimately, these are
cosmetic differences that don't matter for our purpose.

> Those are a few of the obvious possibilities. There are less obvious
> ones, including some that stem from those, especially the lack of
> information about compiler and options used. Even a seemingly minor

The Makefile was included in the download. I guess you didn't see
it? The Makefile also indicates that inlining should be disabled.

> difference in something like code alignment could account for quite
> a bit when we're dealing with such a minuscule loop that has to be
> called extremely often to take long enough to time reasonably well.

Well the clock() resolution on the systems you tested was far too
low for the 1000 sample limit you hardcoded. Which reminds me, in
my other post the PPC samples are 4000 x 8 not 1000 x 8 since the
clock there was also low resolution requiring more samples.

KHD

--

Jerry Coffin

unread,

Jul 17, 2009, 10:52:20 AM7/17/09

to

In article <86058662-2dfd-4be1-9052-093f28fd5083
@t13g2000yqt.googlegroups.com>, dug...@alum.mit.edu says...

[ ... ]

> Thanks for the effort. Unfortunately, I'm sorry to say that your
> results are fundamentally flawed and useless, for the purpose of
> comparing these functions at least. The critical problem is that
> you did not disable inlining when you compiled the code.

You seem to have things exactly backwards. Disabling inlining renders
your effort entirely useless.

The _first_ and most crucial step in any sort of benchmark is for
that benchmark to imitate real usage as closely as possible.
Disabling inlining violates that, and renders the entirety of the
benchmark meaningless.

Doing a "better" job of timing code that's irrelevant doesn't change
the fact that what you're timing is irrelevant.

--
Later,
Jerry.

Keith H Duggar

unread,

Jul 17, 2009, 11:03:18 AM7/17/09

to

On Jul 16, 3:48 am, Keith H Duggar <dug...@alum.mit.edu> wrote:
> Jerry Coffin <jerryvcof...@yahoo.com> wrote:
> > Keith H Duggar wrote:
> > > Can you provide the code you used in your "bit of testing"? Also
> > > can you reconcile your speculations with these results? I've put
> > > the assembler code .s files generated by g++ for both the tested
> > > platforms in the same directory to help refine your assumptions.
>
> > Sure:
>
> Thanks for the effort. Unfortunately, I'm sorry to say that your
> results are fundamentally flawed and useless, for the purpose of
> comparing these functions at least. The critical problem is that
> you did not disable inlining when you compiled the code.

Ok. So I have little doubt that there are possibly many skeptics
out there thinking that my pervious points that inline confounds
profiling (unless one is very careful) are just me blowing smoke
because I "refuse to admit [I'm wrong], no matter how thoroughly
[I'm] proved wrong" as Jerry said.

With this post I hope to demonstrate empirically that I am right
with respect to inlining by providing two codes designed to give
accurate results with inlining and by showing additional results
with and without inlining using these new programs that are both
consistent with each other and consistent with my previous tests.

The source codes, Makefiles, generated assembly, reports, report
generating scripts, driver scripts, and this post are available
in these two files

http://www.duggar.org/pub/code/circle/circus.tgz
http://www.duggar.org/pub/code/circle/cirper.tgz

Below are the results. The clock on the PPC I have is only 100Hz
so sampling was increased to 5000 to get passable (but not great)
reliability, and radius reduced to 256 (to make the testing time
bearable).

================================================================
Legend
----------------------------------------------------------------
alg : the circle octant algorithm

c0 : Kuzmin goto
c1 : Duggar while
c2 : Steinbach/Glassborow forever
c3 : Coffin do/while (fixed for r==0)
c4 : Glassborow triple duplicate forever
c5 : Duggar modified Kuzmin goto to remove uc jump
c6 : Duggar while + goto

circus.cpp :
a much stripped down version of a real usage of the algorithm;
in the real program it is used to accumulate along the circular
path of a game board. Note the change from Jerry's "nowshow" to
this is because noshow was a little too simple so it was being
optimized away under some conditions for example f(0,0). This
new function is almost as simple yet is not optimized away and
it's stripped from a actual and fun usage of the algorithm. For
similar reasons, a random salt was added to the radius in both
circus and cirper to prevent the optimizer from taking advantage
of the constant radius during the inner timing loop.

cirper.cpp :
the most minimal version of the algorithm; it simply gives the
digital perimeter of the octant which is the final y + 1 (that
is for a Moore neighborhood).

no-inline : compiler inlining disabled

inlining : compiler inlining enabled and verified.

splt :
each circle algorithm was compiled separately into a different
executable for each.

comb :
all circle algorithms were combined into a single monolithic
executable like the original Coffin test code.

================================================================

============================================

Intel Core Duo, g++ 4.0.1, clock() timing

radius range [08,1024)

radius calls 1000 x 8 runs
--------------------------------------------

circus.cpp cirper.cpp
------------------- -------------------
no-inline inlining no-inline inlining
--------- --------- --------- ---------
alg splt comb splt comb splt comb splt comb
--------------------------------------------
absolute clocks/1000 +/- 2%, lower is faster
--------------------------------------------
c0 806 776 777 814 1214 1215 1184 1070
c1 1205 1175 851 900 1252 1249 1248 1258
c2 1036 1023 877 1035 1621 1620 1602 1647
c3 1698 1697 1196 987 1261 1261 1262 1264
c4 720 708 1389 1374 1107 1105 1128 1127
c5 741 738 931 609 1066 1068 1140 1307
c6 1201 1172 863 883 1252 1249 1248 1251
--------------------------------------------
relative clocks(%c0) +/- 2%, lower is faster
--------------------------------------------
c0 100 100 100 100 100 100 100 100
c1 149 151 109 111 103 103 105 118
c2 128 132 113 127 133 133 135 154
c3 211 219 154 121 104 104 107 118
c4 89 91 179 169 91 91 95 105
c5 92 95 120 75 88 88 96 122
c6 149 151 111 108 103 103 105 117
============================================

============================================

PPC G4, g++ 3.3, clock() timing

radius range [08,256)
radius calls 5000 x 8 runs
--------------------------------------------
circus.cpp cirper.cpp
------------------- -------------------
no-inline inlining no-inline inlining
--------- --------- --------- ---------
alg splt comb splt comb splt comb splt comb
--------------------------------------------
absolute clocks/1000 +/- 8%, lower is faster
--------------------------------------------
c0 41 41 26 29 90 90 91 97
c1 48 48 30 29 119 114 116 114
c2 52 52 32 32 104 112 100 101
c3 51 52 26 27 121 111 111 118
c4 40 40 10 10 94 99 99 94
c5 39 40 36 41 96 95 95 93
c6 47 48 43 30 118 114 130 128
--------------------------------------------
relative clocks(%c0) +/- 8%, lower is faster
--------------------------------------------
c0 100 100 100 100 100 100 100 100
c1 118 117 114 102 132 126 127 118
c2 128 126 124 114 116 124 110 104
c3 126 127 101 94 135 123 123 122
c4 99 99 40 35 105 109 109 97
c5 97 99 138 143 106 106 105 96
c6 115 118 168 103 132 127 143 132
============================================

We can see several important points.

1) With no inlining the results are consistent across both split
executables and the combined monolithic executable demonstrating
my point that without inlining we do not have to worry nearly as
much about the context the code is running in.

2) With inlining, the split and combined executable produce very
different results. For example, see the large ( >= 20% ) changes
on Intel circus c3 and c5 and cirper c2 and c5 and on PPC circus
c6. These large changes are solely from being combined with many
circle functions into one big executable! This demonstrates that
inlining combined with optimizations can contort toy programs in
strange ways to produce convolved timings having no relationship
to how the functions will behave in a real context.

3) Comparing no-inline vs inlining we see the absolute clock are
are very close across the board. This indicates that the circleB
correction code for baseline overhead (such as myrandom and salt
computation, circle call overhead, f(x,y) body and call overhead
etc) is doing a good job of recovering circle self-time. This is
a kind of poor man's profiler to help correct for the deficiency
of using coarse sampling of clock. Note, this is just a measured
offset that is fixed for each category. Thus, it does not change
relative order it only increases the spread. Without this offset
the no-inline vs inlining absolute clocks would have been off by
a factor of three for circus.cpp runs reducing profiling signal.

4) Comparing cirper no-inline splt and comb to inlining splt, we
see that inlining has very little effect on the absolute clocks.
This confirms that cirper circle functions are minimal functions
with little to no internal call overhead.

Finally, we again see that only c4 and c5 outperform c0, in some
scenarios. Likewise c1, c2, c6 are all excellent compact choices
depending on the specific scenario. c2 seems to unduly suffer in
the cirper scenario on Intel; but, I have not had time to figure
that out yet. c3 continues to suffer on the Intel when an actual
f(x,y) is present in the new if(r==0) branch; but, it does ok on
PPC, where static branch prediction helps as long as r!=0.

At this point, the only remaining thing to say is that "all your
base are belong to Goto" ;-)

Keith H Duggar

unread,

Jul 17, 2009, 5:26:43 PM7/17/09

to

On Jul 17, 10:52 am, Jerry Coffin <jerryvcof...@yahoo.com> wrote:
> In article <86058662-2dfd-4be1-9052-093f28fd5083
> @t13g2000yqt.googlegroups.com>, dug...@alum.mit.edu says...
>
> [ ... ]
>
> > Thanks for the effort. Unfortunately, I'm sorry to say that your
> > results are fundamentally flawed and useless, for the purpose of
> > comparing these functions at least. The critical problem is that
> > you did not disable inlining when you compiled the code.
>
> You seem to have things exactly backwards. Disabling inlining renders
> your effort entirely useless.

See the recent results I posted that empirically show you are
wrong. Also verify my previous analysis against the assembly.
It also shows clearly and objectively that you are wrong.

> The _first_ and most crucial step in any sort of benchmark is for
> that benchmark to imitate real usage as closely as possible.

If you have a real program then sometimes, yes. You did not have
a real program but rather a big toy monolith of many variants of
the algorithms being tested all compiled together. Not only that
but your stress function was too simple and was at times getting
unrealistically optimized away. Similar things were happening to
the timing loop because of the constant r etc. Just look at the
assembly and/or the data I just posted. You will be able to see
the problems with your inlined toy monolith.

> Disabling inlining violates that, and renders the entirety of the
> benchmark meaningless.

Not when you are careful and design the benchmarks correctly and
examine the assembly to understand what is happening. Also using
a profiler can help avoid some of these problems.

> Doing a "better" job of timing code that's irrelevant doesn't change
> the fact that what you're timing is irrelevant.

You have it backwards there. It was your monolithic toy that
was/is irrelevant. I gave both detailed explanations and now
empirical data to show this. If you still do not accept this
take a look at the assembly yourself; there it is written in
black and white (or whatever colors your editor renders) the
reasons why your toy monolith was broken and irrelevant.

Or do you believe that your monolith is more realistic than
separate compilations of each algorithm? And do you believe
that your noshow with simple addition only (s+=x+y) is more
realistic than my BoardCheck that at least has to look some
random data up (s+=data[x][y]) and cannot be optimized away?
And do you think it's realistic to optimize the timing loop
as having constant r (an arbitrary choice made to make time
sampling easier and more useful)?

KHD

--

Jerry Coffin

unread,

Jul 18, 2009, 10:53:50 PM7/18/09

to

In article <3c3d2187-e415-41e6-a50c-ab8b533accf0
@f10g2000vbf.googlegroups.com>, dug...@alum.mit.edu says...

[ ... ]

> See the recent results I posted that empirically show you are
> wrong. Also verify my previous analysis against the assembly.
> It also shows clearly and objectively that you are wrong.

I already looked at them, and see nothing substantially new or
different from them. They don't support your conclusions.

[ ... ]

> Not when you are careful and design the benchmarks correctly and
> examine the assembly to understand what is happening.

Apparently you didn't do that. Just for one example, I recompiled the
code with the minimum and maximum radii coming from argv instead of
hard-coded. Contrary to your claims about the compiler lifting
(something) out of the loop because the limits were constants, the
code for my routine didn't change at all -- not even one instruction
was affected in any way, shape, form, or fashion.

[ ... ]

> Or do you believe that your monolith is more realistic than
> separate compilations of each algorithm?

I think separate compilation of each algorithm makes no difference
whatsoever. I also tested that, and proved precisely that.

> And do you believe
> that your noshow with simple addition only (s+=x+y) is more
> realistic than my BoardCheck that at least has to look some
> random data up (s+=data[x][y]) and cannot be optimized away?

Looking up random data in such a case makes absolutely no sense at
all. Keep in mind that this is a routine for rasterizing a circle.
The instances of looking up some random data for points that lie on a
circle are sufficiently rare that doing so makes no sense at all. The
primary, and for all practical purposes sole, purpose for rasterizing
a circle is to draw things at those points.

In any case, it makes no difference: the only time it could get
optimized away would be for the r==0 case, in which case it SHOULD be
optimized away. Doing otherwise takes a circle that really has a
diameter of 0 and draws it as if it had a diameter of 1 instead.
While it may be _excusable_ to do such a thing as a cheap and dirty
optimization, it's clearly not the _right_ thing at all.

> And do you think it's realistic to optimize the timing loop
> as having constant r (an arbitrary choice made to make time
> sampling easier and more useful)?

What in the world do you think you're talking about? In my code, r
varied from 8 to 1024. The limits were hard-coded as constants in
that code, but a quick check shows that 1) the test for r==0 was
intact in the generated code, and 2) changing main to receive the
minimum and maximum radii from an external source (argv) did not
affect a single instruction of the code generated for the circle
drawing routine.

At the risk of it sounding like a personal attack, it appears to me
that my original assessment was correct: you've clearly been proved
wrong, and you're grasping at any available straw to avoid admitting
it (and now you've run out of straws to grasp, and started to post
about purely imaginary ones).

--
Later,
Jerry.

Keith H Duggar

unread,

Jul 20, 2009, 5:55:58 AM7/20/09

to

{ Can we turn the heat down a little here, thanks. -mod }

On Jul 18, 10:53 pm, Jerry Coffin <jerryvcof...@yahoo.com> wrote:
> In article <3c3d2187-e415-41e6-a50c-ab8b533accf0
> @f10g2000vbf.googlegroups.com>, dug...@alum.mit.edu says...
>
> [ ... ]
>
> > See the recent results I posted that empirically show you are
> > wrong. Also verify my previous analysis against the assembly.
> > It also shows clearly and objectively that you are wrong.
>
> I already looked at them, and see nothing substantially new or
> different from them. They don't support your conclusions.

Which of the conclusions? I'm asking because it seems to me that
the data clearly supports many, perhaps all, of the claims I've
made in this entire topic.

> [ ... ]
>
> > Not when you are careful and design the benchmarks correctly and
> > examine the assembly to understand what is happening.
>
> Apparently you didn't do that. Just for one example, I recompiled the
> code with the minimum and maximum radii coming from argv instead of
> hard-coded. Contrary to your claims about the compiler lifting
> (something) out of the loop because the limits were constants, the
> code for my routine didn't change at all -- not even one instruction
> was affected in any way, shape, form, or fashion.

Apparently you completely misunderstood what I said. The problem
wasn't that rmin/rmax was constant, it is that r is held constant
inside your timer loop. Add the random salt and then you will see
what it fixes. So try again by looking in the right place.

I think the only comment I made about rmin/rmax atoi etc, is that
hardcoded constants *can* cause problems (and are less convenient
for testing) and that was in some source code comment I think.

> [ ... ]
>
> > Or do you believe that your monolith is more realistic than
> > separate compilations of each algorithm?
>
> I think separate compilation of each algorithm makes no difference
> whatsoever.

You claim you looked at my recent results

http://groups.google.com/group/comp.lang.c++.moderated/msg/3ac2368e485e740d

where I specifically wrote

Keith H Duggar wrote:
> 2) With inlining, the split and combined executable produce very
> different results. For example, see the large ( >= 20% ) changes
> on Intel circus c3 and c5 and cirper c2 and c5 and on PPC circus
> c6. These large changes are solely from being combined with many
> circle functions into one big executable! This demonstrates that
> inlining combined with optimizations can contort toy programs in
> strange ways to produce convolved timings having no relationship
> to how the functions will behave in a real context.

So how can you possibly claim it makes no difference? The numbers
clearly show the monolithic compilation changes the results. What
is your explanation for the inlined separate vs monolith results?
What is your explanation for the non-inlined separate vs monolith
not showing the same large changes?

> I also tested that, and proved precisely that.

How about you (or anyone else with AMD chips pretty please) down-
load and run the circus (and optionally cirper) tests suites that
I provided, and upload the results for others to verify publicly,
just as I did. That will prove far more than such unsubstantiated
anecdotes. Maybe it is difficult to run the test suite in Windows
world? (Ie the convenient scripted compilation, testing, analysis
etc. Maybe someone that has Cygwin installed will give it a try?)

> > And do you believe
> > that your noshow with simple addition only (s+=x+y) is more
> > realistic than my BoardCheck that at least has to look some
> > random data up (s+=data[x][y]) and cannot be optimized away?
>
> Looking up random data in such a case makes absolutely no sense at
> all. Keep in mind that this is a routine for rasterizing a circle.

Keep in mind this is an algorithm for calling an arbitrary function
on a digital circle's digital locus. Rasterization is *one* of the
common applications, but not, by a long shot, the only common one.
You are pulling assumptions out of thin hot air.

> The instances of looking up some random data for points that lie on a
> circle are sufficiently rare that doing so makes no sense at all. The
> primary, and for all practical purposes sole, purpose for rasterizing
> a circle is to draw things at those points.

I'm sorry, but either you have no idea what you are talking about
or you have tunnel vision. And all these wild speculations are to
justify using a trivial stress function? I don't care what stress
function you choose, as long as it resembles realistic scenarios,
or at least isn't optimized away like your noshow was.

> In any case, it makes no difference: the only time it could get
> optimized away would be for the r==0 case, in which case it SHOULD be
> optimized away. Doing otherwise takes a circle that really has a
> diameter of 0 and draws it as if it had a diameter of 1 instead.
> While it may be _excusable_ to do such a thing as a cheap and dirty
> optimization, it's clearly not the _right_ thing at all.

Either you are right and the whole digital geometry community is
wrong, or you are once again pulling "facts" out of the thinnest
possible hot air. Regardless, the requirements for the algorithm
under question are that it calls f(0,0) for r==0 because {(0,0)}
is the digital locus of a digital circle with r==0 as defined in
digital geometry. Your own personal geometry that defines { } as
the locus of circle(0) is irrelevant.

> > And do you think it's realistic to optimize the timing loop
> > as having constant r (an arbitrary choice made to make time
> > sampling easier and more useful)?
>
> What in the world do you think you're talking about? In my code, r
> varied from 8 to 1024. The limits were hard-coded as constants in
> that code, but a quick check shows that 1) the test for r==0 was
> intact in the generated code, and 2) changing main to receive the
> minimum and maximum radii from an external source (argv) did not
> affect a single instruction of the code generated for the circle
> drawing routine.

You *completely* missed and misunderstood the point I made about
constant radius. Here it is again:

Keith H Duggar wrote:
> One can see the problems by looking at the assembly code. In the
> particular case of your code some examples are 1) since the time
> function loops limit times for a *constant* radius the optimizer
> can lift code that only depends on radius (such as the if (r==0)
> check you added to your do/while) outside of the limit test loop

Do you see *anything* in there about the outer rmin/rmax ranges?
The point about command line arguments was a different one that I
made in some code comment I think. The problem of constant radius
in the inner timing loop is dealt with by adding the random salt.

> At the risk of it sounding like a personal attack, it appears to me
> that my original assessment was correct: you've clearly been proved
> wrong, and you're grasping at any available straw to avoid admitting
> it (and now you've run out of straws to grasp, and started to post
> about purely imaginary ones).

First, as far as proof goes, you didn't refute even a single data
point in the numbers I recently published that *prove* monolithic
compilation affected the results. Nor did you even provide *data*
to substantiate your claim. Second you made a spurious claim that
the circle(0) locus should be { } at odds with both the published
circle algorithms and digital geometry. Third, you misunderstood
my argument about constant r and attacked something else instead.

Finally, and most importantly, you didn't run the new test suite,
you didn't provide the data from your separate compilation tests,
you didn't provide code and scripts that would allow us to repeat
those tests (ie separate vs monolithic compilations), you didn't
provide the generated assembly for verification, you didn't offer
alternative explanations for the recent circus + cirper data, and
you didn't rebut my observations of the data.

The most reasonable and useful thing would be for you to run the
new circus and cirper suites provided and upload the results. Or
you could also provide your own test suite that corrects for the
artifacts such as constant r in the timing loop, monolithic, etc
or demonstrates they have no effect.

Barring that, I wish some others would run the test suite and/or
weigh in on the debate because to me your position at this point
seems very weak. I'm looking at the timing data and the assembly
and simply cannot see how one can deny my points. Least of all I
can't see how that objective data proves *me wrong*. So truly, I
could use some help either from new data or fresh reasoning.

KHD

--

SG

unread,

Jul 20, 2009, 7:14:32 PM7/20/09

to

On 6 Jul., 07:50, Keith H Duggar wrote:
>
> In fact, Knuth and Floyd proved Jacopini's conjecture that gotos
> in a program of the form
>
> L1 : if ( C1 ) goto L2 ;
> S1 ;
> if ( C2 ) goto L2 ;
> S2 ;
> goto L1 ;
> L2 : S3 ;
>
> cannot always be removed without additional runtime computation.
> In fact, this circle algorithm is one example of another classic
> goto flow called the "loop-and-a-half" since as Dijkstra pointed
> out it loops "n and and half times":
>
> L1 : S1 ;
> if ( C1 ) goto L2 ;
> S2 ;
> goto L1 ;
> L2 : S3 ;
>
> that in general cannot be eliminated without code duplication as
> your second version does or additional computation as your first
> version did.

Occasionally, I use these kinds of loops. I can do so without goto and
without code duplication. Since in C and C++ the parts A, B, and C in
for(A;B;C) are optional I can write:

for (;C1;) {
S1;
if (C2) break;
S2;
}
S3;

for the first example and

for (;;) {
S1;
if (C1) break;
S2;
}
S3;

for the second example. It's quite convenient sometimes and IMHO still
preferable to gotos and labels.

Cheers!
SG

Seungbeom Kim

unread,

Jul 21, 2009, 1:07:57 PM7/21/09

to

SG wrote:
> On 6 Jul., 07:50, Keith H Duggar wrote:
>> In fact, Knuth and Floyd proved Jacopini's conjecture that gotos
>> in a program of the form
>>
>> L1 : if ( C1 ) goto L2 ;
>> S1 ;
>> if ( C2 ) goto L2 ;
>> S2 ;
>> goto L1 ;
>> L2 : S3 ;
>>
>> cannot always be removed without additional runtime computation.

[...]

>
> Occasionally, I use these kinds of loops. I can do so without goto and
> without code duplication. Since in C and C++ the parts A, B, and C in
> for(A;B;C) are optional I can write:
>
> for (;C1;) {

^ should be !C1

> S1;
> if (C2) break;
> S2;
> }
> S3;
>
> for the first example

[...]

This works, but breaks the symmetry between C1 and C2, in case
it is meaningful. You would not have made the mistake above with
goto, probably. :)

--
Seungbeom Kim

Keith H Duggar

unread,

Jul 21, 2009, 1:10:31 PM7/21/09

to

In some cases it's possible to do without the goto, but not in
the general case. Keep in mind that S1 and S2 can be arbitrary
code so, for example, if S1 and S2 construct a structured loop

for (;;) {
S1';
while ( C2 ) {
if ( C1 ) break ;
}
S2';
}

then the second example you gave is now broken in C/C++. Also,
in those debates they usually considered only a limited subset
of flow structures as "structured". In particular, that subset
was sequence (;), selection (if{}else{}), and repetition (for,
while, etc). Flow structure such as break, continue, the C and
C++ switch when used for interleaving code as in Duff's device,
exceptions, etc were not considered.

Now in a sense, even those goto-like structures such as break,
are "structured" because they restrict the possibilities a bit.
Indeed, in Knuth's goto paper he discusses some new structures
that he thought might be good additions to "structured" design
because they might allow one to efficiently eliminate goto for
even larger classes of problems. Sadly those useful structures
he proposed never became common.

Anyhow, it's easy to break 'break' solutions with nesting just
as I did. That said I, like you, often find the forms you gave
useful for all but the lowest level needs.

> It's quite convenient sometimes and IMHO still preferable to
> gotos and labels.

In some, but not all, scenarios I agree with you. I also agree
we should always first consider goto-less solutions first (and
thoroughly) before considering goto solutions for general code.
But, goto should remain in the toolbox not in the crucible.

KHD