Bug/Gross InEfficiency in HeathField's fgetline program

Antoninus Twink

unread,

Oct 7, 2007, 6:16:07 PM10/7/07

to

The function below is from Richard HeathField's fgetline program. For
some reason, it makes three passes through the string (a strlen(), a
strcpy() then another pass to change dots) when two would clearly be
sufficient. This could lead to unnecessarily bad performance on very
long strings. It is also written in a hard-to-read and clunky style.

char *dot_to_underscore(const char *s)
{
char *t = malloc(strlen(s) + 1);
if(t != NULL)
{
char *u;
strcpy(t, s);
u = t;
while(*u)
{
if(*u == '.')
{
*u = '_';
}
++u;
}
}
return
t;
}

Proposed solution:

char *dot_to_underscore(const char *s)
{
char *t, *u;
if(t=u=malloc(strlen(s)+1))
while(*u++=(*s=='.' ? s++, '_' : *s++));
return t;
}

Richard Heathfield

unread,

Oct 7, 2007, 6:55:39 PM10/7/07

to

Antoninus Twink said:

> The function below is from Richard HeathField's fgetline program.

It appears to be from my emgen utility, in fact.

> For
> some reason, it makes three passes through the string (a strlen(), a
> strcpy() then another pass to change dots) when two would clearly be
> sufficient. This could lead to unnecessarily bad performance on very
> long strings.

If it becomes a problem, I'll fix it. So far, it has not been a problem.

> It is also written in a hard-to-read and clunky style.

A matter of opinion. Which bit did you find hard to read?

> char *dot_to_underscore(const char *s)
> {
> char *t = malloc(strlen(s) + 1);
> if(t != NULL)
> {
> char *u;
> strcpy(t, s);
> u = t;
> while(*u)
> {
> if(*u == '.')
> {
> *u = '_';
> }
> ++u;
> }
> }
> return
> t;
> }
>
> Proposed solution:
>
> char *dot_to_underscore(const char *s)
> {
> char *t, *u;
> if(t=u=malloc(strlen(s)+1))
> while(*u++=(*s=='.' ? s++, '_' : *s++));
> return t;
> }

It is not obvious to me that this code correctly replaces the code I wrote.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

Mark McIntyre

unread,

Oct 7, 2007, 7:05:52 PM10/7/07

to

On Mon, 8 Oct 2007 00:16:07 +0200 (CEST), in comp.lang.c , Antoninus
Twink <nos...@nospam.com> wrote:

>The function below is from Richard HeathField's fgetline program.

Troll alert.

>it makes three passes through the string (a strlen(), a
>strcpy() then another pass to change dots) when two would clearly be
>sufficient. This could lead to unnecessarily bad performance on very
>long strings.

Possibly but consider the three laws of optimisation, and the data
typically being processed.

>It is also written in a hard-to-read and clunky style.

I disagree.

> char *t, *u;
> if(t=u=malloc(strlen(s)+1))

hilarious !

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Tor Rustad

unread,

Oct 7, 2007, 8:03:48 PM10/7/07

to

Antoninus Twink wrote:

> The function below is from Richard HeathField's fgetline program. For
> some reason, it makes three passes through the string (a strlen(), a
> strcpy() then another pass to change dots) when two would clearly be
> sufficient.

What new, R.H. has written slow code for years. ;-)

> This could lead to unnecessarily bad performance on very
> long strings. It is also written in a hard-to-read and clunky style.

ROTFL

> Proposed solution:
>
> char *dot_to_underscore(const char *s)
> {
> char *t, *u;
> if(t=u=malloc(strlen(s)+1))
> while(*u++=(*s=='.' ? s++, '_' : *s++));
> return t;
> }

Hmm.. who's code was hard-to-read and clunky?

Anyway, you are missing the point. Strings are usually rather short, and
when located in L1 cache, is doesn't matter much, scanning it 2 or 3 times.

However, a real spead-up here, would be to drop malloc() and use fixed
sized strings. That's the way, to beat the crap out of OO code.

--
Tor <torust [at] online [dot] no>

C-FAQ: http://c-faq.com/

Richard Heathfield

unread,

Oct 7, 2007, 8:10:28 PM10/7/07

to

Tor Rustad said:

<snip>

>
> Anyway, you are missing the point. Strings are usually rather short, and
> when located in L1 cache, is doesn't matter much, scanning it 2 or 3
> times.

The important thing is to avoid reading it n times.

> However, a real spead-up here, would be to drop malloc() and use fixed
> sized strings. That's the way, to beat the crap out of OO code.

If performance were the primary consideration, that's exactly what I'd have
done. Since it wasn't, it isn't. I considered robustness in the face of
long strings to be more important.

Peter Nilsson

unread,

Oct 7, 2007, 10:59:08 PM10/7/07

to

Antoninus Twink <nos...@nospam.com> wrote:
> The function below is from Richard HeathField's fgetline
> program. For some reason, it makes three passes through
> the string (a strlen(), a strcpy() then another pass to

> change dots) when two would clearly be sufficient. ...

For some reason you've capitalised three letters in his
name, when two would clearly be sufficient.

--
Peter

CBFalconer

unread,

Oct 7, 2007, 11:15:21 PM10/7/07

to

Mark McIntyre wrote:
> Antoninus Twink <nos...@nospam.com> wrote:
>
... snip ...

>
>> char *t, *u;
>> if(t=u=malloc(strlen(s)+1))
>
> hilarious !

What is hilarious? It should detect the failure of malloc quite
reliably. Of course the lack of blanks is rather foul.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Chris Thomasson

unread,

Oct 8, 2007, 1:25:03 AM10/8/07

to

"Peter Nilsson" <ai...@acay.com.au> wrote in message
news:1191812348....@g4g2000hsf.googlegroups.com...

Perhaps when he says his name out loud with a nervous tinge, it sound as if
there should be to capital letters... Speaking out load:

"That darn Heath, possible minor/stutter, Field!"

Na... Nobody is that serious about the pronouncement of somebody's name...
Ahh, it could be a simple typo...

Richard Heathfield

unread,

Oct 8, 2007, 1:50:08 AM10/8/07

to

Peter Nilsson said:

The mis-capitalisation is consistent with that used by a troll named "Paul"
(with an email address containing "paulcr"), who once threatened to break
my nose, apparently because he didn't know C very well.

Antoninus Twink

unread,

Oct 8, 2007, 4:38:03 AM10/8/07

to

On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
> Antoninus Twink said:
>
>> The function below is from Richard HeathField's fgetline program.
>
> It appears to be from my emgen utility, in fact.
>
>> For
>> some reason, it makes three passes through the string (a strlen(), a
>> strcpy() then another pass to change dots) when two would clearly be
>> sufficient. This could lead to unnecessarily bad performance on very
>> long strings.
>
> If it becomes a problem, I'll fix it. So far, it has not been a problem.

But what's frustrating is that it's an inefficiency that's completely
gratuitous! We all know that micro-optimization is bad, but this is a
micro-anti-optimization, which is surely worse!

The most natural way to look at this is "copy the characters from one
string to another, replacing . by _ when we see it". This has the
benefit of being a 1-pass algorithm. Instead, you split this up "first
copy one string to another; then go back to the beginning and swap . for
_". This makes a simple single operation into two, at the same time
introducing an extra pass through the string! It's not as if there's so
much fiendish complexity here that there's any benefit in breaking it up
into two separate operations.

>
>> It is also written in a hard-to-read and clunky style.
>
> A matter of opinion. Which bit did you find hard to read?

The function is a completely trivial one, yet I can't see it all at once
in my editor without scrolling! Whitespace can help readability, but
excessive whitespace can reduce it, and at the same time give too much
weight to things that aren't important.

>
>> char *dot_to_underscore(const char *s)
>> {
>> char *t = malloc(strlen(s) + 1);
>> if(t != NULL)
>> {
>> char *u;
>> strcpy(t, s);
>> u = t;
>> while(*u)
>> {
>> if(*u == '.')
>> {
>> *u = '_';
>> }
>> ++u;
>> }
>> }
>> return
>> t;
>> }
>>
>> Proposed solution:
>>
>> char *dot_to_underscore(const char *s)
>> {
>> char *t, *u;
>> if(t=u=malloc(strlen(s)+1))
>> while(*u++=(*s=='.' ? s++, '_' : *s++));
>> return t;
>> }
>
> It is not obvious to me that this code correctly replaces the code I wrote.

If you believe that it doesn't correctly replace the code you wrote, it
would be easy to demonstrate that by pointing out a specific input s for
which it gives a different result, or an error (syntax error or
undefined behavior or whatever).

Ian Collins

unread,

Oct 8, 2007, 4:48:17 AM10/8/07

to

Antoninus Twink wrote:
> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
>> Antoninus Twink said:
>>> It is also written in a hard-to-read and clunky style.
>> A matter of opinion. Which bit did you find hard to read?
>
> The function is a completely trivial one, yet I can't see it all at once
> in my editor without scrolling! Whitespace can help readability, but
> excessive whitespace can reduce it, and at the same time give too much
> weight to things that aren't important.
>

You must have a very small screen.

--
Ian Collins.

Philip Potter

unread,

Oct 8, 2007, 5:04:14 AM10/8/07

to

He hasn't said that this is what he believes. He is stating that your
code does not *obviously* replace his correctly. It is up to you to
prove it does, if you're going to say his is defective and you have come
up with a replacement.

--
Philip Potter pgp <at> doc.ic.ac.uk

Joachim Schmitz

unread,

Oct 8, 2007, 5:14:27 AM10/8/07

to

"Antoninus Twink" <nos...@nospam.com> schrieb im Newsbeitrag
news:slrnfgjr36...@nospam.com...

> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
>> Antoninus Twink said:
>>> It is also written in a hard-to-read and clunky style.
>>
>> A matter of opinion. Which bit did you find hard to read?
>
> The function is a completely trivial one, yet I can't see it all at once
> in my editor without scrolling!

20 lines don't fit your screen???

Bye, Jojo

Army1987

unread,

Oct 8, 2007, 6:02:44 AM10/8/07

to

And two letters in InEfficiency, when zero would clearly be
sufficient.

--
Army1987 (Replace "NOSPAM" with "email")
A hamburger is better than nothing.
Nothing is better than eternal happiness.
Therefore, a hamburger is better than eternal happiness.

Richard Heathfield

unread,

Oct 8, 2007, 6:35:59 AM10/8/07

to

Philip Potter said:

> Antoninus Twink wrote:
>> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
>>> It is not obvious to me that this code correctly replaces the code I
>>> wrote.
>>
>> If you believe that it doesn't correctly replace the code you wrote
>
> He hasn't said that this is what he believes.

I always knew you could read, Philip. :-) Some other people, I'm not so
sure about.

Richard Bos

unread,

Oct 8, 2007, 6:50:30 AM10/8/07

to

Peter Nilsson <ai...@acay.com.au> wrote:

Well, yes. Now look at OP's /nom de Usenet/. Surprised?

Richard

Chris Hills

unread,

Oct 8, 2007, 7:25:21 AM10/8/07

to

In article <slrnfgiml5...@nospam.com>, Antoninus Twink
<nos...@nospam.com> writes

>The function below is from Richard HeathField's fgetline program.

I know Richard has a thing about JN but can we not all sink to the same
level.

We should be discussing C, as it is used here, not all trying to be
pedants and petty point scoring.

--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/
/\/\/ ch...@phaedsys.org www.phaedsys.org \/\/\
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

Richard

unread,

Oct 8, 2007, 8:22:12 AM10/8/07

to

Antoninus Twink <nos...@nospam.com> writes:

I would move the ++ part

,----
| while(*u++=(*s=='.' ? '_' : *s))
| s++;
`----

But yes, much nicer, easier to read and understand and possibly
faster. And hopefully working :-;

Thad Smith

unread,

Oct 8, 2007, 9:08:57 AM10/8/07

to

What happens when malloc returns a null pointer?

--
Thad

santosh

unread,

Oct 8, 2007, 9:13:10 AM10/8/07

to

Thad Smith wrote:

AFAICT it returns a null pointer value, same as Richard's version.

Richard

unread,

Oct 8, 2007, 11:53:25 AM10/8/07

to

Thad Smith <Thad...@acm.org> writes:

>>>>
>>>> char *dot_to_underscore(const char *s)
>>>> {
>>>> char *t, *u;
>>>> if(t=u=malloc(strlen(s)+1))
>>>> while(*u++=(*s=='.' ? s++, '_' : *s++));
>>>> return t;
>>>> }
>

> What happens when malloc returns a null pointer?

It returns null obviously , why?

Richard

unread,

Oct 8, 2007, 11:52:33 AM10/8/07

to

Richard Heathfield <r...@see.sig.invalid> writes:

> Antoninus Twink said:
>
>> The function below is from Richard HeathField's fgetline program.
>
> It appears to be from my emgen utility, in fact.
>
>> For
>> some reason, it makes three passes through the string (a strlen(), a
>> strcpy() then another pass to change dots) when two would clearly be
>> sufficient. This could lead to unnecessarily bad performance on very
>> long strings.
>
> If it becomes a problem, I'll fix it. So far, it has not been a
> problem.

Why write it to be slower for the same functionality. It's basic level 1
pointer stuff.

>
>> It is also written in a hard-to-read and clunky style.
>
> A matter of opinion. Which bit did you find hard to read?

It covers about 20 lines - this is harder to read when the other is
about 4 and easy to follow. In my experience excessive white space is rarely welcome in
the real world especially when using a debugger.

>
>> char *dot_to_underscore(const char *s)
>> {
>> char *t = malloc(strlen(s) + 1);
>> if(t != NULL)
>> {
>> char *u;
>> strcpy(t, s);
>> u = t;
>> while(*u)
>> {
>> if(*u == '.')
>> {
>> *u = '_';
>> }
>> ++u;
>> }
>> }
>> return
>> t;
>> }
>>
>> Proposed solution:
>>
>> char *dot_to_underscore(const char *s)
>> {
>> char *t, *u;
>> if(t=u=malloc(strlen(s)+1))
>> while(*u++=(*s=='.' ? s++, '_' : *s++));
>> return t;
>> }
>
> It is not obvious to me that this code correctly replaces the code I
> wrote.

Which part do you think doesn't "correctly" replace the code? Always
better to highlight the errors since we are all prone to making them. I
guess there is some feature in your code that you are alluding too that
is not obvious to others.

Antoninus Twink

unread,

Oct 8, 2007, 12:47:11 PM10/8/07

to

On the contrary: the code was four lines and used common C idioms. If it
isn't completely obvious to someone (or at the very least if they can't
mentally check in 10 seconds) what it does, then I don't believe they
know the language well enough to write it professionally.

(And if in turn you read my words to the letter, you'll notice that I
make no claim that Mr HeathField falls into this class of people: it is
my belief that he is being deliberately obtuse.)

Antoninus Twink

unread,

Oct 8, 2007, 12:48:26 PM10/8/07

to

On 8 Oct 2007 at 12:22, Richard wrote:
>> char *dot_to_underscore(const char *s)
>> {
>> char *t, *u;
>> if(t=u=malloc(strlen(s)+1))
>> while(*u++=(*s=='.' ? s++, '_' : *s++));
>> return t;
>> }
>
> I would move the ++ part
>
> ,----
>| while(*u++=(*s=='.' ? '_' : *s))
>| s++;
> `----
>
> But yes, much nicer, easier to read and understand and possibly
> faster. And hopefully working :-;

That's indeed easier to read, though it's always satisfying to write a
loop with no body :)

Richard Heathfield

unread,

Oct 8, 2007, 1:11:15 PM10/8/07

to

Richard said:

> Richard Heathfield <r...@see.sig.invalid> writes:
>
>> Antoninus Twink said:
>>

<snip>

>>>
>>> char *dot_to_underscore(const char *s)
>>> {
>>> char *t, *u;
>>> if(t=u=malloc(strlen(s)+1))
>>> while(*u++=(*s=='.' ? s++, '_' : *s++));
>>> return t;
>>> }
>>
>> It is not obvious to me that this code correctly replaces the code I
>> wrote.
>
> Which part do you think doesn't "correctly" replace the code?

That's not what I said.

Richard

unread,

Oct 8, 2007, 1:19:28 PM10/8/07

to

Philip Potter <p...@see.sig.invalid> writes:

> Antoninus Twink wrote:
>> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
>>> It is not obvious to me that this code correctly replaces the code I wrote.
>>
>> If you believe that it doesn't correctly replace the code you wrote
>
> He hasn't said that this is what he believes. He is stating that your
> code does not *obviously* replace his correctly. It is up to you to

Don't be ridiculous. It was quite clear what RH meant. The shorter code
is FAR more obvious to me anyway - especially since it removes 2
unnecessary string operations which may or may not have caused hard to
spot behaviour. The shorter version is far, far more obvious in every
way. This is, of course, IMO abnd I am sure that a nOOb who cant read
the "?" operator and has no idea about post increment and pointer
dereferncing might have difficulties with the compact nature of the
second version. Personally I think it is easy enough to read at first
glance and the reduction in library calls makes it *easier* to
understand. having said that, I haven't tested or scrutinised the code
to ensure it reproduces in all cases exactly what the original code
produces.

> prove it does, if you're going to say his is defective and you have
> come up with a replacement.

No one said it was deceptive. heatjfield suggested it didn't do the same
thing "obviously". The original was, however long winded, overly white
spaced and unnecessarily inefficient IMO. Real code review if you
like. No one needs 20 lines for such a simple piece of code. This is C :
elegance and efficiency is everything. I suspect that it was overly
wordy for a reason since I would be surprised to see production quality
code like that.

Richard Heathfield

unread,

Oct 8, 2007, 1:27:15 PM10/8/07

to

Richard said:

> Philip Potter <p...@see.sig.invalid> writes:
>
>> Antoninus Twink wrote:
>>> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
>>>> It is not obvious to me that this code correctly replaces the code I
>>>> wrote.
>>>
>>> If you believe that it doesn't correctly replace the code you wrote
>>
>> He hasn't said that this is what he believes. He is stating that your
>> code does not *obviously* replace his correctly. It is up to you to
>
> Don't be ridiculous. It was quite clear what RH meant.

Well, I agree, but you appear to have misunderstood it nonetheless. I said
*precisely* what I meant to say, which is that it was not obvious to me

that this code correctly replaces the code I wrote.

<snip>

>> prove it does, if you're going to say his is defective and you have
>> come up with a replacement.
>
> No one said it was deceptive.

Nor did anyone say it was defective.

> heatjfield suggested it didn't do the same thing "obviously".

Well, I said it wasn't obvious to me that it does the same thing. Your
paraphrase of what I said is susceptible to interpretations that cannot
reasonably be drawn from my original phrasing.

<snip>

Richard

unread,

Oct 8, 2007, 1:24:12 PM10/8/07

to

Richard Heathfield <r...@see.sig.invalid> writes:

> Richard said:
>
>> Richard Heathfield <r...@see.sig.invalid> writes:
>>
>>> Antoninus Twink said:
>>>
> <snip>
>>>>
>>>> char *dot_to_underscore(const char *s)
>>>> {
>>>> char *t, *u;
>>>> if(t=u=malloc(strlen(s)+1))
>>>> while(*u++=(*s=='.' ? s++, '_' : *s++));
>>>> return t;
>>>> }
>>>
>>> It is not obvious to me that this code correctly replaces the code I
>>> wrote.
>>
>> Which part do you think doesn't "correctly" replace the code?
>
> That's not what I said.

Yes it is. If its not "obvious" then there is an issue. It is obvious to
me. Which part is not obvious to you?

Richard

unread,

Oct 8, 2007, 1:22:20 PM10/8/07

to

"Joachim Schmitz" <nospa...@schmitz-digital.de> writes:

What a ridiculous argument.

It doesn't fit my source window in gdb for a start.

Why 20 lines when it can be 4 perfectly eligible, efficient lines?

Don't defend code just because of the source. The second version is
superior for production level code in at least two ways:

1) screen real estate (this IS important in the real world).
2) less library calls. less to worry about IMO.

>
> Bye, Jojo

Richard

unread,

Oct 8, 2007, 1:22:58 PM10/8/07

to

Richard Heathfield <r...@see.sig.invalid> writes:

> Philip Potter said:
>
>> Antoninus Twink wrote:
>>> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
>>>> It is not obvious to me that this code correctly replaces the code I
>>>> wrote.
>>>
>>> If you believe that it doesn't correctly replace the code you wrote
>>
>> He hasn't said that this is what he believes.
>
> I always knew you could read, Philip. :-) Some other people, I'm not so
> sure about.

It was perfectly obvious what you meant. Word games don't even begin to
cover it.

santosh

unread,

Oct 8, 2007, 1:31:52 PM10/8/07

to

Richard wrote:

> [ ... ] heatjfield suggested [ ... ]

What is it with people mangling Richard's surname? :)

Richard Heathfield

unread,

Oct 8, 2007, 1:38:23 PM10/8/07

to

Richard said:
> Richard Heathfield <r...@see.sig.invalid> writes:
>> Richard said:
>>> Richard Heathfield <r...@see.sig.invalid> writes:
>>>
>>>> It is not obvious to me that this code correctly replaces the code I
>>>> wrote.
>>>
>>> Which part do you think doesn't "correctly" replace the code?
>>
>> That's not what I said.
>
> Yes it is.

No, it isn't. Read it again.

> If its not "obvious" then there is an issue.

Yes. The issue is that the suggested replacement code is overly terse and
difficult to read.

> It is obvious to me. Which part is not obvious to you?

Not least the need to compress functionality into the fewest possible
source characters at the expense of readability and maintainability. I
recognise that a terse style is considered "cool" by some, but in my
experience it is merely expensive.

I still haven't bothered to attempt to /read/ the suggested replacement
code, partly since its style discourages reading, and partly because of
its source. If, as I suspect, the OP is just a sock-puppet for the guy who
threatened to break my nose (see elsethread), I'm not terribly interested
in working out whether the functionality of his code does or does not
accurately reflect that of my own.

Richard Heathfield

unread,

Oct 8, 2007, 1:49:29 PM10/8/07

to

santosh said:

It's sheer ignorance. Getting someone's name right is a basic mark of
respect, which is why I *always* take care with names - I hate getting
them wrong, and have done so only rarely. "Kylheku" and "Dijkstra" are two
that I've got wrong in the past, but I always get them right now.

If someone continually gets a name wrong, it's a reasonable sign that they
consider that person to be not worth the bother of taking trouble for.
Richard Riley's inability or unwillingness to take out a second or two to
get my name right is therefore suggestive. And if he is so short of time
and care when composing Usenet articles that he *doesn't even have time to
get my name right*, I see no reason why I or anyone should accord his
hurried, careless views any weight. Furthermore, I would venture to
suggest that there *may* be a correlation between those who write hurried,
careless Usenet articles and those who write hurried, careless C.

And if someone habitually gets a particular name wrong in precisely the
same way that a violence-threatening troll habitually does (not Richard
Riley, I hasten to add, but this Antoninus Twink character that everyone
is treating so seriously), then it's hard to avoid the conclusion that
we're dealing with a sock puppet.

santosh

unread,

Oct 8, 2007, 1:48:24 PM10/8/07

to

Richard wrote:

<snip>

> Why 20 lines when it can be 4 perfectly eligible, efficient lines?
>
> Don't defend code just because of the source. The second version is
> superior for production level code in at least two ways:
>
> 1) screen real estate (this IS important in the real world).
> 2) less library calls. less to worry about IMO.

As far as the second point applies to this specific example, it could be
possible that the strcpy implementation is more efficient than a manual
coded loop. This could become noticeable for large string numbers or
sizes.

Minor point, but since this whole thread is nit-picking, I though I'd
mention it.

Joachim Schmitz

unread,

Oct 8, 2007, 1:51:06 PM10/8/07

to

"Richard" <rgr...@gmail.com> schrieb im Newsbeitrag
news:dj5qt4-...@news.individual.net...

> "Joachim Schmitz" <nospa...@schmitz-digital.de> writes:
>
>> "Antoninus Twink" <nos...@nospam.com> schrieb im Newsbeitrag
>> news:slrnfgjr36...@nospam.com...
>>> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
>>>> Antoninus Twink said:
>>>>> It is also written in a hard-to-read and clunky style.
>>>>
>>>> A matter of opinion. Which bit did you find hard to read?
>>>
>>> The function is a completely trivial one, yet I can't see it all at once
>>> in my editor without scrolling!
>> 20 lines don't fit your screen???
>
> What a ridiculous argument.

Indeed. Not mine though but Antonius'...
It is definitly easier to read. Yes it could have been written more compact,
I esp. disliked the "return t;" in 2 lines and with a different brace style
one could save another couple of lines and still keep it simple.

> It doesn't fit my source window in gdb for a start.

esp. in a debugger it is much easier to follow the flow if not everything is
done in one line

> Why 20 lines when it can be 4 perfectly eligible, efficient lines?

see above, because it is easier to debug.
For that reason I liked your improvement to Antonius' version (moving the ++
part into the otherwise empty body of the loop).

> Don't defend code just because of the source.

?? You mean RH being the source and I defend it because of that? Or what
else are youn trying to say here?

> The second version is
> superior for production level code in at least two ways:
>
> 1) screen real estate (this IS important in the real world).
> 2) less library calls. less to worry about IMO.

We're talking about one call to strcpy, which in a decent implementation is
lightning fast and higly optimized, so saving it doesn't really save much.

Bye, Jojo

Joachim Schmitz

unread,

Oct 8, 2007, 1:53:48 PM10/8/07

to

"Antoninus Twink" <nos...@nospam.com> schrieb im Newsbeitrag

news:slrnfgknqi...@nospam.com...

Try to resist the temptation, it's hard, I know...

Bye, Jojo

santosh

unread,

Oct 8, 2007, 2:07:23 PM10/8/07

to

Richard Heathfield wrote:

> santosh said:
>
>> Richard wrote:
>>
>>> [ ... ] heatjfield suggested [ ... ]
>>
>> What is it with people mangling Richard's surname? :)
>
> It's sheer ignorance. Getting someone's name right is a basic mark of
> respect, which is why I *always* take care with names - I hate getting
> them wrong, and have done so only rarely. "Kylheku" and "Dijkstra" are
> two that I've got wrong in the past, but I always get them right now.
>
> If someone continually gets a name wrong, it's a reasonable sign that
> they consider that person to be not worth the bother of taking trouble
> for. Richard Riley's inability or unwillingness to take out a second
> or two to get my name right is therefore suggestive. And if he is so
> short of time and care when composing Usenet articles that he *doesn't
> even have time to get my name right*, I see no reason why I or anyone
> should accord his hurried, careless views any weight. Furthermore, I
> would venture to suggest that there *may* be a correlation between
> those who write hurried, careless Usenet articles and those who write
> hurried, careless C.
>
> And if someone habitually gets a particular name wrong in precisely
> the same way that a violence-threatening troll habitually does (not
> Richard Riley, I hasten to add, but this Antoninus Twink character
> that everyone is treating so seriously), then it's hard to avoid the
> conclusion that we're dealing with a sock puppet.

Regarding your last paragraph, there is strong evidence that
this "Antoninus Twink" and "Paul", (having an email beginning
with "paulcr"), who posted a few months ago are the same.

<http://groups.google.com/group/comp.lang.c/msg/f2e171fc9691cd7d?dmode=source>
<http://groups.google.com/group/comp.lang.c/msg/b571a9a155facb48?dmode=source>

As you can see, the "User-Agent" and "NNTP-Posting-Host" fields in the
header are identical. Further the timezone is also the same.

Whether the "Paul" in the second message above is indeed the one who
threatened you is not so easily verifiable, but it seems very likely.

Antoninus Twink

unread,

Oct 8, 2007, 2:12:27 PM10/8/07

to

On 8 Oct 2007 at 17:38, Richard Heathfield wrote:
> Not least the need to compress functionality into the fewest possible
> source characters at the expense of readability and maintainability. I
> recognise that a terse style is considered "cool" by some, but in my
> experience it is merely expensive.

Wading through 20 lines of code that do a 4-line job in a roundabout way
can also be expensive.

> I still haven't bothered to attempt to /read/ the suggested replacement
> code, partly since its style discourages reading, and partly because of
> its source. If, as I suspect, the OP is just a sock-puppet for the guy who
> threatened to break my nose (see elsethread), I'm not terribly interested
> in working out whether the functionality of his code does or does not
> accurately reflect that of my own.

It's a pair of declarations, a simple return statement, and two lines of
moderately dense but completely idiomatic C. It's not like you need to
set aside a rainy afternoon to dedicate to reading it.

Actually I suspect that you've read it closely, because you'd like
nothing more than to humiliate me by pointing out a bug in it. As it's
manifestly correct, you've turned to word games instead of admitting
that your own code might, just might, admit some improvement.

And who on earth has said anything about breaking your nose? It sounds
to me like you're suffering from some form of paranoia.

santosh

unread,

Oct 8, 2007, 2:24:29 PM10/8/07

to

Antoninus Twink wrote:

> On 8 Oct 2007 at 17:38, Richard Heathfield wrote:
>> Not least the need to compress functionality into the fewest possible
>> source characters at the expense of readability and maintainability.
>> I recognise that a terse style is considered "cool" by some, but in
>> my experience it is merely expensive.
>
> Wading through 20 lines of code that do a 4-line job in a roundabout
> way can also be expensive.
>
>> I still haven't bothered to attempt to /read/ the suggested
>> replacement code, partly since its style discourages reading, and
>> partly because of its source. If, as I suspect, the OP is just a
>> sock-puppet for the guy who threatened to break my nose (see
>> elsethread), I'm not terribly interested in working out whether the
>> functionality of his code does or does not accurately reflect that of
>> my own.

<snip>

> And who on earth has said anything about breaking your nose? It sounds
> to me like you're suffering from some form of paranoia.

The email address of the poster who wrote this threatening message:
<http://groups.google.com/group/alt.comp.lang.learn.c-c++/msg/82c5c4b2e59984e0?dmode=source>

And this message:
<http://groups.google.com/group/comp.lang.c/msg/b571a9a155facb48?dmode=source
&utoken=0qjyGSsAAAAoZoYpti9uhwqsYFEYUqETYYZ-k6tkIFJmSBA4M7hURISd8ZKZA3rNsODuE9WiCO4>

begins identically.

Your messages in this thread share with the second message linked above,
identical "User-Agent", "NNTP-Posting-Host" headers as well as the
timezone.

A is related to B.
B is realted to C.

So

A is related to C.

:)

Richard Heathfield

unread,

Oct 8, 2007, 2:27:11 PM10/8/07

to

santosh said:

> Richard wrote:
>
> <snip>
>
>> Why 20 lines when it can be 4 perfectly eligible, efficient lines?
>>
>> Don't defend code just because of the source. The second version is
>> superior for production level code in at least two ways:
>>
>> 1) screen real estate (this IS important in the real world).
>> 2) less library calls. less to worry about IMO.
>
> As far as the second point applies to this specific example, it could be
> possible that the strcpy implementation is more efficient than a manual
> coded loop. This could become noticeable for large string numbers or
> sizes.

Yes, it's certainly possible. So is the reverse, of course. The only proper
course is to measure, realising that the measurement will be specific to a
particular implementation on a specific machine.

I have measured the performance of the code - my code - on an Athlon 1.4
(no slouch, but hardly a state-of-the-art mean machine) under gcc 2.95.3,
using an input file over 10 Megabytes in size (a typical real world input
would be a handful of kilobytes). Exact input file size: 11057780 bytes.
Number of lines: 120,000 (compared to a typical real world input of
perhaps a few dozen, or maybe three or four thousand for a fairly large
system).

The profiler reports that the program took 0.1 seconds to run (on inputs
that are orders of magnitude larger than would be expected in production).
The code whose performance is in question is called ONCE, by the way, and
the profiler (which claims to measure in microseconds) reports that the
function takes zero time to run. Obviously that can't be literally true,
but it's certainly too small for my gprof implementation to measure. It
might be reasonably argued that it takes almost a microsecond.

The purpose of the program is to take as input a list of error messages and
identifiers, and convert these into a .h file with #defines for the error
identifiers, and a .c file with a function that converts a number into an
error message.

It's a programmer's tool, and requires as input a file that is used by a
programmer to store an intelligent identifier/message pair. To edit such a
file, for a superhuman programmer like Chris Torek, might take as little
as - what, five seconds? (Wow, watch those fingers fly) And the next step
would be to run the program (perhaps automatically, on saving the file).
If this superhuman programmer did nothing but updates to the input file
all day every day, he would cause 86400/5 = 17280 program runs per day. If
the file size is as large as in my test (deeply unlikely in the real
world, but just about possible for a really, really, really large
project), the total program time taken would be a little less than half an
hour (29 minutes 28 seconds, in fact) - remember that this is for over
seventeen THOUSAND runs.

If, for the sake of argument, the OP's code is correct and if it reduces
the cost of converting dots to underscores from 1 microsecond to *zero*,
the total time saving per day will be 0.01728 seconds. Over a thousand
years of 24hrs/day of running this program 17280 times a day, the total
time saved will be about an hour and three quarters.

Compare this to the time spent on the discussion so far.

Tor Rustad

unread,

Oct 8, 2007, 2:36:13 PM10/8/07

to

Richard Heathfield wrote:
> Tor Rustad said:
>
> <snip>
>> Anyway, you are missing the point. Strings are usually rather short, and
>> when located in L1 cache, is doesn't matter much, scanning it 2 or 3
>> times.
>
> The important thing is to avoid reading it n times.
>
>> However, a real spead-up here, would be to drop malloc() and use fixed
>> sized strings. That's the way, to beat the crap out of OO code.
>
> If performance were the primary consideration, that's exactly what I'd have
> done. Since it wasn't, it isn't. I considered robustness in the face of
> long strings to be more important.

I didn't look at the original code.

In particular, if the function in question, was supposed to interact
with a human providing input, OP's objections was nonsense.

--
Tor <torust [at] online [dot] no>

C-FAQ: http://c-faq.com/

santosh

unread,

Oct 8, 2007, 2:44:24 PM10/8/07

to

Richard Heathfield wrote:

Until recently I used, where possible, a handwritten assembler wrapper
that used the Pentium's RDTSC instruction. I've switched to Valgrind.

> The purpose of the program is to take as input a list of error
> messages and identifiers, and convert these into a .h file with
> #defines for the error identifiers, and a .c file with a function that
> converts a number into an error message.
>
> It's a programmer's tool, and requires as input a file that is used by
> a programmer to store an intelligent identifier/message pair. To edit
> such a file, for a superhuman programmer like Chris Torek, might take
> as little as - what, five seconds? (Wow, watch those fingers fly) And
> the next step would be to run the program (perhaps automatically, on
> saving the file). If this superhuman programmer did nothing but
> updates to the input file all day every day, he would cause 86400/5 =
> 17280 program runs per day. If the file size is as large as in my test
> (deeply unlikely in the real world, but just about possible for a
> really, really, really large project), the total program time taken
> would be a little less than half an hour (29 minutes 28 seconds, in
> fact) - remember that this is for over seventeen THOUSAND runs.
>
> If, for the sake of argument, the OP's code is correct and if it
> reduces the cost of converting dots to underscores from 1 microsecond
> to *zero*, the total time saving per day will be 0.01728 seconds. Over
> a thousand years of 24hrs/day of running this program 17280 times a
> day, the total time saved will be about an hour and three quarters.
>
> Compare this to the time spent on the discussion so far.

Good point. By the way, I'd have thought something like Perl would be
better suited to your task. But of course, it's best to code it
whatever language you are most familiar with.

Richard

unread,

Oct 8, 2007, 2:49:54 PM10/8/07

to

"Joachim Schmitz" <nospa...@schmitz-digital.de> writes:

> "Richard" <rgr...@gmail.com> schrieb im Newsbeitrag
> news:dj5qt4-...@news.individual.net...
>> "Joachim Schmitz" <nospa...@schmitz-digital.de> writes:
>>
>>> "Antoninus Twink" <nos...@nospam.com> schrieb im Newsbeitrag
>>> news:slrnfgjr36...@nospam.com...
>>>> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
>>>>> Antoninus Twink said:
>>>>>> It is also written in a hard-to-read and clunky style.
>>>>>
>>>>> A matter of opinion. Which bit did you find hard to read?
>>>>
>>>> The function is a completely trivial one, yet I can't see it all at once
>>>> in my editor without scrolling!
>>> 20 lines don't fit your screen???
>>
>> What a ridiculous argument.
> Indeed. Not mine though but Antonius'...
> It is definitly easier to read. Yes it could have been written more compact,
> I esp. disliked the "return t;" in 2 lines and with a different brace style
> one could save another couple of lines and still keep it simple.
>
>> It doesn't fit my source window in gdb for a start.
> esp. in a debugger it is much easier to follow the flow if not everything is
> done in one line

You are being purposely obstructive for some reason. No one mentioned 1
line. It is however important, in my experience, to keep tight code
close together so it can be examined without excessive scrolling or
white space cluttering up the display.

>
>> Why 20 lines when it can be 4 perfectly eligible, efficient lines?
> see above, because it is easier to debug.

No, it isn't. by adding the s++ line I suggested you have ability to
examine data on every iteration. Set your watched variables, step. Done.

> For that reason I liked your improvement to Antonius' version (moving the ++
> part into the otherwise empty body of the loop).

Two reasons : because it meant you didnt have the same s++ twice. Less
code as it happens. In addition a suitable break point to examine the
flow. Contrary to a lot of opinion here I belieb debuggers are very
useful and incredibly handy for teaching new programmers how a system
works. As well, of course, as being able to set watch points and HW
breakpoints.

>
>> Don't defend code just because of the source.
> ?? You mean RH being the source and I defend it because of that? Or what
> else are youn trying to say here?

Just that. I can see no advantages whatsoever to the original code. It
is wordy, spacey and inefficient.

>
>> The second version is
>> superior for production level code in at least two ways:
>>
>> 1) screen real estate (this IS important in the real world).
>> 2) less library calls. less to worry about IMO.
> We're talking about one call to strcpy, which in a decent implementation is
> lightning fast and higly optimized, so saving it doesn't really save
> much.

So, which, as a C programmer, do you prefer?

To me there is only one contender. Sorry and all that but I feel
strongly about it. Really.

>
> Bye, Jojo

Richard

unread,

Oct 8, 2007, 2:59:11 PM10/8/07

to

Antoninus Twink <nos...@nospam.com> writes:

> On 8 Oct 2007 at 17:38, Richard Heathfield wrote:
>> Not least the need to compress functionality into the fewest possible
>> source characters at the expense of readability and maintainability. I
>> recognise that a terse style is considered "cool" by some, but in my
>> experience it is merely expensive.
>
> Wading through 20 lines of code that do a 4-line job in a roundabout way
> can also be expensive.
>
>> I still haven't bothered to attempt to /read/ the suggested replacement
>> code, partly since its style discourages reading, and partly because
>> of

Rubbish. I sometimes wonder if you only program in a class room. Even
K&R have this type of thing by page 20.

The code is as simple as it can get. A malloc and a typical if then else
duplication of a string using ?: It can hardly get any simpler. It seems
you are not as open to code corrections and/or improvements as you
frequently state. Why am I not surprised?

>> its source. If, as I suspect, the OP is just a sock-puppet for the guy who
>> threatened to break my nose (see elsethread), I'm not terribly interested
>> in working out whether the functionality of his code does or does not
>> accurately reflect that of my own.

Working out?!?!?! A glance should be enough.

So know we go from "not obvious" to "I am too self important to even
bother looking at it". Not much new there then.

>
> It's a pair of declarations, a simple return statement, and two lines of
> moderately dense but completely idiomatic C. It's not like you need to
> set aside a rainy afternoon to dedicate to reading it.
>
> Actually I suspect that you've read it closely, because you'd like
> nothing more than to humiliate me by pointing out a bug in it. As it's
> manifestly correct, you've turned to word games instead of admitting
> that your own code might, just might, admit some improvement.

I hate to agree. But I agree.

Richard

unread,

Oct 8, 2007, 2:53:37 PM10/8/07

to

santosh <santo...@gmail.com> writes:

> Richard wrote:
>
> <snip>
>
>> Why 20 lines when it can be 4 perfectly eligible, efficient lines?
>>
>> Don't defend code just because of the source. The second version is
>> superior for production level code in at least two ways:
>>
>> 1) screen real estate (this IS important in the real world).
>> 2) less library calls. less to worry about IMO.
>
> As far as the second point applies to this specific example, it could be
> possible that the strcpy implementation is more efficient than a manual
> coded loop. This could become noticeable for large string numbers or
> sizes.

That would be a point if it had any validity at all here. As it is, I'm
not sure it does. There was simply no need to copy the entire string
using strcpy and then loop through it again doing the replacement
IMO. However, I would be interested to hear than I am wrong. I just like
the "do it once" mantra.

>
> Minor point, but since this whole thread is nit-picking, I though I'd
> mention it.

It is not nit picking in the slightest.

Reducing a 20 line lump of code to 3 or 4 lines can does have big
ramifications on

code reads
maintenance
documentation
efficiency
debugging

Richard

unread,

Oct 8, 2007, 2:54:27 PM10/8/07

to

santosh <santo...@gmail.com> writes:

Why would you post that?

Off Topic for a start and NOTHING to do with C.

The poster was quite correct in his improvements of RHs code. Regardless
of who or what he is.

Richard Heathfield

unread,

Oct 8, 2007, 2:59:16 PM10/8/07

to

santosh said:

> By the way, I'd have thought something like Perl would be
> better suited to your task.

Perl is available in lots of places, it's true - but it has not yet been
ported *everywhere*.

> But of course, it's best to code it
> whatever language you are most familiar with.

Or indeed a language for which an implementation exists on the target
machine.

"Oh?" I hear you say. "What machine /is/ he targeting?" My answer is
simple: "how on earth should I know? Suddenly we need to know what machine
people run our code on? When did /that/ rule come in?"

Richard Heathfield

unread,

Oct 8, 2007, 3:08:39 PM10/8/07

to

Richard said:

> Antoninus Twink <nos...@nospam.com> writes:
>
>> On 8 Oct 2007 at 17:38, Richard Heathfield wrote:
>>> Not least the need to compress functionality into the fewest possible
>>> source characters at the expense of readability and maintainability. I
>>> recognise that a terse style is considered "cool" by some, but in my
>>> experience it is merely expensive.
>>
>> Wading through 20 lines of code that do a 4-line job in a roundabout way
>> can also be expensive.
>>
>>> I still haven't bothered to attempt to /read/ the suggested replacement
>>> code, partly since its style discourages reading, and partly because
>>> of
>
> Rubbish.

Precisely. And I don't intend to waste my time looking at rubbish posted by
one troll and lauded by another.

Richard

unread,

Oct 8, 2007, 3:06:24 PM10/8/07

to

"Joachim Schmitz" <nospa...@schmitz-digital.de> writes:

To reiterate why : it makes it nigh on impossible to debug. And people
do use debuggers in the real world. A lot.

Richard

unread,

Oct 8, 2007, 3:00:38 PM10/8/07

to

santosh <santo...@gmail.com> writes:

Santosh, you are not Heathfield's lapdog. No one cares. Can we stick to
the issue at hand - namely efficient coding in C and the possible
ramifications of wordy, overly engineered solution compared to small
precise, elegant solutions which utilise the features of the language.

Joachim Schmitz

unread,

Oct 8, 2007, 3:10:01 PM10/8/07

to

"Richard" <rgr...@gmail.com> schrieb im Newsbeitrag

news:mnaqt4-...@news.individual.net...

The real work in Antonius' version is done in one line:

while(*u++=(*s=='.' ? s++, '_' : *s++));

> It is however important, in my experience, to keep tight code

> close together so it can be examined without excessive scrolling or
> white space cluttering up the display.

Excessive scrolling and 20 lines (that could easily be deduced to some 15)
is not the same

>>> Why 20 lines when it can be 4 perfectly eligible, efficient lines?
>> see above, because it is easier to debug.
>
> No, it isn't. by adding the s++ line I suggested you have ability to
> examine data on every iteration. Set your watched variables, step. Done.

Which is exactly my point. Yes, with your change this is possibnle, with
Antonius' it is not.

>> For that reason I liked your improvement to Antonius' version (moving the
>> ++
>> part into the otherwise empty body of the loop).
>
> Two reasons : because it meant you didnt have the same s++ twice. Less
> code as it happens. In addition a suitable break point to examine the
> flow.

Agreed

> Contrary to a lot of opinion here I belieb debuggers are very
> useful and incredibly handy for teaching new programmers how a system
> works. As well, of course, as being able to set watch points and HW
> breakpoints.
>>> Don't defend code just because of the source.
>> ?? You mean RH being the source and I defend it because of that? Or what
>> else are youn trying to say here?
>
> Just that. I can see no advantages whatsoever to the original code. It
> is wordy, spacey and inefficient.

It could be condensed by using a different bracing style, without making it
more difficult to read.
The OP claimed a Bug and/or gross inefficiency. This is nonsense. There's no
bug and the inefficiency is minor and outwheight by the simplicity of the
code. Even if I see room for improvment.

>>> The second version is
>>> superior for production level code in at least two ways:
>>>
>>> 1) screen real estate (this IS important in the real world).
>>> 2) less library calls. less to worry about IMO.
>> We're talking about one call to strcpy, which in a decent implementation
>> is
>> lightning fast and higly optimized, so saving it doesn't really save
>> much.
>
> So, which, as a C programmer, do you prefer?

I liked your version, for it's elegance and Richard's for it's simplicity.
Antonius' version is too terse, good for teaching C maybe, but not good for
production code.

Bye, Jojo

Richard Heathfield

unread,

Oct 8, 2007, 3:22:26 PM10/8/07

to

Richard said:

<snip>

> Can we stick to
> the issue at hand - namely efficient coding in C and the possible
> ramifications of wordy, overly engineered solution compared to small
> precise, elegant solutions which utilise the features of the language.

The code as written took about a minute to write, tops. EVEN IF the
suggested replacement (which, no matter what you may choose to believe, I
haven't bothered to analyse or even read) were correct and even if it were
able to do the same job in zero time, the total time it is likely to save
me over the rest of my expected lifetime is significantly lower than the
time it would take to do the necessary edit, compilation, and testing of
the replacement code.

The OP is suggesting micro-optimisation with a vengeance. He is suggesting
that I replace working code that is called *once* per program invocation
and which does its job so fast that gprof can't keep up. This is silly
beyond words. That you cannot understand this does not surprise me - after
all, you even struggle to get people's names right. Mind you, I've noticed
that you're coming along nicely on that front - another ten or fifteen
years at this rate of learning and you might just be ready to take your
place in this group as someone whose articles are actually worth reading
other than for mere rebuttal.

Keith Thompson

unread,

Oct 8, 2007, 3:31:39 PM10/8/07

to

Yes, it's perfectly obvious what he meant. It's also perfectly
obvious that if he had meant that he believes that it doesn't
correctly replace the code he wrote, the could and would have said so.
He didn't.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

John Bode

unread,

Oct 8, 2007, 3:37:35 PM10/8/07

to

On Oct 8, 1:49 pm, Richard <rgr...@gmail.com> wrote:

> "Joachim Schmitz" <nospam.j...@schmitz-digital.de> writes:
> > "Richard" <rgr...@gmail.com> schrieb im Newsbeitrag
> >news:dj5qt4-...@news.individual.net...

[snip]

>
> >> Don't defend code just because of the source.
> > ?? You mean RH being the source and I defend it because of that? Or what
> > else are youn trying to say here?
>
> Just that. I can see no advantages whatsoever to the original code. It
> is wordy,

Translation: easy to read and understand.

> spacey

Translation: easy to follow.

> and inefficient.

Facts not in evidence. As others have pointed out, strcpy() may be
implemented in such a way that is faster than copying individual
characters in a loop. And elsethread RH has pointed out that this
code gets called *once*, that he has benchmarked it for obnoxiously
long inputs (tens of megabytes), and that the performance is perfectly
acceptable (< 1 us according to his profiler).

Antonius' replacement is terse to the point of being painful to read
(at least for these 42-year-old eyes). Some judicious use of
whitespace would help a *lot* (I almost missed the multiple assignment
the first time through); even so, the conditional expression is
especially jarring, and it took me a couple of minutes to convince
myself it was doing what I thought it was (as opposed to taking almost
no time to understand the original). Yes, it uses common C idioms,
but not all of us use all idioms equally (I almost never use a
conditional expression in my day-to-day work, nor do I commonly use
the comma operator, and I've used the two in combination maybe twice
in the last 15 years). Until someone profiles that code vs.
Richard's, it's an open question as to whether it's really more
efficient, or if that gain in efficiency is worth the cost of
readability.

>
>
>
> >> The second version is
> >> superior for production level code in at least two ways:
>
> >> 1) screen real estate (this IS important in the real world).

Less so than readability and maintainability*. I will happily
sacrifice screen real estate for code that's easy to understand.

> >> 2) less library calls. less to worry about IMO.

This makes no sense to me; if anything in your code is going to work
reasonably well, I would expect it to be the library calls.

> > We're talking about one call to strcpy, which in a decent implementation is
> > lightning fast and higly optimized, so saving it doesn't really save
> > much.
>
> So, which, as a C programmer, do you prefer?

RH's. It's easier to grok at first glance. It would be easier to fix
if a bug was found. It's easier to explain how it works. It's easier
to verify that it's correct.

>
> To me there is only one contender. Sorry and all that but I feel
> strongly about it. Really.
>

As do I. I've maintained code that read like RH's and code that read
like Antonius'. I strongly prefer working with the former, if for no
other reason than obvious flaws are easier to spot and fix. I *used*
to write code like Antonius' version, and it *always* bit me in the
ass six to nine months later when I'd forgotten about it and had to
figure out how it worked again.

* Production code must be, in order: 1. Correct (it doesn't matter
how fast your code is if it does the wrong thing); 2. Maintainable (it
doesn't matter how fast your code is if it can't be fixed or
upgraded); 3. Robust (it doesn't matter how fast your code is if it
dies at the first hint of bad input); 4. Efficient, but not at the
cost of 1, 2, or 3.

Richard

unread,

Oct 8, 2007, 3:48:33 PM10/8/07

to

Keith Thompson <ks...@mib.org> writes:

> Richard <rgr...@gmail.com> writes:
>> Richard Heathfield <r...@see.sig.invalid> writes:
>>> Philip Potter said:
>>>> Antoninus Twink wrote:
>>>>> On 7 Oct 2007 at 22:55, Richard Heathfield wrote:
>>>>>> It is not obvious to me that this code correctly replaces the code I
>>>>>> wrote.
>>>>>
>>>>> If you believe that it doesn't correctly replace the code you wrote
>>>>
>>>> He hasn't said that this is what he believes.
>>>
>>> I always knew you could read, Philip. :-) Some other people, I'm not so
>>> sure about.
>>
>> It was perfectly obvious what you meant. Word games don't even begin to
>> cover it.
>
> Yes, it's perfectly obvious what he meant. It's also perfectly
> obvious that if he had meant that he believes that it doesn't
> correctly replace the code he wrote, the could and would have said so.
> He didn't.

You don't have to defend Heathfield. His fairly unique style makes it
very clear, as did his followup, exactly what he meant.

He didn't even bother to read the code as he made clear in a later post.

Richard

unread,

Oct 8, 2007, 3:47:20 PM10/8/07

to

"Joachim Schmitz" <nospa...@schmitz-digital.de> writes:

>>
>> So, which, as a C programmer, do you prefer?
> I liked your version, for it's elegance and Richard's for it's
> simplicity.

Richard's was more complex. It used the same reference/dereference with
additional library calls.

Richard

unread,

Oct 8, 2007, 3:54:05 PM10/8/07

to

John Bode <john...@my-deja.com> writes:

> On Oct 8, 1:49 pm, Richard <rgr...@gmail.com> wrote:
>> "Joachim Schmitz" <nospam.j...@schmitz-digital.de> writes:
>> > "Richard" <rgr...@gmail.com> schrieb im Newsbeitrag
>> >news:dj5qt4-...@news.individual.net...
>
> [snip]
>
>>
>> >> Don't defend code just because of the source.
>> > ?? You mean RH being the source and I defend it because of that? Or what
>> > else are youn trying to say here?
>>
>> Just that. I can see no advantages whatsoever to the original code. It
>> is wordy,
>
> Translation: easy to read and understand.

Nope. Not at all. Long winded and overly wordy. Far more difficult to
determine its use "at a glance.

>
>> spacey
>
> Translation: easy to follow.

No. Spacey and using unnecessary real estate.

>
>> and inefficient.
>
> Facts not in evidence. As others have pointed out, strcpy() may be
> implemented in such a way that is faster than copying individual
> characters in a loop. And elsethread RH has pointed out that this
> code gets called *once*, that he has benchmarked it for obnoxiously
> long inputs (tens of megabytes), and that the performance is perfectly
> acceptable (< 1 us according to his profiler).

Regardless, unnecessary call. Sorry.

>
> Antonius' replacement is terse to the point of being painful to read
> (at least for these 42-year-old eyes). Some judicious use of
> whitespace would help a *lot* (I almost missed the multiple assignment
> the first time through); even so, the conditional expression is

It was obvious to me and I suspect any C programmer who works daily with
C.

> especially jarring, and it took me a couple of minutes to convince
> myself it was doing what I thought it was (as opposed to taking almost
> no time to understand the original). Yes, it uses common C idioms,
> but not all of us use all idioms equally (I almost never use a
> conditional expression in my day-to-day work, nor do I commonly use
> the comma operator, and I've used the two in combination maybe twice
> in the last 15 years). Until someone profiles that code vs.
> Richard's, it's an open question as to whether it's really more
> efficient, or if that gain in efficiency is worth the cost of
> readability.

I agree about the comma. Hence my "improvement" by moving the s++ to the
body. Not only for reduced code, but for a suitable debugger point.

>
>>
>>
>>
>> >> The second version is
>> >> superior for production level code in at least two ways:
>>
>> >> 1) screen real estate (this IS important in the real world).
>
> Less so than readability and maintainability*. I will happily
> sacrifice screen real estate for code that's easy to understand.

those 4 lines are MUCH more maintainable then Heathfields.

>
>> >> 2) less library calls. less to worry about IMO.
>
> This makes no sense to me; if anything in your code is going to work
> reasonably well, I would expect it to be the library calls.

I dont know. I had to take a double look as to WHY he called
strcpy. There was simply no need.

>
>> > We're talking about one call to strcpy, which in a decent implementation is
>> > lightning fast and higly optimized, so saving it doesn't really save
>> > much.
>>
>> So, which, as a C programmer, do you prefer?
>
> RH's. It's easier to grok at first glance. It would be easier to fix
> if a bug was found. It's easier to explain how it works. It's easier
> to verify that it's correct.

I disagree on almost every point you have made. As is our right :-)

>
>>
>> To me there is only one contender. Sorry and all that but I feel
>> strongly about it. Really.
>>
>
> As do I. I've maintained code that read like RH's and code that read
> like Antonius'. I strongly prefer working with the former, if for no
> other reason than obvious flaws are easier to spot and fix. I *used*
> to write code like Antonius' version, and it *always* bit me in the
> ass six to nine months later when I'd forgotten about it and had to
> figure out how it worked again.

If you found those 4 lines hard to understand than, to be honest, I
wonder about your C experience. it was a basic string transfer with
character replacement. I dont mean this as an insult, but I have never
experienced code like RHs getting through code reviews on systems I have
worked on. It would be called "inefficient" and "beginner like".

>
> * Production code must be, in order: 1. Correct (it doesn't matter
> how fast your code is if it does the wrong thing); 2. Maintainable (it

Are there bugs in the replacement?

> doesn't matter how fast your code is if it can't be fixed or
> upgraded); 3. Robust (it doesn't matter how fast your code is if it
> dies at the first hint of bad input); 4. Efficient, but not at the
> cost of 1, 2, or 3.

Agreed. All of which are met by the replacement.

Less lines also tend to translate to less errors.

Antoninus Twink

unread,

Oct 8, 2007, 4:06:31 PM10/8/07

to

On 8 Oct 2007 at 18:24, santosh wrote:
>> And who on earth has said anything about breaking your nose? It sounds
>> to me like you're suffering from some form of paranoia.
>
> The email address of the poster who wrote this threatening message:
><http://groups.google.com/group/alt.comp.lang.learn.c-c++/msg/82c5c4b2e59984e0?dmode=source>
>
> And this message:
><http://groups.google.com/group/comp.lang.c/msg/b571a9a155facb48?dmode=source
> &utoken=0qjyGSsAAAAoZoYpti9uhwqsYFEYUqETYYZ-k6tkIFJmSBA4M7hURISd8ZKZA3rNsODuE9WiCO4>
>
> begins identically.
>
> Your messages in this thread share with the second message linked above,
> identical "User-Agent", "NNTP-Posting-Host" headers as well as the
> timezone.
>
> A is related to B.
> B is realted to C.
>
> So
>
> A is related to C.

Except that both connections are, to say the least, tenuous. I mean,
people with slightly similar email addresses *must* be the same! And
after all, there's a *unique* slrn user who posts through aioe!

Put the two together and you get something completely far-fetched. Do
you honestly believe that I, Paulcr, or anyone else would disappear for
5 years then suddently reappear, still nursing a grudge against Mr
Heathfield?

On the subject of names, I've been criticized for writing HeathField as
two words (hardly unreasonable, given that Heath and Field are... well,
both words), but I notice that Joachim and John Bode have both mistaken
my pseudonym for Antonius in this thread. What's sauce for the goose...

But really, it's just amazing how ready people are to shy away from
actual technical discussion and resort to mud-slinging instead.

Richard

unread,

Oct 8, 2007, 4:37:34 PM10/8/07

to

Antoninus Twink <nos...@nospam.com> writes:

I am astonished that people claiming to be professional programmers
could be in any way "confused" or "unsure" about your concise
replacement for Heathfield's version.

,----
| while(*u++=(*s=='.' ? '_' : *s))
| s++;
`----

I simply can not see the complication or the need to "analyse" this. Yes
if we were training in lesson 2 we might expand it out a little. but
only to the extent of removing the ?: usage to an if then else.

Willem

unread,

Oct 8, 2007, 4:42:31 PM10/8/07

to

Richard wrote:
) Nope. Not at all. Long winded and overly wordy. Far more difficult to
) determine its use "at a glance.

And the proposed improvement is overly terse and compressed.
It might be that someone who writes C every day can easily understand what
it does, but you can't assume that the maintenance programmer that comes
across your code in a year or two is that well-versed in C.
This is one possible implementation:

char *dot_to_underscore(const char *src)
{
char *ret;
size_t i;

ret = malloc(strlen(src) + 1);
if (ret == NULL)
return NULL;

for (i = 0; src[i] != '\0'; i++) {
if (src[i] == '.') {
ret[i] = '_';
} else {
ret[i] = src[i];
}
}
ret[i] = '\0';

return ret;
}

This is another, much more suited for reading by non-C programmers:
char *dot_to_underscore(const char *src)
{
return substitute(copy_string(src), '.', '_');
}

(Of course, you need to have suitable 'substitute' and 'copy_string'
functions.)

What I'm trying to say is: it's entirely possible that your code
will be read by someone who's not a C guru. And I can tell you, lots
of people out there maintaining C code only have passing familiarity.

By the way, the perl equivalent would be:
(ret = str) =~ s/\./_/g;
But that's off-topic.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

John Bode

unread,

Oct 8, 2007, 5:28:44 PM10/8/07

to

On Oct 8, 2:54 pm, Richard <rgr...@gmail.com> wrote:

> John Bode <john_b...@my-deja.com> writes:
> > On Oct 8, 1:49 pm, Richard <rgr...@gmail.com> wrote:
> >> "Joachim Schmitz" <nospam.j...@schmitz-digital.de> writes:
> >> > "Richard" <rgr...@gmail.com> schrieb im Newsbeitrag
> >> >news:dj5qt4-...@news.individual.net...
>

[snip]

> >> and inefficient.

>
> > Facts not in evidence. As others have pointed out, strcpy() may be
> > implemented in such a way that is faster than copying individual
> > characters in a loop. And elsethread RH has pointed out that this
> > code gets called *once*, that he has benchmarked it for obnoxiously
> > long inputs (tens of megabytes), and that the performance is perfectly
> > acceptable (< 1 us according to his profiler).
>
> Regardless, unnecessary call. Sorry.
>

FWIW, after four runs on a 200 MB input string, RH's code is on
average 4% faster than AT's code.

Keith Thompson

unread,

Oct 8, 2007, 5:29:13 PM10/8/07

to

I think I've misunderstood what *you* meant. I've been assuming that
you thought RH was claiming that the code doesn't work, and that your
"Word games don't even begin to cover it" remark was aimed at RH.
Looking back at this subthread, I see that this may have been a
misjudgement on my part.

In my defense, your defending Richard Heathfield is so out of
character for you that it didn't occur to me that you were doing so.
And I still don't know what "word games" is supposed to refer to.

> He didn't even bother to read the code as he made clear in a later post.

Well, he didn't read it in enough depth to determine whether it works
(I don't recall his exact words). And that's why it's not obvious to
him that ... well, see above.

Mark McIntyre

unread,

Oct 8, 2007, 5:32:22 PM10/8/07

to

On Sun, 07 Oct 2007 23:15:21 -0400, in comp.lang.c , CBFalconer
<cbfal...@yahoo.com> wrote:

>Mark McIntyre wrote:
>> Antoninus Twink <nos...@nospam.com> wrote:
>>
>... snip ...
>>
>>> char *t, *u;
>>> if(t=u=malloc(strlen(s)+1))
>>
>> hilarious !
>
>What is hilarious? It should detect the failure of malloc quite
>reliably. Of course the lack of blanks is rather foul.

in a post complaining about lack of clarity...

--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

santosh

unread,

Oct 8, 2007, 5:33:25 PM10/8/07

to

John Bode wrote:

Interesting! This could be a case of simpler constructions being more
easily understood by the optimiser.

Mark McIntyre

unread,

Oct 8, 2007, 5:33:44 PM10/8/07

to

On Mon, 08 Oct 2007 17:27:15 +0000, in comp.lang.c , Richard
Heathfield <r...@see.sig.invalid> wrote:

>Well, I agree, but you appear to have misunderstood it nonetheless. I said
>*precisely* what I meant to say

You are the Cheshire Cat and I claim my five pounds.
gd&r

Mark McIntyre

unread,

Oct 8, 2007, 5:35:04 PM10/8/07

to

On Mon, 8 Oct 2007 20:12:27 +0200 (CEST), in comp.lang.c , Antoninus
Twink <nos...@nospam.com> wrote:

>On 8 Oct 2007 at 17:38, Richard Heathfield wrote:
>> Not least the need to compress functionality into the fewest possible
>> source characters at the expense of readability and maintainability. I
>> recognise that a terse style is considered "cool" by some, but in my
>> experience it is merely expensive.
>
>Wading through 20 lines of code

"Wading"?

>that do a 4-line job in a roundabout way can also be expensive.

This is why we invented computers - to do tricky things like reading
20 lines of code....

Malcolm McLean

unread,

Oct 8, 2007, 5:47:19 PM10/8/07

to

"Willem" <wil...@stack.nl> wrote in message

> This is another, much more suited for reading by non-C programmers:
> char *dot_to_underscore(const char *src)
> {
> return substitute(copy_string(src), '.', '_');
> }
>

My thought too. Here's my enhancement to the code.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

/*
replace all instances of target string with destination
Params: src - the input string
tar - target string to search for
rep - replacement string
Returns: malloced pointer to substituted string
Notes: greedy replacement, overlapping targets replaced only once.
*/
char *replace(const char *src, const char *tar, const char *rep)
{
char *sub;
char *answer;
const char *ptr;
char *out;
size_t srclen;
size_t tarlen;
size_t replen;
int Nmatches = 0;

srclen = strlen(src);
tarlen = strlen(tar);
replen = strlen(rep);

assert(tarlen > 0);

ptr = src;
while(sub = strstr(ptr, tar))
{
Nmatches++;
ptr = sub + tarlen;
}
answer = malloc(srclen + Nmatches * replen - Nmatches * tarlen + 1);
if(!answer)
return 0;

ptr = src;
out = answer;
while(sub = strstr(ptr, tar))
{
memcpy(out, ptr, sub - ptr);
out += sub - ptr;
memcpy(out, rep, replen);
out += replen;
ptr = sub + tarlen;
}
strcpy(out, ptr);

return answer;
}

/*
Let's unit test it.
*/
int main(int argc, char **argv)
{
char *result = 0;

printf("Replace all instances of arg2 with arg2 in arg1\n");
if(argc != 4)
exit(EXIT_FAILURE);

result = replace(argv[1], argv[2], argv[3]);
if(result)
printf("%s\n", result);
else
{
fprintf(stderr, "Out of memory\n");
exit(EXIT_FAILURE);
}
free(result);
return 0;
}

The OP's solution was a hacker's approach to the problem. Sometimes that is
what you need, for instance if the replacement function were a bottleneck.
But generally you don't. It is hard to read, hard to translate to other
programming languages.

But is my function actually an improvement?

result = replace(str, ".", "_");

is certainly as readable as the original call, in fact more so, because in
replace() we have a reusable function that can be called many times, and its
behaviour memorised.

However it is substantially more code. It is quite probably over-engineering
the answer. The there are certian problems - what is meant to happen if
target strings overlap, for instance, or if caller passes the empty string
as a target? If it is in a time critical portion of the code, it is
unacceptably slow. Whilst Richard Heathfield's code was lackadisical, it was
bug free and did work.

At the end of the day, it depends on your project. If code quality is
paramount, use my approach, if it is just knock-up code use one of the
others.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Richard Heathfield

unread,

Oct 8, 2007, 6:09:01 PM10/8/07

to

John Bode said:

<snip>

> FWIW, after four runs on a 200 MB input string, RH's code is on
> average 4% faster than AT's code.

I couldn't possibly comment. :-)

Richard Heathfield

unread,

Oct 8, 2007, 6:10:49 PM10/8/07

to

Keith Thompson said:

> Richard <rgr...@gmail.com> writes:

<snip>

>> He didn't even bother to read the code as he made clear in a later post.
>
> Well, he didn't read it in enough depth to determine whether it works

Quite so. I'm not sure why Richard Riley finds this hard to understand.

Old Wolf

unread,

Oct 8, 2007, 6:14:29 PM10/8/07

to

On Oct 9, 9:37 am, Richard <rgr...@gmail.com> wrote:
> I am astonished that people claiming to be professional programmers
> could be in any way "confused" or "unsure" about your concise
> replacement for Heathfield's version.
>
> ,----
> | while(*u++=(*s=='.' ? '_' : *s))
> | s++;
> `----

That is NOT the replacement code suggested by
"Antoninus Twink". Perhaps you should get one of
those threaded newsreaders you keep whingeing
that everybody else should have, so that you can
check your facts before posting.

For the actual code, see:
http://groups.google.co.nz/group/comp.lang.c/msg/f2e171fc9691cd7d

Antoninus Twink

unread,

Oct 8, 2007, 6:27:20 PM10/8/07

to

On 8 Oct 2007 at 22:14, Old Wolf wrote:
> On Oct 9, 9:37 am, Richard <rgr...@gmail.com> wrote:
>> I am astonished that people claiming to be professional programmers
>> could be in any way "confused" or "unsure" about your concise
>> replacement for Heathfield's version.
>>
>> ,----
>> | while(*u++=(*s=='.' ? '_' : *s))
>> | s++;
>> `----
>
> That is NOT the replacement code suggested by
> "Antoninus Twink". Perhaps you should get one of
> those threaded newsreaders you keep whingeing
> that everybody else should have, so that you can
> check your facts before posting.

You are splitting hares. It is functionally equivalent - its advantage
is a slightly shorter total length, but in exchange the loop doesn't
have an empty body.

And I find it amusing that you object to me posting under a Usenet
handle, when your own posts are from "Old Wolf".

Tor Rustad

unread,

Oct 8, 2007, 6:37:32 PM10/8/07

to

John Bode wrote:

[...]

> FWIW, after four runs on a 200 MB input string, RH's code is on
> average 4% faster than AT's code.

ROFL!

However, the optimizer can be playing tricks with you here. Anyway, RH
code was 96% more readable and maintainable, so OP was Trolling.

--
Tor <torust [at] online [dot] no>

CBFalconer

unread,

Oct 8, 2007, 5:32:58 PM10/8/07

to

Thad Smith wrote:

> Antoninus Twink wrote:
>> Richard Heathfield wrote:
>>> Antoninus Twink said:
>
>> The function is a completely trivial one, yet I can't see it all

>> at once in my editor without scrolling! Whitespace can help
>> readability, but excessive whitespace can reduce it, and at the
>> same time give too much weight to things that aren't important.
>>
>>>> char *dot_to_underscore(const char *s)
>>>> {
>>>> char *t = malloc(strlen(s) + 1);
>>>> if(t != NULL)
>>>> {
>>>> char *u;
>>>> strcpy(t, s);
>>>> u = t;
>>>> while(*u)
>>>> {
>>>> if(*u == '.')
>>>> {
>>>> *u = '_';
>>>> }
>>>> ++u;
>>>> }
>>>> }
>>>> return
>>>> t;
>>>> }
>>>>
>>>> Proposed solution:

>>>>
>>>> char *dot_to_underscore(const char *s)
>>>> {
>>>> char *t, *u;
>>>> if(t=u=malloc(strlen(s)+1))

>>>> while(*u++=(*s=='.' ? s++, '_' : *s++));

>>>> return t;

>>>> }
>>> It is not obvious to me that this code correctly replaces the
>>> code I wrote.
>>
>> If you believe that it doesn't correctly replace the code you

>> wrote, it would be easy to demonstrate that by pointing out a
>> specific input s for which it gives a different result, or an
>> error (syntax error or undefined behavior or whatever).
>
> What happens when malloc returns a null pointer?

While I am basically in favor of the 'tauter code' group, my
rewrite would have been different. For one thing, I don't like the
? coding. My solution:

char *dot_to_underscore(const char *s) {
char *t, *u;

if (!(t = u = malloc(strlen(s) + 1))) {
while (*u = *s++) {
if (*u == '.') *u = '_';
++u;
}
}
return t;
}

However, I dislike taking multiple scans of the same strings, so I
would probably have arranged for the routine to return strlen, and
a negative length if malloc fails. This decision depends heavily
on the use to which the routine is put.

All this is a non-factor, and basically tests individual styles and
preferances. While worthy of a discussion, it is not worth an
argument.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

santosh

unread,

Oct 8, 2007, 6:48:33 PM10/8/07

to

CBFalconer wrote:

You mean you commence copy when malloc fails?

Tor Rustad

unread,

Oct 8, 2007, 6:51:05 PM10/8/07

to

santosh wrote:
> John Bode wrote:

[...]

>> FWIW, after four runs on a 200 MB input string, RH's code is on
>> average 4% faster than AT's code.
>
> Interesting! This could be a case of simpler constructions being more
> easily understood by the optimiser.

I don't think so. My guess would be, either the measurement is
misleading, or strcpy() help alignment and execute some code in parallel.

--
Tor <torust [at] online [dot] no>

C-FAQ: http://c-faq.com/

Tor Rustad

unread,

Oct 8, 2007, 6:55:33 PM10/8/07

to

CBFalconer wrote:
> Thad Smith wrote:

[...]

>> What happens when malloc returns a null pointer?
>
> While I am basically in favor of the 'tauter code' group, my
> rewrite would have been different. For one thing, I don't like the
> ? coding. My solution:
>
> char *dot_to_underscore(const char *s) {
> char *t, *u;
>
> if (!(t = u = malloc(strlen(s) + 1))) {

That should run very fast! :-)

santosh

unread,

Oct 8, 2007, 7:10:19 PM10/8/07

to

Tor Rustad wrote:

> santosh wrote:
>> John Bode wrote:
>
> [...]
>
>>> FWIW, after four runs on a 200 MB input string, RH's code is on
>>> average 4% faster than AT's code.
>>
>> Interesting! This could be a case of simpler constructions being more
>> easily understood by the optimiser.
>
> I don't think so. My guess would be, either the measurement is
> misleading, or strcpy() help alignment and execute some code in
> parallel.

Well I did a small comparison test between the two version on a string
of length 209,715,200 bytes, with the '.' character just before the
terminating null. I used clock to time the functions. For four runs for
each version here are the averages:

RH's version = 1.480000s
AT's version = 1.212500s

[system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
So for this system at least, AT's version is significantly faster.

If anyone wants, I'll post the driver code.

Keith Thompson

unread,

Oct 8, 2007, 7:15:19 PM10/8/07

to

Willem <wil...@stack.nl> writes:
[...]

> By the way, the perl equivalent would be:
> (ret = str) =~ s/\./_/g;
> But that's off-topic.

Indeed, which is why I won't mention that the *correct* perl
equivalent is:

($ret = $str) =~ s/\./_/g;

Richard Tobin

unread,

Oct 8, 2007, 7:07:11 PM10/8/07

to

In article <feedci$1hk$1...@aioe.org>, santosh <santo...@gmail.com> wrote:

>Well I did a small comparison test between the two version on a string
>of length 209,715,200 bytes, with the '.' character just before the
>terminating null. I used clock to time the functions. For four runs for
>each version here are the averages:
>
>RH's version = 1.480000s
>AT's version = 1.212500s

If you can be bothered, what happens if you replace the strcpy() in
RH's code with a memcpy() (the length being already known)?

-- Richard
--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

Richard Heathfield

unread,

Oct 8, 2007, 7:31:47 PM10/8/07

to

santosh said:

<snip>

> RH's version = 1.480000s
> AT's version = 1.212500s
>
> [system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
> So for this system at least, AT's version is significantly faster.

No, it isn't significantly faster. The function is called *once* by the
program of which it is a part. On my system, it takes less than a
microsecond to run. Let's round up and call it one microsecond. According
to your timings, AT's version saves (1.48 - 1.2125)/1.48 = approximately
0.180743 of the time. This equates to 181 nanoseconds per program run.

To save as much as a single second, you'd have to run the program well over
five *million* times. If it only took me one minute to verify the change,
update the source, recompile, and re-test, I'd still have to run the code
three hundred million times just to break even. I type at 40wpm, so I can
probably manage to add one error message identifier and meaningful message
text in about five seconds. 300,000,000 * 5 = 1,500,000,000 seconds. So
breaking even would take 47 years (assuming I did nothing else for the
next 47 years, such as eating and sleeping and actual programming and
watching LOTR and so on).

So in fact it would be counter-productive to adopt the change, even if I
thought it were an improvement, which I don't.

santosh

unread,

Oct 8, 2007, 7:30:54 PM10/8/07

to

Richard Tobin wrote:

> In article <feedci$1hk$1...@aioe.org>, santosh <santo...@gmail.com>
> wrote:
>
>>Well I did a small comparison test between the two version on a string
>>of length 209,715,200 bytes, with the '.' character just before the
>>terminating null. I used clock to time the functions. For four runs
>>for each version here are the averages:
>>
>>RH's version = 1.480000s
>>AT's version = 1.212500s
>
> If you can be bothered, what happens if you replace the strcpy() in
> RH's code with a memcpy() (the length being already known)?

The average time drops to 1.320000s. Not much.

santosh

unread,

Oct 8, 2007, 7:34:47 PM10/8/07

to

Richard Heathfield wrote:

> santosh said:
>
> <snip>
>
>> RH's version = 1.480000s
>> AT's version = 1.212500s
>>
>> [system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
>> So for this system at least, AT's version is significantly faster.
>
> No, it isn't significantly faster. The function is called *once* by
> the program of which it is a part. On my system, it takes less than a
> microsecond to run. Let's round up and call it one microsecond.
> According to your timings, AT's version saves (1.48 - 1.2125)/1.48 =
> approximately 0.180743 of the time. This equates to 181 nanoseconds
> per program run.
>
> To save as much as a single second, you'd have to run the program well
> over five *million* times. If it only took me one minute to verify the
> change, update the source, recompile, and re-test, I'd still have to
> run the code three hundred million times just to break even. I type at
> 40wpm, so I can probably manage to add one error message identifier
> and meaningful message text in about five seconds. 300,000,000 * 5 =
> 1,500,000,000 seconds. So breaking even would take 47 years (assuming
> I did nothing else for the next 47 years, such as eating and sleeping
> and actual programming and watching LOTR and so on).
>
> So in fact it would be counter-productive to adopt the change, even if
> I thought it were an improvement, which I don't.

You are right. I got carried away by the literal numbers and failed to
put them perspective with the real world.

Tor Rustad

unread,

Oct 8, 2007, 7:55:22 PM10/8/07

to

santosh wrote:
> Tor Rustad wrote:
>
>> santosh wrote:
>>> John Bode wrote:
>> [...]
>>
>>>> FWIW, after four runs on a 200 MB input string, RH's code is on
>>>> average 4% faster than AT's code.
>>> Interesting! This could be a case of simpler constructions being more
>>> easily understood by the optimiser.
>> I don't think so. My guess would be, either the measurement is
>> misleading, or strcpy() help alignment and execute some code in
>> parallel.
>
> Well I did a small comparison test between the two version on a string
> of length 209,715,200 bytes, with the '.' character just before the
> terminating null. I used clock to time the functions. For four runs for
> each version here are the averages:
>
> RH's version = 1.480000s
> AT's version = 1.212500s

Yeah, that's more like it.

> [system is Pentium Dual Core 1.6 GHz with 1 Gb RAM]
> So for this system at least, AT's version is significantly faster.

I have not seen the fgetline() code, but if the function in question is
called only *once*, then OP "optimized" the *outer* loop, a professional
would go for the *inner* loop.

Your measurement, showed that OP's posted nonsense, an insignificant
micro-optimization.

CBFalconer

unread,

Oct 8, 2007, 6:51:36 PM10/8/07

to

Antoninus Twink wrote:
>
... snip ...

>
> You are splitting hares. It is functionally equivalent - its
> advantage is a slightly shorter total length, but in exchange the
> loop doesn't have an empty body.

Splitting hares will not reduce the poor beasts length, only its
width. The emptiness of the body depends (at least in part) on how
much of the interior spills after the split. :-)

You might want to consider 'splitting hairs'.

John Bode

unread,

Oct 8, 2007, 8:21:38 PM10/8/07

to

On Oct 8, 5:37 pm, Tor Rustad <tor_rus...@hotmail.com> wrote:
> John Bode wrote:
>
> [...]
>
> > FWIW, after four runs on a 200 MB input string, RH's code is on
> > average 4% faster than AT's code.
>
> ROFL!
>
> However, the optimizer can be playing tricks with you here.

Oh, no doubt. But that kind of reinforces the idea that writing code
in a clear, straightforward way (that is, clear to anyone who *isn't*
an expert in C) is better than trying to save screen real estate.

I'm rewriting the test harness and running it on my home box (Gentoo);
the box I ran it on earlier was RHEL 3 running in a VMware session on
an XP Pro box that's running eleventy billion server processes. I'm
also going to try it with various string lengths and optimization
options.

> Anyway, RH
> code was 96% more readable and maintainable, so OP was Trolling.
>

Again, no doubt.

Richard

unread,

Oct 8, 2007, 8:39:06 PM10/8/07

to

John Bode <john...@my-deja.com> writes:

If you prefer RHs code I am astonished. But it takes all sorts I
support. it's not "bad code" by any stretch of the imagination.

As was pointed out earlier that kind of speed increase/decrease is
generally immaterial however, I know for a fact that the time saved in
debugging and reading it is worth its weight in gold if you can not
understand such a bog standard 2 lines then there are issues in your
understanding of C.

while(*u++=(*s=='.' ? '_' : *s))
s++;

It couldn't be easier.

Old Wolf

unread,

Oct 8, 2007, 10:22:19 PM10/8/07

to

On Oct 9, 11:27 am, Antoninus Twink <nos...@nospam.com> wrote:
> On 8 Oct 2007 at 22:14, Old Wolf wrote:
> > On Oct 9, 9:37 am, Richard <rgr...@gmail.com> wrote:
> >> ,----
> >> | while(*u++=(*s=='.' ? '_' : *s))
> >> | s++;
> >> `----
>
> > That is NOT the replacement code suggested by
> > "Antoninus Twink".
>

> You are splitting hares. It is functionally equivalent - its advantage
> is a slightly shorter total length, but in exchange the loop doesn't
> have an empty body.

The difference is more than you realise; here we are
talking about code readability and maintainability and
the modified version has two major improvements to your
original:
* The s++ only occurs once
* It has a loop body

> And I find it amusing that you object to me posting under a Usenet
> handle, when your own posts are from "Old Wolf".

I'm not objecting to you posting under a handle. Also,
I find it amusing that you think it improves code if
you can make a loop body empty.

CBFalconer

unread,

Oct 8, 2007, 8:22:09 PM10/8/07

to

santosh wrote:
> CBFalconer wrote:
>
... snip ...

>
>> While I am basically in favor of the 'tauter code' group, my
>> rewrite would have been different. For one thing, I don't like
>> the ? coding. My solution:
>>
>> char *dot_to_underscore(const char *s) {
>> char *t, *u;
>>
>> if (!(t = u = malloc(strlen(s) + 1))) {
>
> You mean you commence copy when malloc fails?
>
>> while (*u = *s++) {
>> if (*u == '.') *u = '_';
>> ++u;
>> }
>> }
>> return t;
>> }
>>
>> However, I dislike taking multiple scans of the same strings, so I
>> would probably have arranged for the routine to return strlen, and
>> a negative length if malloc fails. This decision depends heavily
>> on the use to which the routine is put.
>>
>> All this is a non-factor, and basically tests individual styles and
>> preferances. While worthy of a discussion, it is not worth an
>> argument.

Yup. Slight overexuberance with the bang.

柑\/b

unread,

Oct 9, 2007, 3:31:52 AM10/9/07

to

In data Tue, 09 Oct 2007 02:39:06 +0200, Richard scrisse:

this is easier
goto l2;
l1: ++s; ++u;
l2: *u=(*s=='.'? '_': *s);
if(*s) goto l1;

in 2 lines:
goto l2;
l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s) goto l1;

in one line with #==goto

#l2; l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s)#l1;

Philip Potter

unread,

Oct 9, 2007, 3:50:19 AM10/9/07

to

Richard wrote:
> Why would you post that?
>
> Off Topic for a start and NOTHING to do with C.
>
> The poster was quite correct in his improvements of RHs code. Regardless
> of who or what he is.

Really? I see neither bug nor gross inefficiency in RH's original, which
was his main claim. His code posted is not grossly more efficient - it's
only a constant factor faster, not even a different big-O efficiency class.

--
Philip Potter pgp <at> doc.ic.ac.uk

Keith Thompson

unread,

Oct 9, 2007, 3:59:03 AM10/9/07

to

"柑\\/b" <al@f.g> writes:
> In data Tue, 09 Oct 2007 02:39:06 +0200, Richard scrisse:

[...]

>> while(*u++=(*s=='.' ? '_' : *s))
>> s++;
>
>>It couldn't be easier.
>
> this is easier
> goto l2;
> l1: ++s; ++u;
> l2: *u=(*s=='.'? '_': *s);
> if(*s) goto l1;
>
>
> in 2 lines:
> goto l2;
> l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s) goto l1;
>
> in one line with #==goto
>
> #l2; l1: ++s; ++u; l2: *u=(*s=='.'? '_': *s); if(*s)#l1;

Stripped of whitespace, compressed with gzip, and base64 encoded:

H4sIAIUzC0cAA1POMbJWyDG0UtDWLrYGEqVAnpGVglaprYZWsa2tnr1CPJBXrGmt
kJkGFNFUzjG05gIAN60ghDUAAAA=

Ahh, much better.

柑\/b

unread,

Oct 9, 2007, 4:06:00 AM10/9/07

to

In data Mon, 8 Oct 2007 00:16:07 +0200 (CEST), Antoninus Twink
scrisse:

>The function below is from Richard HeathField's fgetline program. For
>some reason, it makes three passes through the string (a strlen(), a
>strcpy() then another pass to change dots) when two would clearly be
>sufficient. This could lead to unnecessarily bad performance on very
>long strings. It is also written in a hard-to-read and clunky style.
>
>char *dot_to_underscore(const char *s)
>{

> char *t = malloc(strlen(s) + 1);
> if(t != NULL)
> {
> char *u;
> strcpy(t, s);
> u = t;
> while(*u)
> {

> if(*u == '.')

> {
> *u = '_';
> }
> ++u;
> }
> }
> return
> t;
>}

this is a big waste of vertical spaces

>Proposed solution:

>
>char *dot_to_underscore(const char *s)
>{
> char *t, *u;

柑\/b

unread,

Oct 9, 2007, 4:06:04 AM10/9/07

to

In data Tue, 09 Oct 2007 09:31:52 +0200, 柑\/b scrisse:

#.2
.0: B*i='_'; #.1;
.1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1

柑\/b

unread,

Oct 9, 2007, 4:34:17 AM10/9/07

to

In data Tue, 09 Oct 2007 10:06:04 +0200, 柑\/b scrisse:

>#.2
>.0: B*i='_'; #.1;
>.1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1

there is jmp more

#.2
.0: B*i='_'

.1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1

that should be something like
jmp short .2
.0: mov byte [esi], '.'
.1: inc esi
inc edi
.2: mov al, [edi]
cmp al, '.'
je .0
mov [esi], al
cmp al, 0
jne .1;

Antoninus Twink

unread,

Oct 9, 2007, 4:35:46 AM10/9/07

to

On 8 Oct 2007 at 23:55, Tor Rustad wrote:
> I have not seen the fgetline() code, but if the function in question is
> called only *once*, then OP "optimized" the *outer* loop, a professional
> would go for the *inner* loop.
>
> Your measurement, showed that OP's posted nonsense, an insignificant
> micro-optimization.

The point was that Mr Heathfield's code was a micro-anti-optimization.
If making the code harder to read in exchange for a small increase in
speed is bad, how much worse is it to make the code harder to read by
implementing a convoluted two-pass algorithm that's more complex and
slower than the natural, idiomatic one?

Or, if someone makes the small decisions badly, does that give us any
faith that they'll do better on the big decisions?

santosh

unread,

Oct 9, 2007, 4:47:59 AM10/9/07

to

Źa\/b wrote:

> In data Tue, 09 Oct 2007 10:06:04 +0200, Źa\/b scrisse:

>
>
>>#.2
>>.0: B*i='_'; #.1;
>>.1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1
>
> there is jmp more
>
> #.2
> .0: B*i='_'
> .1: ++i,j; .2: al=*j; al=='.'#.0; *i=al; al#.1

Your programs would be more readable if you decided to use either one of
l33t or brainf?uk.

To get you started:
<http://esoteric.voxelperfect.net/wiki/Brainf?ck>
<http://www.geocities.com/electrodruiduk/l33t.htm>

Richard Heathfield

unread,

Oct 9, 2007, 5:34:51 AM10/9/07

to

Antoninus Twink said:

> On 8 Oct 2007 at 23:55, Tor Rustad wrote:
>> I have not seen the fgetline() code, but if the function in question is
>> called only *once*, then OP "optimized" the *outer* loop, a professional
>> would go for the *inner* loop.
>>
>> Your measurement, showed that OP's posted nonsense, an insignificant
>> micro-optimization.
>
> The point was that Mr Heathfield's code was a micro-anti-optimization.

No, Mr Heathfield's code was written in about a minute, and worked
perfectly first time, and is easy for even a very inexperienced C
programmer to understand. What's more, it is called once per program
invocation, and does its job in less than a microsecond. You can call it
what you like, but I call it a win.

> If making the code harder to read in exchange for a small increase in
> speed is bad,

Whether it is bad depends on the relative merits of performance and
clarity. I have already shown how I would have to use the program 24/7 for
almost half a century before your suggested change could save me so much
as a nanosecond (once the time cost of making that change is factored in),
and quite frankly I have better things to do with my life. So in this
case, the performance increase is meaningless, whereas the loss of clarity
is significant.

> how much worse is it to make the code harder to read by
> implementing a convoluted two-pass algorithm that's more complex and
> slower than the natural, idiomatic one?

We have already demonstrated why "slower" is unimportant. If you seriously
think the original code is hard to read, then words fail me. It's
terribly, terribly simple C. It's astoundingly easy for most C people. I
simply cannot understand why you would find it difficult or complex.

> Or, if someone makes the small decisions badly, does that give us any
> faith that they'll do better on the big decisions?

We have different criteria. You appear to judge code by how "tight" it is.
I prefer to judge it by how readable and maintainable it is.

It is certainly true that I could have merged the copy and the replace
operation, but it is also true that doing so makes practically no
difference to the performance of the code. What you have called "gross
inefficiency" manages to outperform your code on at least one platform (as
reported elsethread), and doesn't perform significantly worse on others.

What's more, my code is easily understood and easily maintained by even
very inexperienced C programmers. For me, that's a key goal, because the
code I write is very often used and perhaps modified in environments over
which I have no control. So I like to keep things simple. If someone
discovers a way that they can shave 180 nanoseconds (that's 0.00000018
seconds!) off a function that is called once per program invocation, I'm
pleased for them, but that doesn't mean I have to incorporate their
changes into my code.

John Bode

unread,

Oct 9, 2007, 8:08:43 AM10/9/07

to

On Oct 7, 5:16 pm, Antoninus Twink <nos...@nospam.com> wrote:
> The function below is from Richard HeathField's fgetline program. For
> some reason, it makes three passes through the string (a strlen(), a
> strcpy() then another pass to change dots) when two would clearly be
> sufficient. This could lead to unnecessarily bad performance on very
> long strings.

It could, except that on my system (Gentoo Linux, 2.6.22 kernel, gcc
4.1.2), it doesn't. In fact, it leads to overall *better* performance
on very long strings. I wrote a test harness that generates random
strings and calls both Richard's and Antonius' versions of
dot_to_underscore, as well as a "control" version (basically a copy of
Richard's, except that it copies characters to the target buffer
manually instead of using strcpy()).

I ran three rounds of tests. Each round worked on a 2 MB string
(2097152 bytes), and for each run of the program, each version of the
function was called 100 times (basically so I could get some usable
times). For the first round, I compiled the code without any
optimization. For the second round, I compiled with the -O1 flag.
For the third round, I compiled with the -O2 flag. I used gprof to
get the time spent in each function.

For each round, I ran the program 10 times, logging the times from
each, and computed the average.

So, calling each function 100 times on a 2 MB string, I got the
following results (average time over 10 runs):

Function RH AT JB
-------- -- -- --
No optimization 2.72 s 3.49 s 4.01 s
Level 1 optimization 1.33 s 1.93 s 1.74 s
Level 2 optimization 0.60 s 1.11 s 1.13 s

Hmmmm. Two things stand out. One, on my system, calling strcpy() is
*definitely* faster than copying each individual character in a loop
(as that is the only difference between RH and JB). Secondly, writing
code in a straightforward, "clunky" style seems to make it easier to
optimize.

Of course, these results are valid *only* for my particular system.
I'd be interested to see the results from different hardware/OS/
compiler combinations.

> It is also written in a hard-to-read and clunky style.
>
> char *dot_to_underscore(const char *s)
> {
> char *t = malloc(strlen(s) + 1);
> if(t != NULL)
> {
> char *u;
> strcpy(t, s);
> u = t;
> while(*u)
> {
> if(*u == '.')
> {
> *u = '_';
> }
> ++u;
> }
> }
> return
> t;
>
> }
>

> Proposed solution:
>
> char *dot_to_underscore(const char *s)
> {
> char *t, *u;
> if(t=u=malloc(strlen(s)+1))
> while(*u++=(*s=='.' ? s++, '_' : *s++));
> return t;
>
> }

It's time to re-learn those two important lessons:

1. Terse code does not necessarily equate to faster code
2. The only way to know what version of code will be more efficient
is to profile all versions under consideration.

More bad code is written in the name of "efficiency" than anything
else.

Richard

unread,

Oct 9, 2007, 8:15:54 AM10/9/07

to

John Bode <john...@my-deja.com> writes:

> On Oct 7, 5:16 pm, Antoninus Twink <nos...@nospam.com> wrote:
>> The function below is from Richard HeathField's fgetline program. For
>> some reason, it makes three passes through the string (a strlen(), a
>> strcpy() then another pass to change dots) when two would clearly be
>> sufficient. This could lead to unnecessarily bad performance on very
>> long strings.
>
> It could, except that on my system (Gentoo Linux, 2.6.22 kernel, gcc
> 4.1.2), it doesn't. In fact, it leads to overall *better* performance
> on very long strings. I wrote a test harness that generates random

On very long strings yes. Why? because there is no comparison blocking
every substitution/transfer. Clearly the strcpy will be faster.

The number of "." will be a big input as well.

>
> It's time to re-learn those two important lessons:
>
> 1. Terse code does not necessarily equate to faster code
> 2. The only way to know what version of code will be more efficient
> is to profile all versions under consideration.
>
> More bad code is written in the name of "efficiency" than anything
> else.

There are other lessons to learn:

1) Anyone can engineer any results when they feel the need. 2 meg
strings? for a 4% increase? No mention of the data involved? Come off it.
2) The code above is not to do with pure speed at runtime

The code above is not "terse". It is bog standard C. Sorry.

It is most certainly not "bad code" - it is a damn site easier to look
at and maintain than RHs in my humber opinion.

2 lines of core code beats 15 any day of the week when the functionality
is so trivial. And it IS trivial. AND we are not taling 2k long lines
here either. Basic pointer dereference and assignments.

Antoninus Twink

unread,

Oct 9, 2007, 8:20:41 AM10/9/07

to

On 9 Oct 2007 at 9:34, Richard Heathfield wrote:

> Antoninus Twink said:
>> If making the code harder to read in exchange for a small increase in
>> speed is bad,
>
> Whether it is bad depends on the relative merits of performance and
> clarity. I have already shown how I would have to use the program 24/7 for
> almost half a century before your suggested change could save me so much
> as a nanosecond (once the time cost of making that change is factored in),
> and quite frankly I have better things to do with my life. So in this
> case, the performance increase is meaningless, whereas the loss of clarity
> is significant.

But exactly the opposite is true - clarity is lost in *your* version, by
taking something simple and making a meal of it. The average programmer
looking at your code is going to do a double-take, wonder why on earth
that strcpy() is there when you immediately make another pass through
the string, and have to convince himself that no, there really isn't any
funny business going on.

Compare that to two lines of idiomatic code.

>> how much worse is it to make the code harder to read by
>> implementing a convoluted two-pass algorithm that's more complex and
>> slower than the natural, idiomatic one?
>
> We have already demonstrated why "slower" is unimportant. If you seriously
> think the original code is hard to read, then words fail me. It's
> terribly, terribly simple C. It's astoundingly easy for most C people. I
> simply cannot understand why you would find it difficult or complex.

Yes, it's terribly, terribly simple. Almost babyish even. I did not mean
that your *code* is complex, but that the *algorithm* is complex. And
you will say, "But it's a simple algorithm!" Right, it isn't complex by
any objective standard of complexity, but it's *more complex than it
needs to be* - why swap a simple single-pass algorithm for a 2-pass
algorithm?

And if you make simple things over-complicated, we might not
unreasonably suspect that you might make complicated things into a
complete mess.