Return value for strcat and similar functions

589 views
Skip to first unread message

David Brown

unread,
May 6, 2015, 4:02:59 AM5/6/15
to
Does anyone know why the return value for strcat and similar library
functions is defined in such an unhelpful (IMHO) way?

The return value of strcat(s1, s2) is always s1, which is a value you
already have available. Returning a pointer to the end of s1 (i.e., s1
+ strlen(s1)) would be much more efficient when building up a string
from parts.

Xavier Roche

unread,
May 6, 2015, 4:35:06 AM5/6/15
to
Le 06/05/2015 10:02, David Brown a écrit :
> Does anyone know why the return value for strcat and similar library
> functions is defined in such an unhelpful (IMHO) way?

To allow thing such as printf("foo=%s", strcat(foo, bar)) ?

My feeling is that there is no rationale behind, and this was an early historical choice. The [BSD] strlcat() function has a probably much better return value choice in this regard, IMHO.

Lőrinczy Zsigmond

unread,
May 6, 2015, 10:52:30 AM5/6/15
to
This thing is called 'standardization before thinking'.
It's very popular and common. Random examples:
The return value of fgets.
Storing dates on 6 digits.
Defining NULL as integer zero.

Ambiguous standards are worse, though;
for example after fifty years,
you still don't know what BackSpace key will (or should) generate:
^? or ^H

Keith Thompson

unread,
May 6, 2015, 11:58:56 AM5/6/15
to
Speculation: The early design of C did not go out of its way to allow
for the possibility of buffer overflows. If you assume that your target
array is always going to be big enough, it's reasonable to write:

char s[BIG_ENOUGH];
strcat(strcat(strcpy(s, this), that), the_other_thing);

Personally, even if I were willing to assume the target array is big
enough, I'd probably still break up the calls:

strcpy(s, this);
strcat(s, that);
strcat(s, the_other_thing);

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Richard Heathfield

unread,
May 6, 2015, 12:07:58 PM5/6/15
to
On 06/05/15 16:58, Keith Thompson wrote:

<snip>

> Speculation: The early design of C did not go out of its way to allow
> for the possibility of buffer overflows. If you assume that your target
> array is always going to be big enough, it's reasonable to write:
>
> char s[BIG_ENOUGH];
> strcat(strcat(strcpy(s, this), that), the_other_thing);
>
> Personally, even if I were willing to assume the target array is big
> enough, I'd probably still break up the calls:
>
> strcpy(s, this);
> strcat(s, that);
> strcat(s, the_other_thing);

And I'd probably use sprintf(s, "%s%s%s", this, that, the_other_thing)
instead. Not because it's quicker (which it might or might not be, since
strcat has to find the end of the string each time, but sprintf has to
read and interpret the format string), but because to my eyes it's a
tiny little bit clearer.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Morris Dovey

unread,
May 6, 2015, 1:01:31 PM5/6/15
to
On 5/6/15 11:07 AM, Richard Heathfield wrote:
> On 06/05/15 16:58, Keith Thompson wrote:
>
> <snip>
>
>> Speculation: The early design of C did not go out of its way to allow
>> for the possibility of buffer overflows. If you assume that your target
>> array is always going to be big enough, it's reasonable to write:
>>
>> char s[BIG_ENOUGH];
>> strcat(strcat(strcpy(s, this), that), the_other_thing);
>>
>> Personally, even if I were willing to assume the target array is big
>> enough, I'd probably still break up the calls:
>>
>> strcpy(s, this);
>> strcat(s, that);
>> strcat(s, the_other_thing);
>
> And I'd probably use sprintf(s, "%s%s%s", this, that, the_other_thing)
> instead. Not because it's quicker (which it might or might not be, since
> strcat has to find the end of the string each time, but sprintf has to
> read and interpret the format string), but because to my eyes it's a
> tiny little bit clearer.

Another option is to write a variadic function that takes a
NULL-terminated list of string pointers, and returns a pointer to a
dynamically-allocated exact-size buffer containing the concatenation:

rtss(this,that,the_other_thing,NULL);

It's convenient, clear, smaller, and faster than sprintf() - but
requires remembering to free() the buffer.

A note in the prolog reminds me that I got help from Chris Torek and
posted the code here sometime around August 29, 1999.

--
Morris Dovey
http://www.iedu.com/Solar

Tim Rentsch

unread,
May 6, 2015, 9:26:08 PM5/6/15
to
You're assuming that the value of the first argument is readily
available otherwise, which isn't always true - it could be the
result of a function call, for example.

David Brown

unread,
May 7, 2015, 4:20:01 AM5/7/15
to
On 06/05/15 17:58, Keith Thompson wrote:
> David Brown <david...@hesbynett.no> writes:
>> Does anyone know why the return value for strcat and similar library
>> functions is defined in such an unhelpful (IMHO) way?
>>
>> The return value of strcat(s1, s2) is always s1, which is a value you
>> already have available. Returning a pointer to the end of s1 (i.e., s1
>> + strlen(s1)) would be much more efficient when building up a string
>> from parts.
>
> Speculation: The early design of C did not go out of its way to allow
> for the possibility of buffer overflows. If you assume that your target
> array is always going to be big enough, it's reasonable to write:
>
> char s[BIG_ENOUGH];
> strcat(strcat(strcpy(s, this), that), the_other_thing);
>
> Personally, even if I were willing to assume the target array is big
> enough, I'd probably still break up the calls:
>
> strcpy(s, this);
> strcat(s, that);
> strcat(s, the_other_thing);
>

That's how I would write it. But if strcat and strcpy returned a
pointer to the end of the string, you could write:

char s[BIG_ENOUGH];
char* t = s;
t = strcpy(t, this);
t = strcat(t, that);
t = strcat(t, the_other_thing);

That just seems so much more sensible to me - you are putting "that"
onto the end of the string, rather than running through the whole string
several times. Even though it run-time efficiency is not always a
priority in my code, it is usually important enough that glaring
inefficiencies always irritate me.

(Of course, since these are standard functions and the compiler knows
exactly what they do, it could optimise the series of strcat operations
to eliminate the extra search.)

David Brown

unread,
May 7, 2015, 4:22:55 AM5/7/15
to
It /is/ readily available - if you as a programmer need it, you have
access to it. If your "s1" is the result of calling "foo()", then all
you need to write is "strcat(s1 = foo(), s2)".

Bartc

unread,
May 7, 2015, 5:12:08 AM5/7/15
to
It sounds a little like a deliberate decision to keep string lengths
under wraps and not to let the programmer know what they were without
explicitly calling strlen().

This might have sounded like an elegant concept back in 1970.

After all, if string lengths were available at nearly every stage, then
it would tear apart the string library: you'd use memcpy instead of
strcpy and strcat; you wouldn't need strcmp half the time (because the
lengths were unequal, and for equal lengths you'd use memcmp); it would
even obviate the need for a zero terminator in many cases!

--
Bartc

glen herrmannsfeldt

unread,
May 7, 2015, 9:21:39 AM5/7/15
to
David Brown <david...@hesbynett.no> wrote:
> On 06/05/15 17:58, Keith Thompson wrote:
>> David Brown <david...@hesbynett.no> writes:
>>> Does anyone know why the return value for strcat and similar
>>> library functions is defined in such an unhelpful (IMHO) way?

(snip)
>> Speculation: The early design of C did not go out of its way to allow
>> for the possibility of buffer overflows. If you assume that your target
>> array is always going to be big enough, it's reasonable to write:

(snip)
> That's how I would write it. But if strcat and strcpy returned a
> pointer to the end of the string, you could write:

> char s[BIG_ENOUGH];
> char* t = s;
> t = strcpy(t, this);
> t = strcat(t, that);
> t = strcat(t, the_other_thing);

> That just seems so much more sensible to me - you are putting "that"
> onto the end of the string, rather than running through the whole string
> several times. Even though it run-time efficiency is not always a
> priority in my code, it is usually important enough that glaring
> inefficiencies always irritate me.

C originated on small machines. Seems to me that they didn't think
about how it would scale to really large machines. Pretty much no-one
at the time thought about what you would do with gigabytes of RAM,
or even gigabytes of disk.

A big disk at the time was 100MB, but they might have had a 5MB disk.

-- glen

David Brown

unread,
May 7, 2015, 10:24:50 AM5/7/15
to
I work mostly with small machines, with fewer resources than the
earliest C targets. Re-scanning strings unnecessarily to find their
ends is a waste of resources no matter what size of system you have, and
it would not have been difficult to avoid when strcat, etc., were first
specified.


Morris Dovey

unread,
May 7, 2015, 11:17:57 AM5/7/15
to
On 5/7/15 3:19 AM, David Brown wrote:

> That's how I would write it. But if strcat and strcpy returned a
> pointer to the end of the string, you could write:
>
> char s[BIG_ENOUGH];
> char* t = s;
> t = strcpy(t, this);
> t = strcat(t, that);
> t = strcat(t, the_other_thing);
>
> That just seems so much more sensible to me - you are putting "that"
> onto the end of the string, rather than running through the whole string
> several times. Even though it run-time efficiency is not always a
> priority in my code, it is usually important enough that glaring
> inefficiencies always irritate me.
>
> (Of course, since these are standard functions and the compiler knows
> exactly what they do, it could optimise the series of strcat operations
> to eliminate the extra search.)

Is it worth pointing out that it’s easier to implement the desired
behavior from scratch than to complain about the lack of it? :-P

char *cate(char *d,char *s)
{ if (!s) *(s = d) = '\0';
while (*d++ = *s++);
return --d;

Morris Dovey

unread,
May 7, 2015, 11:57:08 AM5/7/15
to
On 5/7/15 10:17 AM, Morris Dovey wrote:
> Is it worth pointing out that it’s easier to implement the desired
> behavior from scratch than to complain about the lack of it? :-P
>
> char *cate(char *d,char *s)
> { if (!s) *(s = d) = '\0';
> while (*d++ = *s++);
> return --d;
> }

On second thought, I think I prefer

char *cate(char *d,char *s)
{ if (!s) *d++ = '\0';
else while (*d++ = *s++);
return --d;
}

--
Morris Dovey
http://www.iedu.com/Solar/
http://www.facebook.com/MorrisDovey

asetof...@gmail.com

unread,
May 7, 2015, 2:00:09 PM5/7/15
to
Morris wrote:"
char *cate(char *d,char *s)
{ if (!s) *d++ = '\0';
else while (*d++ = *s++);

return --d;
}"

char* cate(char *d,char *s)
{ if(d==0) return 0;
else if(s==0) *d++ = '\0';
else while(*d++ = *s++);
return --d;
}

glen herrmannsfeldt

unread,
May 7, 2015, 2:02:48 PM5/7/15
to
David Brown <david...@hesbynett.no> wrote:

(snip, regarding strcat, then I wrote)

>> C originated on small machines. Seems to me that they didn't think
>> about how it would scale to really large machines. Pretty much no-one
>> at the time thought about what you would do with gigabytes of RAM,
>> or even gigabytes of disk.

>> A big disk at the time was 100MB, but they might have had a 5MB disk.

> I work mostly with small machines, with fewer resources than the
> earliest C targets. Re-scanning strings unnecessarily to find their
> ends is a waste of resources no matter what size of system you have, and
> it would not have been difficult to avoid when strcat, etc., were first
> specified.

Yes. But note that a strcat() loop is O(N**2) in time, and the
reasonable size for N increases with memory size.

I once ran into this with real data. Someone had written a program
to read in DNA sequence data, with strcat() in a loop. Sometime later,
we noticed that part was unusually slow. It was reading in megabytes
of data 60 bytes per line. On a PDP-11, you might read in kilobytes
that way, but not megabytes.

I changed the loop to remember where the previous read ended, and
strcpy() the new data in. Not much harder, but O(N).

-- glen

Morris Dovey

unread,
May 7, 2015, 3:28:47 PM5/7/15
to
On 5/7/15 1:00 PM, asetof...@gmail.com wrote:
> char* cate(char *d,char *s)
> { if(d==0) return 0;
> else if(s==0) *d++ = '\0';
> else while(*d++ = *s++);
> return --d;
> }

I thought about that, then decided that while s==NULL could conceivably
be valid, d==NULL would always indicate an error. Silently returning a
NULL pointer would mask the error, and my preference is to fail early.

When I got done tinkering, this is what I settled on:

char *cate(char *d,char *s)
{ if (s) while (*d++ = *s++);
else *d++ = '\0';
return --d;
}

Whatever, I like that the one function can fill the roles of both
strcpy() and strcat() and eliminate the pointer initialization in
David's sample scenario:

char s[BIG_ENOUGH], *t;
t = cate(s,this);
t = cate(t,that);
t = cate(t,the_other_thing);

asetof...@gmail.com

unread,
May 7, 2015, 3:47:19 PM5/7/15
to
Return 0 as pointer means "Return
error"... No library function
has to seg fault, the seg fault
I say it is better out

Tim Rentsch

unread,
May 7, 2015, 6:39:33 PM5/7/15
to
David Brown <david...@hesbynett.no> writes:

> On 07/05/15 03:25, Tim Rentsch wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>>> Does anyone know why the return value for strcat and similar library
>>> functions is defined in such an unhelpful (IMHO) way?
>>>
>>> The return value of strcat(s1, s2) is always s1, which is a value you
>>> already have available. Returning a pointer to the end of s1 (i.e., s1
>>> + strlen(s1)) would be much more efficient when building up a string
>>> from parts.
>>
>> You're assuming that the value of the first argument is readily
>> available otherwise, which isn't always true - it could be the
>> result of a function call, for example.
>
> It /is/ readily available -

Not exactly. What you mean is not that it is readily available
but that it can be made available, by declaring a variable and
assigning the returned value.

> if you as a programmer need it, you have
> access to it. If your "s1" is the result of calling "foo()", then all
> you need to write is "strcat(s1 = foo(), s2)".

My point is either choice has its own advantages. Personally I think
the choice made for strcat() is the right one for what it is. If
something different or more elaborate is wanted, it's easy enough
to program that.

Tim Rentsch

unread,
May 7, 2015, 6:55:31 PM5/7/15
to
Morris Dovey <mrd...@iedu.com> writes:

> On 5/7/15 3:19 AM, David Brown wrote:
>
>> That's how I would write it. But if strcat and strcpy returned a
>> pointer to the end of the string, you could write:
>>
>> char s[BIG_ENOUGH];
>> char* t = s;
>> t = strcpy(t, this);
>> t = strcat(t, that);
>> t = strcat(t, the_other_thing);
>>
>> That just seems so much more sensible to me - you are putting "that"
>> onto the end of the string, rather than running through the whole string
>> several times. Even though it run-time efficiency is not always a
>> priority in my code, it is usually important enough that glaring
>> inefficiencies always irritate me.
>>
>> (Of course, since these are standard functions and the compiler knows
>> exactly what they do, it could optimise the series of strcat operations
>> to eliminate the extra search.)
>
> Is it worth pointing out that it's easier to implement the desired
> behavior from scratch than to complain about the lack of it? :-P

+1

> char *cate(char *d,char *s)
> { if (!s) *(s = d) = '\0';
> while (*d++ = *s++);
> return --d;
> }

char *
add_more( char *d, char *s ){
return (*d = *s) ? add_more( d+1, s+1 ) : d;
}

(I'm not wild about the name add_more, maybe someone has a better
suggestion.)

David Brown

unread,
May 8, 2015, 5:03:52 AM5/8/15
to
On 07/05/15 17:17, Morris Dovey wrote:
> On 5/7/15 3:19 AM, David Brown wrote:
>
>> That's how I would write it. But if strcat and strcpy returned a
>> pointer to the end of the string, you could write:
>>
>> char s[BIG_ENOUGH];
>> char* t = s;
>> t = strcpy(t, this);
>> t = strcat(t, that);
>> t = strcat(t, the_other_thing);
>>
>> That just seems so much more sensible to me - you are putting "that"
>> onto the end of the string, rather than running through the whole string
>> several times. Even though it run-time efficiency is not always a
>> priority in my code, it is usually important enough that glaring
>> inefficiencies always irritate me.
>>
>> (Of course, since these are standard functions and the compiler knows
>> exactly what they do, it could optimise the series of strcat operations
>> to eliminate the extra search.)
>
> Is it worth pointing out that it’s easier to implement the desired
> behavior from scratch than to complain about the lack of it? :-P
>

Yes, in the case of strcat the functionality is particularly easy to
duplicate (often it is not even worth making it a separate function). I
am not complaining as such - I am more curious to know if there is any
good reason why strcat was specified the way it is, when it could so
easily have been more efficient.

David Brown

unread,
May 8, 2015, 5:09:35 AM5/8/15
to
On 08/05/15 00:39, Tim Rentsch wrote:
> David Brown <david...@hesbynett.no> writes:
>
>> On 07/05/15 03:25, Tim Rentsch wrote:
>>> David Brown <david...@hesbynett.no> writes:
>>>
>>>> Does anyone know why the return value for strcat and similar library
>>>> functions is defined in such an unhelpful (IMHO) way?
>>>>
>>>> The return value of strcat(s1, s2) is always s1, which is a value you
>>>> already have available. Returning a pointer to the end of s1 (i.e., s1
>>>> + strlen(s1)) would be much more efficient when building up a string
>>>> from parts.
>>>
>>> You're assuming that the value of the first argument is readily
>>> available otherwise, which isn't always true - it could be the
>>> result of a function call, for example.
>>
>> It /is/ readily available -
>
> Not exactly. What you mean is not that it is readily available
> but that it can be made available, by declaring a variable and
> assigning the returned value.

That is certainly what /I/ mean be "readily available". You can get the
value by simply attaching a name to it - that's good enough.

>
>> if you as a programmer need it, you have
>> access to it. If your "s1" is the result of calling "foo()", then all
>> you need to write is "strcat(s1 = foo(), s2)".
>
> My point is either choice has its own advantages. Personally I think
> the choice made for strcat() is the right one for what it is. If
> something different or more elaborate is wanted, it's easy enough
> to program that.
>

I agree that it's easy to write a replacement for strcat() that returns
a pointer to the end terminator (or the length of the string, if
preferred). And of course your personal opinion is up to you. But I
can't see any good reason why the standard library version is specified
the way it is, rather than the much more useful alternative of returning
a pointer to the end of the string. (And if you want a version
returning the first argument, that's easy to write yourself.)


Morris Dovey

unread,
May 8, 2015, 6:07:22 AM5/8/15
to
On 5/8/15 4:03 AM, David Brown wrote:
> Yes, in the case of strcat the functionality is particularly easy to
> duplicate (often it is not even worth making it a separate function).
> I am not complaining as such - I am more curious to know if there is
> any good reason why strcat was specified the way it is, when it could
> so easily have been more efficient.

I can't do much more than speculate that there might have been a lot of
'get it working, then can polish it up later' going on, along with a
certain amount of pressure for a working version of unix ASAP. I suspect
a lot of that 'polishing' never actually got done and a lot of good
ideas (multiple entry points, for example) never saw the light of day.
The original syntax has always struck me as 'cobbled' and lacking
elegance - and I'm inclined to think polish wasn't of much concern.

Of course, once there was a working compiler and operating system, then
there was likely considerable managerial resistance to change on the 'if
it ain't broke, don't fix it!' principal.

Richard Bos

unread,
May 8, 2015, 5:57:35 PM5/8/15
to
The value of the first argument is always either readily available or
readily _made_ available. The desired return value, the _end_ of the
string rather than the already given start of it, is not, except from
within the function.

I, too, would have preferred to get the latter value from strfoo(), and
if I ever get around to draw up the specifications for my very perfect
C-with-hindsight (a.k.a. C 2020), they will - but in C, they don't. For
this, as far as the Standard is concerned, I fear the blame lays
squarely at the feet of that old monster, Previous Art; why Art did it
this way, Paul may know, but I don't. Possibly laziness.

Richard

Nathan Wagner

unread,
May 8, 2015, 8:12:06 PM5/8/15
to
On 2015-05-07, Morris Dovey <mrd...@iedu.com> wrote:
> On 5/7/15 10:17 AM, Morris Dovey wrote:
>> Is it worth pointing out that it???s easier to implement the desired
>> behavior from scratch than to complain about the lack of it? :-P

[snip]

Probably, since I was about to post the same thing. The only downside
of course is that you don't get the benefit of any assembly optimized
implementation.

--
nw

Tim Rentsch

unread,
May 10, 2015, 11:44:28 AM5/10/15
to
David Brown <david...@hesbynett.no> writes:

> On 08/05/15 00:39, Tim Rentsch wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>>> On 07/05/15 03:25, Tim Rentsch wrote:
>>>> David Brown <david...@hesbynett.no> writes:
>>>>
>>>>> Does anyone know why the return value for strcat and similar
>>>>> library functions is defined in such an unhelpful (IMHO) way?
>>>>>
>>>>> The return value of strcat(s1, s2) is always s1, which is a
>>>>> value you already have available. Returning a pointer to the
>>>>> end of s1 (i.e., s1 + strlen(s1)) would be much more efficient
>>>>> when building up a string from parts.
>>>>
>>>> You're assuming that the value of the first argument is readily
>>>> available otherwise, which isn't always true - it could be the
>>>> result of a function call, for example.
>>>
>>> It /is/ readily available -
>>
>> Not exactly. What you mean is not that it is readily available
>> but that it can be made available, by declaring a variable and
>> assigning the returned value.
>
> That is certainly what /I/ mean be "readily available". You can
> get the value by simply attaching a name to it - that's good
> enough.

I don't want to get into a quibbling match about definitions;
the key thing is your description is wrong. You cannot simply
attach a name to the needed value - sometimes a name must be
declared before being assigned to. That is a non-negligible
cost.

>>> if you as a programmer need it, you have access to it. If your
>>> "s1" is the result of calling "foo()", then all you need to write
>>> is "strcat(s1 = foo(), s2)".
>>
>> My point is either choice has its own advantages. Personally I
>> think the choice made for strcat() is the right one for what it
>> is. If something different or more elaborate is wanted, it's
>> easy enough to program that.
>
> I agree that it's easy to write a replacement for strcat() that
> returns a pointer to the end terminator (or the length of the
> string, if preferred). And of course your personal opinion is up
> to you. But I can't see any good reason why the standard library
> version is specified the way it is, rather than the much more
> useful alternative of returning a pointer to the end of the
> string.

What's wrong with the idea that someone thought it valuable to
make the resulting (entire) string available so it could be
given, eg, as an argument to another function (or perhaps one
branch of a ?: operator, or return expression, etc), taking
advantage of a functional style? Is it so hard for you to
imagine that someone else would have different priorities than
you do?

> (And if you want a version returning the first argument,
> that's easy to write yourself.)

I don't have to, I can just use strcat(). :)

Tim Rentsch

unread,
May 10, 2015, 12:07:59 PM5/10/15
to
ral...@xs4all.nl (Richard Bos) writes:

> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>> David Brown <david...@hesbynett.no> writes:
>>
>>> Does anyone know why the return value for strcat and similar
>>> library functions is defined in such an unhelpful (IMHO) way?
>>>
>>> The return value of strcat(s1, s2) is always s1, which is a value
>>> you already have available. Returning a pointer to the end of s1
>>> (i.e., s1 + strlen(s1)) would be much more efficient when building
>>> up a string from parts.
>>
>> You're assuming that the value of the first argument is readily
>> available otherwise, which isn't always true - it could be the
>> result of a function call, for example.
>
> The value of the first argument is always either readily available
> or readily _made_ available. The desired return value, the _end_
> of the string rather than the already given start of it, is not,
> except from within the function.

I agree the value is always either readily available or can be
made available by declaring a name to hold the value. Either
way, the result can be clunky if what you want to do is pass
the value on to another expression in a more functional style,
which the current semantics supports.

> I, too, would have preferred to get the latter value from
> strfoo(), and if I ever get around to draw up the specifications
> for my very perfect C-with-hindsight (a.k.a. C 2020), they will -
> but in C, they don't. For this, as far as the Standard is
> concerned, I fear the blame lays squarely at the feet of that old
> monster, Previous Art; why Art did it this way, Paul may know,
> but I don't. Possibly laziness.

Another possibility is that whoever made the decision thought
supporting a more functional style has value, and made a
conscious choice favoring that over an "end of string" version.

Incidentally, if you want these, both can be provided with
less writing then you did in your followup:

char *
bstrcpy( char *d, const char *s ){
return (*d = *s) ? bstrcpy( d+1, s+1 ) : d;
}

char *
bstrcat( char *d, const char *s ){
return *d ? bstrcat( d+1, s ) : bstrcpy( d, s );
}

David Brown

unread,
May 10, 2015, 3:30:43 PM5/10/15
to
If you think that this cost is non-negligible (assuming a half-decent
compiler), then I suppose that explains your problem here. I, on the
other hand, have no doubts that it is negligible, and have great
difficulty in imagining a situation where it is significantly easier,
clearer, faster, or more efficient to use "s1 = strcat(foo(), s2)"
rather than "strcat(s1 = foo(), s2)", or my own preferred choice of two
lines "s1 = foo(); strcat(s1, s2);".

I think that means we have approximately the same definition of "readily
available", but different ideas about the cost of attaching a name to a
value that the compiler has already calculated and will need later on.

>
>>>> if you as a programmer need it, you have access to it. If your
>>>> "s1" is the result of calling "foo()", then all you need to write
>>>> is "strcat(s1 = foo(), s2)".
>>>
>>> My point is either choice has its own advantages. Personally I
>>> think the choice made for strcat() is the right one for what it
>>> is. If something different or more elaborate is wanted, it's
>>> easy enough to program that.
>>
>> I agree that it's easy to write a replacement for strcat() that
>> returns a pointer to the end terminator (or the length of the
>> string, if preferred). And of course your personal opinion is up
>> to you. But I can't see any good reason why the standard library
>> version is specified the way it is, rather than the much more
>> useful alternative of returning a pointer to the end of the
>> string.
>
> What's wrong with the idea that someone thought it valuable to
> make the resulting (entire) string available so it could be
> given, eg, as an argument to another function (or perhaps one
> branch of a ?: operator, or return expression, etc), taking
> advantage of a functional style? Is it so hard for you to
> imagine that someone else would have different priorities than
> you do?

The point of this thread was to ask people with different experiences
and styles if they could think of an advantage of the way strcat is
defined - precisely because while I could not think of one myself, I
thought that someone else might. So far no one (including you) has
given any suggestions other than "it seemed a good idea at the time" or
"they standardised the function they were using, without thinking about
other use-cases". This is, of course, entirely normal - most projects
of any size have decisions that are odd, inefficient, or unfortunately
when looked at later with hindsight. I merely wondered if the early C
authors had a good reason that I didn't know of, and it seems not to be
the case.

Tim Rentsch

unread,
May 11, 2015, 11:42:47 PM5/11/15
to
> here. [snip]

The cost I'm talking about is a coding and maintenance cost,
not a compilation or performance cost. I don't mind agreeing to
the possibility that your sense of the size of that cost may be
different from mine.
> were using, without thinking about other use-cases". [snip]

My answer may not have been stated very clearly, but I did
suggest another possible explanation, namely, for reasons you
don't consider important, namely coding convenience in certain
situations, those making the choice thought, and still think (or
would think, if they were still alive), that this consideration
outweighs those of the alternate semantics, even after conscious
evaluation.

David Brown

unread,
May 12, 2015, 3:52:50 AM5/12/15
to
On 12/05/15 05:42, Tim Rentsch wrote:
> David Brown <david...@hesbynett.no> writes:
>

<snip>

> The cost I'm talking about is a coding and maintenance cost,
> not a compilation or performance cost. I don't mind agreeing to
> the possibility that your sense of the size of that cost may be
> different from mine.

Fair enough. I don't think we need a discussion about the differences
in such costs in this context - at least, not in this thread.

>>
>> The point of this thread was to ask people with different
>> experiences and styles if they could think of an advantage of the
>> way strcat is defined - precisely because while I could not think
>> of one myself, I thought that someone else might. So far no one
>> (including you) has given any suggestions other than "it seemed a
>> good idea at the time" or "they standardised the function they
>> were using, without thinking about other use-cases". [snip]
>
> My answer may not have been stated very clearly, but I did
> suggest another possible explanation, namely, for reasons you
> don't consider important, namely coding convenience in certain
> situations, those making the choice thought, and still think (or
> would think, if they were still alive), that this consideration
> outweighs those of the alternate semantics, even after conscious
> evaluation.
>

Can you give me:

1. An example where strcat, as defined today, leads to /significantly/
clearer, neater, more maintainable, or more efficient code than a
"strcat2" function that returns a pointer to the end of the string?

2. A justification why you think such coding style was likely in the
early days of C, thus making it a reasonable explanation for the choice
of strcat?




It's very easy to find situations where a strcat2 (returning a pointer
to the end of the string) is clearer and more efficient than strcat.

// Combine three strings and return the total length
int combineStrings(char* s, const char* t1, const char* t2) {
strcat(s, t1); // Time len(s) + len(t1)
strcat(s, t2); // Time len(s) + len(t1) + len(t2)
return strlen(s); // Time len(s) + len(t1) + len(t2)
}
// Total time 3*len(s) + 3*len(t1) + 2*len(t2)


int combineStrings(char* s, const char* t1, const char* t2) {
char* end = strcat2(s, t1); // Time len(s) + len(t1)
end = strcat(end, t2); // Time len(t2)
return end - s; // Time 1
}
// Total time len(s) + len(t1) + len(t2) + 1

(Note that if you really don't like the extra "char* end" variable, you
can use strcat2 exactly like strcat in the first version.)


It is also simpler and more efficient to implement strcat in terms of
strcat2 rather than the reverse:

char * strcat(char* s, const char* t) {
strcat2(s, t); // Time len(s) + len(t)
return s;
}
// Total time len(s) + len(t)

char * strcat2(char* s, const char* t) {
strcat(s, t); // Time len(s) + len(t)
return strlen(s); // Time len(s) + len(t)
}
// Total time 2*len(s) + 2*len(t)



So it is clear that there are common cases where a strcat2 returning the
end pointer is more flexible and significantly more efficient than the
standard version. There are no cases in which the standard version is
more efficient. And the standard version is easily and efficiently
created from the strcat2 version.

The question is, does the standard version lead to significantly neater
or clearer code in common cases, and was it common enough to lead to a
concious design decision when the library was standardised? Or was it,
as most people here seem to think, a bad decision - probably the result
of standardising someone's existing usage rather than the result of a
thoughtful design.



Bartc

unread,
May 12, 2015, 8:34:02 AM5/12/15
to
On 12/05/2015 08:52, David Brown wrote:
> It's very easy to find situations where a strcat2 (returning a pointer
> to the end of the string) is clearer and more efficient than strcat.
>
> // Combine three strings and return the total length
> int combineStrings(char* s, const char* t1, const char* t2) {
> strcat(s, t1); // Time len(s) + len(t1)
> strcat(s, t2); // Time len(s) + len(t1) + len(t2)
> return strlen(s); // Time len(s) + len(t1) + len(t2)
> }
> // Total time 3*len(s) + 3*len(t1) + 2*len(t2)
>
>
> int combineStrings(char* s, const char* t1, const char* t2) {
> char* end = strcat2(s, t1); // Time len(s) + len(t1)
> end = strcat(end, t2); // Time len(t2)
> return end - s; // Time 1
> }
> // Total time len(s) + len(t1) + len(t2) + 1

In this case you might as well use strlen() on each string, and use
memcpy functions instead.

In fact, if you're going to work with string lengths, the chances are
that these are already available from prior operations on the three
strings, so you might only need to call strlen 0, 1 or 2 times instead of 3.

Perhaps that was the original idea: either use explicit lengths, or work
more elegantly with pure zero-terminated strings.

--
Bartc

Ike Naar

unread,
May 14, 2015, 4:17:10 PM5/14/15
to
On 2015-05-12, David Brown <david...@hesbynett.no> wrote:
> It's very easy to find situations where a strcat2 (returning a pointer
> to the end of the string) is clearer and more efficient than strcat.
>
> // Combine three strings and return the total length
> int combineStrings(char* s, const char* t1, const char* t2) {
> strcat(s, t1); // Time len(s) + len(t1)
> strcat(s, t2); // Time len(s) + len(t1) + len(t2)
> return strlen(s); // Time len(s) + len(t1) + len(t2)
> }
> // Total time 3*len(s) + 3*len(t1) + 2*len(t2)
>
>
> int combineStrings(char* s, const char* t1, const char* t2) {
> char* end = strcat2(s, t1); // Time len(s) + len(t1)
> end = strcat(end, t2); // Time len(t2)

Shouldn't that be strcat2(end, t2) ?

David Brown

unread,
May 14, 2015, 5:07:03 PM5/14/15
to
Of course. Sorry.

Tim Rentsch

unread,
May 20, 2015, 2:08:34 PM5/20/15
to
No, I'm not offering either of those. I'm not saying the reasons
mentioned are ones you would find convincing, only that is it
plausible that the original designers found them convincing.

> It's very easy to find situations where a strcat2 (returning a pointer
> to the end of the string) is clearer and more efficient than strcat.
>
> // Combine three strings and return the total length
> int combineStrings(char* s, const char* t1, const char* t2) {
> strcat(s, t1); // Time len(s) + len(t1)
> strcat(s, t2); // Time len(s) + len(t1) + len(t2)
> return strlen(s); // Time len(s) + len(t1) + len(t2)
> }
> // Total time 3*len(s) + 3*len(t1) + 2*len(t2)
>
>
> int combineStrings(char* s, const char* t1, const char* t2) {
> char* end = strcat2(s, t1); // Time len(s) + len(t1)
> end = strcat(end, t2); // Time len(t2)
> return end - s; // Time 1
> }
> // Total time len(s) + len(t1) + len(t2) + 1
>
> (Note that if you really don't like the extra "char* end" variable, you
> can use strcat2 exactly like strcat in the first version.)

These examples don't seem very compelling to me, because what the
functions do seems somewhat contrived, and because they don't
match how I expect they would be written, especially if
efficiency concerns were paramount. (I assume you meant strcat2
rather than strcat in the second definition.)

> It is also simpler and more efficient to implement strcat in terms of
> strcat2 rather than the reverse:
>
> char * strcat(char* s, const char* t) {
> strcat2(s, t); // Time len(s) + len(t)
> return s;
> }
> // Total time len(s) + len(t)
>
> char * strcat2(char* s, const char* t) {
> strcat(s, t); // Time len(s) + len(t)
> return strlen(s); // Time len(s) + len(t)
> }
> // Total time 2*len(s) + 2*len(t)

I might agree on the "more efficient" part, but not on "simpler"
part.

> So it is clear that there are common cases where a strcat2 returning the
> end pointer is more flexible and significantly more efficient than the
> standard version. There are no cases in which the standard version is
> more efficient. And the standard version is easily and efficiently
> created from the strcat2 version.

I understand that you think this reasoning is convincing. Not
everyone does.

> The question is, does the standard version lead to significantly neater
> or clearer code in common cases, and was it common enough to lead to a
> concious design decision when the library was standardised? Or was it,
> as most people here seem to think, a bad decision - probably the result
> of standardising someone's existing usage rather than the result of a
> thoughtful design.

Why do you find it so hard to believe that other people might
reasonably reach different conclusions than you do?

David Brown

unread,
May 21, 2015, 4:56:05 AM5/21/15