Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Return value for strcat and similar functions

602 views
Skip to first unread message

David Brown

unread,
May 6, 2015, 4:02:59 AM5/6/15
to
Does anyone know why the return value for strcat and similar library
functions is defined in such an unhelpful (IMHO) way?

The return value of strcat(s1, s2) is always s1, which is a value you
already have available. Returning a pointer to the end of s1 (i.e., s1
+ strlen(s1)) would be much more efficient when building up a string
from parts.

Xavier Roche

unread,
May 6, 2015, 4:35:06 AM5/6/15
to
Le 06/05/2015 10:02, David Brown a écrit :
> Does anyone know why the return value for strcat and similar library
> functions is defined in such an unhelpful (IMHO) way?

To allow thing such as printf("foo=%s", strcat(foo, bar)) ?

My feeling is that there is no rationale behind, and this was an early historical choice. The [BSD] strlcat() function has a probably much better return value choice in this regard, IMHO.

Lőrinczy Zsigmond

unread,
May 6, 2015, 10:52:30 AM5/6/15
to
This thing is called 'standardization before thinking'.
It's very popular and common. Random examples:
The return value of fgets.
Storing dates on 6 digits.
Defining NULL as integer zero.

Ambiguous standards are worse, though;
for example after fifty years,
you still don't know what BackSpace key will (or should) generate:
^? or ^H

Keith Thompson

unread,
May 6, 2015, 11:58:56 AM5/6/15
to
Speculation: The early design of C did not go out of its way to allow
for the possibility of buffer overflows. If you assume that your target
array is always going to be big enough, it's reasonable to write:

char s[BIG_ENOUGH];
strcat(strcat(strcpy(s, this), that), the_other_thing);

Personally, even if I were willing to assume the target array is big
enough, I'd probably still break up the calls:

strcpy(s, this);
strcat(s, that);
strcat(s, the_other_thing);

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Richard Heathfield

unread,
May 6, 2015, 12:07:58 PM5/6/15
to
On 06/05/15 16:58, Keith Thompson wrote:

<snip>

> Speculation: The early design of C did not go out of its way to allow
> for the possibility of buffer overflows. If you assume that your target
> array is always going to be big enough, it's reasonable to write:
>
> char s[BIG_ENOUGH];
> strcat(strcat(strcpy(s, this), that), the_other_thing);
>
> Personally, even if I were willing to assume the target array is big
> enough, I'd probably still break up the calls:
>
> strcpy(s, this);
> strcat(s, that);
> strcat(s, the_other_thing);

And I'd probably use sprintf(s, "%s%s%s", this, that, the_other_thing)
instead. Not because it's quicker (which it might or might not be, since
strcat has to find the end of the string each time, but sprintf has to
read and interpret the format string), but because to my eyes it's a
tiny little bit clearer.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Morris Dovey

unread,
May 6, 2015, 1:01:31 PM5/6/15
to
On 5/6/15 11:07 AM, Richard Heathfield wrote:
> On 06/05/15 16:58, Keith Thompson wrote:
>
> <snip>
>
>> Speculation: The early design of C did not go out of its way to allow
>> for the possibility of buffer overflows. If you assume that your target
>> array is always going to be big enough, it's reasonable to write:
>>
>> char s[BIG_ENOUGH];
>> strcat(strcat(strcpy(s, this), that), the_other_thing);
>>
>> Personally, even if I were willing to assume the target array is big
>> enough, I'd probably still break up the calls:
>>
>> strcpy(s, this);
>> strcat(s, that);
>> strcat(s, the_other_thing);
>
> And I'd probably use sprintf(s, "%s%s%s", this, that, the_other_thing)
> instead. Not because it's quicker (which it might or might not be, since
> strcat has to find the end of the string each time, but sprintf has to
> read and interpret the format string), but because to my eyes it's a
> tiny little bit clearer.

Another option is to write a variadic function that takes a
NULL-terminated list of string pointers, and returns a pointer to a
dynamically-allocated exact-size buffer containing the concatenation:

rtss(this,that,the_other_thing,NULL);

It's convenient, clear, smaller, and faster than sprintf() - but
requires remembering to free() the buffer.

A note in the prolog reminds me that I got help from Chris Torek and
posted the code here sometime around August 29, 1999.

--
Morris Dovey
http://www.iedu.com/Solar

Tim Rentsch

unread,
May 6, 2015, 9:26:08 PM5/6/15
to
You're assuming that the value of the first argument is readily
available otherwise, which isn't always true - it could be the
result of a function call, for example.

David Brown

unread,
May 7, 2015, 4:20:01 AM5/7/15
to
On 06/05/15 17:58, Keith Thompson wrote:
> David Brown <david...@hesbynett.no> writes:
>> Does anyone know why the return value for strcat and similar library
>> functions is defined in such an unhelpful (IMHO) way?
>>
>> The return value of strcat(s1, s2) is always s1, which is a value you
>> already have available. Returning a pointer to the end of s1 (i.e., s1
>> + strlen(s1)) would be much more efficient when building up a string
>> from parts.
>
> Speculation: The early design of C did not go out of its way to allow
> for the possibility of buffer overflows. If you assume that your target
> array is always going to be big enough, it's reasonable to write:
>
> char s[BIG_ENOUGH];
> strcat(strcat(strcpy(s, this), that), the_other_thing);
>
> Personally, even if I were willing to assume the target array is big
> enough, I'd probably still break up the calls:
>
> strcpy(s, this);
> strcat(s, that);
> strcat(s, the_other_thing);
>

That's how I would write it. But if strcat and strcpy returned a
pointer to the end of the string, you could write:

char s[BIG_ENOUGH];
char* t = s;
t = strcpy(t, this);
t = strcat(t, that);
t = strcat(t, the_other_thing);

That just seems so much more sensible to me - you are putting "that"
onto the end of the string, rather than running through the whole string
several times. Even though it run-time efficiency is not always a
priority in my code, it is usually important enough that glaring
inefficiencies always irritate me.

(Of course, since these are standard functions and the compiler knows
exactly what they do, it could optimise the series of strcat operations
to eliminate the extra search.)

David Brown

unread,
May 7, 2015, 4:22:55 AM5/7/15
to
It /is/ readily available - if you as a programmer need it, you have
access to it. If your "s1" is the result of calling "foo()", then all
you need to write is "strcat(s1 = foo(), s2)".

Bartc

unread,
May 7, 2015, 5:12:08 AM5/7/15
to
It sounds a little like a deliberate decision to keep string lengths
under wraps and not to let the programmer know what they were without
explicitly calling strlen().

This might have sounded like an elegant concept back in 1970.

After all, if string lengths were available at nearly every stage, then
it would tear apart the string library: you'd use memcpy instead of
strcpy and strcat; you wouldn't need strcmp half the time (because the
lengths were unequal, and for equal lengths you'd use memcmp); it would
even obviate the need for a zero terminator in many cases!

--
Bartc

glen herrmannsfeldt

unread,
May 7, 2015, 9:21:39 AM5/7/15
to
David Brown <david...@hesbynett.no> wrote:
> On 06/05/15 17:58, Keith Thompson wrote:
>> David Brown <david...@hesbynett.no> writes:
>>> Does anyone know why the return value for strcat and similar
>>> library functions is defined in such an unhelpful (IMHO) way?

(snip)
>> Speculation: The early design of C did not go out of its way to allow
>> for the possibility of buffer overflows. If you assume that your target
>> array is always going to be big enough, it's reasonable to write:

(snip)
> That's how I would write it. But if strcat and strcpy returned a
> pointer to the end of the string, you could write:

> char s[BIG_ENOUGH];
> char* t = s;
> t = strcpy(t, this);
> t = strcat(t, that);
> t = strcat(t, the_other_thing);

> That just seems so much more sensible to me - you are putting "that"
> onto the end of the string, rather than running through the whole string
> several times. Even though it run-time efficiency is not always a
> priority in my code, it is usually important enough that glaring
> inefficiencies always irritate me.

C originated on small machines. Seems to me that they didn't think
about how it would scale to really large machines. Pretty much no-one
at the time thought about what you would do with gigabytes of RAM,
or even gigabytes of disk.

A big disk at the time was 100MB, but they might have had a 5MB disk.

-- glen

David Brown

unread,
May 7, 2015, 10:24:50 AM5/7/15
to
I work mostly with small machines, with fewer resources than the
earliest C targets. Re-scanning strings unnecessarily to find their
ends is a waste of resources no matter what size of system you have, and
it would not have been difficult to avoid when strcat, etc., were first
specified.


Morris Dovey

unread,
May 7, 2015, 11:17:57 AM5/7/15
to
On 5/7/15 3:19 AM, David Brown wrote:

> That's how I would write it. But if strcat and strcpy returned a
> pointer to the end of the string, you could write:
>
> char s[BIG_ENOUGH];
> char* t = s;
> t = strcpy(t, this);
> t = strcat(t, that);
> t = strcat(t, the_other_thing);
>
> That just seems so much more sensible to me - you are putting "that"
> onto the end of the string, rather than running through the whole string
> several times. Even though it run-time efficiency is not always a
> priority in my code, it is usually important enough that glaring
> inefficiencies always irritate me.
>
> (Of course, since these are standard functions and the compiler knows
> exactly what they do, it could optimise the series of strcat operations
> to eliminate the extra search.)

Is it worth pointing out that it’s easier to implement the desired
behavior from scratch than to complain about the lack of it? :-P

char *cate(char *d,char *s)
{ if (!s) *(s = d) = '\0';
while (*d++ = *s++);
return --d;

Morris Dovey

unread,
May 7, 2015, 11:57:08 AM5/7/15
to
On 5/7/15 10:17 AM, Morris Dovey wrote:
> Is it worth pointing out that it’s easier to implement the desired
> behavior from scratch than to complain about the lack of it? :-P
>
> char *cate(char *d,char *s)
> { if (!s) *(s = d) = '\0';
> while (*d++ = *s++);
> return --d;
> }

On second thought, I think I prefer

char *cate(char *d,char *s)
{ if (!s) *d++ = '\0';
else while (*d++ = *s++);
return --d;
}

--
Morris Dovey
http://www.iedu.com/Solar/
http://www.facebook.com/MorrisDovey

asetof...@gmail.com

unread,
May 7, 2015, 2:00:09 PM5/7/15
to
Morris wrote:"
char *cate(char *d,char *s)
{ if (!s) *d++ = '\0';
else while (*d++ = *s++);

return --d;
}"

char* cate(char *d,char *s)
{ if(d==0) return 0;
else if(s==0) *d++ = '\0';
else while(*d++ = *s++);
return --d;
}

glen herrmannsfeldt

unread,
May 7, 2015, 2:02:48 PM5/7/15
to
David Brown <david...@hesbynett.no> wrote:

(snip, regarding strcat, then I wrote)

>> C originated on small machines. Seems to me that they didn't think
>> about how it would scale to really large machines. Pretty much no-one
>> at the time thought about what you would do with gigabytes of RAM,
>> or even gigabytes of disk.

>> A big disk at the time was 100MB, but they might have had a 5MB disk.

> I work mostly with small machines, with fewer resources than the
> earliest C targets. Re-scanning strings unnecessarily to find their
> ends is a waste of resources no matter what size of system you have, and
> it would not have been difficult to avoid when strcat, etc., were first
> specified.

Yes. But note that a strcat() loop is O(N**2) in time, and the
reasonable size for N increases with memory size.

I once ran into this with real data. Someone had written a program
to read in DNA sequence data, with strcat() in a loop. Sometime later,
we noticed that part was unusually slow. It was reading in megabytes
of data 60 bytes per line. On a PDP-11, you might read in kilobytes
that way, but not megabytes.

I changed the loop to remember where the previous read ended, and
strcpy() the new data in. Not much harder, but O(N).

-- glen

Morris Dovey

unread,
May 7, 2015, 3:28:47 PM5/7/15
to
On 5/7/15 1:00 PM, asetof...@gmail.com wrote:
> char* cate(char *d,char *s)
> { if(d==0) return 0;
> else if(s==0) *d++ = '\0';
> else while(*d++ = *s++);
> return --d;
> }

I thought about that, then decided that while s==NULL could conceivably
be valid, d==NULL would always indicate an error. Silently returning a
NULL pointer would mask the error, and my preference is to fail early.

When I got done tinkering, this is what I settled on:

char *cate(char *d,char *s)
{ if (s) while (*d++ = *s++);
else *d++ = '\0';
return --d;
}

Whatever, I like that the one function can fill the roles of both
strcpy() and strcat() and eliminate the pointer initialization in
David's sample scenario:

char s[BIG_ENOUGH], *t;
t = cate(s,this);
t = cate(t,that);
t = cate(t,the_other_thing);

asetof...@gmail.com

unread,
May 7, 2015, 3:47:19 PM5/7/15
to
Return 0 as pointer means "Return
error"... No library function
has to seg fault, the seg fault
I say it is better out

Tim Rentsch

unread,
May 7, 2015, 6:39:33 PM5/7/15
to
David Brown <david...@hesbynett.no> writes:

> On 07/05/15 03:25, Tim Rentsch wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>>> Does anyone know why the return value for strcat and similar library
>>> functions is defined in such an unhelpful (IMHO) way?
>>>
>>> The return value of strcat(s1, s2) is always s1, which is a value you
>>> already have available. Returning a pointer to the end of s1 (i.e., s1
>>> + strlen(s1)) would be much more efficient when building up a string
>>> from parts.
>>
>> You're assuming that the value of the first argument is readily
>> available otherwise, which isn't always true - it could be the
>> result of a function call, for example.
>
> It /is/ readily available -

Not exactly. What you mean is not that it is readily available
but that it can be made available, by declaring a variable and
assigning the returned value.

> if you as a programmer need it, you have
> access to it. If your "s1" is the result of calling "foo()", then all
> you need to write is "strcat(s1 = foo(), s2)".

My point is either choice has its own advantages. Personally I think
the choice made for strcat() is the right one for what it is. If
something different or more elaborate is wanted, it's easy enough
to program that.

Tim Rentsch

unread,
May 7, 2015, 6:55:31 PM5/7/15
to
Morris Dovey <mrd...@iedu.com> writes:

> On 5/7/15 3:19 AM, David Brown wrote:
>
>> That's how I would write it. But if strcat and strcpy returned a
>> pointer to the end of the string, you could write:
>>
>> char s[BIG_ENOUGH];
>> char* t = s;
>> t = strcpy(t, this);
>> t = strcat(t, that);
>> t = strcat(t, the_other_thing);
>>
>> That just seems so much more sensible to me - you are putting "that"
>> onto the end of the string, rather than running through the whole string
>> several times. Even though it run-time efficiency is not always a
>> priority in my code, it is usually important enough that glaring
>> inefficiencies always irritate me.
>>
>> (Of course, since these are standard functions and the compiler knows
>> exactly what they do, it could optimise the series of strcat operations
>> to eliminate the extra search.)
>
> Is it worth pointing out that it's easier to implement the desired
> behavior from scratch than to complain about the lack of it? :-P

+1

> char *cate(char *d,char *s)
> { if (!s) *(s = d) = '\0';
> while (*d++ = *s++);
> return --d;
> }

char *
add_more( char *d, char *s ){
return (*d = *s) ? add_more( d+1, s+1 ) : d;
}

(I'm not wild about the name add_more, maybe someone has a better
suggestion.)

David Brown

unread,
May 8, 2015, 5:03:52 AM5/8/15
to
On 07/05/15 17:17, Morris Dovey wrote:
> On 5/7/15 3:19 AM, David Brown wrote:
>
>> That's how I would write it. But if strcat and strcpy returned a
>> pointer to the end of the string, you could write:
>>
>> char s[BIG_ENOUGH];
>> char* t = s;
>> t = strcpy(t, this);
>> t = strcat(t, that);
>> t = strcat(t, the_other_thing);
>>
>> That just seems so much more sensible to me - you are putting "that"
>> onto the end of the string, rather than running through the whole string
>> several times. Even though it run-time efficiency is not always a
>> priority in my code, it is usually important enough that glaring
>> inefficiencies always irritate me.
>>
>> (Of course, since these are standard functions and the compiler knows
>> exactly what they do, it could optimise the series of strcat operations
>> to eliminate the extra search.)
>
> Is it worth pointing out that it’s easier to implement the desired
> behavior from scratch than to complain about the lack of it? :-P
>

Yes, in the case of strcat the functionality is particularly easy to
duplicate (often it is not even worth making it a separate function). I
am not complaining as such - I am more curious to know if there is any
good reason why strcat was specified the way it is, when it could so
easily have been more efficient.

David Brown

unread,
May 8, 2015, 5:09:35 AM5/8/15
to
On 08/05/15 00:39, Tim Rentsch wrote:
> David Brown <david...@hesbynett.no> writes:
>
>> On 07/05/15 03:25, Tim Rentsch wrote:
>>> David Brown <david...@hesbynett.no> writes:
>>>
>>>> Does anyone know why the return value for strcat and similar library
>>>> functions is defined in such an unhelpful (IMHO) way?
>>>>
>>>> The return value of strcat(s1, s2) is always s1, which is a value you
>>>> already have available. Returning a pointer to the end of s1 (i.e., s1
>>>> + strlen(s1)) would be much more efficient when building up a string
>>>> from parts.
>>>
>>> You're assuming that the value of the first argument is readily
>>> available otherwise, which isn't always true - it could be the
>>> result of a function call, for example.
>>
>> It /is/ readily available -
>
> Not exactly. What you mean is not that it is readily available
> but that it can be made available, by declaring a variable and
> assigning the returned value.

That is certainly what /I/ mean be "readily available". You can get the
value by simply attaching a name to it - that's good enough.

>
>> if you as a programmer need it, you have
>> access to it. If your "s1" is the result of calling "foo()", then all
>> you need to write is "strcat(s1 = foo(), s2)".
>
> My point is either choice has its own advantages. Personally I think
> the choice made for strcat() is the right one for what it is. If
> something different or more elaborate is wanted, it's easy enough
> to program that.
>

I agree that it's easy to write a replacement for strcat() that returns
a pointer to the end terminator (or the length of the string, if
preferred). And of course your personal opinion is up to you. But I
can't see any good reason why the standard library version is specified
the way it is, rather than the much more useful alternative of returning
a pointer to the end of the string. (And if you want a version
returning the first argument, that's easy to write yourself.)


Morris Dovey

unread,
May 8, 2015, 6:07:22 AM5/8/15
to
On 5/8/15 4:03 AM, David Brown wrote:
> Yes, in the case of strcat the functionality is particularly easy to
> duplicate (often it is not even worth making it a separate function).
> I am not complaining as such - I am more curious to know if there is
> any good reason why strcat was specified the way it is, when it could
> so easily have been more efficient.

I can't do much more than speculate that there might have been a lot of
'get it working, then can polish it up later' going on, along with a
certain amount of pressure for a working version of unix ASAP. I suspect
a lot of that 'polishing' never actually got done and a lot of good
ideas (multiple entry points, for example) never saw the light of day.
The original syntax has always struck me as 'cobbled' and lacking
elegance - and I'm inclined to think polish wasn't of much concern.

Of course, once there was a working compiler and operating system, then
there was likely considerable managerial resistance to change on the 'if
it ain't broke, don't fix it!' principal.

Richard Bos

unread,
May 8, 2015, 5:57:35 PM5/8/15
to
The value of the first argument is always either readily available or
readily _made_ available. The desired return value, the _end_ of the
string rather than the already given start of it, is not, except from
within the function.

I, too, would have preferred to get the latter value from strfoo(), and
if I ever get around to draw up the specifications for my very perfect
C-with-hindsight (a.k.a. C 2020), they will - but in C, they don't. For
this, as far as the Standard is concerned, I fear the blame lays
squarely at the feet of that old monster, Previous Art; why Art did it
this way, Paul may know, but I don't. Possibly laziness.

Richard

Nathan Wagner

unread,
May 8, 2015, 8:12:06 PM5/8/15
to
On 2015-05-07, Morris Dovey <mrd...@iedu.com> wrote:
> On 5/7/15 10:17 AM, Morris Dovey wrote:
>> Is it worth pointing out that it???s easier to implement the desired
>> behavior from scratch than to complain about the lack of it? :-P

[snip]

Probably, since I was about to post the same thing. The only downside
of course is that you don't get the benefit of any assembly optimized
implementation.

--
nw

Tim Rentsch

unread,
May 10, 2015, 11:44:28 AM5/10/15
to
David Brown <david...@hesbynett.no> writes:

> On 08/05/15 00:39, Tim Rentsch wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>>> On 07/05/15 03:25, Tim Rentsch wrote:
>>>> David Brown <david...@hesbynett.no> writes:
>>>>
>>>>> Does anyone know why the return value for strcat and similar
>>>>> library functions is defined in such an unhelpful (IMHO) way?
>>>>>
>>>>> The return value of strcat(s1, s2) is always s1, which is a
>>>>> value you already have available. Returning a pointer to the
>>>>> end of s1 (i.e., s1 + strlen(s1)) would be much more efficient
>>>>> when building up a string from parts.
>>>>
>>>> You're assuming that the value of the first argument is readily
>>>> available otherwise, which isn't always true - it could be the
>>>> result of a function call, for example.
>>>
>>> It /is/ readily available -
>>
>> Not exactly. What you mean is not that it is readily available
>> but that it can be made available, by declaring a variable and
>> assigning the returned value.
>
> That is certainly what /I/ mean be "readily available". You can
> get the value by simply attaching a name to it - that's good
> enough.

I don't want to get into a quibbling match about definitions;
the key thing is your description is wrong. You cannot simply
attach a name to the needed value - sometimes a name must be
declared before being assigned to. That is a non-negligible
cost.

>>> if you as a programmer need it, you have access to it. If your
>>> "s1" is the result of calling "foo()", then all you need to write
>>> is "strcat(s1 = foo(), s2)".
>>
>> My point is either choice has its own advantages. Personally I
>> think the choice made for strcat() is the right one for what it
>> is. If something different or more elaborate is wanted, it's
>> easy enough to program that.
>
> I agree that it's easy to write a replacement for strcat() that
> returns a pointer to the end terminator (or the length of the
> string, if preferred). And of course your personal opinion is up
> to you. But I can't see any good reason why the standard library
> version is specified the way it is, rather than the much more
> useful alternative of returning a pointer to the end of the
> string.

What's wrong with the idea that someone thought it valuable to
make the resulting (entire) string available so it could be
given, eg, as an argument to another function (or perhaps one
branch of a ?: operator, or return expression, etc), taking
advantage of a functional style? Is it so hard for you to
imagine that someone else would have different priorities than
you do?

> (And if you want a version returning the first argument,
> that's easy to write yourself.)

I don't have to, I can just use strcat(). :)

Tim Rentsch

unread,
May 10, 2015, 12:07:59 PM5/10/15
to
ral...@xs4all.nl (Richard Bos) writes:

> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>> David Brown <david...@hesbynett.no> writes:
>>
>>> Does anyone know why the return value for strcat and similar
>>> library functions is defined in such an unhelpful (IMHO) way?
>>>
>>> The return value of strcat(s1, s2) is always s1, which is a value
>>> you already have available. Returning a pointer to the end of s1
>>> (i.e., s1 + strlen(s1)) would be much more efficient when building
>>> up a string from parts.
>>
>> You're assuming that the value of the first argument is readily
>> available otherwise, which isn't always true - it could be the
>> result of a function call, for example.
>
> The value of the first argument is always either readily available
> or readily _made_ available. The desired return value, the _end_
> of the string rather than the already given start of it, is not,
> except from within the function.

I agree the value is always either readily available or can be
made available by declaring a name to hold the value. Either
way, the result can be clunky if what you want to do is pass
the value on to another expression in a more functional style,
which the current semantics supports.

> I, too, would have preferred to get the latter value from
> strfoo(), and if I ever get around to draw up the specifications
> for my very perfect C-with-hindsight (a.k.a. C 2020), they will -
> but in C, they don't. For this, as far as the Standard is
> concerned, I fear the blame lays squarely at the feet of that old
> monster, Previous Art; why Art did it this way, Paul may know,
> but I don't. Possibly laziness.

Another possibility is that whoever made the decision thought
supporting a more functional style has value, and made a
conscious choice favoring that over an "end of string" version.

Incidentally, if you want these, both can be provided with
less writing then you did in your followup:

char *
bstrcpy( char *d, const char *s ){
return (*d = *s) ? bstrcpy( d+1, s+1 ) : d;
}

char *
bstrcat( char *d, const char *s ){
return *d ? bstrcat( d+1, s ) : bstrcpy( d, s );
}

David Brown

unread,
May 10, 2015, 3:30:43 PM5/10/15
to
If you think that this cost is non-negligible (assuming a half-decent
compiler), then I suppose that explains your problem here. I, on the
other hand, have no doubts that it is negligible, and have great
difficulty in imagining a situation where it is significantly easier,
clearer, faster, or more efficient to use "s1 = strcat(foo(), s2)"
rather than "strcat(s1 = foo(), s2)", or my own preferred choice of two
lines "s1 = foo(); strcat(s1, s2);".

I think that means we have approximately the same definition of "readily
available", but different ideas about the cost of attaching a name to a
value that the compiler has already calculated and will need later on.

>
>>>> if you as a programmer need it, you have access to it. If your
>>>> "s1" is the result of calling "foo()", then all you need to write
>>>> is "strcat(s1 = foo(), s2)".
>>>
>>> My point is either choice has its own advantages. Personally I
>>> think the choice made for strcat() is the right one for what it
>>> is. If something different or more elaborate is wanted, it's
>>> easy enough to program that.
>>
>> I agree that it's easy to write a replacement for strcat() that
>> returns a pointer to the end terminator (or the length of the
>> string, if preferred). And of course your personal opinion is up
>> to you. But I can't see any good reason why the standard library
>> version is specified the way it is, rather than the much more
>> useful alternative of returning a pointer to the end of the
>> string.
>
> What's wrong with the idea that someone thought it valuable to
> make the resulting (entire) string available so it could be
> given, eg, as an argument to another function (or perhaps one
> branch of a ?: operator, or return expression, etc), taking
> advantage of a functional style? Is it so hard for you to
> imagine that someone else would have different priorities than
> you do?

The point of this thread was to ask people with different experiences
and styles if they could think of an advantage of the way strcat is
defined - precisely because while I could not think of one myself, I
thought that someone else might. So far no one (including you) has
given any suggestions other than "it seemed a good idea at the time" or
"they standardised the function they were using, without thinking about
other use-cases". This is, of course, entirely normal - most projects
of any size have decisions that are odd, inefficient, or unfortunately
when looked at later with hindsight. I merely wondered if the early C
authors had a good reason that I didn't know of, and it seems not to be
the case.

Tim Rentsch

unread,
May 11, 2015, 11:42:47 PM5/11/15
to
> here. [snip]

The cost I'm talking about is a coding and maintenance cost,
not a compilation or performance cost. I don't mind agreeing to
the possibility that your sense of the size of that cost may be
different from mine.
> were using, without thinking about other use-cases". [snip]

My answer may not have been stated very clearly, but I did
suggest another possible explanation, namely, for reasons you
don't consider important, namely coding convenience in certain
situations, those making the choice thought, and still think (or
would think, if they were still alive), that this consideration
outweighs those of the alternate semantics, even after conscious
evaluation.

David Brown

unread,
May 12, 2015, 3:52:50 AM5/12/15
to
On 12/05/15 05:42, Tim Rentsch wrote:
> David Brown <david...@hesbynett.no> writes:
>

<snip>

> The cost I'm talking about is a coding and maintenance cost,
> not a compilation or performance cost. I don't mind agreeing to
> the possibility that your sense of the size of that cost may be
> different from mine.

Fair enough. I don't think we need a discussion about the differences
in such costs in this context - at least, not in this thread.

>>
>> The point of this thread was to ask people with different
>> experiences and styles if they could think of an advantage of the
>> way strcat is defined - precisely because while I could not think
>> of one myself, I thought that someone else might. So far no one
>> (including you) has given any suggestions other than "it seemed a
>> good idea at the time" or "they standardised the function they
>> were using, without thinking about other use-cases". [snip]
>
> My answer may not have been stated very clearly, but I did
> suggest another possible explanation, namely, for reasons you
> don't consider important, namely coding convenience in certain
> situations, those making the choice thought, and still think (or
> would think, if they were still alive), that this consideration
> outweighs those of the alternate semantics, even after conscious
> evaluation.
>

Can you give me:

1. An example where strcat, as defined today, leads to /significantly/
clearer, neater, more maintainable, or more efficient code than a
"strcat2" function that returns a pointer to the end of the string?

2. A justification why you think such coding style was likely in the
early days of C, thus making it a reasonable explanation for the choice
of strcat?




It's very easy to find situations where a strcat2 (returning a pointer
to the end of the string) is clearer and more efficient than strcat.

// Combine three strings and return the total length
int combineStrings(char* s, const char* t1, const char* t2) {
strcat(s, t1); // Time len(s) + len(t1)
strcat(s, t2); // Time len(s) + len(t1) + len(t2)
return strlen(s); // Time len(s) + len(t1) + len(t2)
}
// Total time 3*len(s) + 3*len(t1) + 2*len(t2)


int combineStrings(char* s, const char* t1, const char* t2) {
char* end = strcat2(s, t1); // Time len(s) + len(t1)
end = strcat(end, t2); // Time len(t2)
return end - s; // Time 1
}
// Total time len(s) + len(t1) + len(t2) + 1

(Note that if you really don't like the extra "char* end" variable, you
can use strcat2 exactly like strcat in the first version.)


It is also simpler and more efficient to implement strcat in terms of
strcat2 rather than the reverse:

char * strcat(char* s, const char* t) {
strcat2(s, t); // Time len(s) + len(t)
return s;
}
// Total time len(s) + len(t)

char * strcat2(char* s, const char* t) {
strcat(s, t); // Time len(s) + len(t)
return strlen(s); // Time len(s) + len(t)
}
// Total time 2*len(s) + 2*len(t)



So it is clear that there are common cases where a strcat2 returning the
end pointer is more flexible and significantly more efficient than the
standard version. There are no cases in which the standard version is
more efficient. And the standard version is easily and efficiently
created from the strcat2 version.

The question is, does the standard version lead to significantly neater
or clearer code in common cases, and was it common enough to lead to a
concious design decision when the library was standardised? Or was it,
as most people here seem to think, a bad decision - probably the result
of standardising someone's existing usage rather than the result of a
thoughtful design.



Bartc

unread,
May 12, 2015, 8:34:02 AM5/12/15
to
On 12/05/2015 08:52, David Brown wrote:
> It's very easy to find situations where a strcat2 (returning a pointer
> to the end of the string) is clearer and more efficient than strcat.
>
> // Combine three strings and return the total length
> int combineStrings(char* s, const char* t1, const char* t2) {
> strcat(s, t1); // Time len(s) + len(t1)
> strcat(s, t2); // Time len(s) + len(t1) + len(t2)
> return strlen(s); // Time len(s) + len(t1) + len(t2)
> }
> // Total time 3*len(s) + 3*len(t1) + 2*len(t2)
>
>
> int combineStrings(char* s, const char* t1, const char* t2) {
> char* end = strcat2(s, t1); // Time len(s) + len(t1)
> end = strcat(end, t2); // Time len(t2)
> return end - s; // Time 1
> }
> // Total time len(s) + len(t1) + len(t2) + 1

In this case you might as well use strlen() on each string, and use
memcpy functions instead.

In fact, if you're going to work with string lengths, the chances are
that these are already available from prior operations on the three
strings, so you might only need to call strlen 0, 1 or 2 times instead of 3.

Perhaps that was the original idea: either use explicit lengths, or work
more elegantly with pure zero-terminated strings.

--
Bartc

Ike Naar

unread,
May 14, 2015, 4:17:10 PM5/14/15
to
On 2015-05-12, David Brown <david...@hesbynett.no> wrote:
> It's very easy to find situations where a strcat2 (returning a pointer
> to the end of the string) is clearer and more efficient than strcat.
>
> // Combine three strings and return the total length
> int combineStrings(char* s, const char* t1, const char* t2) {
> strcat(s, t1); // Time len(s) + len(t1)
> strcat(s, t2); // Time len(s) + len(t1) + len(t2)
> return strlen(s); // Time len(s) + len(t1) + len(t2)
> }
> // Total time 3*len(s) + 3*len(t1) + 2*len(t2)
>
>
> int combineStrings(char* s, const char* t1, const char* t2) {
> char* end = strcat2(s, t1); // Time len(s) + len(t1)
> end = strcat(end, t2); // Time len(t2)

Shouldn't that be strcat2(end, t2) ?

David Brown

unread,
May 14, 2015, 5:07:03 PM5/14/15
to
Of course. Sorry.

Tim Rentsch

unread,
May 20, 2015, 2:08:34 PM5/20/15
to
No, I'm not offering either of those. I'm not saying the reasons
mentioned are ones you would find convincing, only that is it
plausible that the original designers found them convincing.

> It's very easy to find situations where a strcat2 (returning a pointer
> to the end of the string) is clearer and more efficient than strcat.
>
> // Combine three strings and return the total length
> int combineStrings(char* s, const char* t1, const char* t2) {
> strcat(s, t1); // Time len(s) + len(t1)
> strcat(s, t2); // Time len(s) + len(t1) + len(t2)
> return strlen(s); // Time len(s) + len(t1) + len(t2)
> }
> // Total time 3*len(s) + 3*len(t1) + 2*len(t2)
>
>
> int combineStrings(char* s, const char* t1, const char* t2) {
> char* end = strcat2(s, t1); // Time len(s) + len(t1)
> end = strcat(end, t2); // Time len(t2)
> return end - s; // Time 1
> }
> // Total time len(s) + len(t1) + len(t2) + 1
>
> (Note that if you really don't like the extra "char* end" variable, you
> can use strcat2 exactly like strcat in the first version.)

These examples don't seem very compelling to me, because what the
functions do seems somewhat contrived, and because they don't
match how I expect they would be written, especially if
efficiency concerns were paramount. (I assume you meant strcat2
rather than strcat in the second definition.)

> It is also simpler and more efficient to implement strcat in terms of
> strcat2 rather than the reverse:
>
> char * strcat(char* s, const char* t) {
> strcat2(s, t); // Time len(s) + len(t)
> return s;
> }
> // Total time len(s) + len(t)
>
> char * strcat2(char* s, const char* t) {
> strcat(s, t); // Time len(s) + len(t)
> return strlen(s); // Time len(s) + len(t)
> }
> // Total time 2*len(s) + 2*len(t)

I might agree on the "more efficient" part, but not on "simpler"
part.

> So it is clear that there are common cases where a strcat2 returning the
> end pointer is more flexible and significantly more efficient than the
> standard version. There are no cases in which the standard version is
> more efficient. And the standard version is easily and efficiently
> created from the strcat2 version.

I understand that you think this reasoning is convincing. Not
everyone does.

> The question is, does the standard version lead to significantly neater
> or clearer code in common cases, and was it common enough to lead to a
> concious design decision when the library was standardised? Or was it,
> as most people here seem to think, a bad decision - probably the result
> of standardising someone's existing usage rather than the result of a
> thoughtful design.

Why do you find it so hard to believe that other people might
reasonably reach different conclusions than you do?

David Brown

unread,
May 21, 2015, 4:56:05 AM5/21/15
to
I realise that - that's why I asked you for a justification for why you
think the original designers did it this way. But I am not yet
convinced that the original designers thought this through properly. I
can accept that they might have had different balances between
efficiency and simplicity, or compilers with different abilities, or
different styles. However, I am still lacking /any/ justification for
suggesting the original library designers thought "strcat" was a better
choice than "strcat2". Currently, it appears that they simply didn't
consider the issues. (And that's fair enough - we can't expect them to
have had perfect hindsight when starting out. We must simply live with
it as another of C's sub-optimal design decisions.)

>
>> It's very easy to find situations where a strcat2 (returning a pointer
>> to the end of the string) is clearer and more efficient than strcat.
>>
>> // Combine three strings and return the total length
>> int combineStrings(char* s, const char* t1, const char* t2) {
>> strcat(s, t1); // Time len(s) + len(t1)
>> strcat(s, t2); // Time len(s) + len(t1) + len(t2)
>> return strlen(s); // Time len(s) + len(t1) + len(t2)
>> }
>> // Total time 3*len(s) + 3*len(t1) + 2*len(t2)
>>
>>
>> int combineStrings(char* s, const char* t1, const char* t2) {
>> char* end = strcat2(s, t1); // Time len(s) + len(t1)
>> end = strcat(end, t2); // Time len(t2)
>> return end - s; // Time 1
>> }
>> // Total time len(s) + len(t1) + len(t2) + 1
>>
>> (Note that if you really don't like the extra "char* end" variable, you
>> can use strcat2 exactly like strcat in the first version.)
>
> These examples don't seem very compelling to me, because what the
> functions do seems somewhat contrived, and because they don't
> match how I expect they would be written, especially if
> efficiency concerns were paramount. (I assume you meant strcat2
> rather than strcat in the second definition.)

They are simply example functions that demonstrate the difference
between strcat and strcat2 when combining several strings. Feel free to
write them in a more efficient form - once using the standard strcat,
strcpy, strlen functions, and once using strcat2. No loops to manually
copy the strings are allowed (because that would be missing the point of
having the library functions). I am curious to see how you would write
them with greater efficiency.

>
>> It is also simpler and more efficient to implement strcat in terms of
>> strcat2 rather than the reverse:
>>
>> char * strcat(char* s, const char* t) {
>> strcat2(s, t); // Time len(s) + len(t)
>> return s;
>> }
>> // Total time len(s) + len(t)
>>
>> char * strcat2(char* s, const char* t) {
>> strcat(s, t); // Time len(s) + len(t)
>> return strlen(s); // Time len(s) + len(t)
>> }
>> // Total time 2*len(s) + 2*len(t)
>
> I might agree on the "more efficient" part, but not on "simpler"
> part.

You cannot possibly /disagree/ on the "more efficient" part - strcat can
be implemented optimally from strcat2, while the opposite direction
requires double the time. (In theory, a smart enough compiler could
remove the function calls and turn the strcat + strlen into an inlined
loop.)

And how can you justify contending on the "simpler" part? Implementing
strcat from strcat2 requires one function call, while implementing
strcat2 from strcat requires two function calls. I won't claim it's a
complicated function, but there can be no doubts as to which is simpler.

>
>> So it is clear that there are common cases where a strcat2 returning the
>> end pointer is more flexible and significantly more efficient than the
>> standard version. There are no cases in which the standard version is
>> more efficient. And the standard version is easily and efficiently
>> created from the strcat2 version.
>
> I understand that you think this reasoning is convincing. Not
> everyone does.

I have given a fair amount of justification and evidence for why my
reasoning makes sense. I still have heard absolutely no technical
justification for the choice of strcat as we have it today, compared to
strcat2 (or a version that returned the length of the new string, which
would be another reasonable alternative). You have done nothing but
claim that the original library authors might have had good reason for
the decision.

>
>> The question is, does the standard version lead to significantly neater
>> or clearer code in common cases, and was it common enough to lead to a
>> concious design decision when the library was standardised? Or was it,
>> as most people here seem to think, a bad decision - probably the result
>> of standardising someone's existing usage rather than the result of a
>> thoughtful design.
>
> Why do you find it so hard to believe that other people might
> reasonably reach different conclusions than you do?
>

I like to base my conclusions on evidence and technical justifications,
if I can. I can find plenty of evidence for the viewpoint that strcat2
would have been a better choice for the standard library than strcat. I
can find no significant evidence to the contrary, and no one else here
seems to have done so either.

The nearest I have seen is the fact that strcat allows certain chained
expression syntaxes that cannot be written directly with strcat2 - you
need to use multiple expressions. But even in such cases, the strcat2
version is not more difficult, and it is much more efficient. And no
one has argued that such chained expressions were common in old C code,
or that they were a reason for picking the strcat form.

So the question is, why do you find it so hard to believe that the
original design decision was not a good one, and was perhaps made in
haste or for "historical" reasons?


glen herrmannsfeldt

unread,
May 21, 2015, 9:05:37 AM5/21/15
to
David Brown <david...@hesbynett.no> wrote:
> On 20/05/15 20:08, Tim Rentsch wrote:
>> David Brown <david...@hesbynett.no> writes:
(snip, someone wrote)
>>> Can you give me:

>>> 1. An example where strcat, as defined today, leads to /significantly/
>>> clearer, neater, more maintainable, or more efficient code than a
>>> "strcat2" function that returns a pointer to the end of the string?

An important word above is "today". strcat wasn't designed recently,
and hardware was a lot different when it was. Decisions that are
right today, may not have been right 40 years ago.

>>> 2. A justification why you think such coding style was likely in the
>>> early days of C, thus making it a reasonable explanation for the
>>> choice of strcat?

>> No, I'm not offering either of those. I'm not saying the reasons
>> mentioned are ones you would find convincing, only that is it
>> plausible that the original designers found them convincing.

That was just at the transitions from writing operating systems,
and even compilers, in the appropriate assembly language, possibly
with various aids to make it easier, to writing them in high level
languages. Multics is, as I understand it, mostly PL/I.

C and unix were for smaller computers. It is mostly that we, today,
decided to take what was optimal for small computers and assume it
optimal for large ones.

> I realise that - that's why I asked you for a justification for why you
> think the original designers did it this way. But I am not yet
> convinced that the original designers thought this through properly. I
> can accept that they might have had different balances between
> efficiency and simplicity, or compilers with different abilities, or
> different styles. However, I am still lacking /any/ justification for
> suggesting the original library designers thought "strcat" was a better
> choice than "strcat2". Currently, it appears that they simply didn't
> consider the issues. (And that's fair enough - we can't expect them to
> have had perfect hindsight when starting out. We must simply live with
> it as another of C's sub-optimal design decisions.)

The inefficiency of strcat() is O(n*n). (O(n**2) in some newsgroups.)

(I had to actually debug a program once where someone had used
strcat() in a loop, where actual O(n**2) timing was visible.)

When n=10, you don't notice, and the existing strcat() may be
optimal.

>>> It's very easy to find situations where a strcat2 (returning a pointer
>>> to the end of the string) is clearer and more efficient than strcat.

(snip)

>> These examples don't seem very compelling to me, because what the
>> functions do seems somewhat contrived, and because they don't
>> match how I expect they would be written, especially if
>> efficiency concerns were paramount. (I assume you meant strcat2
>> rather than strcat in the second definition.)

> They are simply example functions that demonstrate the difference
> between strcat and strcat2 when combining several strings. Feel free to
> write them in a more efficient form - once using the standard strcat,
> strcpy, strlen functions, and once using strcat2. No loops to manually
> copy the strings are allowed (because that would be missing the point of
> having the library functions). I am curious to see how you would write
> them with greater efficiency.

In the actual case mentioned above, I used strlen() and strcpy().
Using strlen() allowed me to verify that the new string would fit.

>>> It is also simpler and more efficient to implement strcat in terms of
>>> strcat2 rather than the reverse:

(snip)
>> I might agree on the "more efficient" part, but not on "simpler"
>> part.

In many cases, strcat() or strcat2() aren't good choices.

They work well when you can be reasonably (enough) sure that you
won't outrun the array. If you can't, there are easier ways than
to test first and then strcat().

> You cannot possibly /disagree/ on the "more efficient" part - strcat can
> be implemented optimally from strcat2, while the opposite direction
> requires double the time. (In theory, a smart enough compiler could
> remove the function calls and turn the strcat + strlen into an inlined
> loop.)

As above, I suspect original strcat() was used closer to n=10.
It is nice when you want to add an extension onto a file name.
You might:

x=fopen(strcat(file,".xyz"),"w");

It is not nice when you want to read in megabytes of data, one
screen-width line at a time.

> And how can you justify contending on the "simpler" part? Implementing
> strcat from strcat2 requires one function call, while implementing
> strcat2 from strcat requires two function calls. I won't claim it's a
> complicated function, but there can be no doubts as to which is simpler.
(snip)

>> I understand that you think this reasoning is convincing. Not
>> everyone does.

> I have given a fair amount of justification and evidence for why my
> reasoning makes sense. I still have heard absolutely no technical
> justification for the choice of strcat as we have it today, compared to
> strcat2 (or a version that returned the length of the new string, which
> would be another reasonable alternative). You have done nothing but
> claim that the original library authors might have had good reason for
> the decision.

Your reasoning might make sense today, but maybe not 40 years ago.

I don't know by now, maybe someone does, was the original strcat()
written in C, or the appropriate assembly?

>>> The question is, does the standard version lead to significantly neater
>>> or clearer code in common cases, and was it common enough to lead to a
>>> concious design decision when the library was standardised? Or was it,
>>> as most people here seem to think, a bad decision - probably the result
>>> of standardising someone's existing usage rather than the result of a
>>> thoughtful design.

>> Why do you find it so hard to believe that other people might
>> reasonably reach different conclusions than you do?

If you haven't read K&R1, you should do that. That should give
you a better idea of what the original designers of C expected
it to do. How they thought it might be used.

> I like to base my conclusions on evidence and technical justifications,
> if I can. I can find plenty of evidence for the viewpoint that strcat2
> would have been a better choice for the standard library than strcat. I
> can find no significant evidence to the contrary, and no one else here
> seems to have done so either.

Well, for one, many C library string routines return a pointer to the
first argument. It is nice to have a little consistency.

> The nearest I have seen is the fact that strcat allows certain chained
> expression syntaxes that cannot be written directly with strcat2 - you
> need to use multiple expressions. But even in such cases, the strcat2
> version is not more difficult, and it is much more efficient. And no
> one has argued that such chained expressions were common in old C code,
> or that they were a reason for picking the strcat form.

> So the question is, why do you find it so hard to believe that the
> original design decision was not a good one, and was perhaps made in
> haste or for "historical" reasons?

I suspect that if you asked the original C designers, at the time,
if they expected C to be popular 40 years later, they would have
said "no". Not that they didn't believe in their design, but there
are many people out there with good ideas. Statistically, it wasn't
so likely to happen.

-- glen

David Brown

unread,
May 21, 2015, 2:42:39 PM5/21/15
to
On 21/05/15 15:05, glen herrmannsfeldt wrote:
> David Brown <david...@hesbynett.no> wrote:
>> On 20/05/15 20:08, Tim Rentsch wrote:
>>> David Brown <david...@hesbynett.no> writes:
> (snip, someone wrote)
>>>> Can you give me:
>
>>>> 1. An example where strcat, as defined today, leads to /significantly/
>>>> clearer, neater, more maintainable, or more efficient code than a
>>>> "strcat2" function that returns a pointer to the end of the string?
>
> An important word above is "today". strcat wasn't designed recently,
> and hardware was a lot different when it was. Decisions that are
> right today, may not have been right 40 years ago.

I agree - and that is part of the reason for this thread in the first
place. Was there something about the hardware, or the way compilers
worked, or the style used by early C programmers, that made the standard
strcat definition a better choice than the strcat2 version returning a
pointer to the end of the string (or alternatively, a version returning
the length of the string)? The same arguments about the efficiency of
strcat2 compared to strcat applied 40 years ago.

Saying that it is /possible/ that strcat was a better choice than
strcat2 40 years ago does not mean it was actually the case.

>
>>>> 2. A justification why you think such coding style was likely in the
>>>> early days of C, thus making it a reasonable explanation for the
>>>> choice of strcat?
>
>>> No, I'm not offering either of those. I'm not saying the reasons
>>> mentioned are ones you would find convincing, only that is it
>>> plausible that the original designers found them convincing.
>
> That was just at the transitions from writing operating systems,
> and even compilers, in the appropriate assembly language, possibly
> with various aids to make it easier, to writing them in high level
> languages. Multics is, as I understand it, mostly PL/I.
>
> C and unix were for smaller computers. It is mostly that we, today,
> decided to take what was optimal for small computers and assume it
> optimal for large ones.

strcat2 is a much more efficient choice than strcat for small systems as
well as big ones. So that argument does not hold much water (although
it might possibly apply to other C design decisions that are not ideal
when viewed in a modern light).

>
>> I realise that - that's why I asked you for a justification for why you
>> think the original designers did it this way. But I am not yet
>> convinced that the original designers thought this through properly. I
>> can accept that they might have had different balances between
>> efficiency and simplicity, or compilers with different abilities, or
>> different styles. However, I am still lacking /any/ justification for
>> suggesting the original library designers thought "strcat" was a better
>> choice than "strcat2". Currently, it appears that they simply didn't
>> consider the issues. (And that's fair enough - we can't expect them to
>> have had perfect hindsight when starting out. We must simply live with
>> it as another of C's sub-optimal design decisions.)
>
> The inefficiency of strcat() is O(n*n). (O(n**2) in some newsgroups.)
>
> (I had to actually debug a program once where someone had used
> strcat() in a loop, where actual O(n**2) timing was visible.)
>
> When n=10, you don't notice, and the existing strcat() may be
> optimal.

You might not notice it when n = 10, and for small sizes on old
processors (with larger function-call overheads), the wasted time gets
lost in the "noise" of all the other time spent in the code.
If you have the length in advance, then of course you can use something
like memcpy() rather than strcat(). But sometimes you know your limits
are safe even though you don't know exact lengths (or, like many early C
programmers and a few modern ones, you simply ignore the risks of buffer
overflow!).
I have read it, but it was a /long/ time ago - perhaps 25 years.

>
>> I like to base my conclusions on evidence and technical justifications,
>> if I can. I can find plenty of evidence for the viewpoint that strcat2
>> would have been a better choice for the standard library than strcat. I
>> can find no significant evidence to the contrary, and no one else here
>> seems to have done so either.
>
> Well, for one, many C library string routines return a pointer to the
> first argument. It is nice to have a little consistency.
>
>> The nearest I have seen is the fact that strcat allows certain chained
>> expression syntaxes that cannot be written directly with strcat2 - you
>> need to use multiple expressions. But even in such cases, the strcat2
>> version is not more difficult, and it is much more efficient. And no
>> one has argued that such chained expressions were common in old C code,
>> or that they were a reason for picking the strcat form.
>
>> So the question is, why do you find it so hard to believe that the
>> original design decision was not a good one, and was perhaps made in
>> haste or for "historical" reasons?
>
> I suspect that if you asked the original C designers, at the time,
> if they expected C to be popular 40 years later, they would have
> said "no". Not that they didn't believe in their design, but there
> are many people out there with good ideas. Statistically, it wasn't
> so likely to happen.
>

True enough.


Tim Rentsch

unread,
Jul 10, 2015, 1:40:46 PM7/10/15
to
What you mean is you're lacking a justification you find convincing.
That's not quite the same thing.
I don't see the point of (re-)writing them, because I don't think
the problem they are solving illustrates anything useful.

>>> It is also simpler and more efficient to implement strcat in terms of
>>> strcat2 rather than the reverse:
>>>
>>> char * strcat(char* s, const char* t) {
>>> strcat2(s, t); // Time len(s) + len(t)
>>> return s;
>>> }
>>> // Total time len(s) + len(t)
>>>
>>> char * strcat2(char* s, const char* t) {
>>> strcat(s, t); // Time len(s) + len(t)
>>> return strlen(s); // Time len(s) + len(t)
>>> }
>>> // Total time 2*len(s) + 2*len(t)
>>
>> I might agree on the "more efficient" part, but not on "simpler"
>> part.
>
> You cannot possibly /disagree/ on the "more efficient" part - strcat can
> be implemented optimally from strcat2, while the opposite direction
> requires double the time. (In theory, a smart enough compiler could
> remove the function calls and turn the strcat + strlen into an inlined
> loop.)

Now that's funny. You point out a line of reasoning that pokes
a hole in your assertion, immediately after saying I can't
possibly have such a line of reasoning. Good one. :)

> And how can you justify contending on the "simpler" part? Implementing
> strcat from strcat2 requires one function call, while implementing
> strcat2 from strcat requires two function calls. I won't claim it's a
> complicated function, but there can be no doubts as to which is simpler.

Obviously we have different "simpleness" metrics; in my metric
they are (roughly) equally simple. Also it doesn't help your
case that one of the definitions has a (deliberate?) bug.

>>> So it is clear that there are common cases where a strcat2 returning the
>>> end pointer is more flexible and significantly more efficient than the
>>> standard version. There are no cases in which the standard version is
>>> more efficient. And the standard version is easily and efficiently
>>> created from the strcat2 version.
>>
>> I understand that you think this reasoning is convincing. Not
>> everyone does.
>
> I have given a fair amount of justification and evidence for why my
> reasoning makes sense. I still have heard absolutely no technical
> justification for the choice of strcat as we have it today, compared to
> strcat2 (or a version that returned the length of the new string, which
> would be another reasonable alternative). You have done nothing but
> claim that the original library authors might have had good reason for
> the decision.

Again, you haven't heard a justtification that convinvices you.
That isn't the same as saying no justification has been offered.

>>> The question is, does the standard version lead to significantly neater
>>> or clearer code in common cases, and was it common enough to lead to a
>>> concious design decision when the library was standardised? Or was it,
>>> as most people here seem to think, a bad decision - probably the result
>>> of standardising someone's existing usage rather than the result of a
>>> thoughtful design.
>>
>> Why do you find it so hard to believe that other people might
>> reasonably reach different conclusions than you do?
>
> I like to base my conclusions on evidence and technical justifications,
> if I can. I can find plenty of evidence for the viewpoint that strcat2
> would have been a better choice for the standard library than strcat. I
> can find no significant evidence to the contrary, and no one else here
> seems to have done so either.
>
> The nearest I have seen is the fact that strcat allows certain chained
> expression syntaxes that cannot be written directly with strcat2 - you
> need to use multiple expressions. But even in such cases, the strcat2
> version is not more difficult, and it is much more efficient. And no
> one has argued that such chained expressions were common in old C code,
> or that they were a reason for picking the strcat form.

I'm sorry I wasn't better able to communicate the point I was
trying to make.

> So the question is, why do you find it so hard to believe that the
> original design decision was not a good one, and was perhaps made in
> haste or for "historical" reasons?

It's important to distinguish two different situations. I don't
find it hard to believe the original design process for strcat()
etc _might_ not be so good. But that is very different from
concluding that the original design process was _not_ good.
My complaint is that you appear not to differentiate between
these two circumstances.

highl...@gmail.com

unread,
Feb 9, 2016, 9:05:13 PM2/9/16
to
On Wednesday, May 6, 2015 at 9:02:59 AM UTC+1, David Brown wrote:
> Does anyone know why the return value for strcat and similar library
> functions is defined in such an unhelpful (IMHO) way?
>
> The return value of strcat(s1, s2) is always s1, which is a value you
> already have available. Returning a pointer to the end of s1 (i.e., s1
> + strlen(s1)) would be much more efficient when building up a string
> from parts.

Funny, I made this exact same point here nearly 20 years ago.
https://groups.google.com/d/msg/comp.lang.c/lz7tjJWzj3o/acBOFilgQWEJ

I hope I don't still have to make it again 20 years from now.
https://symas.com/the-sad-state-of-c-strings/

James Kuyper

unread,
Feb 9, 2016, 9:38:07 PM2/9/16
to
If C and strcat() both still exist at that time, I'm reasonably sure
strcat will have the same return value. It's not a particularly useful
value, but because it is the value returned, it does get used. As a
result, there's already too much existing code that would be broken by
it. If this issue ever gets properly addressed, it will have to be in a
backwardly compatible way - most likely by new functions.
--
James Kuyper

Geoff

unread,
Feb 10, 2016, 12:24:29 AM2/10/16
to
On Tue, 9 Feb 2016 21:37:58 -0500, James Kuyper
<james...@verizon.net> wrote:

>On 02/09/2016 09:05 PM, highl...@gmail.com wrote:
>> On Wednesday, May 6, 2015 at 9:02:59 AM UTC+1, David Brown wrote:
>>> Does anyone know why the return value for strcat and similar library
>>> functions is defined in such an unhelpful (IMHO) way?
>>>
>>> The return value of strcat(s1, s2) is always s1, which is a value you
>>> already have available. Returning a pointer to the end of s1 (i.e., s1
>>> + strlen(s1)) would be much more efficient when building up a string
>>> from parts.
>>
>> Funny, I made this exact same point here nearly 20 years ago.
>> https://groups.google.com/d/msg/comp.lang.c/lz7tjJWzj3o/acBOFilgQWEJ
>>
>> I hope I don't still have to make it again 20 years from now.
>> https://symas.com/the-sad-state-of-c-strings/
>

char *strcopy(register char *dst, register char *src)
{
while((*dst++ = *src++))
;
return dst-1;
}

Looks handy, I'm stealing it. :)

>If C and strcat() both still exist at that time, I'm reasonably sure
>strcat will have the same return value. It's not a particularly useful
>value, but because it is the value returned, it does get used. As a
>result, there's already too much existing code that would be broken by
>it. If this issue ever gets properly addressed, it will have to be in a
>backwardly compatible way - most likely by new functions.

C won't exist at that time except as a very niche market with a very
small number of programmers on very focused devices. Compiler
technology will have advanced to a state where the disadvantages of
the C libraries and its syntax outweigh their advantages. C's main
advantage is small footprint and attendant speed but these are rapidly
being supplanted by safe and efficient class libraries. C will be
superceded by languages that can handle string concatenation as easily
as c = a + b.

Even C++'s handling will be archaic.
std::string a = "Hello ";
std::string b = "World ";
std::string c = a + b;
std::cout << c << std::endl;

It will begin to look like, and surpass, C#:
string a = "Hello ";
string b = "World ";
string c = a + b;
Console.WriteLine(c);

Python:
a ="Hello ";
b ="World ";
c = a + b;
print c;

Swift:
let a = "Hello "
let b = "World "
var c = a + b
print (c)

... and we will have come full circle to Microsoft BASIC ca. 1975:
a$ = "Hello "
b$ = "World "
c$ = a + b
PRINT c$

Richard Damon

unread,
Feb 10, 2016, 8:05:47 AM2/10/16
to
As James says, there is essentially ZERO chance that the existing
functions will change their return value. It might be possible that a
new function will be added to return the pointer to the final nul at the
end.

Returning the start of string actually does have some use when chaining
functions if you are calling strcat to update a string you want to give
to another function in one line. This is one reason in C++ there are a
lot of member functions that are defined to return a pointer (or
reference) to the object they were called on (it is probably more useful
in C++ though, as in C the chaining is a bit more ugly).

It could be argued that this type of chaining would be better done by
splitting the calls each to a separate line, but sytles have somewhat
changed (when terminals are 24x80, you tend to want to get as much on
the screen as possible, with larger displays we are less pressed for
terse code).

Keith Thompson

unread,
Feb 10, 2016, 10:58:20 AM2/10/16
to
James Kuyper <james...@verizon.net> writes:
[...]
> If C and strcat() both still exist at that time, I'm reasonably sure
> strcat will have the same return value. It's not a particularly useful
> value, but because it is the value returned, it does get used. As a
> result, there's already too much existing code that would be broken by
> it. If this issue ever gets properly addressed, it will have to be in a
> backwardly compatible way - most likely by new functions.

Like strlcat(), which returns the length of the updated string.

size_t strlcat(char *dst, const char *src, size_t size);

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

supe...@casperkitty.com

unread,
Feb 10, 2016, 11:53:17 AM2/10/16
to
On Wednesday, February 10, 2016 at 9:58:20 AM UTC-6, Keith Thompson wrote:
> Like strlcat(), which returns the length of the updated string.
>
> size_t strlcat(char *dst, const char *src, size_t size);

For repeated usage, I'd like to see forms that accepted a pointer to the end
of the destination buffer, and returned a pointer to the end of the destination
string; one form should "insist" upon a trailing zero, while the other should
allow the full use of the buffer [both patterns have their uses]. That would
allow many things to be concatenated without having to adjust the amount of
space remaining after each.

Randy Howard

unread,
Feb 10, 2016, 12:04:53 PM2/10/16
to
On 2/10/16 7:05 AM, Richard Damon wrote:
> Returning the start of string actually does have some use when chaining
> functions if you are calling strcat to update a string you want to give
> to another function in one line.

It's pretty obvious people will want slightly different behavior for
various purposes over decades of use. What I don't get is why the
standard library needs to change for those needs to be met.

It's not something that is difficult to implement portably (in any
flavor yet mentioned). Last I checked, rolling your own tiny function is
not quite yet illegal. ;-)



--
Randy Howard
(replace the obvious text in the obvious way if you wish to contact me
directly)

Siri Cruz

unread,
Feb 10, 2016, 12:23:08 PM2/10/16
to
In article <michpc$2b0$1...@dont-email.me>, David Brown <david...@hesbynett.no>
wrote:

> Does anyone know why the return value for strcat and similar library
> functions is defined in such an unhelpful (IMHO) way?

#define strdup(s) (strcpy(malloc(strlen(s)+1), s))

Some implementations have stpcpy and stpncpy.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted.
'I desire mercy, not sacrifice.'
God exists since mathematics is consistent, and the devil exists since we
cannot prove the consistency. ~~ Morris Kline

highl...@gmail.com

unread,
Feb 10, 2016, 1:30:57 PM2/10/16
to
On Wednesday, February 10, 2016 at 5:24:29 AM UTC, Geoff wrote:
> On Tue, 9 Feb 2016 21:37:58 -0500, James Kuyper
> <james...@verizon.net> wrote:
>
> >On 02/09/2016 09:05 PM, highl...@gmail.com wrote:
> >> On Wednesday, May 6, 2015 at 9:02:59 AM UTC+1, David Brown wrote:
> >>> Does anyone know why the return value for strcat and similar library
> >>> functions is defined in such an unhelpful (IMHO) way?
> >>>
> >>> The return value of strcat(s1, s2) is always s1, which is a value you
> >>> already have available. Returning a pointer to the end of s1 (i.e., s1
> >>> + strlen(s1)) would be much more efficient when building up a string
> >>> from parts.
> >>
> >> Funny, I made this exact same point here nearly 20 years ago.
> >> https://groups.google.com/d/msg/comp.lang.c/lz7tjJWzj3o/acBOFilgQWEJ
> >>
> >> I hope I don't still have to make it again 20 years from now.
> >> https://symas.com/the-sad-state-of-c-strings/

> char *strcopy(register char *dst, register char *src)
> {
> while((*dst++ = *src++))
> ;
> return dst-1;
> }
>
> Looks handy, I'm stealing it. :)

And the companion
char *strecopy(char *dst, char *src, const char *end)
{
while(dst < end && (*dst++ = *src++))
;
if (dst < end)
dst--;
return dst;
}

> >If C and strcat() both still exist at that time, I'm reasonably sure
> >strcat will have the same return value. It's not a particularly useful
> >value, but because it is the value returned, it does get used. As a
> >result, there's already too much existing code that would be broken by
> >it. If this issue ever gets properly addressed, it will have to be in a
> >backwardly compatible way - most likely by new functions.

Sure, let's add new functions. It wasn't acceptable to kill strcat in 1986, it obviously won't fly today. But add new functions that aren't ridiculously sub-optimal (as strlcpy/strlcat obviously is).

> C won't exist at that time except as a very niche market with a very
> small number of programmers on very focused devices. Compiler
> technology will have advanced to a state where the disadvantages of
> the C libraries and its syntax outweigh their advantages. C's main
> advantage is small footprint and attendant speed but these are rapidly
> being supplanted by safe and efficient class libraries. C will be
> superceded by languages that can handle string concatenation as easily
> as c = a + b.

I don't buy that. Betting on compilers to improve so much is the same mistake HP & Intel made with Itanium. The fact is that CPUs aren't getting faster, (in fact they're going to get slower! https://thestack.com/iot/2016/02/05/intel-william-holt-moores-law-slower-energy-efficient-chips/ )therefore the only way forward is to make our software more efficient. Every other language (including C++) has a measurable overhead cost that will only become more unacceptable as software complexity increases.
Yeah, I think we're already there. And BASIC is perfectly acceptable, if all you want out of it is a vehicle for introducing people to the basics of programming. None of those systems are suitable for production code though, not in a brave new world where CPU cycles aren't infinitely fast and free.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/

supe...@casperkitty.com

unread,
Feb 10, 2016, 3:12:59 PM2/10/16
to
On Wednesday, February 10, 2016 at 11:04:53 AM UTC-6, Randy Howard wrote:
> It's not something that is difficult to implement portably (in any
> flavor yet mentioned). Last I checked, rolling your own tiny function is
> not quite yet illegal. ;-)

Many platforms have some specialized instructions or other features that
would enable string operations to be implemented as compiler intrinsics
that are much more efficient than anything which could be accomplished in
C code, especially if the code in question needs to be portable. For
example, some processors have an instruction which, given a 32-bit word,
will report whether any of the four 8-bit bytes within that word is zero.
A strcpy or similar operation which is optimized for such a processor could
very likely run 2-4 times as fast as anything that could be written in
portable C.

Charles Richmond

unread,
Feb 10, 2016, 8:02:32 PM2/10/16
to
<highl...@gmail.com> wrote in message
news:2109b5bd-ee69-4a65...@googlegroups.com...
char *new_strcat(s1,s2)
{
strcat(s1,s2);

return s1 + strlen(s1);
}

--

numerist at aquaporin4 dot com


highl...@gmail.com

unread,
Feb 10, 2016, 8:41:42 PM2/10/16
to
You're trolling, right? I almost fell for it.

highl...@gmail.com

unread,
Feb 10, 2016, 8:42:58 PM2/10/16
to
On Wednesday, February 10, 2016 at 5:23:08 PM UTC, Siri Cruz wrote:
> In article <michpc$2b0$1...@dont-email.me>, David Brown <david...@hesbynett.no>
> wrote:
>
> > Does anyone know why the return value for strcat and similar library
> > functions is defined in such an unhelpful (IMHO) way?
>
> #define strdup(s) (strcpy(malloc(strlen(s)+1), s))

Brilliant. I hope you're fond of SEGVs, which is what your suggestion will most likely do if malloc fails. A proper strdup() would return NULL in that case.

James Kuyper

unread,
Feb 10, 2016, 9:09:55 PM2/10/16
to
You left out the types for s1 and s2. My first thought was that you were
writing an obsolescent K&R-style function definition, but that just
changes the location where the types must be declared, and there's
nothing in either location.

> {
> strcat(s1,s2);
>
> return s1 + strlen(s1);
> }

That version unnecessarily traverses the entire length of the output
string twice. I would prefer a single pass algorithm:

char *str_cat(
char * restrict s1,
const char * restrict s2)
{
while(*s1)
s1++;

while((*s1++ = *s2++));

return s1-1;
}

--
James Kuyper

Siri Cruz

unread,
Feb 11, 2016, 12:31:25 AM2/11/16
to
In article <cbe4f640-883a-4848...@googlegroups.com>,
My VM is 2^64 bytes. When swap space is full, the kernel grabs more disk blocks
for swap until the disk, with about 40 GB free, is completely full. Only then
does malloc fail.

If malloc fails, I could catch the SIGSEGV and deal with impossible events off
to the side somewhere instead of devoting half the code to recover from
impossible situations. Or since my response to a malloc failure is to print a
message (write(2, "memory dead\n", strlen("memory dead\n")) and exit, why not
just let the default signal hander do that.

In other words, bugger off. Just because you have to run on 6502s doesn't mean
rest of the world hasn't moved on.

David Brown

unread,
Feb 11, 2016, 2:55:13 AM2/11/16
to
A compiler could legally optimise this into the version James gave. I
believe I have seen gcc doing that sort of thing, but I can't remember
the details and could not replicate it with a brief test. But certainly
the compiler is allowed to see something like "strcat" followed by
"strlen" and realise that a single combined function (either
compiler-provided function call, or inlined code) is more efficient.


James Kuyper

unread,
Feb 11, 2016, 11:05:41 AM2/11/16
to
On 02/11/2016 02:55 AM, David Brown wrote:
> On 11/02/16 02:41, highl...@gmail.com wrote:
>> On Thursday, February 11, 2016 at 1:02:32 AM UTC, Charles Richmond wrote:
...
>>> char *new_strcat(s1,s2)
>>> {
>>> strcat(s1,s2);
>>>
>>> return s1 + strlen(s1);
>>> }
>>
>> You're trolling, right? I almost fell for it.
>>
>
> A compiler could legally optimise this into the version James gave. I
> believe I have seen gcc doing that sort of thing, but I can't remember
> the details and could not replicate it with a brief test.

That's more likely to happen if the implementation of strcat() and
strlen() are in the same translation unit as new_strcat(), and it would
be even more likely if both are declared inline.
Still, my version is no more complex, and doesn't rely on compiler
optimization.
--
James Kuyper

supe...@casperkitty.com

unread,
Feb 11, 2016, 11:52:00 AM2/11/16
to
On Thursday, February 11, 2016 at 10:05:41 AM UTC-6, James Kuyper wrote:
> That's more likely to happen if the implementation of strcat() and
> strlen() are in the same translation unit as new_strcat(), and it would
> be even more likely if both are declared inline.
> Still, my version is no more complex, and doesn't rely on compiler
> optimization.

Either version could end up being more efficient. On platforms which
process strcpy and strlen a byte at a time, handwritten code which performs
the operation "manually" and then returns the end pointer could easily be
more efficient than doing a separate strcpy/strlen operation. On the other
hand, some platforms include instructions that can process strcpy and/or
strlen a word at a time, and the built-in strcpy/strlen functions take
advantage of those instructions. Even if a compiler doesn't optimize the
combination, the combined cost of strcpy+strlen might still come out cheaper
than the cost of the combined-purpose loop. And if the compiler does
optimize the combination it could come out cheaper yet.

My personal inclination would be to use strlen+memcpy; actually, strlen is
just about the only one of the str* functions which I would consider well
designed; it's too bad there's no clean way to create string literals with
a variable-length prefix except by creating named identifiers. It's possible
to write a macro which, given:

SHORT_STRING(hello, "Hello"):
MEDIUM_STRING(longer, "Pretend this string is 1000 bytes long");

would yield

struct { unsigned char len; char dat[5];} = hello[1] = {{5,"Hello"}};
struct { unsigned char len,len2; char dat[5];} longer[1] =
{{0x83,0xE8, "Pretend..."}};

but there's no way to create such things with static duration within an
expression. The lack of such ability is IMHO the biggest factor that is
compelling the continued use of zero-terminated strings.

Keith Thompson

unread,
Feb 11, 2016, 12:03:10 PM2/11/16
to
highl...@gmail.com writes:
[...]
> Sure, let's add new functions. It wasn't acceptable to kill strcat in
> 1986, it obviously won't fly today. But add new functions that aren't
> ridiculously sub-optimal (as strlcpy/strlcat obviously is).
[...]

How are strlcpy and strcat "ridiculously sub-optimal"?

highl...@gmail.com

unread,
Feb 11, 2016, 12:23:36 PM2/11/16
to
On Thursday, February 11, 2016 at 5:03:10 PM UTC, Keith Thompson wrote:
> highl...@gmail.com writes:
> [...]
> > Sure, let's add new functions. It wasn't acceptable to kill strcat in
> > 1986, it obviously won't fly today. But add new functions that aren't
> > ridiculously sub-optimal (as strlcpy/strlcat obviously is).
> [...]
>
> How are strlcpy and strcat "ridiculously sub-optimal"?

Full context was already linked in the my previous post.
https://symas.com/the-sad-state-of-c-strings/

Malcolm McLean

unread,
Feb 11, 2016, 1:39:51 PM2/11/16
to
On Thursday, February 11, 2016 at 5:03:10 PM UTC, Keith Thompson wrote:
>
> How are strlcpy and strcat "ridiculously sub-optimal"?
>
If you build up a string through repeated calls to strcat, as is
natural, you scan through the string from start to end on each
call. So pretty everyone with more than a month's experience
of C has thought "could we not save the length somewhere".

Actually, most programs don't spend much time on that sort of
string manipulation. And if the strings are long enough for the
optimisation to be worthwhile, they'll almost certainly be in
dynamic memory, so you won't use strcat anyway.

James Kuyper

unread,
Feb 11, 2016, 1:43:27 PM2/11/16
to
When I try to use that link, a new Firefox window pops up and a regular
sequence of symbols displays near the center top of the screen:
1. A small gray rectangle, taller than it is wide, with a darker
rectangle the same shape but slightly taller to the right.
2. A dark rectangle the same shape and size as in the first symbol, but
appearing to the right of it's original position.
3. An isosceles triangle, with the longer side vertical on the left,
about the same size and position as the left edge of the dark rectangle
it replaced. The triangle is about twice as wide as the rectangle was.
4. Blank screen.

If that's an explanation of "the sad state of C strings", it's a little
too obscure for me to understand. :-) It looks like a "Loading"
indicator, but it's not Firefox's own usual loading indicator. That's
all I see after letting it run for a half hour.
I note that that's an https:// link - that usually means that a password
is required. If the page ever finishes loading, will a password be needed?

While I'm waiting for that page to load, could you summarized the
discussion on that page? If you do, I'd guess that your summary will
appear before that page finishes loading.

--
James Kuyper

Ian Collins

unread,
Feb 11, 2016, 2:15:33 PM2/11/16
to
Malcolm McLean wrote:
>
> Actually, most programs don't spend much time on that sort of
> string manipulation. And if the strings are long enough for the
> optimisation to be worthwhile, they'll almost certainly be in
> dynamic memory, so you won't use strcat anyway.

You really should stop posting such sweeping generalisations. How do
you know what most programs do?

The last couple of stand alone applications I've written (generating
makefiles from visual studio projects and tracing changes to files and
directories) spent most of their time manipulating and searching
strings. In the latter example, lots of strings!

--
Ian Collins

supe...@casperkitty.com

unread,
Feb 11, 2016, 3:37:02 PM2/11/16
to
On Thursday, February 11, 2016 at 11:03:10 AM UTC-6, Keith Thompson wrote:
> How are strlcpy and strcat "ridiculously sub-optimal"?

The strlcpy function is specified as returning the length of the source string,
and strlcat is specified as returning the combined length of the source string
and destination strings. Computation of the return value thus requires that
they scan the source argument until they find a zero byte, rather than
stopping when they've processed enough data to fill the destination buffer or
perhaps enough to ascertain that the string is too big to fit.

Keith Thompson

unread,
Feb 11, 2016, 4:15:09 PM2/11/16
to
James Kuyper <james...@verizon.net> writes:
> On 02/11/2016 12:22 PM, highl...@gmail.com wrote:
>> On Thursday, February 11, 2016 at 5:03:10 PM UTC, Keith Thompson wrote:
[...]
>>> How are strlcpy and strcat "ridiculously sub-optimal"?
>>
>> Full context was already linked in the my previous post.
>> https://symas.com/the-sad-state-of-c-strings/
>
> When I try to use that link, a new Firefox window pops up and a regular
> sequence of symbols displays near the center top of the screen:
[snip]

That's odd. It displays fine for me in both Firefox and Chrome, as well
as in lynx.

[...]

> I note that that's an https:// link - that usually means that a password
> is required. If the page ever finishes loading, will a password be needed?

No, https:// doesn't usually mean a password is required. It merely
means that the content is encrypted in transit. In particular, this
page doesn't require a password.

[...]

Keith Thompson

unread,
Feb 11, 2016, 4:16:47 PM2/11/16
to
Malcolm McLean <malcolm...@btinternet.com> writes:
> On Thursday, February 11, 2016 at 5:03:10 PM UTC, Keith Thompson wrote:
>> How are strlcpy and strcat "ridiculously sub-optimal"?
>>
> If you build up a string through repeated calls to strcat, as is
> natural, you scan through the string from start to end on each
> call. So pretty everyone with more than a month's experience
> of C has thought "could we not save the length somewhere".

I meant to write "strlcpy and strlcat" rather than "strlcpy and strcat".
strlcat, unlike strcat, does return the length of the string it tried to
create.

[...]

David Brown

unread,
Feb 11, 2016, 4:37:10 PM2/11/16
to
On 11/02/16 17:05, James Kuyper wrote:
> On 02/11/2016 02:55 AM, David Brown wrote:
>> On 11/02/16 02:41, highl...@gmail.com wrote:
>>> On Thursday, February 11, 2016 at 1:02:32 AM UTC, Charles Richmond wrote:
> ...
>>>> char *new_strcat(s1,s2)
>>>> {
>>>> strcat(s1,s2);
>>>>
>>>> return s1 + strlen(s1);
>>>> }
>>>
>>> You're trolling, right? I almost fell for it.
>>>
>>
>> A compiler could legally optimise this into the version James gave. I
>> believe I have seen gcc doing that sort of thing, but I can't remember
>> the details and could not replicate it with a brief test.
>
> That's more likely to happen if the implementation of strcat() and
> strlen() are in the same translation unit as new_strcat(), and it would
> be even more likely if both are declared inline.

It is perhaps more likely, but it is not necessary. For
standards-defined functions like strcat and strlen, the compiler knows
what they do - it does not need an implementation on hand, because it
can use its own internal implementation. (In the same way, memcpy calls
can be changed into inline copies - in some cases, the call could turn
into register moves or be removed entirely.) gcc has a
"-foptimize-strlen" flag, enabled at -O2, to control strlen optimisation
- though I suspect it is new to the very latest version of the compiler.

> Still, my version is no more complex, and doesn't rely on compiler
> optimization.
>

True - but it requires writing the function rather than letting the
compiler do all the work.

James Kuyper

unread,
Feb 11, 2016, 4:51:22 PM2/11/16
to
On 02/11/2016 04:14 PM, Keith Thompson wrote:
> James Kuyper <james...@verizon.net> writes:
>> On 02/11/2016 12:22 PM, highl...@gmail.com wrote:
...
>>> Full context was already linked in the my previous post.
>>> https://symas.com/the-sad-state-of-c-strings/
>>
>> When I try to use that link, a new Firefox window pops up and a regular
>> sequence of symbols displays near the center top of the screen:
> [snip]
>
> That's odd. It displays fine for me in both Firefox and Chrome, as well
> as in lynx.

I agree that it's odd. I've no idea how to investigate the problem.
Suggestions appreciated.

>> I note that that's an https:// link - that usually means that a password
>> is required. If the page ever finishes loading, will a password be needed?
>
> No, https:// doesn't usually mean a password is required. It merely
> means that the content is encrypted in transit. In particular, this
> page doesn't require a password.

I know that https:// merely specifies encryption - but it has been my
experience that most sites that use https:// also require some kind of
authentication, usually a password (except when it's a credit card
number, for commercial web sites). YMMV.
--
James Kuyper

supe...@casperkitty.com

unread,
Feb 11, 2016, 4:55:57 PM2/11/16
to
On Thursday, February 11, 2016 at 3:15:09 PM UTC-6, Keith Thompson wrote:
> No, https:// doesn't usually mean a password is required. It merely
> means that the content is encrypted in transit. In particular, this
> page doesn't require a password.

Many kinds of gateways, such as coffee-shop WiFi servers, will attempt to
redirect traffic in certain cases. For example, some will redirect page
requests from a user to a "terms-of-service" page and then redirect to the
original requested page when the user agrees to the terms. They may also
redirect attempts to access certain objectionable web sites to an "Access
forbidden" page. Such redirects are not allowed with https:// accesses,
however, since they would allow a crook on e.g. a public WiFi connection
to watch for attempts to log into https://www.paypal.com and respond to
them with phony redirects to https://www.paypa1.com.

James Kuyper

unread,
Feb 11, 2016, 4:56:19 PM2/11/16
to
On 02/11/2016 04:36 PM, David Brown wrote:
> On 11/02/16 17:05, James Kuyper wrote:
>> On 02/11/2016 02:55 AM, David Brown wrote:
>>> On 11/02/16 02:41, highl...@gmail.com wrote:
>>>> On Thursday, February 11, 2016 at 1:02:32 AM UTC, Charles Richmond wrote:
>> ...
>>>>> char *new_strcat(s1,s2)
>>>>> {
>>>>> strcat(s1,s2);
>>>>>
>>>>> return s1 + strlen(s1);
>>>>> }
>>>>
>>>> You're trolling, right? I almost fell for it.
>>>>
>>>
>>> A compiler could legally optimise this into the version James gave. I
>>> believe I have seen gcc doing that sort of thing, but I can't remember
>>> the details and could not replicate it with a brief test.
>>
>> That's more likely to happen if the implementation of strcat() and
>> strlen() are in the same translation unit as new_strcat(), and it would
>> be even more likely if both are declared inline.
>
> It is perhaps more likely, but it is not necessary. For
> standards-defined functions like strcat and strlen, the compiler knows
> what they do - it does not need an implementation on hand, because it
> can use its own internal implementation. (In the same way, memcpy calls

Agreed - it's precisely for those reasons that I wrote "more likely"
rather than "necessary".

> can be changed into inline copies - in some cases, the call could turn
> into register moves or be removed entirely.) gcc has a
> "-foptimize-strlen" flag, enabled at -O2, to control strlen optimisation
> - though I suspect it is new to the very latest version of the compiler.
>
>> Still, my version is no more complex, and doesn't rely on compiler
>> optimization.
>>
>
> True - but it requires writing the function rather than letting the
> compiler do all the work.

Writing the function my way was no harder for me than it would have been
to write it his way. In actual code, I will usually manually inline such
code rather that put it in a separate function.
--
James Kuyper

Keith Thompson

unread,
Feb 11, 2016, 5:01:29 PM2/11/16
to
Which is an issue only when the target is too small (as specified by the
"size" parameter) to hold the result. I presume that's not the most
common case, and it's hardly reason enough to say that they're
"ridiculously sub-optimal".

Charles Richmond

unread,
Feb 11, 2016, 7:00:45 PM2/11/16
to
"James Kuyper" <james...@verizon.net> wrote in message
news:n9ibal$f0p$1...@dont-email.me...
Yeah, I know... I did *not* declare "s1" and "s2" as parameters of the
function. It's classic "me"... I get in a hurry and forget little details.
:-) Don't worry... the C compiler will whack me across the knuckles until I
fix it!!!

Geoff

unread,
Feb 11, 2016, 8:23:28 PM2/11/16
to
https:// means a secure channel will be established. This channel is
encrypted with a session key that is exchanged via PKI between client
and server long before a user password is required but the site
doesn't require one. This would seem to be a problem with Firefox.

James Kuyper

unread,
Feb 11, 2016, 11:15:17 PM2/11/16
to
It wasn't a problem for Keith when he used Firefox to visit that site.
It hasn't bee a problem for me when I've visited many other https://
sites. Are you aware of any special characteristic of this particular
site that might be relevant?
I often run into problems because I require that Firefox get my
permission before letting web sites set up cookies. I don't understand
why - anything problematic should cause a window to pop up asking for my
permission, and if I say "Yes", it should run just the same if
permission had been given automatically - but as a simple matter of
observed fact, that policy frequently does cause web sites to silently
malfunction. Could something like that be relevant here?
--
James Kuyper

Paul

unread,
Feb 12, 2016, 12:10:45 AM2/12/16
to
This is the protocol set symas.com uses.

https://www.ssllabs.com/ssltest/analyze.html?d=symas.com

Configuration

Protocols
TLS 1.2 Yes
TLS 1.1 Yes
TLS 1.0 Yes
SSL 3 No
SSL 2 No

It doesn't fall back to insecure protocols. Generally,
only the most recent protocol is actually trusted.
TLS 1.0 is being offered in this case, to cover
older browsers, not because it's a good idea.

You can test your web browser, by visiting this page.
If your web browser has no matching method, then
you'll get some sort of error or warning.

https://www.ssllabs.com/ssltest/viewMyClient.html

For example, this is Firefox 3. In this example,
SSL3 was disabled on purpose, so it could not be used.
By using TLS 1.0, I can interact with the symas.com web page.
TLS 1.0 is the highest protocol version this browser
example supports.

Protocol Features
Protocols
TLS 1.2 No
TLS 1.1 No
TLS 1.0 Yes
SSL 3 No
SSL 2 No

*******

A second browser roadblock is here.

"completely distrust connections to sites using SHA-1 signed certs"

https://blog.cloudflare.com/sha-1-deprecation-no-browser-left-behind/

"Symantec SHA256 SSL Page

If your browser is able to display this page,
then your browser supports the SHA-256 algorithm."

https://ssltest39.ssl.symclab.com/

So those are some tests you can carry out. I thought the
Firefox 3 browser was broken for SHA-2, but apparently the
Symantec test page for SHA-2 (SHA256) is working.

The viewmyClient page above, also gives information on other
vulnerabilities. Not just the ones that prevent web pages
from appearing for you.

Web sites can also refuse to work for more trivial reasons:

1) Javascript disabled. This is used by advertising materials
embedded in the web page, tracking by Facebook or Google
and so on. Some pages "erase" the visual rendering on the
page, when they realize they're not getting their own way.
Perhaps NoScript caused your page to not render.

2) SuperCookie/Evercookie capability. Many sites complain you're not letting
them set a cookie, when your actual cookie setting is perfectly
open. What they're really complaining about, is your HTML5 DOM
storage isn't available, so they can store information in places
you might not normally clean out. So this is more of an
"you wouldn't let me abuse you" warning dialog, rather than
being technically accurate. For more info on alternative cookie
storage techniques, including test cases, try here. If they used
the original cookie storage method, you would only erase it, which
reduces its value as a tracking mechanism. By storing a cookie in
nooks and crannies, that makes the cookie more persistent.

http://samy.pl/evercookie/

And if you look in your Firefox SQLite databases, and you
use GMail, you may see a Google link with a unique number on
the end. Which I presume is part of some sort of cookie-like
mechanism. Helping Google to figure out if you have more
than one Gmail account perhaps. You can never keep a browser
too clean. Software is available for cleaning, such as
CCLeaner or BleachBit. I prefer (in the case of BleachBit),
to merely understand what each script is doing, rather than
actually use it. As Bleachbit should have methods you can
read about, for the various abusive tracking mechanisms.

HTH,
Paul

Malcolm McLean

unread,
Feb 12, 2016, 4:16:09 AM2/12/16
to
On Thursday, February 11, 2016 at 5:31:25 AM UTC, Siri Cruz wrote:
>
> My VM is 2^64 bytes. When swap space is full, the kernel grabs more
> disk blocks for swap until the disk, with about 40 GB free, is
> completely full. Only then does malloc fail.
>
Yes, it's often more likely that the computer will break than that
it will fail to honour a request for a small amount of memory.

Richard Heathfield

unread,
Feb 12, 2016, 5:04:31 AM2/12/16
to
The problem with thinking that way is that, nowadays, a computer has to
run a great many programs at the same time.

Right now, my laptop isn't terribly busy. It's /only/ running 209 tasks.
When I'm hammering it (which I sometimes do), that number can climb a
lot higher. If every single program is written in line with the
philosophy that you needn't check for malloc, sooner or later things are
going to start falling over. And they /do/.

Some time ago, I tried telling Linux never to over-commit memory (i.e.
to return NULL from malloc, so that applications could fail gracefully
instead of crashing). It *should* have helped, but it didn't, because
nowadays applications don't know /how/ to fail gracefully. It's sloppy
programming.

--
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

Siri Cruz

unread,
Feb 12, 2016, 5:19:49 AM2/12/16
to
In article <e0bcdf54-cda6-4cc3...@googlegroups.com>,
Actually it does a kernel panic before malloc fails.

James Kuyper

unread,
Feb 12, 2016, 2:13:42 PM2/12/16
to
On 02/12/2016 12:10 AM, Paul wrote:
> James Kuyper wrote:
>> On 02/11/2016 04:14 PM, Keith Thompson wrote:
>>> James Kuyper <james...@verizon.net> writes:
>>>> On 02/11/2016 12:22 PM, highl...@gmail.com wrote:
>> ...
>>>>> Full context was already linked in the my previous post.
>>>>> https://symas.com/the-sad-state-of-c-strings/
>>>> When I try to use that link, a new Firefox window pops up and a regular
>>>> sequence of symbols displays near the center top of the screen:
>>> [snip]
>>>
>>> That's odd. It displays fine for me in both Firefox and Chrome, as well
>>> as in lynx.
>>
>> I agree that it's odd. I've no idea how to investigate the problem.
>> Suggestions appreciated.
...
> This is the protocol set symas.com uses.
>
> https://www.ssllabs.com/ssltest/analyze.html?d=symas.com
>
> Configuration
>
> Protocols
> TLS 1.2 Yes
> TLS 1.1 Yes
> TLS 1.0 Yes
> SSL 3 No
> SSL 2 No
>
> It doesn't fall back to insecure protocols. Generally,
> only the most recent protocol is actually trusted.
> TLS 1.0 is being offered in this case, to cover
> older browsers, not because it's a good idea.
>
> You can test your web browser, by visiting this page.
> If your web browser has no matching method, then
> you'll get some sort of error or warning.
>
> https://www.ssllabs.com/ssltest/viewMyClient.html

My browser shows the same configuration as you showed above. The only
clearly negative issues are:

TLS compression No
SSL 2 handshake compatibility No

I'm not sure how to interpret the "Mixed Content Handling" results. For
Images it shows "Passive Yes", for all other tests it shows "Active No".
Upgrade Insecure Requests is also "No".

> For example, this is Firefox 3. In this example,
> SSL3 was disabled on purpose, so it could not be used.
> By using TLS 1.0, I can interact with the symas.com web page.
> TLS 1.0 is the highest protocol version this browser
> example supports.
>
> Protocol Features
> Protocols
> TLS 1.2 No
> TLS 1.1 No
> TLS 1.0 Yes
> SSL 3 No
> SSL 2 No
>
> *******
>
> A second browser roadblock is here.
>
> "completely distrust connections to sites using SHA-1 signed certs"
>
> https://blog.cloudflare.com/sha-1-deprecation-no-browser-left-behind/
>
> "Symantec SHA256 SSL Page
>
> If your browser is able to display this page,
> then your browser supports the SHA-256 algorithm."
>
> https://ssltest39.ssl.symclab.com/

It was able to display that page.

> So those are some tests you can carry out. I thought the
> Firefox 3 browser was broken for SHA-2, but apparently the
> Symantec test page for SHA-2 (SHA256) is working.
>
> The viewmyClient page above, also gives information on other
> vulnerabilities. Not just the ones that prevent web pages
> from appearing for you.
>
> Web sites can also refuse to work for more trivial reasons:
>
> 1) Javascript disabled. This is used by advertising materials
> embedded in the web page, tracking by Facebook or Google
> and so on. Some pages "erase" the visual rendering on the
> page, when they realize they're not getting their own way.
> Perhaps NoScript caused your page to not render.
>
> 2) SuperCookie/Evercookie capability. Many sites complain you're not letting
> them set a cookie, when your actual cookie setting is perfectly
> open. What they're really complaining about, is your HTML5 DOM
> storage isn't available, so they can store information in places
> you might not normally clean out. So this is more of an
> "you wouldn't let me abuse you" warning dialog, rather than
> being technically accurate. For more info on alternative cookie
> storage techniques, including test cases, try here. If they used
> the original cookie storage method, you would only erase it, which
> reduces its value as a tracking mechanism. By storing a cookie in
> nooks and crannies, that makes the cookie more persistent.
>
> http://samy.pl/evercookie/

That's a rather annoying concept. I'd like to know how I can find and
remove every evercookie currently stored on my system.

Richard Bos

unread,
Feb 14, 2016, 9:03:14 AM2/14/16
to
No longer true - even Pikiwedia requires https:// now, FFS! For editing,
sure, that'd be perfectly reasonable, indeed laudable. But for reading?
Mere silliness and security posturing.

Richard

Richard Bos

unread,
Feb 14, 2016, 9:05:44 AM2/14/16
to
Siri Cruz <chine...@yahoo.com> wrote:

> In other words, bugger off. Just because you have to run on 6502s doesn't mean
> rest of the world hasn't moved on.

Stop dicksizing. Just because you have the money for a latest-generation
computer and don't mind if your programs randomly crash on an overcommit
rather than bail out gracefully doesn't mean the rest of the world can
afford to be that slap-dash.

Richard

Siri Cruz

unread,
Feb 14, 2016, 9:59:49 AM2/14/16
to
In article <56c08948...@news.xs4all.nl>, ral...@xs4all.nl (Richard Bos)
wrote:
There is no bailout on a kernel panic. It seems brk and mmap therefore malloc
doesn't fail on MacOSX. It would require running a second process polling and
decipherring ps to programmatically detect a failing program ere panic.

Is that what you call graceful programming?

Malcolm McLean

unread,
Feb 14, 2016, 10:20:39 AM2/14/16
to
On Friday, February 12, 2016 at 10:04:31 AM UTC, Richard Heathfield wrote:
> On 12/02/16 09:15, Malcolm McLean wrote:
>
> > Yes, it's often more likely that the computer will break than that
> > it will fail to honour a request for a small amount of memory.
>
> The problem with thinking that way is that, nowadays, a computer has to
> run a great many programs at the same time.
>
The real issue is that you can get stuck in a loop. When I make a
programming error on my Apple Mac, such that it's allocating a small
document node repeatedly in a non-terminating loop, it doesn't
actually run out of memory and report allocation failure. It runs
for a minute or so, and you think it has stuck forever. You kill it
and the debuggers shows the state of the stack at kill time, which
will be in the non-terminated, allocating loop. Eventually I guess it
will run out of memory.

Richard Heathfield

unread,
Feb 14, 2016, 1:54:25 PM2/14/16
to
On 14/02/16 14:59, Siri Cruz wrote:
> In article <56c08948...@news.xs4all.nl>, ral...@xs4all.nl (Richard Bos)
> wrote:
>
>> Siri Cruz <chine...@yahoo.com> wrote:
>>
>>> In other words, bugger off. Just because you have to run on 6502s doesn't
>>> mean
>>> rest of the world hasn't moved on.
>>
>> Stop dicksizing. Just because you have the money for a latest-generation
>> computer and don't mind if your programs randomly crash on an overcommit
>> rather than bail out gracefully doesn't mean the rest of the world can
>> afford to be that slap-dash.
>
> There is no bailout on a kernel panic. It seems brk and mmap therefore malloc
> doesn't fail on MacOSX. It would require running a second process polling and
> decipherring ps to programmatically detect a failing program ere panic.
>
> Is that what you call graceful programming?

Not on the part of the MacOSX folks, no. They need to fix that.

Richard Heathfield

unread,
Feb 14, 2016, 1:59:36 PM2/14/16
to
On 14/02/16 15:20, Malcolm McLean wrote:
> On Friday, February 12, 2016 at 10:04:31 AM UTC, Richard Heathfield wrote:
>> On 12/02/16 09:15, Malcolm McLean wrote:
>>
>>> Yes, it's often more likely that the computer will break than that
>>> it will fail to honour a request for a small amount of memory.
>>
>> The problem with thinking that way is that, nowadays, a computer has to
>> run a great many programs at the same time.
>>
> The real issue is that you can get stuck in a loop.

Is it? I thought the real issue was that we ought to be writing the
programs properly in the first place.

> When I make a
> programming error on my Apple Mac, such that it's allocating a small
> document node repeatedly in a non-terminating loop, it doesn't
> actually run out of memory and report allocation failure.

If it hasn't actually run out of memory, that's not the OS's fault. But
if it's running out of memory but not reporting the fact, that /is/ the
OS's fault. Fortunately, Linux (and perhaps MacOSX?) has a way you can
fix that -- turn off overcommit. Unfortunately, it doesn't help, because
programs still crash at random.

Malcolm McLean

unread,
Feb 14, 2016, 2:17:44 PM2/14/16
to
On Sunday, February 14, 2016 at 6:59:36 PM UTC, Richard Heathfield wrote:
> On 14/02/16 15:20, Malcolm McLean wrote:
>
> > The real issue is that you can get stuck in a loop.
>
> Is it? I thought the real issue was that we ought to be writing the
> programs properly in the first place.
>
Normally a properly written, small program can't run out of memory.
It's as likely as if you withdraw some money from the cashpoint, and
HSBC goes bankrupt at that very moment and in consequence.
However unlike a bank account, if we have a loop whereby you are
continually withdrawing money unnecessarily for the same purchase,
the OS will advance you the request, until it runs out. So that's how
you get out of memory conditions.
>
> > When I make a
> > programming error on my Apple Mac, such that it's allocating a small
> > document node repeatedly in a non-terminating loop, it doesn't
> > actually run out of memory and report allocation failure.
>
> If it hasn't actually run out of memory, that's not the OS's fault. But
> if it's running out of memory but not reporting the fact, that /is/ the
> OS's fault. Fortunately, Linux (and perhaps MacOSX?) has a way you can
> fix that -- turn off overcommit. Unfortunately, it doesn't help, because
> programs still crash at random.
>
It's not clear to me whether it's over-committing or if there's just
so much memory in the system that it never runs out. Maybe I'll do
some experiments.

supe...@casperkitty.com

unread,
Feb 14, 2016, 2:44:19 PM2/14/16
to
On Sunday, February 14, 2016 at 1:17:44 PM UTC-6, Malcolm McLean wrote:
> Normally a properly written, small program can't run out of memory.
> It's as likely as if you withdraw some money from the cashpoint, and
> HSBC goes bankrupt at that very moment and in consequence.
> However unlike a bank account, if we have a loop whereby you are
> continually withdrawing money unnecessarily for the same purchase,
> the OS will advance you the request, until it runs out. So that's how
> you get out of memory conditions.

I would think that a reasonably-designed OS should by default limit the
total amount of memory that could be used by an individual non-Kernel
process and its subprocesses to a level low enough to prevent disruption
of other processes except in cases where multiple entirely-independent
processes were requesting excessive allocations. There are many situations
were file-parsing code won't have a fixed upper bound on the amount of
memory it needs, but will instead require an amount of memory which is
determined by the content of the file. It would see cleaner to allow
the operator to configure how much memory a process is allowed to use, and
then be able to safely use the process to try to view untrusted content,
than to require that the file-parsing code be configured as to how much
memory it's allowed to use.

I really dislike the idea that there's no way a program can run out of
memory except when the entire system is out of memory. Indeed, I wish that
languages would provide a way for code to say "I need a memory pool which
will be able to supply at least 100MB of allocations", and "Start using pool
Y for future allocations on this thread, but give me something I can use
to switch back to whatever the thread was using". If a function won't have
any chance of working unless it can make 100MB worth of allocations, it would
be better to have the function find out that it's not going to work before
it even starts, than for it to gobble up all the memory that the program
would have available (leaving none for other things) before it hits an out-
of-memory limit.

fir

unread,
Feb 14, 2016, 2:56:00 PM2/14/16
to
todays 'system', I mean the situation when app is not hit by out of memory communicate but system swaps other apps to disk is at all not so bad.. (maybe except some details)

Can you (or someone ) invent something better?

fir

unread,
Feb 14, 2016, 3:03:16 PM2/14/16
to
probably they could compress when swapping - im not sure if they do that

Malcolm McLean

unread,
Feb 14, 2016, 3:09:23 PM2/14/16
to
On Sunday, February 14, 2016 at 7:56:00 PM UTC, fir wrote:
>
> todays 'system', I mean the situation when app is not hit by out
> of memory communicate but system swaps other apps to disk is at
> all not so bad.. (maybe except some details)
>
> Can you (or someone ) invent something better?
>
Depends what you are doing.
Say you're running a scientific program that analyses whole genomes.
If it runs short of memory, you can leave it running overnight and
collect the results in the morning, and that's a lot better than
no results at all.
However if the program needs to be interactive or semi-interactive,
and it slows to a crawl, then it just becomes intolerable for the
user. Much better to abort that operation.

fir

unread,
Feb 14, 2016, 3:13:20 PM2/14/16
to
I read they do that in WIN 10 "According to the Windows team, “In practice, compressed memory takes up about40% of the uncompressed size, and as a result of a typical device running a typical workload, Windows 10 writes pages out to disk only 50% as often as previous versions of the OS.” If all goes according to plan, Windows users could be experiencing reduced waiting times for all devices as well as extended lifespans on systems that have flash-based hard drives."

well thats good, maybe this is the reason people was saying win10 running faster .. [sleepy]

fir

unread,
Feb 14, 2016, 3:27:36 PM2/14/16
to
unresponsivness (in this crawl mode i know too sometimes (though not so often)) is not good - but im not sure if this may be not avoided by the system interface even if client app is hang in swapping ... probably it can be avoided but i dont remember the details how to avoid or why is hard to avoid
sys interface hang .. IF it can be avoided then this argument is not hit becouse then you got a choice to break app or wait (better than break automatically).. But i agree sometimes system interface hang is still a problem (at least in xp im mostly using).. but this is like a problem of os
code that maybe was already removed i dont know
1) in win when you dont call message pump for few seconds you got like an app interface hang
2) sometimes (not so often i experience it i dont know not more often that onece a month or two?) you got a whole windows interface hang (dont know the full reason, I dont remember but out of system memory may probably sometimes do that sometiomes not,
im not sure - this could mean that windows
interface should be most probably be coded other way and dont rely on swapping mechanism as normal apps do - they probably do that from laziness, and thats bad]

crankypuss

unread,
Feb 14, 2016, 3:50:42 PM2/14/16
to
Siri Cruz wrote:

> In article <56c08948...@news.xs4all.nl>, ral...@xs4all.nl
> (Richard Bos) wrote:
>
>> Siri Cruz <chine...@yahoo.com> wrote:
>>
>> > In other words, bugger off. Just because you have to run on 6502s
>> > doesn't mean
>> > rest of the world hasn't moved on.
>>
>> Stop dicksizing. Just because you have the money for a
>> latest-generation computer and don't mind if your programs randomly
>> crash on an overcommit rather than bail out gracefully doesn't mean
>> the rest of the world can afford to be that slap-dash.
>
> There is no bailout on a kernel panic. It seems brk and mmap therefore
> malloc doesn't fail on MacOSX. It would require running a second
> process polling and decipherring ps to programmatically detect a
> failing program ere panic.
>
> Is that what you call graceful programming?

Sorry to expose my ignorance by jumping in like this, but I've just
started following the ng and this subthread sounds like a "linux vs mac"
argument, is that the case? Who's on first?

--
http://totally-portable-software.blogspot.com
[Sun Nov 22: "Total Portability is not binary"]

crankypuss

unread,
Feb 14, 2016, 4:52:36 PM2/14/16
to
Please post the results. I think I have 8-GB main-store on this laptop.
I remember writing assembler code for a system with a 128-kb hard drive.
It would be interesting to know just how much storage is wasted these
days; if it isn't a lot, you'll get some real big numbers I think. I'm
amazed at some of the footprints people consider "reasonable" these
days. Actually I'm amazed at a lot of things these days, like how
mortally slow iceweasel is on an old 32bit laptop. It just reinforces
the fact that i/o device speeds are the main bottleneck imo. Running a
core-i7 I'm not seeing massive speed differences between it and the old
32bit machine, most of what's going on seems to be waiting on
peripherals. Except in whatever graphics library supports iceweasel, I
think it's gtk. Anyway please do post what you learn if you do those
experiments. thx.

fir

unread,
Feb 14, 2016, 6:02:29 PM2/14/16
to
in oldschool days people was working on somewhat compressed data in low color and resolution (i was starting at C64),
also hardware was used very
strightforward (so dynamic was crunchy high)

Today you put tons of stupid assets
so not much mystery it consumes ram
and cpu/gpu time.. small aps in c fortunatelly still can work fast.. but all the half-hidden layers (like dll-loading,
maybe relocation, some 'higher level' library layers, and layers and layers costs and slug thing down

1. big data
2. layers *

* sometimes shitty, not all 'vendors'/people
optimize code very well [yawn, god im sleepy]






fir

unread,
Feb 14, 2016, 6:27:28 PM2/14/16
to
W dniu poniedziałek, 15 lutego 2016 00:02:29 UTC+1 użytkownik fir napisał:

> Today you put tons of stupid assets
> so not much mystery it consumes ram
> and cpu/gpu time.. small aps in c fortunatelly still can work fast.. but all the half-hidden layers (like dll-loading,
> maybe relocation, some 'higher level' library layers, and layers and layers costs and slug thing down
>
> 1. big data
> 2. layers *
>
> * sometimes shitty, not all 'vendors'/people
> optimize code very well [yawn, god im sleepy]

wonder what are thiose evil half-hidden layers in teh case of my own programming,
im not sure

1) msvcrt dependency
2) cr0 and potential c++ compiler stubs
3) some people say graphics/ogl drivers are often evil
4) some parts of internal win32 ? *which one and why?)
5) more ?
6) more ?
this is sadly a bit more advanced topic to get valuable answer

luser droog

unread,
Feb 14, 2016, 6:57:45 PM2/14/16
to
On Sunday, February 14, 2016 at 2:50:42 PM UTC-6, crankypuss wrote:
> Siri Cruz wrote:
>
> > There is no bailout on a kernel panic. It seems brk and mmap therefore
> > malloc doesn't fail on MacOSX. It would require running a second
> > process polling and decipherring ps to programmatically detect a
> > failing program ere panic.
> >
> > Is that what you call graceful programming?
>
> Sorry to expose my ignorance by jumping in like this, but I've just
> started following the ng and this subthread sounds like a "linux vs mac"
> argument, is that the case? Who's on first?
>

It's often hard to discern: humans (including me) get
upset about so many silly things. But I think the argument
is between "works on my system" and "follow the goddam
standard".

Often, we eventually tease out the useful info: what the
standard says, what that actually means, how do most systems
do it, what wiggle room did the weird ones find in the
standard to justify being so weird.

But we're not there yet in this thread. So add the onions
and let it simmer.

supe...@casperkitty.com

unread,
Feb 14, 2016, 7:19:23 PM2/14/16
to
On Sunday, February 14, 2016 at 5:57:45 PM UTC-6, luser droog wrote:
> It's often hard to discern: humans (including me) get
> upset about so many silly things. But I think the argument
> is between "works on my system" and "follow the goddam
> standard".

Unfortunately, I think there's a lack of clarity on what purpose the Standard
is meant to serve, and absent agreement on that there can be little hope
for agreement over what it should say or how it should be interpreted.

There are many features and guarantees which many implementations of C
could provide at essentially zero cost, but which some implenmentations of
C cannot. For many such features and guarantees, there will be some kinds
of programs which could benefit enormously from being able to use them and
others that cannot.

A decision to require a feature in the language means that the language will
be unusable on systems that can't support the feature. A decision to leave
a feature out of the Standard will mean that the standardized language will
not be anywhere optimal for the kinds of programs that would have benefited
from the features.

Unfortunately, because of C99's attempts to create new "optional"
language features like VLAs out of thin air weren't very successful, many
people are opposed to the idea of optional features; in practice, I would
suggest that the language would never have become particularly popular had
it not been for the fact that compiler vendors acknowledged many of each
other's features as de-facto standards; since today's compiler philosophy
suggests that compilers shouldn't try to ensure consistent behaviors in
cases not mandated by the Standard, the only way the language can survive
and remain useful will be if it acknowledges the common extensions which
made the language implemented by popular compilers vastly more suitable
for many purposes than the language defined by the committee.

highl...@gmail.com

unread,
Feb 14, 2016, 9:22:01 PM2/14/16
to
On Thursday, February 11, 2016 at 5:31:25 AM UTC, Siri Cruz wrote:
> In article <cbe4f640-883a-4848...@googlegroups.com>,
> highl...@gmail.com wrote:
>
> > On Wednesday, February 10, 2016 at 5:23:08 PM UTC, Siri Cruz wrote:
> > > In article <michpc$2b0$1...@dont-email.me>, David Brown
> > > <david...@hesbynett.no>
> > > wrote:
> > >
> > > > Does anyone know why the return value for strcat and similar library
> > > > functions is defined in such an unhelpful (IMHO) way?
> > >
> > > #define strdup(s) (strcpy(malloc(strlen(s)+1), s))
> >
> > Brilliant. I hope you're fond of SEGVs, which is what your suggestion will
> > most likely do if malloc fails. A proper strdup() would return NULL in that
> > case.
>
> My VM is 2^64 bytes. When swap space is full, the kernel grabs more disk blocks
> for swap until the disk, with about 40 GB free, is completely full. Only then
> does malloc fail.
>
> If malloc fails, I could catch the SIGSEGV and deal with impossible events off
> to the side somewhere instead of devoting half the code to recover from
> impossible situations. Or since my response to a malloc failure is to print a
> message (write(2, "memory dead\n", strlen("memory dead\n")) and exit, why not
> just let the default signal hander do that.
>
> In other words, bugger off. Just because you have to run on 6502s doesn't mean
> rest of the world hasn't moved on.

You, Siri, are a moron.

Correct code is correct regardless of its machine environment. Just because you aren't aware of trivial ways that your code can fail, and obviously haven't bothered to think about it, doesn't mean it won't. There's a thing called ulimit, perhaps you've heard of it? Or another thing called containers...

The fact is there are myriad reasons why your supposedly gargantuan virtual memory space will be exhausted in your application's lifetime, even without the rest of the OS panicking or crashing. You're an arrogant, ignorant ass. I pity your users and your employers.

Kenny McCormack

unread,
Feb 14, 2016, 10:15:11 PM2/14/16
to
In article <d2419ea9-6c18-4fe8...@googlegroups.com>,
<highl...@gmail.com> blubbered:
...
>You, Siri, are a moron.
>
>Correct code is correct regardless of its machine environment. Just because you
>aren't aware of trivial ways that your code can fail, and obviously haven't
>bothered to think about it, doesn't mean it won't. There's a thing called ulimit,
>perhaps you've heard of it? Or another thing called containers...
>
>The fact is there are myriad reasons why your supposedly gargantuan virtual
>memory space will be exhausted in your application's lifetime, even without the
>rest of the OS panicking or crashing. You're an arrogant, ignorant ass. I pity
>your users and your employers.

Somebody didn't get enough attention as a child...

--
Amazingly, it's beginning to look like W is the smart one.

But Jeb will still be the nominee (seriously, who else can/could it be?)

Siri Cruz

unread,
Feb 14, 2016, 10:20:39 PM2/14/16
to
In article <d2419ea9-6c18-4fe8...@googlegroups.com>,
highl...@gmail.com wrote:

> Correct code is correct regardless of its machine environment. Just because
> you aren't aware of

My code is appropriate, idiot. If malloc were to ever fail, I don't have a
recovery except to print an error and then correct the program so it no longer
fails. It really doesn't matter how it fails when a customer gets it, a failure
is a failure.

Perhaps you should worry more about palming bad code to your customers so they
can do your testing for you, You can also do graceful beginnings. Rather than
start your programme assuming the environment is what it should be, verify and
correct it or report where it makes sense: tell your customer they violated the
contract before wasting resources. Give diagnostics that are sensible and
actionable to the customer.

If you're worrying about heap failure, don't do a 'graceful recovery' that
doesn't mean jack shit to your customer. As part of the contract validation,
overestimate how much memory (did you learn about O(...) space computation?) and
underestimate what is available. State up front you probably can't fulfill the
request and point to the contract clause and/or propose remediation that makes
sense to the customer.
INSUFFICIENT CORE. USE THESE COMMANDS.
RFL(70000)
REDUCE,-.

Also do remember that loc generators can overrun just like heap generators.
However since loc allocation is so often implicit, you don't get a 'graceful
exit', you get a stack overflow which is either on faulting an unmappable page
or watching the stack overwrite the heap and descending into chaos. The only
correct way to deal with loc failure is to ensure you have a big enough stack
for any possible recursion and local arrays.

> trivial ways that your code can fail, and obviously haven't bothered to think
> about it, doesn't mean it won't. There's a thing called ulimit, perhaps

rlimit isn't fully enforced on MacOSX.

> you've heard of it? Or another thing called containers...
>
> The fact is there are myriad reasons why your supposedly gargantuan virtual
> memory space will be exhausted in your application's lifetime, even without
> the rest of the OS panicking or crashing. You're an arrogant, ignorant ass. I
> pity your users and your employers.

Actually there's no reason for a program to exhaust memory. If it does, it's
broken. Fix it so it doesn't break. Since you're expected to find your bugs
before giving it to customers, it really doesn't matter to your customer what
your own diagnostics you send to yourself.

srsly dude, there's been no reason for a divide check since atan2 come along.

Ian Collins

unread,
Feb 14, 2016, 10:45:16 PM2/14/16
to
Siri Cruz wrote:
>
> Actually there's no reason for a program to exhaust memory.

Unless it's called java or firefox....

--
Ian Collins
It is loading more messages.
0 new messages